Study participants
The China Health and Nutrition Survey (CHNS) is an ongoing household-based, cohort study. It began in 1989, with surveys completed every 2–4 years until 2015. Prior to 2011, the sample included data from nine provinces (Heilongjiang, Liaoning, Jiangsu, Shandong, Henan, Hubei, Hunan, Guangxi, Guizhou). In 2011 three megacities were added (Beijing, Shanghai and Chonqing) and in 2015 three additional provinces were added (Shaanxi, Yunnan, and Zhejiang). A stratified, multistage, clustered sampling design was used to select households and communities within each province or mega-city. The CHNS captures a variety of geographical areas, levels of economic development, and health indicators [12]. Data were collected through a 7-day survey including collection of biomarker data, 3-day 24-hour diet recall data, eating behaviors, financial resources, and other relevant measures, making the CHNS a robust resource for studying diet urbanization and cardiometabolic disease. The study met the standards for the ethical treatment of participants and was approved by the Institutional Review Boards of the University of North Carolina at Chapel Hill and the National Institute for Nutrition and Health, Chinese Center for Disease Control and Prevention. Participants gave informed consent for participation [12].
To develop the urbanized diet index, we used cross-sectional data from all 12 provinces and three megacities in the 2015 wave of the CHNS for all adults, 18 years or older, who had diet data collected in 2015 (n = 17,191). All inclusion and exclusion criteria are shown in Fig. 1. We excluded participants who were pregnant, had implausible (< 500 kcals) or missing average daily energy intake (kcal), or had missing covariate or health outcome data.
Diet data
Dietary intake was measured with 24-hour dietary recalls on three consecutive days. Household food consumption was determined by examining changes in inventory from the beginning to the end of each day, in combination with a weighing and measuring technique. Three-day inventory and dietary recall interviews were randomized to occur between Monday-Sunday and were conducted by trained investigators. Interviewers recorded the type of food, amount consumed, timing of consumption and location of foods consumed, which included food consumed outside the home, for each individual participant. Side dishes, condiments and spices were recorded and weighed at the beginning and end of the food inventory and allocated to household members based on individual 24-hour recall data. Food consumption data was converted to nutritional content using the Chinese Food Composition Table. Other variables were derived from the 24-hour recall data, including percent of daily calories consumed from a variety of foods or food groups (fruit, nuts and seeds, all snack foods, sweet snack foods, eggs, dairy, fried foods, fast food, instant noodles, high fat meat, carbohydrates, fat, animal products, processed foods), and daily averages of number of snacks eaten, number of food groups eaten, sodium intake, and fiber intake. We classified “snacks” to include salty soda cracker or mooncake, sweetened cookies, biscuits, cakes, pastries, and mooncake, nuts (peanut/sunflower seeds/pumpkin seeds, watermelon seeds, and other seeds), chocolate, and potato chips, French fries, and other fried snacks. Sweet snack foods included sweetened cookies, biscuits, cakes, pastries, and mooncake. Processed foods included packaged, frozen, boxed, or bagged foods, as well as oils and condiments that were added during cooking [13].
In addition, we used diet-related data from other parts of the CHNS questionnaire. For example, we used self-reported data on the amount of each type of alcohol consumed each week to generate an indicator for whether an individual consumed any wine. We used data from the household questionnaire to generate diet-related infrastructure indicators of urbanization, including household ownership of refrigerator and/or microwave.
Health outcome variables
Hypertension (HTN)
Blood pressure was measured three times on the right arm after 10 minutes of seated rest by physicians using standard mercury sphygmomanometers [14]. A participant with systolic blood pressure ≥ 130 or diastolic blood pressure ≥ 80 was classified as having HTN based on the ACC/AHA blood pressure guidelines [15]. Additionally, participants who reported a HTN diagnosis or who reported they were currently taking HTN medication were also defined as having hypertension.
Overweight
Height was measured without shoes to the nearest 0.1 cm using a portable stadiometer and weight was measured without shoes and in light clothing to the nearest 0.1 kg on a calibrated floor scale. Both were measured by trained anthropometrists and were used to calculate body mass index (BMI). Overweight was defined as having a BMI of 24 kg/m2 or greater, based on the Chinese overweight BMI cut point [16].
Type II diabetes mellitus (T2DM)
Fasting blood glucose was measured through blood sample collection, after an overnight fast, by health workers, according to standard procedures [17]. T2DM was determined based on individuals having either a fasting blood glucose level of 126 mg/dL or greater [18], the International Diabetes Federations (IDF) criteria for T2DM diagnosis (IDF, 2017 [19]), or a self-reported previous T2DM diagnosis or taking T2DM medications.
Urbanization index
To define dietary consumption patterns relative to overall urbanization, we used the urbanization index, a validated multicomponent measure of urbanization in the CHNS that captures rapid and differential urbanization across China [6]. This measure of overall urbanization, calculated at the community level, was derived using a multicomponent continuous scale based on very detailed direct measurement of community contextual measures. The 12 components that comprise this validated index include population density, economic activity, traditional markets, modern markets, transportation infrastructure, sanitation, communications, housing, education, diversity, health infrastructure and social services.
Covariates
Age, sex, educational level (highest attained), and smoking history were self-reported. Due to the high correlation between sex and smoking history, we derived a combined variable of sex and smoking status (female never smoker, male never smoker, female ever smoker, male former smoker, male current smoker) for use as a covariate. Region was categorized as North, Central, South, and Megacities. Per capita household income, in Yuan, was derived from individual and household questionnaires from time-use, asset, and economic activity. Average total daily energy intake, in calories, was collected across the repeated 24-hour recall data, which was validated using doubly labelled water. Physical activity was measured using a detailed weekly activity recall which captured occupational, domestic, travel and active leisure activity. Metabolic equivalent of task (MET) hours per week were calculated for each of these categories of physical activity using the Compendium of Physical Activity [20,21,22]. We calculated a total physical activity index, capturing the sum of occupational, domestic, travel, and active leisure activity MET hours per week. Details on these physical activity variables, including how the METs were calculated can be found elsewhere [23, 24].
Urbanized diet index development
Select dietary variables of interest
First, we selected a broad set of diet-related variables with literature-based evidence for association with urbanization, with the intent to capture the total diet. Second, we examined consistency in the association between these variables with urbanization index, overall and within each region (North, Central, South) and within Megacities, by examining the mean and standard deviation for each continuous dietary measure (e.g., mg of sodium) and percentage for each dichotomous dietary measure (e.g., whether or not an individual consumed wine) by tertile of overall and within-region urbanization index. Third, we determined which variables to move forward, based on (1) association with overall urbanization, (2) consistent association with urbanization across regions, and (3) frequency of consumption > 5%. We used substantive differences in means or percentages across urbanization and region to determine inclusion/exclusion rather than formal statistical testing.
Score individual dietary variables
We categorized each individual diet variable for scoring. For uncommonly consumed foods (< 80% of sample are consumers), we created a categorical variable for non-consumers and among the consumers, quartiles of consumption. For commonly consumed foods (≥80% of sample are consumers), we created quintiles of consumption with non-consumers classified in the lowest quintile. Non-continuous variables were scored as dichotomous yes/no consumed or ownership.
To determine scoring of diet variables for the urbanized diet index, we examined the association between each of the candidate diet variables with the overall urbanization index using logistic or multinomial logistic regressions in the following three models: 1) the set of variables from commonly consumed foods (≥80% consumers), 2) the set of variables from infrequently consumed foods (< 80% consumers), and 3) items classified as present/consumed or not present/not consumed. Each of the three sets of diet-related variables are outcome variables and overall urbanization is the exposure variable. For the non-dichotomous dietary variables, we present results as relative risk ratios (RRRs) for associations between each dietary variable and a one standard deviation change in urbanization index, using a referent of the lowest consumer or non-consumer group. For dichotomous variables, we present odds ratios (ORs) for associations between each variable and a one-unit change in overall urbanization index. Using these results, we scored each category of consumption or presence from one to four based on the strength and direction of their association with overall urbanization allowing for the range of the scores to reflect the range of relative risk (e.g., similar scores for similar risk ratios, wider range of scores for wider range of relative risk).
Analytical steps to create the urbanized diet index
First, we created six candidate urbanized diet indices for consideration based on inclusion and exclusion of specific food variables. Second, we tested associations for each of the six candidate diet urbanization indices with overall urbanization in unadjusted, age- and sex-adjusted, and fully adjusted (age, sex and smoking, average daily energy intake, region, educational attainment, per capita household income, and physical activity) mixed linear regression models, with random intercepts to account for correlation at the household and community level. We used these model results to select a final diet urbanization measure based on the strength of association with overall urbanization, and degree of missingness.
We then tested whether the final diet urbanization index was stable across varying sociodemographic characteristics using standardized residuals greater or less than two. Using a fully adjusted mixed model linear regression, we classified individuals into two groups, one with individuals who had less accurate prediction of final urbanized diet index (standardized residuals >|2|), and one with individuals who had more accurate prediction of final urbanized diet index (standardized residuals ≤ |2|). The residuals measure the difference between the expected value (prediction from the regression model) and the observed value for each individual. For interpretation purposes, we present standardized residuals (residual divided by the standard deviation of the residuals). We generated average demographic data for each of these two groups and conducted ANOVA and chi-squared tests to determine statistical significance of differences across overall urbanization index, age, sex and smoking status, region, average daily energy intake, educational attainment, per capita household income, and physical activity.
Finally, we tested whether the final diet urbanization index was associated with three key CMD risk variables - HTN, overweight, and T2DM. In a minimally adjusted model, we used mixed effects logistic regression models with urbanized diet index as the exposure, adjusted for age and sex. We used a fully adjusted model, with urbanized diet index as the exposure, adjusted for age, sex/smoking, average daily energy intake, region, educational attainment, per capita household income, physical activity, and random intercepts to account for correlation at community and household levels. In a third set of models, we included all the previous covariates with the addition of the overall urbanization index [6], to compare associations with and without controlling for overall urbanization. We present these results as odds ratios for the association between each of the three CMD outcomes and a one standard deviation change in urbanized diet index.