Based on various aspects of validity and an extensive range of statistical tests, we demonstrated that the present FFQ developed for a Lebanese community is a useful tool for dietary assessment, when compared to six 24-h DRs. We obtained an acceptable agreement between nutrient intakes of both dietary instruments, given that most participants were correctly classified into the same and adjacent quartiles, with a low level of misclassification. Weighted kappa statistics also showed acceptable results. These findings were further confirmed in the Bland Altman plots and the indirect validity analysis relating nutrient intakes from the 24-h DRs to food groups from the FFQ, indicating a satisfactory agreement between the two methods.
There is no perfect reference method in validation studies. Objective methods such as biochemical indicators are relatively invasive and expensive especially when they aim to test many nutrients. Moreover, biochemical indicators do not exist for some nutrients (total fat, total CHO, total fibers). They are also influenced by dietary factors including day-to-day variation and physiological factors such as nutrient absorption and metabolism, diurnal and menstrual cycles [34]. Diet records also hold several limitations, such as decreased cooperation from the respondents and modification of their dietary intake. Therefore, multiple 24-h DRs appear to be the primary alternative [34], and they are used by most validation studies as a reference method [9].
In validation studies, it is important to cover many aspects of validity. An in-depth literature review carried out in 2015 showed that the mostly used statistical tests in FFQ validation studies were combinations of two to three tests, which may not be sufficient to provide a comprehensive perception of various facets of validity [31]. Moreover, the sole use of correlation analysis is not sufficient in validity studies, as it does not measure agreement between methods [31]. Hence, in the current study, we applied a remarkable number of statistical tests for a more reliable analysis. In addition to correlation analysis, we used percent difference, cross-classification quartiles, weighted kappa statistics, and Bland Altman plots to measure agreement between the two methods, as well as indirect validity analysis between nutrient intakes from 24-h DRs and food consumption categories derived from the FFQ. While correlation coefficient, kappa statistics, and cross-classification assess validity at the individual level, Bland Altman and percent difference do it at the group level [31].
Regarding correlation analyses, a desirable Pearson correlation coefficient generally ranges from 0.5 to 0.7 [34], with coefficients between 0.2 and 0.45 considered acceptable [31]. In the current study, correlation coefficient values fell within the acceptable range, with a good outcome for alcohol (> 0.5). They were similar to some FFQ validation studies [35, 36] and lower than others [14, 37]. Adjusting for factors such as age, gender, and energy intake is very important in validation studies. In line with the present findings, it is unrealistic to obtain high values of correlations coefficients after such an adjustment [34]. However, Pearson correlation coefficient cannot be considered the only determinant of validity as it does not test the level of agreement between the two dietary instruments [31].
Results showed that the FFQ tended to overestimate nutrient intakes as compared to 24-h DRs. This finding is consistent with most of the FFQ validation studies [35, 36, 38, 39]. Possible reasons for this overestimation are the relatively large number of food items participants have to recall while filling the FFQ in comparison with the 24-h DR [9]. We also described mean nutrient intakes by gender; they were generally higher in men than women, which is consistent with previous findings [37].
Regarding the mean percent difference, it was calculated for both crude and energy-adjusted nutrient intakes. The difference remarkably decreased with the energy-adjusted values. It showed acceptable to good results for macronutrients, vitamins such as vitamin D, thiamin, and niacin, and minerals like phosphorus, potassium, sodium, and iron. Given that the FFQ overestimates energy intake, it seemed more plausible to compare intakes when they are energy-adjusted. This allowed evaluating the nutrient composition of the diet as assessed by both dietary instruments, rather than only crude intakes. In future epidemiological studies, especially those evaluating diet-disease associations, it is crucial to consider adjusting for energy intake among other confounding factors; diet-disease associations should not be the sole result of differences in total energy intake between cases and non-cases [30].
Cross-classification of nutrient intakes into quartiles and weighted kappa calculation showed promising results as per the agreement between the dietary instruments. Regarding the quartile categorization, misclassification was less than 10% among most nutrients, while a relatively high proportion of participants were classified into the same or adjacent quartile. Results were similar to previous FFQ validation studies [35, 37, 40]. Moreover, most weighted kappa values fell within the acceptable range (between 0.2 and 0.6) [31] while Cohen’s kappa values reflected fair agreement (between 0.2 and 0.4) [32]. These results are of utmost importance, given that ranking individuals according to their dietary intakes is fundamental in the investigation of diet-disease associations [31].
Bland Altman plots showed a good level of agreement between the two methods. While the positive mean in most plots indicated that the FFQ overestimated intakes, plots show that the majority of data points fell within the LOA around the mean intake.
Indirect validity assesses the relationship between the food consumption categories derived from the FFQ, and the nutrient intakes extracted from the 24-h DRs [41]. This type of analysis has been rarely conducted in previous FFQ validation studies [42, 43]. Results suggested a good indirect validity; intakes of key nutrients significantly increased with the relative tertiles of foods groups that they are usually and logically related to.
Test-retest reliability displays not only the degree of correlation but also the agreement between measurements. In contrast to Pearson correlation coefficient, paired t-test, and Bland Altman plots, ICC is an advisable measure of reliability that assesses both degree of correlation and agreement between two measures [44]. In the current study, the FFQ yielded good to excellent reproducibility according to ICC results [44], similarly to previous studies [12, 14, 36, 45]. Bland Altman analysis for reproducibility confirmed these findings. Moreover, the interval between repeated measurements (1 month) is adequate in order to minimize dietary changes over time as well as the recall of previous answers [9]. In fact, following a time interval longer than 1 month (reaching 3 months), seasonality bias could emerge and affect food reporting during the second administration of the FFQ [22]. Hence, the resulting reproducibility correlation in the present study could be attenuated. Nevertheless, previous studies have adopted time intervals of 2 weeks [43, 46, 47], 3 weeks [12, 38], 4 weeks [11, 36, 48], 4 to 6 weeks [13], and 6 weeks [49].
This is the first FFQ validation study conducted in Lebanon to assess most aspects of validity, for a complete range of macro- and micronutrients. In fact, only a few number of studies worldwide used an extensive number of statistical tests for FFQ validation. Another strength of this study is the number of 24-h DRs (6 days) collected as a reference method for the FFQ validation, which was not common in previous studies. In addition, the sample size which is relatively higher than other validation studies, appears sufficient in the context of deriving useful information on questionnaire validity, when combined with 24-h DRs of 6 days [34].
We acknowledge the present validation study has some limitations. First, the length of the FFQ could have increased the burden on participants, hence impairing the cooperation of the respondents and raising the risk of biased responses and the overestimation of intakes. Therefore, in order to account for this limitation, the FFQ was interviewer-administered which assured a more accurate completion of answers [23, 24]. Despite this limitation, some studies suggested that food lists reaching 200 items could perform better than shorter ones with 100 items, and the resulting respondent burden “does not seem to be a decisive factor for FFQs” [6]. Second, the present sample of a university community is not necessarily representative of the total population; it includes a higher proportion of women, a higher education level, and a younger age distribution. Third, errors usually associated with both dietary instruments should be taken into consideration, including errors related to memory and estimation of energy and nutritional intakes. It would have been preferable to administer the multiple 24-h DRs several times over the period of 1 year. However, this was not possible due to technical and collaboration issues. In order to account for this limitation, we collected multiple 24-h DRs administered twice within an interval of 1 month. Finally, even though food composition databases were carefully chosen to reflect our community’s dietary habits in the most accurate way, the use of multiple food composition tables could still induce a certain level of error.