### Study design

NOWAC is a national population-based cohort study with 102 443 women enrolled at age 30–70 years from 1991 to 1997. The cohort has been described in detail elsewhere [7]. Updated information can be found on the NOWAC web-site [8]. NOWAC includes the Norwegian sub-cohort in the European Prospective Investigation into Cancer and Nutrition (EPIC). The present methodological sub-study was undertaken to assess the reproducibility of the food frequency questionnaire developed for NOWAC and the Norwegian part of the EPIC study. The FFQ covers four consecutive pages within a larger self-instructive health and lifestyle questionnaire (eight pages) that is administered by post and optically read. The same questionnaire was mailed twice (test and retest) to the same subjects, about three months apart in February/March and May/June 2002. A letter of invitation and a return envelope with pre-paid postage were included. Non-responders received up to two written reminders for each questionnaire. No rewards were given to participants.

### Subjects

In 2002 a follow-up questionnaire was mailed to 36 000 women from the cohort aged 46–75 years. Those who returned the questionnaire within four weeks (n = 14 817) were taken as the sampling frame, from which a random sample of 2000 women was drawn for the reproducibility study. The sampling was done by Statistics Norway using the national population registry, which identifies all Norwegian residents by a unique 11-digit national person number incorporating birth date and sex. Information about name, address, emigration and death is continuously updated based on mandatory registration and notification to the registry. To retain confidentiality the person number was replaced by a serial number on the letter of invitation and questionnaire, and in the data files. The study was approved by the Regional Committee for Medical Research Ethics, Northern Norway, and the license for data storage and processing was issued by the national Data Inspectorate.

In the random sample of 2000 women, five had not given an informed consent to further contact and were excluded. The retest questionnaire was returned by 1496 (75%) of the 1995 women. One test questionnaire was not available at the time of analysis, and seven women with null energy intake in either test or retest were excluded. Thus, 1488 respondents with two FFQ measurements could be included in the reproducibility analyses. Background characteristics were compared for the respondents and 1994 women from the original sample to check for selection bias. Except for age, all characteristics were based on self-reported information in the test questionnaire.

The reproducibility analysis of single questions in the FFQ included pairs of test-retest responses without missing values, so the number of subjects included varied. The analysis of food groups and nutrients included 1370 women (92%) who answered at least 50% of the frequency questions and had energy intake in the range 2500–15000 kJ in both test and retest. Similar inclusion criteria have previously been used in NOWAC [9]. The effects of exposure measurement error on disease risk estimates were investigated using the 1370 subjects from the food group and nutrient analysis, who also had completed a question about high blood pressure. Those who answered "yes" or "no" to this question in both test and retest, were defined as cases (n = 301) and controls (n = 712), respectively. Subjects with inconsistent or missing answers were excluded.

### The food frequency questionnaire (FFQ)

The FFQ was designed to assess habitual diet over the past year, with emphasis on fish consumption and a traditional diet in the study population. Questions were asked about the intake of milk, coffee, orange juice, soft drinks, yoghurt, breakfast cereal, bread, fat on bread, toppings for open sandwiches (jam, cheeses, meat and fish products), fruit, vegetables, potatoes, rice, pasta, rice porridge, fish and fish products, shellfish, condiments and sauces for fish, meat and poultry, eggs, ice cream, cakes, desserts, chocolate, snacks, alcoholic beverages, and dietary supplements. Similar items were grouped together in blocks with question headings. The response options were predefined and listed in increasing order with check-boxes to facilitate completion and optical reading. For example, the items listed under the question "How often do you eat fruit?" were "apples/pears", "oranges", "bananas", and "other fruit" with the following options: "never/rarely", "1–3 per month", "1 per week", "2–4 per week", "5–6 per week", "1 per day", and "2+ per day". The first alternative for consumption frequencies was always "never/rarely", but the number of options ranged from 4 to 7 depending on the food. When convenient, the questions were phrased in terms of natural units, such as glasses (milk, fruit juice, soft drinks, and wine), cups (coffee), slices (bread), or number (eggs and potatoes). Separate questions about the usual amounts consumed were included for fat on bread, vegetables, fish and fish products, sauces and condiments for fish, meat and meat products, ice cream, chocolate, and cod liver oil supplements. The number of response options ranged from 3 to 5 with units in pieces, slices, decilitres, florets (broccoli and cauliflower), or spoonfuls. The dietary intake computations included a total of 132 questions in the FFQ (consumption frequencies = 91, types of fat used on bread = 7, amounts = 28, and time of year for the consumption of different species of fish = 6). A detailed list of the food items, including a specification of those with a separate amount question, can be found in Additional file 1. The original version of the test-retest FFQ is shown in Additional file 2.

### Computation of dietary intake

The daily intake of food groups, energy, and nutrients was computed using an analysis program developed at the Institute of Community Medicine, University of Tromsø, for SAS software. The program was run with an updated file version of the food composition table for Norway [10]. Broader categories of foods (e.g. "apples/pears") were split into single foods according to frequency weights (e.g. 80% apples and 20% pears) derived from 24-hour dietary recalls in a random sample of women within NOWAC [11, 12]. For season specific frequencies (ice cream, fish, and cod liver oil supplements) the average for the whole year was used. Missing frequencies were treated as null intake, and missing portion sizes were substituted by the smallest portion for a conservative intake estimate. Standard portion sizes and standard weights were taken from official tables for Norway [13]. The type of fat used on bread was taken into account in the calculations, but not fat in cooking since the intake of fried and cooked foods was computed using values for prepared foods in the food composition table. The only dietary supplement included was cod liver oil (liquid and capsules), which is commonly used in Norway as a source of vitamin A, vitamin D, and long-chain ω-3 fatty acids. The food groups were based on the classification system in the EPIC-SOFT program for conducting 24-hour dietary recalls in the EPIC study [14], but with some modifications. Peanuts and potato chips were added to the EPIC group "Sugar and confectionary" and called "Sweets and salty snacks". The EPIC groups "Potatoes and other tubers" and "Egg and egg products" only included one item each from the FFQ and were therefore called "Potatoes" and "Eggs". A new group was made for cod liver oil. The food groups included whole food items, not ingredients, as recipes were not used. The composition of the food groups is given in Additional file 1.

### Statistical analysis

Background characteristics of the study population are presented as mean and standard deviation (SD) or range for continuous variables, and proportion (%) for categorical variables. Single questions with predefined response options were treated as categorical variables, and calculated intake of food groups, energy, and nutrients as continuous variables. The reproducibility of single questions was evaluated by contingency tables for test-retest responses. The table diagonal represents the agreement, i.e. the responses in the same categories (test = retest). Total agreement (%) and agreement for the category "never/rarely" (%) were calculated for each table. Misclassification (%) was calculated for adjacent categories (± 1 and ± 2) and extreme opposite categories (lowest and highest). The symmetry of the misclassification was assessed by calculating the misclassification (%) on each side of the table diagonal (retest <test and retest> test). The difference across the diagonal indicates if there is a shift towards higher or lower responses in the retest compared to the test. The coefficients simple Kappa and weighted Kappa were also calculated and summarize the total agreement beyond that expected by chance [15].

For food groups, energy, and nutrients, we calculated the mean and standard deviation (SD) for the test and retest, the mean of the within person differences with both 95% confidence interval (± 2 SEM, i.e. standard error of the mean) and limits of agreement (± 2 SD). If the individual differences are normally distributed, 95% will lie within these limits [16]. We estimated Pearson's product moment correlation coefficient, *r*, and Spearman's rank correlation coefficient, *r*
_{
s
}. We also estimated the two intraclass correlation coefficients (ICCs) relevant to this reproducibility study with two measurements on every subject. Following the notation by Shrout and Fleiss [17],

The first number refers to one of three cases of random and fixed effects models used as examples in their paper. The second number indicates if the reliability is assessed for one single measurement, as in our case, or the mean of several measurements. The ICCs are based on variance decomposition, where BMS is the between-person mean square, WMS is the within-person mean square, and EMS is the residual mean square for the respective models. *ICC*(*1*, *1*) is a measure of the absolute agreement between the measurements, whereas *ICC*(*3*, *1*) should be interpreted in terms of consistency. This is because *ICC*(*3*, *1*) treats the variance between the two measurements as a fixed effect that does not contribute to the WMS.

To estimate the effects of measurement error in dietary intake on disease risk, we demonstrate the method of regression calibration using alcohol intake and reported high blood pressure in the questionnaire as an example. The idea behind regression calibration is to predict the true intake for each subject in the study, and to include the predicted value in a standard analysis to get corrected estimates. Alcohol was assumed to be measured with random, additive error, which was estimated from the test-retest replicates. Based on a linear calibration function for replicate data [18] the calibrated mean alcohol intake for each subject,
, can be calculated as
, where
is the grand mean of all bservations,
is the mean of the replicate measurements for each person, and *λ* is the reliability coefficient *ICC* (*1*, *2*) [17]. Alcohol (g/day) was then included as a continuous variable in a logistic regression model for high blood pressure (yes/no). Odds ratio (OR) estimates and 95% CIs were compared for the test, the retest, the test-retest mean, and the calibrated mean for 1 g and 10 g increases in alcohol intake. To avoid the influence of measurement errors in covariates we only present the crude estimates. Most analyses were done in SAS 8.2, but the ICCs with 95% CIs were calculated in SPSS 12.0. For the regression calibration we used the *rcal* program in STATA 8.0.