Vous êtes sur la page 1sur 4

Operations Research for Health Care 1 (2012) 30–33

Contents lists available at SciVerse ScienceDirect

Operations Research for Health Care

journal homepage: www.elsevier.com/locate/orhc

Use of OR to design food frequency questionnaires in nutritional epidemiology

J.C. Gerdessen a,∗ , P.M. Slegers b , O.W. Souverein c , J.H.M. de Vries c
Wageningen University, Group Operations Research and Logistics, Hollandseweg 1, 6707 KN Wageningen, The Netherlands
Wageningen University, Systems and Control Group, Bornse Weilanden 9, 6708 WG Wageningen, The Netherlands
Wageningen University, Division of Human Nutrition, Bomenweg 2, 6703 HD Wageningen, The Netherlands

article info abstract

Article history: Nutritional epidemiology, investigating the relationship between diet and disease, often uses food
Received 27 October 2011 frequency questionnaires (FFQs) to assess a population’s habitual dietary intake. An FFQ should include
Accepted 3 March 2012 enough food items (i.e. questions) to capture sufficient information on all nutrients of interest. However,
Available online 19 April 2012
it should not be too long in order to avoid the fatigue of respondents.
Although the procedure of selecting questions is done by an expert, it is neither standardized nor
transparent, and very time consuming. Moreover, it is hard to select questions in such a way that
0–1 knapsack problem
Food frequency questionnaire
all nutrients of interest are sufficiently covered within a relatively short questionnaire. The resulting
Nutrition assessment questionnaire is probably not optimal, e.g. with the same number of questions more information might
Nutritional epidemiology be obtained. We have developed a 0–1 knapsack model to optimize the selection of questions for FFQs
with interest in multiple nutrients. With this FFQ model we generated FFQs with interest in energy and 9
nutrients. We found that the FFQ model can be a valuable tool to optimize FFQs. With the FFQ model the
selection of questions is less time-consuming and more standardized and transparent than in a manual
procedure, and the resulting food lists of FFQs are either shorter or provide more information.
© 2012 Elsevier Ltd. All rights reserved.

1. Background respondents according to their intake as is required for epidemio-

logical purposes.
In nutritional epidemiological studies, factors affecting the On the one hand an FFQ should include enough (questions on)
health of populations are identified in order to develop interven- food items to capture sufficient information on all nutrients of
tions to improve the population’s health [1]. For these studies, it interest. On the other hand an FFQ should be as short as possible,
is necessary to assess long-term food consumption of the popula- because long FFQs may bore respondents and make them less
tion of interest. For this assessment, food frequency questionnaires motivated to fill out an FFQ accurately [1].
(FFQs) are often used [1], because they are simple to administer, Usually the goal of an FFQ is to assess the intake of multiple
suitable to assess habitual long-term dietary intake, and have rela- nutrients. As most food items contribute to the intake of multiple
tively low cost. FFQs consist of a limited number of questions ask- nutrients a shift in the choice of food items will affect the amount
ing respondents to give the frequency of consumption of the foods of information obtained on multiple nutrients. If the length of the
of interest during a predefined period of time, e.g. the past month food list is fixed then increasing the amount of information on one
or year [2]. The basis of an FFQ is a food list for which the most in- nutrient can be at the expense of that of another. This makes it hard
formative food items (i.e. questions) are selected. For a food item to develop short food lists that provide sufficient information for
to be informative it must have three general characteristics [1]. each of the nutrients.
First, it must be used reasonably often by an appreciable number Although the selection procedure is done by experts, it could
of respondents. Second, the food must have a substantial content be made more transparent and less time-consuming. The aim
of the nutrient(s) of interest in each edible portion. Third, the use of this study is to investigate whether Operations Research (OR)
of the food must vary from person to person, thus contributing to methods could be used for the selection of food items based
the between-person variation of intake. This is necessary to rank on the provided information on multiple nutrients while taking
into account the aggregation level and number of selected food
items. To the best of our knowledge, this is the first time that
∗ OR methods are used for the development of FFQs. However,
Corresponding author.
E-mail addresses: joke.vanlemmen@wur.nl (J.C. Gerdessen),
OR methods have been used for other nutritional purposes, e.g.
Ellen.Slegers@wur.nl (P.M. Slegers), Olga.Souverein@wur.nl (O.W. Souverein), to create food plans that best resemble current eating habits
Jeanne.deVries@wur.nl (J.H.M. de Vries). while meeting pre-specified nutrition and cost constraints [3], or
2211-6923/$ – see front matter © 2012 Elsevier Ltd. All rights reserved.
J.C. Gerdessen et al. / Operations Research for Health Care 1 (2012) 30–33 31

It is neither necessary that all food items in the food list belong
to the same level of detail nor that the food list covers all items
or all paths in the food tree. The part of the food list that concerns
fresh fruit could be e.g.:
– Fresh fruit, or
– Citrus and Non-citrus, or
– Non-citrus fruit and Orange and Tangerine and Lemon.
Note that in the third example not all types of fresh fruits are
covered, for example Grapefruit is not included.
The full food tree contains 1697 items, of which 1340 are leaf
node items.

2.2. Mixed Integer Linear Programming (MILP) model

Fig. 1. Simplified and illustrative part of the tree structure that comprises Fresh
In this section we describe an MILP model to optimize the food
fruit. list of FFQs. The basis for the FFQ model is the tree structure that
was presented in Section 2.1.
for identification of nutritionally adequate mixtures of vegetable The quality of a food list with respect to nutrient n can be
oils [4]. In epidemiology OR methods have been used to support quantified as the explained variance R2n [1,6], which is not a linear
cost-effective hepatitis B interventions in the United States and function of the items in the food list. Therefore we use a heuristic
China [5]. measure qi,n in the MILP model, and evaluate the generated food
We have developed a mixed integer linear programming (MILP) lists by calculating the R2n for all nutrients n.
model to generate food lists for an FFQ targeted for multiple We first define the index sets, parameters and variables that are
nutrients. The generated food lists were compared with an actual used in the FFQ model.
example of an FFQ. Index sets
N set of all nutrients
2. Material and methods I set of all food items, i.e. all potential questions in the FFQ
L set of all leaves
In this section we first describe how food items are organized in Pi set of all predecessors of food item i (i ∈ L).
a food tree. Then we present a MILP model that generates food lists Parameters
for FFQs that are targeted for multiple nutrients. We also specify
the data that have been used for the experiments in Section 3. qi,n heuristic measure for the amount of information regarding
nutrient n that can be obtained by including the question on
food item i in the food list
2.1. Development of a food list: food tree b budget, i.e. upper bound on the number of food items in the
The basis of every food list is a tree structure in which all
potential food items – foods and food groups – are ordered. The Decision variables
tree structure used has been developed by Dutch experts on dietary 1 if the question on food item

assessment [6]. Fig. 1 shows an illustrative and simplified part of Xi = i is included in the food list
the tree structure that comprises Fresh fruit. 0 otherwise.
Level-4 contains the items that can be found as ‘‘food codes’’
The FFQ model is defined as follows:
in a food composition table. Based on similarities in consumption,
portion sizes and nutrient content these detailed food items are Maximize qi,n Xi (1)
aggregated into increasingly broad groups of food items in levels i∈I n∈N
3, 2, and 1 [7]. This process of aggregation causes a loss of Subject to
information, e.g. a food list that asks for Fresh fruit will provide 
less detailed information than a food list that asks for Citrus and Xi ≤ b (2)
Non-citrus. i∈I
Generally, more details will lead to obtaining more information 
Xi + Xj ≤ 1 i ∈ L (3)
regarding nutrient intake. This indicates that it is beneficial to
include food items from the less aggregated groups in the food list,
since they capture more information on nutrient intake. Xi is binary (i ∈ I ).
In the tree structure several paths can be seen, all starting at Constraint (2) ensures that the total number of items in the food
level-1 and ending at level-4, e.g. the path Fresh fruit – Citrus list does not exceed the budget b. Constraint set (3) ensures that for
– Orange and the path Fresh fruit – Non-citrus – Soft fruit every path in the tree at most one food item is included in the food
– Cherry. Each path consists of nodes, which represent the food list.
items. The final node in a path is called the leaf node. The leaf nodes The decision problem of selecting food items for the food list is
of the aforementioned paths are Orange and Cherry respectively. a 0-1 knapsack problem. In a 0-1 knapsack problem a hitchhiker
Nodes on the same path as leaf i are called predecessors of leaf i. The wants to fill up his knapsack by selecting from among various
predecessors of Cherry are Soft fruit, Non-citrus, Fresh fruit. possible objects those which will give him maximum comfort,
Orange has two predecessors: Citrus and Fresh Fruit. To prevent while the total set of objects fits in his knapsack [8]. In the FFQ
overlap no more than one item of every path can be included in the problem a nutritionist (the hitchhiker) wants to fill a food list
food list, e.g. if a question is included on Orange then no question (the knapsack) by selecting from among various possible food
can be included on its predecessors Citrus and Fresh fruit. items (the objects) those which will give maximum information
32 J.C. Gerdessen et al. / Operations Research for Health Care 1 (2012) 30–33

(comfort), while the total number of selected food items stays

within a predefined budget (the volume of the knapsack). The
0-1 knapsack problem is NP-hard [8], which has the practical
implication that we do not have a guarantee that an instance of
real-life size can be solved in reasonable time. However, it turned
out that even for the largest instance of the FFQ problem the global
optimal solution was found within 6 s (see Section 3).

2.3. Data

The parameter values qi,n were obtained from food consump-

tion data of the Dutch National Food Consumption Survey of
1997/1998 of the age group of 25 up to 64 years of age [9]. The
consumption data were converted into the parameter values on
energy (n = 1) and nine nutrients (n = 2, 3, . . . , 10; repre-
senting total protein, total fat, saturated fat, monounsaturated fat,
polyunsaturated fat, total carbohydrates, mono and disaccharides, Fig. 2. Effect of budget b on explained variance R2n .
dietary fiber, and potassium) with the use of the food composition
database of 1996 [10]. The percentage that a food item (i) con-
tributes to the levels of the intake of energy and each nutrient (n)
of this population represents the amount of information qi,n of an
item i ∈ L. For example, if in a population the average total intake of
dietary fiber is 30 g/day, and the average intake of dietary fiber via
fresh fruit is 3 g/day then qfreshfruit,dietary fiber = 3/30 = 0.10. To ac-
count for loss of information by aggregation for items i ̸∈ L the qi,n
was calculated as 90% of the information in its constituent items
(i.e. the items that have i as immediate predecessor). For example
qsoftfruit,n = 0.90 (qstrawberry,n + qcherry,n + qraspberry,n + qblueberry,n ) and
qnoncitrus,n = 0.90 (qapple,n + qpear,n + qbanana,n + qkiwi,n + qsoftfruit,n ).
The value of 90% was set after experimenting with several values
for the loss of information by aggregation. For all generated food
lists all R2n were calculated. It turned out that in general the value
of 90% showed good performance in terms of R2n .

2.4. Comparison with the ValNed food frequency questionnaire

Fig. 3. Effect of budget b on the absolute and relative number of leaf node items.
We compared the length of generated food lists and their R2n
with those of an actual FFQ, the so-called ValNed questionnaire [7]. should at least be 80% then the length of the food list needs to be
This questionnaire was developed for the same nutrients, and with doubled to 60 items.
the same data source as the food lists of the MILP model. For The graph shows the tradeoff between budget b and R2n , thus
constructing the food list of ValNed, the minimum set of food items providing important information for the nutritionist. A graph
was selected that yielded at least 80% of the variance in intake of like this helps the nutritionist to weigh the amount of added
each of the nutrients of interest using the second moment of the information against the number of extra questions needed. It
nutrient intake distribution [6,11], and at least 70% of the level of provides objective information that can help the nutritionist to
nutrient intake of the total population. No additional adaptations make a judgment about whether the extra information justifies
to improve the questionnaire, such as adding extra food items of the additional burden of answering more questions for the
which an expert assumes they will improve the food list, were respondents.
made. After compilation, the ValNed food list consisted of 117 Fig. 2 also shows the lowest R2n and the average R2n of ValNed
items. (117 items). Both are lower than those of the food lists generated
with the MILP model.
3. Results Fig. 3 shows the impact of the budget b on the type of food items
selected for the FFQ. Both the absolute and the relative number
The FFQ model was used to generate food lists for an FFQ with of leaf node items grow with growing budget b, because a larger
interest in energy and nine nutrients. Food lists were generated budget allows selection of more detailed food items, and thus the
for budget b = 10, 20, . . . , 200 items. Standard optimization selection of relatively many leaf node items. ValNed uses fewer leaf
software (Xpress-Mosel 7.0.1) was used to obtain the global node items than the MILP-generated food lists.
optimal solution.
For all generated food lists the R2n was calculated for all nutrients 4. Discussion
n. Fig. 2 shows how R2n increases with growing budget b. It can
be seen that a budget of 30 food items suffices to assure that the This paper presents a starting point for OR-supported devel-
lowest R2n is 68% and the average R2n is 79%. Doubling the budget to opment of food lists for FFQs. The generated food lists seem to
60 questions increases the lowest R2n to 82% and the average R2n to have good performance in terms of R2n and the number of food
87%. In other words: if the nutritionist judges that for each nutrient items used. The lists generated by the presented MILP model
the R2n should at least be 68% then a food list of at least 30 items is can be shortened by adding nutrient specific lower bound con-
needed. If the nutritionist decides that for each nutrient n the R2n straints, which is an interesting topic for future research. In order to
J.C. Gerdessen et al. / Operations Research for Health Care 1 (2012) 30–33 33

compose the questions for the final FFQ it may be necessary to add Acknowledgments
a few food items manually, for example if the model has excluded
or aggregated items through which respondents may encounter The authors thank Marja Molag and Saskia Meyboom for their
problems in filling out the FFQ. assistance in data processing.
OR models appear to be suitable to optimize the selection of
food items for the food list of a food frequency questionnaire. References
They may allow improvement of assessment of dietary intake
by FFQs in epidemiological studies. The developed FFQ model [1] W. Willett, Nutritional Epidemiolgy, Oxford University Press, New York, 1998.
[2] J. Cade, R. Thompson, V. Burley, D. Warm, Development, validation and utili-
allows nutritionists to objectively choose the most informative sation of food-frequency questionnaires—a review, Public Health Nutrition 5
combination of food items from different levels of aggregation (2002) 567–587.
within a pre-defined number of food items. Also, the nutritionist is [3] G. Masset, P. Monsivais, M. Maillot, N. Darmon, A. Drewnowski, Diet
optimization methods can help translate dietary guidelines into a cancer
able to generate food lists of various lengths and to see how much prevention food plan, J. Nutr. 139 (2009) 1541–1548.
extra information can be obtained by adding more or other items [4] N. Darmon, M. Darmon, E. Ferguson, Identification of nutritionally adequate
to the food list. This helps the nutritionist to weigh the amount of mixtures of vegetable oils by linear programming, J. Hum. Nutr. Diet. 19 (2006)
obtained information against the burden for respondents.
[5] D.W. Hutton, M.L. Brandeau, S.K. So, Doing Good with Good OR: supporting
The major advantages are that FFQs can be generated in cost-effective hepatitis B interventions, Interfaces 41 (2011) 289–300.
a faster, more standardized, and more transparent way and [6] M.L. Molag, J.H.M. De Vries, N. Duif, M.C. Ocké, P.C. Dagnelie, R.A. Goldbohm,
that FFQs are either shorter or provide more information than P. Van’T Veer, Selecting informative food items for compiling food-frequency
questionnaires: comparison of procedures, Br. J. Nutr. 104 (2010) 446–456.
manually generated FFQs with the same number of food items. The [7] M.L. Molag, Towards transparent development of food frequency question-
procedure is highly reproducible, which makes food consumption naires, in: Human Nutrition and Epidemiology, Wageningen University, Wa-
data from different surveys better comparable. Based on the geningen, The Netherlands, 2010.
[8] S. Martello, P. Toth, Knapsack Problems: Algorithms and Computer Implemen-
provided information by the model and the objectives of the tations, Wiley, Chichester, 1990.
nutrition survey, the nutritionist can select food items for multiple [9] The Dutch Nutrition Centre, Zo eet Nederland: Resultaten van de Voedselcon-
nutrients more objectively. In addition, the procedure can be sumptiepeiling 1997–1998, Results of the Dutch Food Consumption Survey
1997/1998, Voedingscentrum, Den Haag, 1998 (in Dutch).
included in a computer system. [10] NEVO, Nederlands Voedingsmiddelentabel (Dutch Food Composition Table),
In future other types of heuristic measures for the information De Commissie Nederlandse Voedingsmiddelentabel van de Voedingsraad, Den
content qi,n of food items can be investigated. Another interesting Haag, 1996 (in Dutch).
[11] S.D. Mark, D.G. Thomas, A. Decarli, Measurement of exposure to nutrients: an
issue is the tradeoff between the quality of the FFQ and the number approach to the selection of informative foods, Am. J. Epidemiol. 143 (1996)
of respondents that need to complete the survey. 514–521.

Vous aimerez peut-être aussi