Académique Documents
Professionnel Documents
Culture Documents
I. Introduction
Mapping of the human genome and the subsequent advances in molecular biology
forever changed epidemiologic research on disease etiology. (refer to figure 14-1 for a
photograph of the deoxyribonucleic acid [DNA] helix). Gone are the days when
measurements of exposure are limited to simple interview data, mailed questinnaires,
inspection of secondary data, and surrogate measures of the primary exposure of interest.
The value of descriptive epidemiology and disease monitoring and surveillance remain
important application of epidemiology. However, modern epidemiologists find
themselves armed with several new strategies to asses precursors of disease, identify
biologic markers of exposure, and search for the biologic bases for responses. The wide
differential in human responses to the same environmental exposure is an intriguing issue
for epidemiology that may be explored by using these advanced techniques.
The traditional epidemiologic approach characterized by examination of the
distribution of health conditions in populations and discerning risk factors for them has
proved useful for generating hypotheses and unraveling disease etiologies. However,
suppose that it were possible to go beyond these methods and look inside the black box
of disease processes. If this black box were to become transparent, epidemiologist would
be able to change the definition of risk factors or clarify their location in a causal model.
This chapter present an overview of fundamental principles of molecular and
genetic epidemiology. For readers without a strong background in biology, we begin with
a review of basic principles of human genetics (Exhibit 14-1). Next, we will define the
terms genetic epidemiology and molecular epidemiologic approaches to identify genetic
components of disease, provide an overview of some strategies to identify genes in
studies of families, and present a description and recent result from genome-wide
association studies (GWAS). The chapter includes
Genetic Epidemiology
The field of genetic epidemiology is devoted to the identification of inherited
factors that influence disease, and how variation in the genetic material interacts with
environmental factors to increase (or decrease) risk of disease. The first textbook on the
subject defines it as a "discipline that seeks w unravel the role of genetic factors and their
interactions with environmental factors in the etiology of diseases, using family and
population study approaches. An important premise is that a better understanding of the
genetic etiology of disease can facilitate early detection in high-risk subjects and the
design of more effective intervention strategies. The unifying theme of genetic
epidemiology is the locus on genes and evidence for genetic influences. Note that to
answer questions two through four, families for at least pairs of relatives) were
historically required. The approach of using related persons is quite different from
traditional epidemiology, subjects are biologically related, by under study are
independent. When study subject are biologically related, by definition they are no longer
independent. Lack of independence necessitates special rules in selection of subjects and
analytic approaches. The epidemiologic approach to identify genetic factors that
influence disease does not require prior knowledge about the pathophysiologic process
that underlies the inherited susceptibility. Rather, given that the clustering of a disease, in
families is not due to shared environment, and is consistent with Mendelian transmission
of a major gene, the goal is to identify regions of DNA that cosegregate (are inherited in
the same pattern) with the disease of interest. Once the chromosomal region is narrowly
defined, the torch is passed to molecular geneticists to identify the appropriate gene using
highly specialized techniques. This approach, called positional cloning or physical
mapping, contrasts with a more traditional laboratory approach called functional cloning
or functional mapping. The latter does not require epidemiology or families and is based
instead on identification of proteins that are involved in a disease process. Once a protein
has been identified, scientists can determine its amino acid sequence. Working backward,
the researcher is able to decipher the DNA code for the sequence of amino acids. Finally,
the investigator finds where this DNA sequence occurs in the human genome. Note that
physical mapping has historically only been applied when the genetic influence on
disease is great enough that there will be a Mendelian pattern of disease in the family.
These contrasts in approaches are depicted in Figure 14-2, adapted from a review on the
subject by Dr. Francis Collins.' Collins led the Human Genome Project, an ambitious
public/private venture to determine the full sequence of the roughly 3 billion nucleotides
in our DNA, and served as Director of the National Human Genome Research institute.
GWAS have clearly demonstrated that they can identify susceptibility loci with modest
effect sizes. Drilling down to find the actual gene in a region of interest has traditionally
been done in families. However, with advances in genotyping abilities, the ability to
carefully refine the causal genetic variant is now possible through the study of unrelated
individuals. Thus, the once clear lines between molecular and genetic epidemiology are
beginning to blur somewhat.
Molecular Epidemiology
Basically, a greater precision in estimating exposure disease associations can be
made by using molecular biology to improve the measurement of exposures and disease.
The term molecular epidemiology has been attributed to researchers Perera and
Weinstein. Molecular epidemiology has the possibility of providing early warnings for
disease by flagging preclinical effects of exposure. The field is much broader than genetic
epidemiology and includes a wide variety of biologic measures of exposure and disease.
As it relates to her research on the causes of cancer, Perera noted that molecular
epidemiology combines "advances in the molecular biology and molecular genetics of
cancer with epidemiology to understand the molecular dose of specific agents, their
preclinical effects, and the biologic factors that modulate susceptibility to their exposure.
Many definitions of molecular epidemiology include the concept of biomarkers.1
Consider the following examples:
1. Rather than rely on individual recall of a usual diet to classify individuals according
to intake of fruits and vegetables, assess serum levels of micro-nutrients to obtain
more precise measurements of intake of fruits and vegetables.
2. Rather than conduct a clinical trial with colon cancer as the end point, use an
intermediate marker (an accepted precursor lesion: the adenomatous polyp).
3. Rather than treat all cases of breast cancer as the same disease, use tumor markers to
identify potentially more heterogeneous subsets.
4. In trying to identify whether clusters of cases of infectious disease are from a
common source, characterize the agents according to their DNA fingerprint.
From these examples, the reader should notice that the markers (exposures) (e.g.,
blood, tissue, urine, and s are based on biologic specimens sputum) rather than
questionnaire or medical records data. As stated earlier, the terms molecular
epidemiology and genetic epidemiology are often used interchangeably. One reason is
that molecular epidemiology commonly measures inherited variation in DNA (as opposed
to acquired variation somatic mutations in our DNA) to classify subjects. Thus, when
genes are involved, there is an overlap between molecular and genetic epidemiology. One
distinction between the two is that molecular epidemiology does not involve studies of
biologically related individuals. Another distinction is that most molecular epidemiologic
studies are conducted to evaluate the significance of variation in genes that would not
necessarily manifest as Mendelian patterns of disease in a family. As molecular biologists
and molecular geneticists work to unravel disease processes, they discover various
proteins that are involved. Genes determine these proteins; if individuals differ from one
another in the genetic sequence of a protein that is functionally involved in the disease
process, then evaluation of this genetic variation in epidemiologic studies could yield
important insights into disease etiology.
Molecular Versus Genetic Epidemiology
Genetic epidemiology: concerned with inherited factors that influence risk of disease
Molecular epidemiology: uses molecular markers (in addition to genes) to establish
exposure-disease associations.
Bad Luck
The first of the explanations for explanations for the familial aggregation of disease
is simply chance. Given that there is a finite probability for the development of a
particular disease or adverse health phenomenon, even in the absence of any genetic la,
contribution at all, the disease may afflict several members of the same family. This
occurrence is especially prevalent for the common diseases of major public health
importance, such as obesity, mental illness, heart disease, and cancer. For example, an
early study of the aggregation of cancer used cancer mortality rates for the adult British
population to show that hail of all families with more than five adults would have at least
one case of cancer by chance alone. This is conceptually similar to the concept of
clustering, but with the grouping based on family rather than space or time.
Two other factors not necessarily a simple reflection of "bad luck" would affect the
likelihood that someone reports a positive family history: age and family size. Although
every person has at least two first-degree biologic relatives (our parents), the person's
numbers of siblings, aunts, uncles, cousins, and children are clearly random variables.
For example, someone who has 15 close relatives at risk for a common chronic disease is
more likely to have a relative with the disease than a person with only two relatives at
risk. Similarly, because age is the single most important risk factor for many diseases of
public health importance, an older person (with older relatives) is more likely than a
younger person (with corresponding younger relatives) to have a family history for non
genetic reasons. Consequently, adjustment for age and family size in the analysis is
encouraged when one is trying to assess the association of family history with risk of a
particular disease.
Bad Environment
Epidemiologists historically have devoted their energies and attention to the
identification of non genetic risk factors for disease. As illustrated throughout the other
chapters of this book, history is replete with many success stories. This information on
exposure disease relationships must be considered as explanation for any observed
clustering of disease in families. For example, although roughly 95% of all lung cancer
cases are current or former smokers, fewer than 20% of heavy smokers ever develop the
disease. This observation has caused some to hypothesize that host factors might
influence response to environ mental agents (tobacco). A complicating factor in the study
of a disease with such a strong environmental risk factor is the extent to which family
members also smoke cigarettes. Twin studies conducted in the 1930s provided strong
evidence that smoking habits clustered in families. Therefore, if family members of cases
were more likely to smoke than family members of controls, a greater proportion of cases
would be expected to report a positive family history of lung cancer. In this situation,
however, clustering of cases in families could Ix: due to shared lifestyle, rather than
shared genes.
Studies of migrants from low-risk to high-risk countries clearly support the notion
that aspects of diet are associated with coronary heart disease and cancer. Diets low in
complex carbohydrates and fruits and vegetables are associated with diabetes and cancer.
Several studies suggest the familial nature of dietary intake patterm.I61 7 To the extent
that family members share dietary habits that increase risk for a given disease, familial
clustering of disease may occur.
Similarly, health-related behaviors, such as exercise practices, alcohol intake, and
use of sunscreen, may be learned within a family and indirectly relate to clustering of
disease. Exercise levels are related to avoidance of overweight, pro-motion of bone
strength, and cardiovascular health. Moderate alcohol consumption, especially intake of
red wine, is thought to be protective for the heart; however, overconsumption is
associated with adverse health outcomes and car crashes among drivers who are driving
under the influence. Finally, avoidance of excessive sun exposure reduces the occurrence
of many forms of skin cancer. The foregoing are examples of some of the many health-
related practices that may be transmitted within the family environment. Many risk
factors shared by family members are a reflection of shared environment, such as water
supply, radon from the soil, air quality, pesticides, lead paints, and even occupation. A
case report of a family with four members with mesothelioma, a rare cancer of the lining
of the peritoneal cavity, was traced back to a common occupational exposure to asbestos.
Infectious diseases also may be included in this category, and there are a number of
published examples in which familial clustering of hepatitis or tuberculosis occurs from
common exposures.
Proban the individual in a family who brings a disease of interest to the attention of the
investigator.
In a case the proband likely to be a person affected with the dis-ease (althow4h one
can select families on the basis of an unusual family history among healthy study
participants). The proband in a control family is the subject matched to the case. The
definition of family also must be clearly stated: Three generation families (or more) are
most valuable for elucidation of genetic mechanisms, but this comes at the expense of
less complete data and greater difficulty validating medical histories and collecting
biological specimens. For diseases with late ages at onset, parents might be deceased and
offspring will not be informative, owing to their youth. Therefore, some study designs
include only siblings of probands as families.
As Opposed to Most Types of Epidemiologic Study Designs, Two Steps (Rather Than
Only One) Are Required to Establish the Sampling Frame for a Case Control Family
Study:
Step 1: Ascertainment of cases (probands) and controls
Step 2: Enumeration of the relatives of the probands and controls
The purpose of control families derives from the need to be able to evaluate
whether familial clustering among cases is greater than can be expected. This
consideration is especially important when there are no population-based rates to
calculate the expected number of disease events in case families. Control families also are
needed to be able to rule out common familial (measured) exposure as an explanation for
any observed familial aggregation. Control families must be as similar as possible to the
case families for all other (unmeasured) environmental exposures in order to determine
properly whether or not a disease or trait truly has a genetic component (Exhibit 14-2).
Control families can be identified from several possible sources. One would be the same
disease registry, clinic, or hospital from which the probands were drawn, but with
sampling based on a different disease. Another choice is random selection from the
general population (e.g., neighborhood controls). Relatives of the proband's spouse have
been used as controls since the beginning of the 20th century. This clever approach
"matches" families on unmeasured lifestyle, economic status, education, and even
religious influences, because likes tend to marry likes. Although the primary focus of this
section has been on the selection of families through a proband with a disease of interest,
the strategies may be considered. For example, a random sample of families from a cross-
sectional study might be considered if one is interested in the genetic epidemiology of a
common biologic trait or process, such as steroid hormone levels or eating behaviors.
Modes of Inheritance
The mode of inheritance refers to the association between the number of mutated
alleles at a locus and a given phenotype (disease or trait) and whether or not the locus is
situated on one of the 22 autosomes or on a sex chromosome. Consider the simple
situation in which there are only two possible alleles at a locus, B and b. Furthermore,
suppose that allele B is associated with disease. Because humans have two alleles at a
locus, the three possible genotypes are therefore BB, Bb, or bb. An individual's genotype
is determined by the genotypes of his or her parents and the particular two alleles that
were passed on at meiosis. According to the principles of Mendelian inheritance, the
alleles transmitted from parents to offspring are randomly determined. Consequently, a
parent whose genotype is Bb will transmit, on average, the B allele 50% of the time and
the b allele the other 50%. Parents who are either BB or bb are capable of transmitting
only the B and b alleles, respectively.
Another way of stating Mendelian transmission is in terms of the probability that the
deleterious gene is passed from a parent to child. The expectations that. BB, Bb, and bb
parents transmit a B allele are 1.0, 0.5, and 0.0, respectively. Figure 14-4 illustrates a
Punnett square, named after die English geneticist Reginald C. Punnett, who devised the
approach to determine the probability of an offsprings having a particular genotype. The
Punnett square is a tabular summary of every possible combination of one maternal allele
with one paternal allele for each gene being studied and thus is a visual representation of
Mendelian inheritance.
Some diseases are referred to as autosomal dominant or recessive. The term
autosomal dominant refers to the situation in which only a single copy of an altered gene
located on a non-sex chromosome is sufficient to cause an increased risk of disease. In
these situations, one typically expects to see an affected parent and, on average, half of
his or her offspring affected with the same disease or condition. Autosomal recessive
diseases denote those for which two copies of an altered gene are required to increase risk
of disease. Carriers, individuals who have only one copy of the altered gene
(heterozygotes), are typically not thought to be at increased risk of the disease. Matings
of two carriers will produce, on average, affected children one-fourth of the time.
Traits are said to be genetically additive or codominant when heterozygotes have a
phenotype that is distinguishable from the two homozygous states. Codominance may
occur for both quantitative and qualitative traits. Codominance is easy to imagine for a
quantitative trait such as blood pressure or enzyme activity that facilitates categorizing
individuals into one of three levels (say high, medium, or low enzyme activity). For a
qualitative trait (e.g., presence absence of a type of cancer), the effect may be on the age
at onset of disease. Homozygous carriers would have an earlier mean age at onset than
heterozygotes, who in turn have an earlier mean age at onset than noncarriers of an
altered allele. Although disease-predisposing genes may be carried on the X or the Y
chromo-some, there are very few known examples of the latter. X-linked traits are passed
from women to sons or daughters, but from men only to daughters (because fathers pass
their Y chromosome to sons). X-linked recessive traits affect men much more frequently
than women, because men need only have the gene on their one X chromosome mutated
to be in the recessive state at that locus.
Groopman et a.1.111 pointed out that epidemiologists are concerned with the
potential associations between exposure variables and health effects. Before the advent of
molecular epidemiology, it was not feasible to quantify accurately the relationships
among exposure, dose, and health outcomes. Molecular biomarkers may be used to
denote exposures, outcomes, or vulnerability to disease because they are more sensitive
indicators than other measures of exposure. An example of the use of a biomarker in
environmental health studies is the p53 gene in liver tumors. The p53 gene is thought to
suppress tumor formation. Researchers in China examined the association between
mutations in the gene and exposure to the hepatitis B virus and aflatoxin. They found that
half of a small sample of liver cancer patients indeed had acquired mutations in this gene
in their tumor. This research and related studies suggest that hazardous exposures may
induce genetic changes that in turn are linked to cancer or other diseases.
Tailored Interventions
Many epidemiologists traditionally have ignored the possibility that individuals
respond differently to the challenges in the environment. Genetic epidemiology is
predicated upon the notion first stated by Galen in 200 AD: out remember throughout that
no cause is efficient without a predisposition of the body itself. Otherwise, external
causes which affect one would affect all." Indications that individuals differ in response
to an intervention have been around for decades. For example, Garrod, the founder of
human biochemical genetics, and Haldane, the great British geneticist, had both
suggested that biochemical individuality might explain unusual reactions to drugs and
foods. We now know that the enzymes involved in these responses are genetically
determined, and many are polymorphic. Indeed, this area (known as pharmacogenetics) is
seeing a resurgence as pharmaceutical companies race to determine ways to tailor
prescriptions and doses to patients for therapeutic (and ultimately chemopreventive)
purposes.
Continuing with our example of colorectal cancer, the epidemiology sug-gests
several dietary strategies for risk reduction. For example, diets rich in fruits, vegetables,
Fiber, and calcium are associated with lower risk for the disease. Conversely, Wets high
in meat and fat are positively related to risk. It is tempting to speculate that dietary
associations observed at the population level would translate to effective risk-reduction
strategies for individuals at high risk for familial or genetic reasons. However, if the
biology is different for hereditary and nonhereditary colorectal cancer, then such an
approach may not be prudent. For example, a clinical trial of calcium among 30 members
of families with hereditary nonpolyposis colorectal cancer found no significant effect of
calcium on cell proliferation in the colon. Sellers et al., utilizing the Iowa Women's
Health Study cohort, observed that high intakes of calcium were associated with 50%
decreased risk of colon cancer (95% CI, 0.3-0.7) among women without, but not with
(relative risk, 1.2; CI, 0.6-2.2), a family history of the disease. In addition, high total
intakes of vitamin E had been previously associated with lower risk in this cohort. High
total vitamin E intakes were observed to decrease risk of colon cancer by 33% among
women without a family history (upper vs. lower tertile), but only a non-statistically
significant 10% among the family history-positive subset. These results suggest that care
should be exercised before implementing a risk reduction intervention among subjects at
elevated risk for disease based on their family history.
XI. Conclusion
This chapter covered the topics of genetic and molecular epidemiology. These fields
are truly on the cutting edge of epidemiology as a result of their promise to provide
insights into the black box of disease etiology. We reviewed concepts md applications of
the field in diverse areas from infectious diseases to occupational health to the
epidemiology of cancer and other chronic diseases. Progress in the field will require
training a new generation of scientists with requisite skills, as well as greater
collaboration and interdisciplinary work among scientists with laboratory skills and those
versed in genetics who have training in analytic epidemiology. Scientific discoveries
utilizing these methods will not only improve our understanding of disease etiology, but
may lead to better and more tailored approaches to screening for disease and primary and
secondary prevention.