Vous êtes sur la page 1sur 47

Beyond Gene Discovery

Workshop:
Developing the Genome Profile of the U.S.
Population and Assessing the Role of Genetic
Variation in Health and Disease

Beyond Gene
Discovery March 3, 2008
8:30 a.m. to 5:00 p.m.

Distance Learning Auditorium


Tom Harkin Global Communications Center
Centers for Disease Control and Prevention
1600 Clifton Road NE
Atlanta GA 30333
BEYOND GENE DISCOVERY WORKSHOP
Developing the Genome Profile of the US Population and Assessing the
Role of Genetic Variation in Health and Disease

March 3, 2008

Purpose of the workshop


To review CDC plans to launch a far-reaching initiative to assess the genome profile of
the U.S. population using nationally representative samples and to provide access to
research datasets for investigators interested in genotype-phenotype analyses.

The workshop will convene CDC programs, federal partners, academia, and the private
sector to review the Beyond Gene Discovery (BGD) plans, discuss analytic issues and
develop solutions regarding models of access to research datasets that maintain
human subjects protections.

Background and Public Health Impact


CDC and the CDC Foundation are launching a new initiative, Beyond Gene Discovery
(BGD), in collaboration with public, private, and academic partners, to assess
population genetic variation in the United States in relation to health and disease
and develop strategies for using genetic information to impact health and eliminate
disparities among population groups.

BGD has the following overarching goals to be accomplished in 3 years:

1. Produce the first comprehensive report of the Genome Profile of the United States
population, using data from the National Health and Nutrition Examination Survey
(NHANES).
2. Support the development of a CDC searchable online information system of
human genome variation: allele, genotype and haplotype frequencies at individual
and multiple genetic loci readily accessible to researchers, healthcare providers and
policy makers.
3. Develop and disseminate a comprehensive agenda for population research that
will help fill the gaps between gene discoveries and health benefits of genomic
information.
4. Enhance informatics and analytic capacity to analyze complex data as well as
develop datasets for access by researchers that link relevant genetic test results and
NHANES interview, examination and laboratory measurements.

BGD will offer an opportunity for an unprecedented look at interactions among


numerous genetic variants, environmental exposures and behavioral factors contained
in the clinical, biochemical and metabolic profiles of a large number of people of a
wide range of ages. The research made possible by BGD will enhance the value of
many ongoing gene discovery studies, helping to translate their findings into new
targets for prevention, diagnosis, and treatment of common diseases.
BEYOND GENE DISCOVERY WORKSHOP
AGENDA
Session 1: Genomics 2008: NHANES as a public health scientific treasure 
Moderator, Julie Gerberding

8:30-8:45 Welcome and charge to the group


Julie Gerberding, CDC Director
8:45-9:15 Keynote speech: Genomics 2008: The era of genome-wide association studies
Francis Collins, Director, National Human Genome Research Institute
9:15-9:45 Public Health Genomics 2008: Beyond Gene Discovery
Muin Khoury, Director, National Office of Public Health Genomics
9:45-10:00 Break

Session 2: NHANES in the genomics era--current practices and future options for Beyond
Gene Discovery
Moderator, Kathleen Toomey

10:00-10:20 CDC’s approach to Beyond Gene Discovery


Nicole Dowling, National Office of Public Health Genomics
10:20-10:50 NHANES: Overview, statutory context for data access, current and proposed new
options for data access
Jennifer Madans, National Center for Health Statistics
10:50-11:00 Q&A

Session 3: Analytic and statistical considerations for NHANES genome-wide association


studies - Co-moderators, Laura Scott & John Witte

11:00-12:00 Discussion of the analytic challenges of NHANES genomic data, in the context of
the proposed access options, and development of a plan to address these
challenges

Issues to include: genotyping quality control; detection of and adjustment for population
stratification; assessment of structural variants; statistical methods for evaluating the relationship
between the variations in genomic structure and function; statistical analysis of genetic
associations, gene-gene and gene-environment interactions; statistical methods appropriate for
analysis of weighted sample survey data.

12:00-1:00 Lunch break


1:00-2:00 Session 3 (continued): Analytic and statistical considerations discussion

Session 4: Policy options to promote access to NHANES genomic data while protecting
privacy and confidentiality - Co-moderators: Ellen Clayton & William Lowrance

2:00-4:00 Discussion of human subjects protections considerations regarding access to


genomic data collected by federal statistical agencies, in the context of the
proposed access options for NHANES genomic data.

Issues to include: identifying and meeting analytic needs while meeting privacy and
confidentiality requirements; the advantages and disadvantages of different access mechanisms;
alternative data access models and options to consider.

4:00-4:30 Summary and action items


Muin Khoury, Director, National Office of Public Health Genomics
Beyond Gene Discovery Workshop
Participant List
Presenters: Session Moderators:

Francis Collins, MD, PhD Julie Gerberding, MD, MPH


National Human Genome Research Institute, NIH Centers for Disease Control and Prevention

Muin Khoury, MD, PhD Kathleen Toomey, MD, MPH


National Office of Public Health Genomics, CDC Coordinating Center for Health Promotion, CDC

Nicole Dowling, PhD Laura Scott, PhD


National Office of Public Health Genomics, CDC University of Michigan

Jennifer Madans, PhD John Witte, PhD


National Center for Health Statistics, CDC University of California, San Francisco

Ellen Clayton, MD, JD


Vanderbilt University

William Lowrance, PhD


Consultant in Health Policy

CDC’s NHANES DNA Bank Advisory Council:


(Not all members will be in attendance.)
Tanja Popovic, MD, PhD Steve Solomon, MD
Office of the Chief Science Officer Coordinating Center for Health Information and
Services
Henry Falk, MD, MPH
Coordinating Center for Environmental Health Ed Sondik, PhD
and Injury Prevention National Center for Health Statistics

Muin Khoury, MD, PhD Kathleen Toomey, MD, MPH


National Office of Public Health Genomics Coordinating Center for Health Promotion

Alison Mawle, PhD Deborah Tress, JD


National Center for Immunization and Office of the Director
Respiratory Diseases

CDC Foundation:

Charles Stokes

Julie Rodgers

Chloe Tonney

Kevin Brady, MPH

John Moore, PhD, RN


Participant List
Invited Discussants:

Shashi Amur, PhD Debra Lappin, JD


Food and Drug Administration B&D Consulting

Terri Beaty, PhD Bryan Luce, PhD, MBA


Johns Hopkins United BioSource Corporation

Mark Bouzyk, PhD Brad Malin, PhD


Emory University Vanderbilt University

Wylie Burke, MD, PhD Joan McGregor, PhD


University of Washington Arizona State University

Mark Daly, PhD Francis McMahon, MD


Harvard University National Institute of Mental Health

Greg Downing, DO, PhD Kathleen Merikangas, PhD


Department of Health and Human Services National Institute of Mental Health

Richard Fabsitz, PhD Thomas Murray, PhD


National Heart, Lung and Blood Institute The Hastings Center

Campbell Gardett Jim Ostell, PhD


Department of Health and Human Services National Library of Medicine

Gerald Gates Laura Rodriguez, PhD


Privacy Consultant National Human Genome Research Institute

Raymond Greenberg, MD, PhD Mark Rothstein, JD


Medical University of South Carolina University of Louisville

Emily Harris, PhD, MPH Charles Rotimi, PhD


National Human Genome Research Institute National Human Genome Research Institute

Cashell Jaquish, PhD James Scanlon


National Heart, Lung and Blood Institute Department of Health and Human Services

Graham Kalton, PhD Margo Schwab, PhD, MA


Westat Office of Management and Budget

Sharon Kardia, PhD Matt Snipp, PhD


University of Michigan Stanford University

Hormuzd Katki, PhD Brian Spear, PhD


National Cancer Institute Abbott Laboratories

Deborah Winn, PhD


National Cancer Institute
Participant List
CDC’s Beyond Gene Discovery Workgroup Members (Not all members will be in attendance.)
Muin Khoury, MD, PhD Rick McCluskey, MD, PhD, MS
National Office of Public Health Genomics Coordinating Office for Terrorism Preparedness
and Emergency Response
Drue Barrett, PhD
Office of the Chief Science Officer Gerry McQuillan, PhD
National Center for Health Statistics
Scott Bowen
National Office of Public Health Genomics Jim Mercy, PhD
National Center for Injury Prevention and Control
Barbara Bowman, PhD
National Center for Chronic Disease Prevention Cynthia Moore, MD
and Health Promotion National Center on Birth Defects and
Developmental Disabilities
Man-Huei Chang, MPH
National Office of Public Health Genomics Pat Mueller, PhD
National Center for Environmental Health
Nicole Dowling, PhD
National Office of Public Health Genomics Peter Meyer, MA, MPH
National Center for Health Statistics
Kenya Ford, JD
Office of the Director Renée Ned, PhD, MS
National Office of Public Health Genomics
Mike Frace, PhD
National Center for Preparedness, Detection, and Marilyn Radke, MD, MPH
Control of Infectious Diseases Office of the Chief Science Officer
Peg Gallagher, PhD Mary Reichler, MD
National Center for Environmental Health National Center for HIV/AIDS, Viral Hepatitis, STD,
and TB Prevention
Susan Hariri, PhD
National Center for HIV/AIDS, Viral Hepatitis, STD, Eric Sampson, PhD
and TB Prevention National Center for Environmental Health
Robin Ikeda, MD, MPH Christopher Sanders
National Center for Injury Prevention and Control National Center for Health Statistics
Cliff Johnson, MPH Tom Savel, MD
National Center for Health Statistics National Center for Public Health Informatics
Kathi Kellar, PhD, MS Ed Sondik, PhD
National Center for Preparedness, Detection, and National Center for Health Statistics
Control of Infectious Diseases
Karen Steinberg, PhD
Ed Kilbourne, MD Coordinating Center for Health Promotion
Martin, Blanck & Associates
Deborah Tress, JD
Katherine Kolor, PhD, MS, CGC Office of the Director
National Office of Public Health Genomics
Venkatachalam Udhayakumar, PhD
Mechele Lynch, MA National Center for Zoonotic, Vector-Borne, and
National Office of Public Health Genomics Enteric Diseases
Jennifer Madans, PhD Ajay Yesupriya, MPH
National Center for Health Statistics McKing Consulting
Alison Mawle, PhD Al Zarate, PhD
National Center for Immunization and National Center for Health Statistics
Respiratory Diseases
Lyna Zhang, MD, PhD
National Office of Public Health Genomics
Beyond Gene Discovery
Data Release Options & Analytic
Considerations:

Promoting Data Access while


Maintaining Human Subjects Protections

Prepared by:
Beyond Gene The Data Access Subgroup of
Discovery CDC’s Beyond Gene Discovery Working
Group

Please do not distribute or cite this briefing document.


Prepared for the Beyond Gene Discovery Workshop, March 3, 2008.
Table of Contents

1.0 Beyond Gene Discovery................................................................................................ 1


2.0 Statutory Context for Access to NHANES Data..................................................... 2
3.0 Analytic and Statistical Considerations for NHANES Genomic Data............. 2
4.0 Current Access Options and Future Considerations for NHANES Genomic
Data...................................................................................................................................... 3
5.0 Proposed Data Access Option #1 – Reengineered email-based remote
access system.................................................................................................................... 5
6.0 Proposed Data Access Option #2 – Designated Agent Agreements.............. 6
7.0 Proposed Data Access Option #3 – Additional Research Data Centers......... 8
8.0 Proposed Data Access Option #4 – NHANES Informed Consent Changes... 10

Appendices................................................................................................................................. 13
Appendix A. The Beyond Gene Discovery Initiative................................................... 14
Appendix B. The National Health and Nutrition Examination Survey.................. 19
Appendix C. Statutory and Policy Considerations Related to the Release of and
Access to NHANES Data Including Genetic Information................................... 24
Appendix D. Considerations Related to Re-consent and Changes to Future
Informed Consent in Order to Achieve Broader Access to NHANES Genetic
Data...................................................................................................................................... 33
Appendix E. NIH Database of Genotypes and Phenotypes (dbGaP).................... 35
Appendix F. The Data Access Subgroup of CDC’s Beyond Gene Discovery
Working Group—Membership List.......................................................................... 38

Please do not distribute or cite this briefing document.


Prepared for the Beyond Gene Discovery Workshop, March 3, 2008.
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

1.0 Beyond Gene Discovery

With the completion of the Human Genome Project and the availability of technologies
to measure human genetic variation on a genome-wide scale, the Centers for Disease
Control and Prevention (CDC) and the CDC Foundation are launching the Beyond
Gene Discovery (BGD) initiative, in collaboration with public, private, and academic
partners. BGD will assess population genetic variation in the United States in relation to
health and disease and develop strategies for using genetic information to impact health
and eliminate health disparities among population groups.

The National Health and Nutrition Examination Survey (NHANES), a major program of
the National Center for Health Statistics (NCHS), provides a unique national resource
for investigating the effects of genetic variation on health and will serve as the initial
focus of BGD. Nationally representative probability samples from two NHANES data
collections include approximately 15,000 persons (about 7,000 participants from
NHANES III and 8,000 participants from NHANES 1999-2002), with oversampling of the
two largest race/ethnic minority groups, non-Hispanic blacks and Mexican Americans,
along with other subgroups of the population.

Information on multiple aspects of health obtained through interviews, laboratory tests


and direct examinations is available for the NHANES participants. Biological specimens
are also available, from which DNA has been banked by the Division of Laboratory
Sciences, National Center for Environmental Health. The banked DNA allows for the
measurement of the over one million genetic variants identifiable using currently
available technologies. BGD is the first large-scale effort in the United States to support
comprehensive population-based prevalence estimates for single nucleotide
polymorphisms (SNPs) and other genetic variants, thus creating an invaluable reference
to which results found in other studies can be compared. BGD will also facilitate the
validation and identification of the associations among variations in genotype,
phenotype, and environmental risk factors in a representative sample of the population,
laying the groundwork for understanding the relationship between human genome
variation and health status.

The success of the BGD initiative relies on the development of new and enhanced data
access procedures that will facilitate state-of-the-art analytic methods, including those
for genome-wide association studies (GWAS), while protecting the confidentiality of
NHANES participants’ identifiable private information. New and enhanced methods are
under consideration to facilitate these types of studies; however, any mechanisms for
access to NHANES data must be consistent with applicable Federal confidentiality and
statistical statutes and guidance.

Appendices A and B provide additional background information regarding the Beyond


Gene Discovery initiative and NHANES, respectively.
1
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

2.0 Statutory Context for Access to NHANES Data

NCHS collected the data and samples for the NHANES under the authority of section
306 of the Public Health Service Act (PHSA) (42 U.S.C. 242k). They are therefore
subject to protection by the confidentiality provisions of section 308(d) of the PHSA (42
U.S.C. 242m(d)). Under this provision, NCHS may not release identifiable information
unless the participant has consented to the release. Genomic data, alone or in
combination with other NHANES data, are considered to be potentially identifiable, and
the NHANES III and NHANES 1999-2002 consents do not allow for release of
identifiable data. Appendix C provides a more detailed review of the statutory authority,
regulations and policies that have governed the collection of and access to NHANES
data.

NCHS, as a statistical agency, is also covered by the Confidential Information


Protection and Statistical Efficiency Act (CIPSEA) of 2002. This Act gives NCHS the
authority to designate agents to whom confidential, individually-identifiable data may be
released for statistical purposes only. These agents must act under the control and
supervision of NCHS. This requirement precludes NCHS and its agents from depositing
NHANES data in non-NCHS data repositories, such as the existing NIH database of
Genotypes and Phenotypes (dbGaP). Appendix E provides additional background
information on dbGaP.

It is currently unclear whether NCHS may use the designated agent authority for data
collected prior to the passage of CIPSEA or without specific reference to such agents in
the informed consent, as is the case for NHANES III and NHANES 1999-2002. NCHS
has requested an Office of Management and Budget (OMB) determination regarding
applying the CIPSEA designated agent provision to these data.

3.0 Analytic and Statistical Considerations for NHANES Genomic Data

The analytic and statistical requirements for NHANES genomic data will inform the type
of data access that is needed to maximize the scientific value of these data. Analytic
methods for genomic data are evolving rapidly, and would need to be custom-tailored to
accommodate the complex survey design of NHANES. Current NHANES data release
policies could present significant challenges for conducting state-of-the-art genomic
analyses, including those for GWAS. While identifiable data can be analyzed using a
remote access system, data that could identify the individual can only be viewed inside
a Research Data Center (RDC) and cannot be removed from the RDC in an identifiable
form.

The input of experts in both NHANES and genomic data analysis is needed to formally
assess the analytic and statistical requirements for conducting genomic/GWAS
2
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

analyses using NHANES data so that appropriate access mechanisms can be


developed. The overarching question is:

What are the analytic and statistical considerations that need to be


incorporated into the design of an access system?

Specific points for consideration include:

• What are the unique opportunities, from an analytic perspective, of the proposed
NHANES genomic dataset? (e.g., What are some of the standard and higher-
level analyses that end users might want to perform using these data?)

• What are the unique challenges, from an analytic perspective, of the proposed
NHANES genomic dataset for standard and higher-level genomic analyses?
(e.g., challenges associated with the complex survey design, data accessibility,
etc.)

• What should be included in an analytic plan to determine the requirements of an


access system and test prototype systems, and what databases (existing or
synthetic) can be used in conducting the analysis?

• What existing analytic and data access models for conducting standard and
higher-level genomic analyses might be applied to or modified for NHANES
data?

• What types of genomic analyses could be conducted using an email-based


remote access system as described in Section 5.0 below, and what types of
genomic analyses would be difficult or perhaps impossible to conduct using such
a remote access system?

4.0 Current Access Options and Future Considerations for NHANES Genomic
Data

The genetic data collected for NHANES III and NHANES 1999-2002 must be kept
strictly confidential consistent with the informed consent provided by the participant
under the NCHS confidentiality statute. The existing data (about 200 genetic variants)
are currently made available to researchers using the same system as other sensitive
datasets housed at NCHS through two methods of access. The first is the Research
Data Center (RDC) at NCHS where individual-level data can be viewed and analyzed in
a secure environment and results vetted for confidentiality risks before they are
removed from the facility. The second method of access is the NCHS Analytical Data
Research by Email (ANDRE) remote access system, which allows broader access to
3
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

the data. The remote system is email-based and allows researchers to submit statistical
analysis code and receive results from any location.

The increase in the availability of NHANES genomic data by four orders of magnitude
through the genome-wide studies planned as part of BGD will change the demands on
the RDC and the ANDRE system. Similarly, the future of genomic data analyses is
clearly moving in the direction of more complex analyses as millions of genetic
variations are analyzed in an attempt to untangle biological pathways, as in GWAS.
These analyses pose real challenges in terms of the analytic methods needed to
perform these statistical tests and the computing resources needed to process them.
While modifying some current practices is likely feasible, other requirements of genomic
analyses may require new solutions.

To address both the increasing demands for analytic complexity, as well as the
anticipated high volume of users for analysis of NHANES genomic data, the Data
Access Subgroup of CDC’s BGD Working Group has identified four possible
approaches for consideration. These options could be employed, independently or in a
combined strategy, to maximize access to NHANES genomic data while protecting the
privacy of the participants and the confidentiality of their information, and working within
the statutory framework for statistical agencies.

1. Remote access to the full NHANES genotypic and phenotypic database that is
centrally-located in the RDC, via electronic submission of code, automated
disclosure review of program code and output, and return of output that has passed
a confidentiality review. Researchers can perform analyses on individual-level data
but cannot see identifiable data.

2. Applying NCHS’ designated agent authority to enter into legally-binding agreements


with a limited number of outside researchers to allow the controlled, conditional
release of individual-level, potentially-identifiable data, for statistical purposes only,
with strict and substantial oversight by NCHS. It will be necessary to develop criteria
for determining which researchers would be granted designated agent status in
addition to the primary requirement that the project not be appropriate for the remote
access system. [Note: The feasibility of this option depends on clarification from
OMB regarding its availability for use with existing NHANES data.]

3. Establish and operate additional RDCs outside Hyattsville, with the first in Atlanta.

4. Informed consent changes to allow data to be shared more broadly, potentially


through dbGaP:
4a. Consider a new model for informed consent for future NHANES, and/or
4b. Consider recontacting NHANES III and NHANES 1999-2002 participants for their
consent.
4
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

5.0 Proposed Data Access Option #1 – Reengineered email-based remote access


system

Remote access to the full NHANES genotypic and phenotypic database that is centrally-
located in the RDC, via email submission of code, automated disclosure review of code
and output, and return of output that has passed automated confidentiality review
conducted by the ANDRE system. Researchers can perform analyses on individual-
level data but cannot see identifiable data.

Reengineered Pros Cons


Email-based
Remote Access
System
Data Access o Widely available access o Feasibility of conducting
o Minimal oversight required typical/higher-level genomic
o Minimal protocol review analyses via remote access,
needed and requirements of such a
system, are not known and
require extensive expert
input.
Privacy & o Identifiable data are not o Disclosure risk prevention
Confidentiality released outside NCHS. strategies would need to be
tailored to genomic data,
including its use in GWAS.
Prior Experience o NCHS is currently testing a o The email-based remote
remote access system for access system currently
the NHANES genetic data being tested was not
that are currently available, developed for complex
and the system will be made genomic/GWAS analyses.
available to users in the near
future.

5.1 Background on NCHS Email-based Remote Access System

NCHS has provided researchers remote access to sensitive data through the ANDRE
remote system since April 1998. In the past nine years, ANDRE has served hundreds
of data analysts and executed tens of thousands of SAS programs. ANDRE has
multiple levels of disclosure risk prevention strategies built into the system.

While the remote access system provides a convenient solution for researchers who do
not wish to travel to NCHS, it does have some limitations: primarily that statistical
5
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

software is limited to SAS and SAS-callable SUDAAN due to technical and resource
constraints and individual-level data are accessible but not viewable. Yet, even with
these restrictions in place, the remote system has successfully been used to process
thousands of data analyses for hundreds of users on various sensitive data sets housed
at NCHS.

Beta testing of the ANDRE system for use with current NHANES genetic data (currently
including approximately 200 genetic variants) is underway. The results to date show
that the system can be successfully used to conduct standard genotype-phenotype
association analyses; however, it is not designed to conduct GWAS analysis. ANDRE
will be available for all users with NCHS Ethics Review Board (ERB)-approved
proposals in the first quarter of 2008. Future plans for development of the remote
system include the creation of a library of macros for more complex genetic analyses.

Clarifying the analytic and statistical needs of the end user will help to determine
whether mechanisms to provide the level of information needed for GWAS analyses can
be developed in an ANDRE-like system. Early attempts to produce regression results
for one million genetic variations have shown that p-values can be generated in under
24 hours using a system similar to ANDRE. However, further investigations are
underway to determine if additional modifications to these p-values will need to be done
in the remote access environment to complete the analyses, or if these p-values can be
further processed by the end user outside of the remote system. Further simulation
studies are underway to examine the capacity of the remote access system for GWAS
data analysis and the impact on the analyses of the requirement to suppress any
analytic products that would jeopardize confidentiality (e.g., individual-level data,
outliers, logs).

6.0 Proposed Data Access Option #2 – Designated Agent Agreements

Applying NCHS’ designated agent authority to enter into legally-binding agreements


with a limited number of outside researchers to allow the controlled, conditional release
of individual-level, potentially-identifiable data, for statistical purposes only, with strict
and substantial oversight by NCHS. [Note: The feasibility of this option depends on
clarification from OMB regarding its availability for use with existing data.]

6
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

Designated Agent Pros Cons


Agreements
Data Access o Facilitates the development o Significant NCHS oversight
of new analytic required, adding to expense
methods/tools and allows and overhead.
more complex analyses than o Extensive and lengthy
email-based remote access protocol review needed
system o Data misuse could have
possible detrimental effects
not only on NHANES, but
also other data collections of
NCHS, CDC, and the federal
statistical system.
Privacy & o Legal penalties apply if data o Identifiable data are released
Confidentiality use agreement is violated. outside NCHS creating
o Can implement disclosure potential risk to participants,
control procedures to in part due to gaps in genetic
maximize protections. information non-
discrimination protection.
Prior Experience o Policies and procedures for
NHANES interagency
agreements have been
developed & implemented
on a limited basis.

6.1 Background on NCHS Off-site Designated Agent Agreements

Until the passage of CIPSEA, NCHS did not have the authority to designate agents that
could access data not released to the public due to confidentiality requirements.
Designated agents are subject to penalties, including fines and imprisonment, if data
use agreements are violated and must be under the strict and substantial supervision of
NCHS. With the passage of CIPSEA, NCHS began developing policies for
implementing the designated agent authority in conjunction with the implementation
guidelines being developed by OMB and an interagency committee. NCHS decided to
adopt a stepwise process whereby designated agent status would be considered first
for users and data that presented the least risk (e.g., federal employees operating under
approved IT security plans accessing the least sensitive data). Proposals requirements
have included a clear and detailed description of the purpose of the access, a clear
justification of the need for the confidential data requested, a description of how the data
will be used, a description of how NCHS will benefit from granting the requested access,

7
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

and a description of the procedures planned to safeguard the confidentiality of


identifiable data. Only data needed for the project are accessed. Agreements include
provisions for oversight by NCHS.

At this time, NCHS does not have off-site NHANES data use agreements with non-
governmental entities; however, mechanisms to expand use of the designated agent
authority are under consideration, including the ability to access genetic data by
researchers that are not federal employees (further work is subject to guidance from
OMB).

Given the anticipated costs associated with the oversight of these proposed
agreements, the number of designated agent agreements that could be supported
would likely be limited, and criteria for their selection would need to be developed. The
designated agent option would be intended only for those instances when the RDC and
remote access system are insufficient to meet the needs of the proposed research.

7.0 Proposed Data Access Option #3 – Additional Research Data Centers

Establish and operate additional RDCs outside Hyattsville, with the first in Atlanta.

Additional RDCs Pros Cons


Data Access o Allow broader access to all o Significant start-up and
NCHS data collections, operation costs
including NHANES genomic o Data enclave presents
data, to CDC scientists and challenges for collaboration
the scientific community. by large multidisciplinary
teams that are typical for
GWAS analysis.
Privacy & o Identifiable data are not
Confidentiality released.
Prior Experience o Policies and procedures for
NHANES RDC access have
been developed and
implemented in Hyattsville.

8
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

7.1 Background on the NCHS Hyattsville Research Data Center

NCHS provides qualified researchers on-site access to confidential data collections for
statistical purposes, under strict supervision, through the Hyattsville Research Data
Center (RDC). Data from virtually all of the NCHS data collection systems may be
made available through the RDC; also available are data from other data collection
systems, and RDC users may supply their own data to be merged with NCHS datasets.
In 2007, certain NCHS confidential data collections, including NHANES I, II and III, were
made available through the U.S. Census Bureau RDC network which includes several
locations around the country. The RDC provides a mechanism whereby researchers
can access detailed data files in a secure environment, without jeopardizing the
confidentiality of respondents.

To apply for RDC access to NCHS confidential data collections, the researcher submits
a proposal which includes key study questions or hypotheses, the analytic strategy and
statistical methods to be used, software requirements, curriculum vitae for each person
participating in the research activity, and a summary of the data requirements for the
proposed research, which is used by RDC staff to construct the necessary data files.
Additionally, the proposals must include the "Agreement Regarding Conditions of
Access to Confidential Data in the Research Data Center for the National Center for
Health Statistics'' and "Affidavit of Confidentiality" signed by all participating
researchers. Research proposals are reviewed by a Proposal Review Committee which
consists of (at a minimum) the director of the NCHS RDC, the RDC staff liaison, the
NCHS Confidentiality Officer, and the director (or designee) of the NCHS data division
whose data are requested in the proposal. Approval of research proposals does not
constitute endorsement by NCHS of the substantive, methodological, theoretical, or
policy relevance or merit of the proposed research, but rather constitutes a judgment
that the research, as described in the application, is not an illegal use of the requested
data file and that there is high probability that the project can be successfully done in
the RDC.

The RDC computers have no electronic link to the NCHS network, the CDC-NCHS
mainframe, or the internet. The computers are configured such that researchers are
given read-only access to requested data files and can write only onto the local
workstation's hard disk—removable media such as floppy disks are inaccessible. All
printed output is routed to a central printer which is monitored by RDC staff.
Researchers may take the results of their analyses off-site only after disclosure review
by NCHS RDC staff. Disclosure review consists of looking for tabular cells less than
five, tables with geographic variables in any dimension, models with geographic
variables (or variables tantamount to geographic variables) as outcome variables, or
case listings.

9
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

Researchers using the NCHS RDC are charged for space and equipment rental and
staff time necessary for supervision, disclosure limitation review, maintenance of
computer facilities (including both hardware and software), and the creation and
maintenance of data files required by the researcher. The cost per project includes a
daily rate of $200 plus a $500 charge for new file creation.

8.0 Proposed Data Access Option #4 – NHANES Informed Consent Changes

Informed consent changes to allow data to be shared more broadly, potentially through
dbGaP:
4a. Consider a new model for informed consent for future NHANES, and/or
4b. Consider re-consenting NHANES III and NHANES 1999-2002 participants.

10
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

Informed Consent Pros Cons


Changes
Data Access o Potentially the most flexible o Possible loss of
mechanism for data access representativeness of the
from the researcher sample for both genomic and
perspective non-genomic data due to
o Facilitates the development inability to recontact
of new analytic participants or refusal of
methods/tools and allows consent
more complex analyses o Data misuse could have
than remote access possible detrimental effects
not only on NHANES, but also
other data collections of
NCHS, CDC, and the federal
statistical system.
Privacy & o Participants are informed o Identifiable data are released
Confidentiality and can opt out of genomic outside NCHS creating
and other data releases. potential risk to participants, in
part due to gaps in genetic
information non-discrimination
protection.
o Complex informed consent
o Applicability of CIPSEA
penalties is uncertain
depending on the consent
modification.
Prior Experience o In 1998, NCHS recontacted o High cost for reconsenting
and reconsented a small even a non-random sample
non-representative sample (i.e., $1M for 545 participants
of NHANES participants to successfully recontacted)
permit previously-banked o A small fraction of recontacted
specimens to be released to NHANES participants
a genetic biobank. withdrew previous consent for
o Only genetic data and banked specimens (i.e.,
limited demographic requested that the specimens
information were included in be destroyed)
the biobank.

11
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

8.1 Background on NCHS Experience with Re-consenting NHANES

In 1998, the National Human Genome Research Institute (NHGRI) requested that
NHANES III cell lines be added to the NIH-CDC DNA Polymorphism Discovery
Resource for the purpose of discovering polymorphisms in human DNA.

The NHGRI proposal involved a request to contact NHANES III participants for the
purposes of explaining the Polymorphism Discovery Resource and obtaining
informed consent for this research purpose. NHANES III was divided into two
phases: phase 1 (conducted from 1988-1991) and phase 2 (conducted from 1991-
1994). Each phase was a representative sample of the U.S. population; however,
since cell lines were not available for all phase 1 participants, and the Resource did
not require a representative sample, only phase 1 participants were re-contacted to
preserve the phase 2 samples for future studies. Participants were re-contacted and
re-consent was obtained by the NHANES III data collection contractor. Cell line
samples from participants who agreed to the research were sent to the Coriell
Institute. A randomly assigned ID was attached to the samples and no information
other than race/ethnicity was released.

A total of 545 participants were successfully re-contacted. These participants were


not a random subsample, but were selected because of their race/ethnicity and their
location so that travel and contact time for the interviewers would be minimized.
Only limited tracing activities were undertaken and attempts to contact respondents
were limited. In addition, only their genetic data and limited demographic
information were included in the Resource; no other NHANES data were made
available. The cost for re-consenting the 545 participants in 1998 was just under $1
million, and was funded by NHGRI.

Appendix D provides an analysis of options for re-consent and/or a change to the


NHANES consent process as well as a discussion of the potential impact of
changing the consent process on past and future NHANES data collection, on other
data collections conducted by NCHS, and on the federal statistical system.

12
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

Appendices

A. The Beyond Gene Discovery Initiative

B. The National Health and Nutrition Survey (NHANES)

C. Statutory and Policy Considerations Related to the Release of


and Access to NHANES Data Including Genetic Information

D. Considerations Related to Re-consent and Changes to Future


Informed Consent in Order to Achieve Broader Access to
NHANES Genetic Data

E. NIH Database of Genotypes and Phenotypes (dbGaP)

F. The Data Access Subgroup of CDC’s Beyond Gene Discovery


Working Group—Membership List

13
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

Appendix A. The Beyond Gene Discovery Initiative

Background on the current NHANES III Collaborative Genomics Project

In 2002, after a Federal Register announcement on the availability of DNA


samples from NHANES III for genetic research, CDC’s National Office of Public
Health Genomics (NOPHG) formed a multidisciplinary working group with
members from across CDC to develop a prototype for a national report on
genomics and public health. One of the main goals of the working group was to
develop a proposal to measure the prevalence of selected genetic variants
believed to be of public health significance using data derived from the NHANES
III DNA Bank. The purpose of the study was to determine the prevalence of the
selected genetic variants in approximately 50 candidate genes known to be
important in major cellular and physiologic pathways, such as nutrient
metabolism, xenobiotic metabolism, and the immune and inflammatory
responses. Future analyses would examine the associations between the
selected genetic variants and numerous phenotypes and disease outcomes
available in NHANES III.

The NHANES III genotyping for this project was performed largely by medium-
throughput TaqMan and MGB Eclipse assays by the Core Genotyping Facility at
the National Cancer Institute (NCI) and the Division of Laboratory Services of the
National Center for Environmental Health at CDC. Quality assurance and quality
control criteria were established and implemented by NCHS, and a total of 90
polymorphisms in 50 genes were available for analyses. A manuscript detailing
the population-based allele frequencies and genotype prevalence of
polymorphisms in the U.S. population has been submitted for publication.

To fulfill the second goal of the project, investigators from the working group
developed and submitted research proposals to examine associations between
genetic variants of interest and numerous phenotypic data available in the
NHANES III public-use datasets. Statistical analyses for over 35 research
proposals are underway, and several investigators have presented their work at
scientific conferences and are preparing manuscripts and reports.

Throughout this process, the role of NOPHG has been to provide overall
leadership for and coordination of the project, develop collaborative research
proposals for the prevalence analysis and for genotype-phenotype analyses, and
conduct the bulk of statistical analysis for the proposed studies. Statistical
analyses of the genomic data have taken place within the Research Data Center
(RDC) of NCHS in Hyattsville, Maryland.

14
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

From candidate gene studies to whole-genome analyses: the Beyond Gene


Discovery initiative

Advancements in genomic technologies to measure human genetic variation now


provide the opportunity to move beyond studies of candidate genes to measure
human genetic variation on a genome-wide scale, and the National Health and
Nutrition Examination Survey (NHANES) provides a unique national resource for
investigating the effects of genetic variation on health. The Beyond Gene
Discovery initiative will employ the most advanced and comprehensive
technology available to achieve the following public health scientific goals:

• Population prevalence of human genetic variants: measure the true


prevalence in the U.S. population and its racial/ethnic subgroups of thousands of
genetic variants, many of which could be associated with risks for common
diseases. BGD will employ technology that assesses the NHANES sample set
with regard to racial/ethnic subgroups that have recognized health disparities in
the United States. Genotyping chips that are optimized for variants predominant
in Americans of African and Mexican descent will be used, depending on
availability.

• Genotype-phenotype associations: describe the associations of genetic


variants with nutritional, biochemical and clinical characteristics measured in
NHANES that serve as markers or risk factors for common diseases and disease
phenotypes (e.g., homocysteine levels, cholesterol, blood pressure, body mass
index, diabetes). (Figure A.1)

• Gene-environment interactions: evaluate the joint effects of genetic variants


and environmental factors on health characteristics measured in a subset of
NHANES samples, as well as the impact of genetic variation on the body burden
of environmental chemical exposures. Furthermore, gene-environment
interaction data will be used to assess differences in prevalence of health
outcomes among the population groups and thereby address gaps in knowledge
concerning health disparities in the United States. (Figure A.1)

15
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

Figure A.1.
Genes Disease NHANES data
provide a unique
resource for
dissecting gene-
disease
associations by
Endo- facilitating
analyses of the
phenotypes associations
between genetic
Genes Disease variants,
environmental
Intermediate factors, and
Outcomes endophenotypes/
intermediate
outcomes, such
as known
Environmental markers or risk
Variables factors for
common
diseases.

Because BGD will use a large, nationally representative collection of genetic


samples, the initiative will have far-reaching potential for health impact. The
research undertaken under BGD will enhance the value of many ongoing gene
discovery studies, helping to translate their findings into new targets for
prevention, diagnosis, and treatment of common diseases. By measuring the
population prevalence of numerous gene variants, BGD will provide the basis for
estimating the numbers of people who may benefit from particular genotype-
based screening or diagnostic tests, drugs, or other preventive or therapeutic
interventions

Project Management and Organization

An initiative of the scope of BGD requires engaging internal and external partners
in a collaborative effort. The necessary scientific, technical, strategic, and
financial resources will be brought together through a public-private partnership
established by the CDC Foundation (CDCF). The CDCF will work to forge this
partnership from interested government, academia, industry, and non-profit
sector organizations and parties.

The BGD Working Group, under the leadership of NOPHG and with membership
from across CDC’s National Centers and the Office of the Director, has been
established to develop CDC activities and to ensure that all activities performed

16
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

by CDC under this initiative are in the best interest of the public’s health and
consistent with CDC’s authorities. To this end, four subgroups have been
established to address various issues of project implementation (Figure A.2):

1) Genotyping: The group is working to address technical and practical


issues associated with selection of the most appropriate laboratory
methods for genotyping. Action items are: to establish quality assurance
and quality control (QA/QC) guidelines and procedures, to evaluate novel
genotyping methods for suitability and application, to evaluate and monitor
sample receipt and quality, and to develop a comprehensive request for
contract for completion of the laboratory activities of BGD.

2) Analysis and Statistics: The charge of the group is to evaluate the analytic
challenges of NHANES genomic data and to develop an analytic plan for
BGD, in the context of the proposed access options, and development of a
plan to address these challenges. Issues under consideration include:
evaluation of genotyping quality control; detection of and adjustment for
population stratification; assessment of structural variants; statistical
methods for evaluating the relationship between the variations in genomic
structure and function; statistical analysis of genetic associations, and
gene-gene and gene-environment interactions. Appropriate analysis of
genome-wide data from NHANES will require additional statistical
methods development given that all data analyses must account for the
clustered complex survey design.

3) Data Access: The group is actively exploring and developing options for
access to data collected by statistical agencies while protecting privacy
and confidentiality. Issues for discussion and resolution include:
identifying and meeting analytic needs while meeting privacy and
confidentiality requirements; the advantages and disadvantages of the
different access mechanisms; and alternative data access models and
options to consider.

4) Research Agenda: Members of the BGD Working Group, informed by the


work of the three subgroups exploring options for genotyping, analysis,
and data access, will develop a comprehensive research agenda for
genomic analyses of NHANES data.

17
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

Figure A.2. The four


BGD Working Group focus areas of the
BGD Working Group
are indicated. The
bidirectional arrow
Genotyping Analytic & Data Access Research indicates the
Statistical Options Agenda interrelationship
between the data
access options and
Data QA/QC Remote access the analytic and
Develop Methods Designated agents statistical needs and
Assess Disclosure Risk Additional RDC(s)
capabilities.
Consent changes

Broader data use

BGD Implementation Plan

18
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

Appendix B. The National Health and Nutrition Examination Survey

Introduction

The National Health and Nutrition Examination Survey (NHANES) is a program


of studies designed to assess the health and nutritional status of adults and
children in the United States. The survey is unique in that it combines interviews
and physical examinations. NHANES is a major program of the National Center
for Health Statistics (NCHS). NCHS as part of the Centers for Disease Control
and Prevention (CDC) has the responsibility for producing vital and health
statistics for the nation.

The origin of the NHANES survey was the National Health Survey Act of 1956.
This act formulated the need for national population surveys to assess the extent
of illness and disability in the U. S. population. The National Health Interview
Survey (NHIS) was created to respond to this public health need, and the first
NHIS was fielded in July 1957. The need for additional data on health status that
could best, or only, be assessed using direct physical measures was recognized,
and in 1958 planning for the first National Health Examination Survey (NHES)
was begun. An early decision was to collect these measures in a standardized
environment, and this led to the construction of mobile examination centers to
collect this information. These mobile examination centers could be moved from
one location to another so that all data collected in the survey would utilize
standard procedures and equipment. A limited set of biological specimens were
included in the first NHES survey and in the subsequent surveys on children and
adolescents (NHES II and NHES III). In 1971, a nutrition component was added
to the NHES and the survey became known as the National Health and Nutrition
Examination Survey (NHANES). The number of biomarkers collected in the
NHANES I survey (1971-1975) was much greater than the number collected in
any of the three NHES surveys in the 1960’s. In addition, the survey covered a
much wider age range, 1-74 years, than any of the previously conducted NHES
surveys. Tests were completed on whole blood, serum and urine samples.
Some of the new biomarkers were added to address specific nutrition issues, but
others were added to provide national reference data for selected immunization
and infectious disease assessments. There was a further expansion of the
number of biomarkers collected in NHANES II (1976-1980). NHANES II included
the first environmental assessments (blood lead and selected pesticides). During
NHANES III (1988-1994) the number of blood and urine biomarkers increased
significantly; and for the first time, blood lymphocytes were collected in
anticipation of advances in genetic research.

19
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

In 1999, conducting the survey in cycles gave way to a continuous survey


operation, which changes focus to meet emerging needs. A nationally
representative sample of about 5,000 persons is examined each year. These
persons are located in counties across the country, 15 of which are visited every
year. The NHANES detailed interview includes demographic, socioeconomic,
dietary, and health-related questions. The examination component consists of
medical and dental examinations, physiological measurements, and laboratory
tests. In 1999-2002, blood was collected for DNA extraction with a specific
consent for future genetic studies. DNA collection was not collected from 2003-
2006 but was added back into the laboratory protocol in 2007-2008.

Survey Content
NHANES collects data on the prevalence of conditions in the population.
Estimates for previously undiagnosed conditions, as well as those known to and
reported by survey respondents, are produced. Risk factors, lifestyle, heredity, or
environmental factors are examined. Smoking, alcohol consumption, sexual
practices, drug use, physical fitness and activity, weight, and dietary intake are
also included in the survey content. Data on certain aspects of reproductive
health, such as use of oral contraceptives and breastfeeding practices, are also
collected. The diseases, medical conditions, and health indicators studied in the
current NHANES include:

• Anemia • Osteoporosis
• Cardiovascular disease • Physical fitness and physical
• Diabetes functioning
• Environmental exposures • Reproductive history and sexual
• Hearing loss behavior
• Infectious diseases • Respiratory disease (asthma, chronic
• Kidney disease bronchitis, emphysema)
• Nutrition • Sexually transmitted diseases
• Obesity • Vision and eye diseases
• Oral health

The sample for the survey is selected to represent the U.S. population of all ages. To
produce reliable statistics, NHANES currently over-samples persons 60 and older,
African Americans, and Mexican Americans.

During the examination all participants have their pulse and/or blood pressure
measured. Dietary interviews and body measurements are included for everyone.
Participants age one and older have a blood sample collected. DNA samples are

20
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

collected on consenting adults age 20 or more years. Depending upon the age of the
participant, the rest of the examination includes tests and procedures to assess the
various aspects of health listed above. In general, the examinations become more
extensive with participant age.

Survey Operations

Health interviews are conducted in respondents’ homes. Health measurements are


performed in specially-designed and equipped mobile examination centers, which travel
to survey locations throughout the country. The study team consists of a physician,
medical and health technicians, as well as dietary and health interviewers. Many of the
study staff members are bilingual (English/Spanish). An advanced computer system
using high-end servers, desktop PCs, and wide-area networking collects and processes
all of the NHANES data, eliminating the need for paper forms and manual coding
operations. This system allows interviewers to use notebook computers with electronic
pens. The staff at the mobile exam center can automatically transmit data into
databases through devices such as digital scales and stadiometers. Touch-sensitive
computer screens let respondents enter their own responses to certain sensitive
questions in complete privacy. Survey information is available to NCHS staff within 24
hours of collection, which enhances the capability of collecting quality data and
increases the speed with which results are released to the public.

In each location, local health and government officials are notified of the upcoming
survey. Households in the survey receive a letter from the NCHS Director to introduce
the survey.

NHANES is designed to facilitate and encourage participation. Participants receive


compensation and, if necessary transportation is provided to and from the examination
center. A report of medical findings is given to each participant. All information collected
in the survey is kept strictly confidential, and privacy is protected by public laws.

Uses of the Data

Information from NHANES is made available through an extensive series of publications


and articles in scientific and technical journals. For data users and researchers
throughout the world, survey data are available on the internet and on easy-to-use CD-
ROMS. Research organizations, universities, health care providers, and educators
benefit from survey information. Primary data users are federal agencies that
collaborated in the design and development of the survey. The National Institutes of
Health, the Food and Drug Administration, and CDC are among the agencies that rely
upon NHANES to provide data essential for the implementation and evaluation of

21
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

program activities. The U.S. Department of Agriculture and NCHS cooperate in planning
and reporting dietary and nutrition information from the survey. NHANES’ partnership
with the U.S. Environmental Protection Agency allows continued study of the many
important environmental influences on our health.

The following are some of the uses of NHANES data:

• Past surveys have provided data to create the growth charts used nationally by
pediatricians to evaluate children’s growth. The charts have been adapted and adopted
worldwide as a reference standard and have recently been updated using the latest
NHANES figures.

• Blood lead data were instrumental in developing policy to eliminate lead from gasoline
and in food and soft drink cans. Recent survey data indicate the policy has been even
more effective than originally envisioned, with a decline in elevated blood lead levels of
more than 70% since the 1970’s.

• Overweight prevalence figures have led to the proliferation of programs emphasizing


diet and exercise, stimulated additional research, and provided a means to track trends
in obesity.

• Data have continued to indicate that undiagnosed diabetes is a significant problem in


the United States. Efforts by government and private agencies to increase public
awareness, especially among minority populations, have been intensified.

• Information collected in the survey assists the Food and Drug Administration in
deciding if there is a need to change vitamin and mineral fortification regulations for the
nation’s food supply.

• National programs to reduce hypertension and cholesterol levels continue to depend


on NHANES data to steer education and prevention programs toward those at risk and
to measure success in curtailing risk factors associated with heart disease, the nation’s
number one cause of death.

• New measures of lung function assist in the understanding of respiratory disease and
better describe the burden of asthma in the United States.

Because NHANES is now an ongoing program, the information collected contributes to


biannual estimates in topic areas included in the survey. For small population groups
and less prevalent conditions and diseases, data must be accumulated over several
years to provide adequate estimates. Data from the current NHANES can be compared

22
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

to information collected from previous surveys to assess trends. This allows health
planners to detect the extent various health problems and risk factors have changed in
the U.S. population over time. By identifying the health care needs of the population,
government agencies and private sector organizations can establish policies and plan
research, education, and health promotion programs that help improve present health
status and prevent future health problems.

23
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

Appendix C. Statutory and Policy Considerations Related to the Release of and


Access to NHANES Data Including Genetic Information

The NHANES database combines objective health status measurements, with


behavioral data, environmental data and genetics in a unique representative sample of
Americans. The genetic component of NHANES has been available to researchers
since 1999 through standard NCHS mechanisms that allow access to data that cannot
be released as public-use files. The CDC BGD initiative will greatly expand the genetic
information available to researchers, which will greatly enhance the analytic utility of the
NHANES database. Expanded access to this unique dataset is needed to achieve the
potential offered by the BGD initiative. In evaluating potential options, it is important to
understand the legislative authorities, principles, policies, and consent procedures
under which the data were collected and how they affect how the data can be accessed.

Informed consent, data release and access policies

For a statistical data collection agency such as NCHS, data release and access policies
are developed at the organization level and for specific surveys or data collections.
Human subjects protection requires that participants sign informed consent statements,
and good research practice dictates that data collection systems include a written data
release policy as part of the data collection protocol. Data release and access policies
must reflect the informed consent protocols and any legislation that governs the data
collection. Practices must then be developed to implement the policies which will be
judged on how well they meet the requirements of the legislation and the informed
consent. They must be consistent with current best practices including data
stewardship and adoption of mechanisms that minimize risk such as limiting the number
of people with access to information that increases risk and limiting the amount of risky
information released to only that which is needed for the task. Policies and practices
should be publicly available.

The policies and practices that relate to NHANES are closely tied to the informed
consent statements used to obtain participation from sample persons, NCHS’
authorizing legislation, and the Principles and Practices of a Federal Statistical Agency.
NCHS, as a federal statistical agency designated by OMB and governed by the Federal
Statistics Confidentiality Order issued by OMB in 1997 (62 FR 35044), must abide by
the stated requirements for data stewardships and confidentiality protection
(http://www.whitehouse.gov/omb/inforeg/conf-order.pdf). Changes to the informed
consent document to request permission from the participant for broader data sharing
needs to abide by the authorizing legislation and requirements of data stewardship
mentioned above.

24
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

Additional NCHS policies and practices that support stewardship and confidentiality
protection are outlined in “How NCHS Protects Your Privacy”
(http://www.cdc.gov/nchs/about/policy/confiden.htm) and the NCHS staff manual
(http://www.cdc.gov/nchs/data/misc/staffmanual2004.pdf). All information on NHANES
operations, including the informed consent protocol and data release policies, is
available to the public, with the most recent version available on the NCHS web site.
Plans for data release and the protection of confidentiality undergo IRB and OMB
review.

Legislation related to privacy/confidentiality

The section of NCHS’s authorizing legislation that deals with confidentiality and data
release is 308(d) of the Public Health Service Act (42 U.S.C. 242m)

"No information, if an establishment or person supplying the information or


described in it is identifiable, obtained in the course of activities undertaken or
supported under section 304, 306, or 307 may be used for any purpose other
than the purpose for which it was supplied unless such establishment or person
has consented (as determined under regulations of the Secretary) to its use for
such other purpose and in the case of information obtained in the course of
health statistical or epidemiological activities under section 304 or 306, such
information may not be published or released in other form if the particular
establishment or person supplying the information or described in it is identifiable
unless such establishment or person has consented (as determined under
regulations of the Secretary) to its publication or release in other form."

According to Section 308(d), consent to release identifiable data must be obtained from
the survey participant. This would be true even if Section 308(d) were not mentioned in
the consent, since all NCHS data collections are covered by 308(d). Confidentiality must
be protected even if no promise to do so is included in the consent. In order to release
identifiable data, consent must be obtained from the participant.

NCHS data collections are also covered by the Privacy Act, which requires that
identifiable data be stored securely and restricts access to personal information. In
2002, the Confidential Information Protection and Statistical Efficiency Act (CIPSEA)
was passed that deals directly with the protection of statistical data collected under a
pledge of confidentiality. In 2007, OMB released guidance for the adoption of CIPSEA
provisions. Of note, the penalties for willful disclosure of confidential statistical

25
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

information is a “class E” felony under this Act subject to imprisonment for up to 5 years,
a fine of $250,000, or both.

Similarly, NCHS and about a dozen other federal statistical agencies are governed by
the statistical confidentiality order issued by OMB. This order establishes a floor of
confidentiality protections similar to those established by law in CIPSEA. In effect, this
OMB order directs statistical agencies to provide survey participants with an informed
consent statement and to then limit disclosure of identifiable information to the
conditions and uses specified in the consent statement.

NHANES informed consent

DNA was collected for the second phase of NHANES III (1991-1994), during NHANES
1999-2002 and is currently being collected during NHANES 2007-2008. Different
informed consent statements have been used in each survey.

Informed Consent for NHANES 1991-1994 (NHANES III Phase 2): There was no
explicit mention of DNA testing, but the consent does state that blood would be stored
for future laboratory tests. There is a general consent for participation in the survey.
Information about the confidentiality of the data is found is several places in the consent
materials:

From the participant letter:


“The survey is authorized by the Public Health Service Act. All of your
answers will be kept in strict confidence.”

“We respect your privacy. The confidentiality of all the information you
give us is protected by public law.”

From the informed consent signature page:


“Health information collected in the NHANES III is kept in strictest
confidence. Without your approval our staff is not allowed to discuss your
participation in this study with anyone under penalty of Federal law:
Section 308(d) of the Public Health Service Act (42 USC 242m) and the
Privacy Act of 1974 (5 USC 552A).”

From the informed consent signature page and all data collection documents:
“Information contained on this form which would permit identification of
any individual or establishment has been collected with a guarantee that it
will be held in strict confidence, will be used only for purposes stated for
this study and will not be disclosed or released to others without the

26
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

consent of the individual or establishment in accordance with section


308(d) of the Public Health Service Act (42 USC 242m).”

Informed Consent for NHANES 1999-2002: DNA is explicitly mentioned in the


consent for specimen storage and further studies, and general information about the
confidentiality of the data is found is several places in the consent materials:

From the NHANES 2000 consent for specimen storage and further studies:

Q What studies will be done with the samples?

A At this time, no specific studies are planned besides the tests


included in the NHANES exam. As scientists learn more about health and
diseases, other studies will be conducted that may include stored
samples. People conducting these studies will not contact NHANES
participants for any additional information.

We will keep strictly private all health data and samples that we collect in
NHANES. Our staff is not allowed to discuss that any person is part of
this survey under penalty of Federal laws: Section 308(d) of the Public
Health Service Act (42 USC 242m) and the Privacy Act of 1974 (5 USC
552A).

“Q - What genetic studies will be done and what part will my DNA
sample play? (DNA samples will be collected only on those ages 20
or over.)

A - Genetic studies look at the DNA found in cells. We will store part of the
blood and saliva sample that we collect in the exam center for future
genetic studies. We will keep this material for an unlimited time. Studies of
human genes are helping us learn about many diseases and health
conditions. The information from people who are part of NHANES could
help that effort.

If you wish to have your samples used for future genetic studies, you will
have a chance to say so when you sign this consent form.”

At the signature section of the document:

“Genetic testing studies may be done with DNA samples collected only on
those ages 20 or over. If you wish to have your samples used for future
genetic studies, check the box below:

27
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

Only for persons ages 20 and over, check this box:


□ I agree that my blood and saliva may be kept for future studies using my
genes to help understand genetic links to medical conditions.”

[Note: the NHANES 1999 consent used similar, but not the exact
language as NHANES 2000 for future genetic studies.]

From the consent to interview:


“We use data collected in this survey to study many health issues. We
use information only for research and statistical reports. All data collected
will be kept strictly private. We gather and protect all information in
keeping with the requirements of Federal Laws: the Public Health Service
Act (42 USC 242k) authorizes collection and Section 308(d) of that law (42
USC 242m) and the Privacy Act of 1974 (5 USC 552A) prohibit us from
giving out information that identifies you or your family without your
consent.”

From the consent to the exam:


“We respect your privacy. Public laws keep all information you give
confidential.”

“We will hold all data we collect in the strictest confidence. We gather and
protect all data in keeping with the requirements of Federal Laws: the
Public Health Service Act (42 USC 242k) authorizes collection and
Section 308(d) of that law (42 USC 242m) and the Privacy Act of 1974 (5
USC 552A) prohibit us from giving out information that identifies you or
your family without your consent. This means that we cannot give out any
fact about you, even if a court of law asks for it. However, if we find signs
of child abuse during an exam, we will report it to the local department of
social services or appropriate law enforcement agency. We will keep all
survey data safe and secure. When we allow researchers to use survey
data, we protect your privacy. We assign code numbers in place of
names or other facts that could identify you.”

From the question and answer fact sheet:


“Q - Who can use my stored samples for future research?”

“A - Researchers from Federal agencies, universities, and other scientific


centers can submit proposals to use your stored specimens. These
proposals will be reviewed for scientific merit and by a board that

28
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

determines if the research proposed is ethical. The NHANES program


will always know which samples belong to you, but we will not give other
researchers any information that could identify you.”

In summary, the consent materials for NHANES 1999-2002 promise that no information
that could identify a participant will be released.

Informed Consent for NHANES 2007-2008: There was a separate consent for
collection and storage of DNA from those age 20 years or more, and general
information about the confidentiality of the data is found is several places in the consent
materials:

From the consent for collection and storage of DNA from those age 20 years or
more:
Q - Why will a sample of my DNA be kept for future health studies?

A - We will store part of the blood sample that we collect in the exam
center for future genetic studies. These samples will be frozen and kept in
a specimen bank for as long as they last. Your participation is voluntary
and no loss of benefits will result if you refuse.

Q - What genetic studies will be done with the samples?

A - Genes are the “instruction book” for people. Genes are made out of
DNA. The DNA of a person is about 99.9% the same as the DNA of
another person, but no two people have the same DNA except identical
twins. Differences in DNA are called genetic variations and explain
differences such as eye color and partly explain why some people get
certain diseases. To look at these variations many genetic tests may be
done on your blood sample. We will keep the DNA for an unlimited time.
Studies of human genes are helping us learn about many diseases and
health conditions. The information from people who are part of NHANES
could help that effort.

People conducting these studies will not contact NHANES participants for
any additional information.

We will keep strictly confidential all health data and samples that we
collect in NHANES, as required by Federal law. By confidential we mean
that the information that we release to the public can not be used to
identify you. Our staff is not allowed to discuss that any person is part of
this survey under penalty of Federal laws: Section 308(d) of the Public

29
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

Health Service Act (42 USC 242m) and the Privacy Act of 1974 (5 USC
552A).

Q Who can use the stored DNA samples for further study?

A Researchers from Federal agencies, universities, and other


scientific centers can submit proposals to use the stored specimens.
These proposals will be reviewed for scientific merit and then by a
separate board that determines if the study proposed is ethical. The
NHANES program will always know which samples belong to you, but we
will not give other researchers any information that could identify you.

Q Will I receive results from any future testing of my specimens?

A Most studies using DNA samples will simply add to our knowledge
of health and disease. Therefore, we do not plan to contact you with
individual results from these studies. Periodically we will announce on our
web site general results from the studies being conducted,
(http://www.cdc.gov/nchs/nhanes.htm). To get more general information
about a particular study, you can call our toll-free number, 1-800 452-
6115.

Q What are the benefits and risks for giving a blood sample for
future genetic studies?

A You will not directly benefit but these studies may eventually
help the health of people in the future. The risk of giving a sample
includes the minor risk associated with taking the blood sample. There
may also be a risk that some people may use the information from the
genetic studies to exaggerate or downplay differences among people. The
ethics board that will review all studies using these samples will attempt to
prevent any misuse of the information gained from the NHANES DNA
samples.

Q How can I remove my DNA samples from the specimen bank?

A In the future, if you want samples removed from the specimen


bank, call us toll-free at 1-800-452-6115.

From the interview consent:


“The Public Health Service Act (42 USC 242k) authorizes collection and
Section 308(d) of that law (42 USC 242m), as well as the Privacy Act of
1974 (5 USC 552A), and the Confidential Information Protection and
Statistical Efficiency Act (PL 107-347), prohibit us from giving out

30
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

information that identifies you or your family without your consent. Any
NHANES employee who violates the law may be convicted of a class E
felony and imprisoned for up to 5 years, or fined as much as $250,000.”

From the consent to the exam:


“Will my information be kept private?”

We respect your privacy. Public laws keep all information you give private.
These laws do not allow us to give out data that identifies you or your
family without your permission. This means that we cannot give out
any facts about you, even if a court of law asks for it. However, if we
find signs of child abuse during an exam, we will report it to the local
department of social services or the police.
We will keep all survey data safe and secure. When we share data
with our partners, we do so in a way that protects your privacy as
required and guaranteed by law. Our interviewer can provide you a list
of our partners if you wish to learn more.”

Implications for release of NHANES DNA data

Public-use files: NCHS releases volumes of NHANES data within the confines of
these confidentiality authorities by taking steps to de-identify information collected from
individual participants. Since these public use files do not contain information
considered to directly or indirectly identify individuals, they are available freely to
researchers and do not need to be accessed in controlled situations.

Potentially identifiable data files: Beyond these public use files, data files that do
include variables that would make the data identifiable are made available to
researchers, but under more carefully controlled circumstances. NCHS has developed
a Research Data Center and remote access system which provide data access while
protecting confidentiality.

The first critical issue that needs to be addressed in order to determine the most
appropriate release strategies for databases that include genetic variation data is
whether the addition of the genetic testing results to existing public-use files produces a
combined file that could identify individuals. The existence of a growing number of DNA
collections through which a match could be made is discussed in the recent editorial in

31
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

Science authored by Lowrance and Collins1. If these collections contain personal


information, linkage to these files makes the NHANES file identifiable.

If the release of genetic data makes the file identifiable, then existing confidentiality
requirements would require that a number of special steps be taken when releasing the
NHANES genetic data to a wider audience. Specifically, these steps could include any
or all of the four options for controlled data access outlined in the main document: a
reengineered remote access system; additional Research Data Centers; designated
agent agreements; or informed consent changes.

The primary difference between the current NHANES model for data accessibility and
those used by other organizations, such as the dbGaP model used by NIH (see
Appendix E), is that in other models, the responsibility for protecting confidentiality is
transferred to the research analyst, with limited responsibility for oversight by the data
steward or the data collectors, once a researcher goes through the ‘front end’ process.
The NHANES model is to have the data steward (NCHS) retain this responsibility
through a variety of mechanisms, including maintaining a separation between the
researcher and the information that could be used to identify a participant. The
researcher rarely needs to see the ‘risky’ information but the mechanisms needed to
maintain the separation can limit the researcher’s analytic flexibility.

1
Lowrance WW and FS Collins. (2007) Ethics. Identifiability in genomic research. Science. Aug 3;317(5838):600-
2.

32
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

Appendix D. Considerations Related to Re-consent and Changes to Future


Informed Consent in Order to Achieve Broader Access to NHANES Genetic Data

NCHS data release policies are determined by both participant consent and federal law.
Past and current NHANES data collections promised that no personally identifiable data
would be released. In order to release identifiable data for GWAS studies, new
approaches to the consent process for future NHANES data collections and re-consent
of past participants would be required. It will be necessary to carefully evaluate the
impact of any such changes to the consent on survey operations, response rates and
data quality before moving in that direction.

Re-consent of the 7,157 NHANES III participants who were age 12 or more years in
1991-1994 and the 7,962 NHANES 1999-2002 participants age 20 or more years would
require tracking and re-contact by NHANES interviewers. New consent documents that
address the sharing of genetic data with the research community through unsupervised
data release mechanisms (such as dbGaP) will need to be developed, cognitively
tested, and pilot tested to assure that participants understand the benefits and
consequences of release of their genetic data. Different consent documents for
NHANES III participants, who never consented to genetic research and NHANES 1999-
2002, who consented but are now asked to allow for the release of potentially
identifiable data, may be needed. Cost to re-consent would be substantial. In 1998 it
cost $988,500 to re-consent 545 phase 1 NHANES III participants for NIH’s
Polymorphism Discovery Resource project. These participants were not a random
subsample but were selected because of their race-ethnicity and location that limited
travel for the interviewers. Only their DNA results were placed in the database; no other
NHANES data were included. It is reasonable to estimate that the process to contact
and re-consent all NHANES III and NHANES 1999-2002 would cost millions of dollars.
Currently, NCHS is in the process of obtaining cost estimates for potential re-consent
activities.

The success of a re-consent initiative will depend on the number of participants that are
successfully contacted and who consent to the new language. There are obvious
challenges in re-consenting participants with samples in the NHANES DNA bank. In
some cases, it has been up to sixteen years since the participant’s last contact with the
survey staff. During this time many participants many have changed addresses several
times and some may be deceased. For those who are contacted, response rates cannot
be predicted, but there will certainly be some that do not wish to have their genetic data,
which could be used to identify them, shared publicly. Therefore, it is entirely possible
that the re-consent process will yield a selection of participants who are no longer

33
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

representative of the U.S. population, in which case the major benefit of the NHANES
DNA samples may be lost.

Another possibility for identifiable NHANES data to be shared widely would be to


change the consent for future DNA collections. Due to the complex nature of this
consent document, this would need to be tested using a focus group of previous
NHANES participants to assess wording and acceptability, cognitively tested, and then
pilot tested. The pilot study would be done on an independent sample of persons whose
demographic characteristics are similar to NHANES participants. Extensive testing is
needed as there is the possibility of negative impacts on future NHANES data
collections. There is potential for reduction in response rates which could bias the data
collected. Another potential concern is that failure to promise confidentiality would
change how participants view their participation in the survey, which could have broader
unforeseen effects on data quality. The knowledge that their data would not be kept
confidential may actually alter participants’ willingness to participate in sensitive
components, such as the sexual behavior and drug use questionnaire component and
sensitive laboratory tests, or alter how participants respond to questions.

There are other potential effects of re-consent and/or changes to future consent that
should be considered due to broader implications to the NHANES data collections as
well as to NCHS. NCHS data released to researchers using the dbGaP model would
mean that NCHS could no longer monitor, for appropriate use, sensitive and potentially
identifiable data. Participants could potentially be identified through the matching of
NHANES genetic data to future databases that contained personally identifiable
information, which puts all NHANES data collected on that participant at risk of
disclosure, which could harm the participant. Public breach of confidentiality could have
a negative impact on current and future NHANES data collections and on all data
collection activities at NCHS and other parts of CDC.

34
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

Appendix E. NIH Database of Genotypes and Phenotypes (dbGaP)

In 2006, the National Institutes of Health (NIH), launched the database of Genotypes
and Phenotypes (dbGaP), which was designed to archive and distribute data from
genome-wide association studies (GWAS). dbGaP provides controlled access to
individual-level data, and open access to summary data and study documentation,
including summaries of the measured variables in an organized and searchable web
format.

The NIH described the reasoning behind dbGaP as follows: “The NIH has concluded
that the full value of GWAS can be realized only if the genotype and phenotype datasets
derived from GWAS are made available as rapidly as possible to a wide range of
scientific investigators. The NIH recognizes that GWAS data release practices must be
consistent with the informed consent provided by individual participants. The NIH
considers broad access to data to be particularly important to GWAS because of the
significant resources involved, the serious analytical challenges involved in such large
datasets, and the powerful opportunities that will be provided by the ability to make
comparisons across multiple studies.”2

In 2007, the NIH issued its “Policy for Sharing of Data Obtained in NIH Supported or
Conducted Genome-wide Association Studies (GWAS)”3 which includes the following
data submission and data access guidance intended to promote access while protecting
privacy and confidentiality:

Data submission guidance for dbGaP related to maintaining confidentiality of


individually identifiable data:

• Submitting investigators could consider whether a Certificate of Confidentiality


might be appropriate as an additional safeguard against involuntary disclosure of
research participant identities.
• Data submitted to dbGaP will be deidentified according to specific criteria and
coded using random, unique codes.
• A certification by the responsible Institutional Official will accompany the data
submission, attesting that:

2
NIH, Genome-wide Association Studies (GWAS) Policy Background.
http://grants.nih.gov/grants/gwas/background.htm. Accessed 10/03/07.
3
NIH Policy for Sharing of Data Obtained in NIH Supported or Conducted Genome-wide Association Studies
(GWAS). http://edocket.access.gpo.gov/2007/pdf/E7-17030.pdf. Accessed 10/03/07.

35
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

o The data submission is consistent with all applicable laws, regulations and
institutional policies;
o The appropriate research uses and exclusions are delineated;
o The identities of the research participants will not be disclosed to the NIH
data repository [Note: A significant difference in comparison to NHANES,
where a federal agency has access to participant identities]; and
o An IRB and/or Privacy Board has reviewed and verified that: the
submission and sharing of data is consistent with the informed consent;
the investigator’s plan for deidentifying the data is consistent with NIH
policy; risks to individuals, families, and groups associated with the
submitted data have been considered; and that the submitted data were
collected in a manner consistent with human subjects regulations.
• Submitting investigators may request removal of data on individual participants
upon withdrawal of consent.

Data access guidance related to maintaining confidentiality of individually


identifiable data:

• Requests for access to NIH GWAS data will be evaluated by federally-staffed


NIH Data Access Committees.
• Investigators and institutions seeking access to NIH GWAS data are asked to
submit a data access request that is cosigned by the investigator and the
designated institutional official.
• The data access request includes a Data Use Certification within which the
investigators agree to: use the data only for the approved research; protect data
confidentiality; follow the appropriate data security protections; follow all
applicable laws, regulations and local institutional policies and procedures for
GWAS data; not attempt to identify individual participants; not sell any of the data
elements; not share any of the data elements other than with those listed on the
request; agree to the listing of approved research uses with the GWAS repository
along with name and organizational affiliation; agree to report, in real time,
violations to the GWAS policy; acknowledge the GWAS policy regarding
publication and intellectual property; and provide annual progress reports.

Limitations of NIH GWAS policy for maintaining confidentiality of individually


identifiable data, and its applicability to NHANES:

The consequences of failure of compliance with the NIH guidance for data access are
unclear. Once released to individual researchers, the dataset can no longer be
protected. For the dbGaP data, the responsibility for protecting the research subject’s

36
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

confidentiality seems to fall on the individual researcher; however, for NHANES data,
that responsibility falls on the federal government. The implications of data misuse for
undermining the confidence in federal data collections are potentially much greater than
those of an individual researcher.

dbGaP Example: The Framingham Heart Study

The three-generation Framingham Heart Study serves as a case study for local access
to individually identifiable GWAS data with parallels to NHANES, including similarity in
size with more than 15,000 participants and 13,000 variables4. Requests for access to
Framingham data are submitted through the standard dbGaP request system6, and
include a Research Use Statement specifying the hypotheses or questions to be
addressed in the proposed data analysis, the phenotypes and covariates on which the
analysis will focus, the clinical events that may be needed, any exclusions that are
expected to part of the analytic approach, a descriptions of the adequacy of the
computing facilities to complete the proposed analyses, and detailed description of the
proposed analytic methods so that reviewers can determine whether the proposed key
personnel have the qualifications to complete the proposed research. The request also
includes a list of all collaborating investigators in the organization; collaborators at
different organizations must complete their own request for data use because
organizations are accountable for the actions of individuals. Since Framingham
research is always considered to be human subjects research due to the small, defined
population5, the request for data access must also include supplemental information
including documentation of IRB approval and human subjects training, a data security
plan, completed confidentiality awareness forms for all staff with access to the data, and
key personnel biosketches for determining qualifications to complete the proposed
research.6

4
NCBI Database of Genotypes and Phenotypes (dbGaP). http://www.ncbi.nlm.nih.gov/sites/entrez?db=gap.
Accessed 10/10/07.
5
Participant Protection Policy FAQ: The Framingham Heart Study. http://0-
www.ncbi.nlm.nih.gov.catalog.llu.edu/projects/gap/cgi-bin/GetPdf.cgi?id=phd000317. Accessed 10/03/07.
6
Instructions to apply for NHLBI authorized access datasets in dbGaP.
http://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?view_pdf&stacc=phs000007.v1.p1. Accessed 10/03/07.

37
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

Appendix F. The Data Access Subgroup of CDC’s Beyond Gene Discovery


Working Group—Membership List

Drue Barrett, PhD Gerry McQuillan, PhD


Office of the Chief Science Officer National Center for Health Statistics

Scott Bowen Cynthia Moore, MD


National Office of Public Health Genomics National Center on Birth Defects and
Developmental Disabilities
Nicole Dowling, PhD
National Office of Public Health Genomics Renée Ned, PhD, MS
National Office of Public Health Genomics
Peg Gallagher, PhD
National Center for Environmental Health Marilyn Radke, MD, MPH
Office of the Chief Science Officer
Ed Kilbourne, MD
National Office of Public Health Genomics Christopher Sanders
National Center for Health Statistics
Muin Khoury, MD, PhD
National Office of Public Health Genomics Tom Savel, MD
National Center for Public Health
Katherine Kolor, PhD, MS, CGC Informatics
National Office of Public Health Genomics
Karen Steinberg, PhD
Mechele Lynch, MA Coordinating Center for Health Promotion
National Office of Public Health Genomics
Deborah Tress, JD
Jennifer Madans, PhD Office of the Director
National Center for Health Statistics

Alison Mawle, PhD


National Center for Immunization and
Respiratory Diseases

38
Please do not distribute or cite this briefing document.
Prepared for the March 3, 2008 Beyond Gene Discovery Workshop.

39

Vous aimerez peut-être aussi