Genetics Lecture 8: Human Genome Projects

I. II. Terms Human Genome Project A. Shotgun sequencing: approach where the DNA sequence of interest is shredded randomly into numerous small fragments and then individual fragments are sequenced. The sequences of these individual fragments are used to determine the sequence of the original DNA sample 1. Hierarchical shotgun sequencing approach: approach where large, overlapping DNA fragments of known location in the human genome were subject to shotgun sequencing. The sequences of these large fragments were then used to determine the sequence of the human genome 2. Whole genome-shotgun sequencing: approach where a genome of interest is shredded randomly into numerous small fragments and the sequences of these small fragments are used to determine the sequence of the whole genome 3. Used hierarchical to sequence; addressed vexing issues arising from the abundance of repeat sequences in the human genome, which make it challenging to assemble our genome from scratch using a whole genome shotgun approach 4. Whole genome shotgun sequencing of a person’s genome the preferred approach 5. Next generation sequencing a. Faster and cheaper each year b. Parallelized, miniaturized, and cost- effective acquisition of raw sequence B. Human Protein coding genes 1. Exome: the complete exon content of an organism or individual 2. Only 22000 protein coding genes a. Introns – 28.5% b. Exons – 1.5% 3. 180000 exons in human genome C. Repeat sequences 1. Interspersed repeats: specific class of repeat sequences in the human genome that range from one hundred to three thousand base pairs. They consist of LINE, SINE, and retrovirus-like elements as well as DNA transposon fossils. (45% of genome) a. LINEs: Long interspersed nuclear elements are a specific type of interspersed repeat element that encode the proteins necessary for their own replication. b. SINEs: Short interspersed nuclear elements are a specific type of interspersed repeat element that do not encode the proteins necessary for their own replication c. Retrovirus-like elements: Specific type of interspersed repeat element that may or may not encode the proteins necessary for their own replication. d. DNA transposon fossils: Specific type of interspersed repeat element that may or may not encode the proteins necessary for their own replication 2. Segmental duplications: Sequences >1-kb in length that share 90-98% identity. a. Special class of repeat elements that are especially large and share high percentage identity with another genomic region. b. Large enough to encompass entire genes c. Can contain interspersed repeat sequences.

d. Segmental duplications >1-kb in length with >90% identity comprise 4% of the human genome D. Results from Human Genome Projects 1. Single nucleotide polymorphism (SNP): polymorphism in DNA sequence consisting of variation in a single base a. Type of sequence variation present at >1% frequency in human populations 2. Comparing the genomes of individuals from different parts of the world a. Greater divergence within African populations than Europeans or Asians E. Estimates of Mutation Frequency 1. Non-synonymous SNPs that change the amino acid sequence of protein: 10,000 2. About 100 loss-of-function sequence variants in annotated genes 3. The number of true deleterious mutations in any person’s genome is still under debate. Many of the loss-of-function variants could be in genes that are not essential for human health. F. De novo mutations and disease 1. De novo mutations: a sequence change that is present for the first time in one family member as a result of a mutation in a germ cell of one of the parents or in the fertilized egg. a. Every person has about 70 de novo mutations in their genome 2. Ex: CHARGE syndrome G. Copy number variants (CNVs) 1. Copy number variants: sequence that is at least one kilobase in length and polymorphic. 2. Different genomes may vary from 4-24MB in length a. Usually involve segmental duplications III. Complex Traits and Disorders A. Complex Traits and Disorders 1. Non-Mendelian inheritance patterns and familial aggregation (clustering in families), but no clearly defined pattern of transmission a. Don’t follow Mendelian inheritance because multiple genes and environmental factors affecting expression of disease traits 2. Population prevalence of diseases a. Chromosomal abnormalities: 4/1000 b. Single gene mutations: 20/1000 c. Multifactorial inheritance: 300/1000 3. Multifactorial Inheritance: the type of non-Mendelian inheritance shown by traits that are determined by a combination of multiple factors, genetic, and environmental. It is also termed complex inheritance. In principle, multifactorial inheritance can be polygenic (involving many genes at different loci), but it always has to be influence by the environment. a. Polygenic: Inheritance determined by many genes at different loci, with small additive effects. It is distinct from multifactorial inheritance in that environmental factors are not involved (like eye color) b. Include Alzheimer Disease, Arthritis, autism, coronary artery disease, orofacial clefting, schizophrenia, type 2 diabetes B. Genome Wide Association Studies

1. Genome Wide Association Studies (GWAS): a case-control study in which genetic variation, often measured as SNP genotypes, is compared between people with a particular trait and unaffected individuals. a. Role that common genetic variants play in human disease b. Address the central hypothesis that common genetic variation are responsible for common human traits 2. Common Disease – Common Variant Hypothesis: Common, interacting disease alleles underlie most common diseases, perhaps in association with environmental factors 3. Haplotype: A set of DNA variations, or polymorphisms that tend to be inherited together. a. A haplotype can refer to a combination of alleles or to a set of single nucleotide polymorphisms (SNPs) found on the same chromosome. b. Combo of alleles transmitted together c. Allows for clustering of SNPs to be transmitted from generation to generation d. Linkage: Genes on the same chromosome are linked if they are transmitted together in meiosis more frequently than by chance. e. Linkage Disequilibrium: The nonrandom association between two or more alleles such that certain combinations of alleles are more likely to occur together on a chromosome than other combinations of alleles 4. SNPs found frequently in affected groups indicated genomic regions relevant to disease C. Age related macular degeneration 1. AMD is common vision disorder that affects the function of the macula, a region near the center of the retina where visual perception is most acute a. “Wet” macular degeneration Abnormal blood vessels grow under the macula b. “Dry” macular degeneration: light sensitive cells die 2. Common human disease where common variation represents a major risk factor a. About 70% of the genetic risk can be inferred based on SNP genotypes and smoking status b. Protective & risk alleles found c. Genes highlight role of inflammation inn disease D. Lessons from GWAS 1. Most common variants have only modest effects on risk 2. For most common diseases/traits: identified SNPs only account for <5-10% of overall risk 3. Potentially useful for understanding pathophysiology a. New biologic pathways, New drug targets 4. Providing new leads into molecular mechanisms of diabetes and many other common disorders. a. Diabetes is an examples of a common human disease where common genetic variation does not explain the bulk of genetic risk factors E. Overview of genetic risk factors 1. Relationship between effect size (severity of disease) relative to frequency of genetic variation in the human population a. Rare Mendelian disorders (such as CF, hereditary breast and ovarian cancer, and Duchenne muscular dystrophy) show large effect sizes and rare alleles b. Common alleles associated with most common disorders (such as Type 2 diabetes) almost always show small effect sizes.


Exception is age-related macular degeneration for which the common alleles can confer a high-risk factor for disease.