Vous êtes sur la page 1sur 49

-Part

IV

-Chapter 21
(Genomics and Proteomics)

12/29/2014

Different Approaches to Study


Biological Questions

DNA

RNA

Proteins

Methylation

Gene expression

2D PAGE

CNV

ChIP-on-chip

LC-MASS-MASS

SNP

microRNA

Protein array

Tens of thousands of data!


DNA()
1995
1,830,137
800
(Saccharomyces
cerevisiae)

6000
(C. elegans 1998).

Prokaryotic Genomes
n Of

interest because

Bacteria

cause diseases
Can apply knowledge to more complex
organisms
Origin of first eukaryotic cell probably involved
the union between an archaeal () and
bacterial cell

n Entire

genomes of many prokaryotes


sequenced and analyzed
n Prokaryotic chromosomes usually several
hundred thousand to a few million bp
n Most contain a single chromosome,
though multiple copies may be found in a
single cell
n Some prokaryotes are known to have
different chromosomes

n Bacterial

chromosomes usually circular

Linear

chromosomes in some prokaryotes


Some have both linear and circular
n Total

number of genes correlated with


genome size
n Prokaryotic genomes less complex
Lack

centromeres, telomeres, single origin of


replication, relatively little repetitive DNA

n Often

have plasmids- typically small

Venter, Smith, and Colleagues Sequenced the First


Complete Genome, That of Haemophilus influenzae
n Causes

a variety of human diseases


n Relatively small genome 1.8 Mb (approx.
800 genes)
n One strategy for mapping large genomes is
extensive mapping
n Alternative is shotgun DNA sequencing (
)
Randomly

sequence fragments
Does not require extensive mapping but you
may waste time sequencing the same DNA
region

n Can

calculate how many fragments to


sequence based on the genome size
n Based on similarities to known genes in
other species, 2/3 of this genome have
predicted functions

New approaches have accelerated the pace


of genome sequencing
n

The most ambitious mapping project to date has


been the sequencing of the human genome

Officially begun as the Human Genome Project


in 1990, the sequencing was largely completed
by 2003

The project had three stages


Genetic
Physical

(or linkage) mapping


mapping

DNA sequencing
9

Three-Stage Approach to Genome Sequencing


n

n
n

A linkage map (genetic map) maps the location


of several thousand genetic markers on each
chromosome
A genetic marker is a gene or other identifiable
DNA sequence
Recombination frequencies are used to
determine the order and relative distances
between genetic markers

10

Figure 21.2-4

Chromosome
bands
Cytogenetic map
Genes located
by FISH

1 Linkage mapping
Genetic
markers
2 Physical mapping
Overlapping
fragments

3 DNA sequencing

11

A physical map expresses the distance between


genetic markers, usually as the number of base
pairs along the DNA
It is constructed by cutting a DNA molecule into
many short fragments and arranging them in
order by identifying overlaps
Sequencing machines are used to determine the
complete nucleotide sequence of each
chromosome
A complete haploid set of human chromosomes
consists of 3.2 billion base pairs

12

Human Genome Project


Officially began October 1, 1990
Goals
identify all the approximately 20,000-25,000 genes in
human DNA
determine the sequences of the 3 billion chemical
base pairs that make up human DNA
store this information in databases
improve tools for data analysis
transfer related technologies to the private sector, and
address the ethical, legal, and social issues (ELSI)
that may arise from the project.
13

n
n

1990 (Human Genome Project)




1998 (CELERA)
DNA


2000 DNA
2003 DNA

(2005)
14

Whole-Genome Shotgun Approach to


Genome Sequencing ()
n
n
n

The whole-genome shotgun approach was


developed by J. Craig Venter in 1992
This approach skips genetic and physical mapping
and sequences random DNA fragments directly
Powerful computer programs are used to order
fragments into a continuous sequence

15

Figure 21.3-3

1 Cut the DNA into


overlapping fragments short enough
for sequencing.

2 Clone the fragments


in plasmid or phage
vectors.

3 Sequence each
fragment.

4 Order the
sequences into
one overall
sequence
with computer
software.

16

Eukaryotic Genomes
n Nuclear

genome usually found in sets of


linear chromosome
n Extranuclear DNA found in mitochondria
and chloroplasts
n Entire nuclear genome sequenced for
several species

17

n Genome

size is not the same as the


number of genes
n Relative size of nuclear genome varies
dramatically
n In general, increases in the amount of
DNA are correlated with increasing cell
size, cell complexity and body complexity
n However, major variations are observed
between organisms with similar form and
function
18

n Eukaryotic

sequences
Many

genomes have repetitive

copies of short DNA sequences

n Moderately

repetitive sequences

Few

hundred to several thousand times


rRNA genes, multiple origins of replication, or
role in gene transcription and translation
n Highly

repetitive sequences

Tens

of thousands or millions of times


Most have no known function
n Coding

regions are only 2% of our


genome
19

Percentage of Genome
n 98%

noncoding

Intron

DNA 24%
Unique noncoding DNA 15%
Repetitive DNA 59%
n Much

n 2%

derived from transposable elements

coding regions of genes

20

21

22

23

1000 Genome Project

three U.S. sequencing centers funded by NHGRI


the Sanger Institute in UK
the Beijing Genomics Institute (BGI) in Shenzhen, China
24

Genomes vary in size, number of


genes, and gene density
n

By early 2010, 1,200 genomes were completely


sequenced, including 1,000 bacteria, 80
archaea, and 124 eukaryotes
Sequencing of over 5,500 genomes and over 200
metagenomes is currently in progress

25

Genome Size
n

n
n

Genomes of most bacteria and archaea range


from 1 to 6 million base pairs (Mb); genomes of
eukaryotes are usually larger
Most plants and animals have genomes greater
than 100 Mb; humans have 3,000 Mb
Within each domain there is no systematic
relationship between genome size and phenotype

26

Genome Sizes and Estimated Numbers of Genes*

27

More Complex Organisms Have


Decreased Gene Density

28

Comparing genome sequences provides


clues to evolution and development
n
n

Genome sequencing and data collection has


advanced rapidly in the last 25 years
Comparative studies of genomes
Advance

our understanding of the evolutionary


history of life
Help explain how the evolution of development
leads to morphological diversity

29

Comparing Genomes
n
n

Genome comparisons of closely related species


help us understand recent evolutionary events
Genome comparisons of distantly related species
help us understand ancient evolutionary events

30

Comparing Distantly Related Species


n
n
n
n

Highly conserved genes have changed very little


over time
These help clarify relationships among species
that diverged from each other long ago
Bacteria, archaea, and eukaryotes diverged from
each other between 2 and 4 billion years ago
Highly conserved genes can be studied in one
model organism, and the results applied to other
organisms

31

Comparing Closely Related Species


n

Genetic differences between closely related


species can be correlated with phenotypic
differences
For example, genetic comparison of several
mammals with nonmammals helps identify what
it takes to make a mammal

32

Comparing Genomes Within a Species


n

n
n

As a species, humans have only been around


about 200,000 years and have low within
-species genetic variation
Variation within humans is due to single
nucleotide polymorphisms, inversions, deletions,
and duplications
Most surprising is the large number of copy
-number variants
These variations are useful for studying human
evolution and human health
33

Proteomics ()
n Entire

collection of a species proteins


n Due to gene regulation, any given cell will
produce only a subset of its proteome
n What proteins are made depends on what
type of cell it is, stage of development, and
environmental conditions

34

2D Gel Electrophoresis ()
n Can

separate thousands of different


proteins in a cell extract
n Sample of proteins loaded onto tube
shaped gel
n Isoelectric focusing separates proteins
according to their net charge at a given pH
n Tube gel then placed on slab gel that
separates proteins by molecular mass
35

36

Mass Spectroscopy ()
n
n

Proteins can be identified by their unique amino


acid sequence
Tandem mass spectrometry uses 2
spectrometers
First

measures mass of a given peptide


Second measures mass after the peptide has been
digested into smaller fragments
n

Researcher could determine the possible codon


sequences that could encode such a peptide
More

than one sequence is possible due to


degeneracy of genetic code
Using computer software, entire genomic sequence
scanned
Can locate match between codon sequence and
specific gene

37

38

Proteomes ()
n Relative

abundance of proteins

Abundance

in genome

n Number

of genes that encode a particular type of


category of protein

Abundance

in cell

n Amount

of a given protein or protein category


actually made by a living cell

n Liver/muscle

cell example

Same

genes so % in genome identical


Cellular abundance very different
39

40

Metabolic enzymes
Accelerate

chemical reactions within the cell


Genome abundance 20-30%
Cellular abundance 20-30%
n

Structural proteins
Provide

shape and form to cells and organisms


Genome abundance 5% of eukaryotic genome
Cellular abundance 5% but can be much higher
n

Motor proteins
Use

energy for intracellular or whole cell movement


Genome abundance- less than 2%
Cellular abundance can be abundant 25-40% in
skeletal muscle cells

41

Cell-signaling proteins
Used

to respond to environmental signals and send


signals
Genome abundance 12%
Cellular abundance less than 12%
n

Transport proteins
Transport

of ions and molecules across membranes


Genome abundance 10-15%
Cellular abundance less than 10-15%
n

Gene expression and regulation proteins


Transcription,

mRNA modification, translation


Genome abundance 25-30%
Cellular abundance well over 25%
42

Protective proteins
Help

survive environmental stress


Difficult to calculate
Genome abundance less than 2%
Cellular abundance antibodies are the most diverse
of all proteins
n How much is made depends on cells environment

43

Proteomes are larger than genomes


n Due

to

Alternative

splicing

n A

single pre-mRNA can be spliced into more than


one version
n Often cell specific or related to environmental
conditions
Post-translational

covalent modification

n Permanent

or transient
n Involved in assembly and construction of protein
n Phosphorylation, methylation, acetylation more
transient
44

45

How Systems Are Studied: An Example


n

A systems biology approach can be applied to


define gene circuits and protein interaction
networks
Researchers working on the yeast Saccharomyces
cerevisiae used sophisticated techniques to
disable pairs of genes one pair at a time, creating
double mutants
Computer software then mapped genes to
produce a network-like functional map of their
interactions
The systems biology approach is possible
because of advances in bioinformatics
46

Figure 21.5

Glutamate
biosynthesis

Translation and
ribosomal functions

Mitochondrial
functions

Vesicle
fusion

RNA processing
Peroxisomal
functions

Transcription
and chromatinrelated functions

Amino acid
permease pathway

Metabolism
and amino acid
biosynthesis

Nuclearcytoplasmic
transport

Secretion
and vesicle
transport

Nuclear migration
and protein
degradation
Mitosis
DNA replication
and repair

Serinerelated
biosynthesis

Cell polarity and


morphogenesis

Protein folding,
glycosylation, and
cell wall biosynthesis
47

Application of Systems Biology to Medicine


n

A systems biology approach has several medical


applications
The

Cancer Genome Atlas project is currently


seeking all the common mutations in three types
of cancer by comparing gene sequences and
expression in cancer versus normal cells
This has been so fruitful, it will be extended to
ten other common cancers
Silicon and glass chips have been produced
that hold a microarray of most known human
genes
48

49

Vous aimerez peut-être aussi