Vous êtes sur la page 1sur 34

GENOMICS AND

BIOINFORMATICS
LEARNING OUTCOMES
 Explain why genomics-related disciplines
in research are rapidly developing. (CO3,
CO4)
 Use online tools to analyze sequences.
(CO1, CO3)
 Provide basic understanding of how
bioinformatics can be used to analyze
nucleic acid and protein sequences.
(CO3, CO4)
LEARNING OUTCOMES
 Discuss social and ethical implications of
genetic engineering. (CO3, CO4)
 Defend your position in class debate on
the use of recombinant DNA technology
and its current restrictions. (CO3, CO4)
Genomics
 the study of genomes

Bioinformatics – an interdisciplinary field that


applies computer science and information
technology to promote an understanding
of biological sequences
Whole-Genome “Shotgun”
Sequencing
 the entire genome, introns and exons, is
sequenced
 the sequence of whole chromosomes is
constructed using restriction enzymes to
digest pieces of entire chromosomes
contigs (contiguous sequences) –
overlapping fragments
Whole-Genome “Shotgun”
Sequencing
Whole-Genome “Shotgun”
Sequencing
 1995 The Institute for Genomic Research
- first to use shotgun approach to sequence
the genome of Haemophilus influenzae
Bioinformatics: Merging
Molecular Biology with
Computing Technology
Newly identified gene or DNA sequence
- reported in scientific publications
- submitted to database

Database manipulations of DNA sequence


-involves the use of computer hardware and
software to study, organize, share and analyze data
related to gene structure, gene sequence and
expression, and protein structure and function
Examples of Bioinformatics in
Action
Databases
– search the new sequence against all other
sequences in the database and create an
alignment of similar nucleotide sequences if a match
is found
– can be used to predict the sequence of amino
acids encoded by a nucleotide sequence
– to provide information on the function of the
cloned gene
Examples of Bioinformatics in
Action
GenBank
– largest publicly available database of DNA
sequences
– contains the NIH collection of DNA sequences
– maintained by the National Center for
Biotechnology Information (NCBI)
Examples of Bioinformatics in
Action
Basic Local Alignment Search Tool (BLAST)
- www.ncbi.nlm.nih.gov/BLAST
- can be used to search GenBank for sequence
matches between cloned genes and to create
DNA sequence alignments

 Click “standard nucleotide-nucleotide BLAST


[blastn]
 in the search box type AATAAAGAAC
CAGGAGTGGA
 click the “Blast!” button
 click “Format”
Examples of Bioinformatics in
Action
Examples of Bioinformatics in
Action
Human Genome Nomenclature Committee
- establishes rules for assigning names and symbols
to newly cloned human genes

accession number (e.g. U14680)


- can use to refer back to that cloned sequence
- provides the original journal reference that
reported this sequence, the single-letter amino
acid code of the protein encoded by this gene,
and the nucleotide sequence (cDNA) of this gene
The Human Genome Project (HGP)
- an international collaborative effort with a 15-year
plan to identify all human genes, and to sequence
the approximately 3 billion pairs thought to make
up the 24 different human chromosomes

Objectives
 Analyze genetic variations among humans. This
included the identification of single-nucleotide
polymorphisms (SNPs).
 Map and sequence the genomes of model
organisms, including bacteria, yeast, roundworms,
fruit flies, mice, and others.
The Human Genome Project (HGP)
Objectives
 Develop new laboratory technologies such as
high-powered automated sequencers and
computing technologies, as well as widely
available databases of genome information, which
can be used to advance our analysis and
understanding of gene structure and function.
 Disseminate genome information among scientists
and the general project.
 Consider the ethical, legal, and social issues that
accompany the HGP and genetic research.
The Human Genome Project (HGP)
- coordinated with the National Center for Human
Genome Research
- primarily carried out by the International Human
Genome Sequence Consortium (China, France,
Germany, Great Britain, Japan and USA)
- estimated budget is $ 3B

- Celera Genomics – private company directed by


Dr. J. Craig Venter (previous director of The Institute
for Genomic Research)
-
The Human Genome Project (HGP)
- In 1998, revised the target date of completion to
2003
- On June 26, 2000 – 95% of the human genome had
been assembled
- On April 14, 2003 – a “map” of the human genome
was essentially complete, with virtually all bases
identified and placed in their proper order and
potential genes assigned to a chromosome
The Human Genome Project (HGP)
 consists of approximately 20,000 protein-coding
genes
 92-95% of human genes produce multiple proteins
through alternative splicing
The Human Genome Project (HGP)
 The human genome consists of approximately 3.1
billion base pairs.
 The genome is approximately 99.9% the same
between individuals of all nationalities.
 Single-nucleotide polymorphisms (SNPs) and copy
number variations (CNVs) – such as deletions,
insertions and duplications in the genome –
account for much of the genome diversity
identified between humans.
 Less than 2% of the genome codes for genes.
The Human Genome Project (HGP)
 The genome contains approximately 20, 000
protein-coding genes.
 The vast majority of our DNA is non-protein-coding,
and repetitive DNA sequences account for at least
50% of the noncoding DNA.
 Many human genes are capable of making more
than one protein, allowing human cells to make at
least 100, 000 proteins from only about 20, 000
genes.
 The functions of over half of all human genes are
unknown.
The Human Genome Project (HGP)
 Chromosome 1 contains the highest number of
genes. The Y chromosome contains the fewest
genes.
 Many of the genes in the human genome show a
high degree of sequence similarity to genes in
other organisms.
 Thousands of human disease genes have been
identified and mapped to their chromosomal
locations.
The Human Genome Project (HGP)

Estimated number of genes on each chromosomes and approximate size


in base pairs (bp) of each human chromosomes
The Human Genome Project (HGP)

Proposed Functions for the Numbers of Human Genes assigned to Different Functional Categories
Comparison of Selected Genes
Organism Apprx. Size Number Apprx. Web Access to
(Scientific of Genome of Genes percentage Genome
name) (date of Genes Databases
completed) Shared with
Humans
Bacterium 4.1 million bp 4, 403 Not www.genome.
(Escherichia (1997) determined wisc.edu/
coli)
Chicken 1 billion bp ≈20,000 – 60% http://genom
(Gallus (2004) 23,000 eold.wustl.edu
gallus) /projects/chic
ken
Dog 6.2 miliion bp ≈18,400 75% http://www.nc
(Canis (2003) bi.gov/genom
familiaris) e/guide/dog
Comparison of Selected Genes
Organism Apprx. Size of Number Apprx. Web Access to
(Scientific Genome of Genes %age of Genome
name) (date Genes Databases
completed) Shared
with
Humans
Chimpanzee ≈ 3 billion bp ≈20,000 – 96% http://www.natu
(Pan (initial draft, 24,000 re.com/nature/f
troglodytes) 2005) ocus/chimpgen
ome/index.html
Fruit fly 165 million bp ≈13,600 50% www.fruitfly.org
(Drosophila (2000)
melanogaster)
Humans ≈2.9 billion bp ≈20,000 - 100% www.doegeno
(Homo sapiens) (2004) 25,000 mes.org
Mouse ≈2.5 billion bp ≈30,000 ≈80% www.informatics
(Mus musculus) (2004) .jax.org
“Omics” Revolution
 Proteomics – studying all of the proteins in the cell

 Metabolomics – studying proteins and enzymatic


pathways involved in cell metabolism

 Metabonomics – measuring metabolic products


produced by cells in response to stimuli (such as
drug treatment) and genetic manipulation

 Glycomics – studying the carbohydrates of a cell


“Omics” Revolution
 Transcriptomics – studying all genes expressed
(transcription) in a cell

 Metagenomics – the analysis of genomes of


organisms collected from the environment

 Pharmacogenomics - customized medicine based


on a person’s genetic profile for a particular
condition
“Omics” Revolution
 Nutrigenomics – focused on understanding
interactions between diet and genes

 E. coli, Arabidopsis thaliana, Saccharomyces


cerevisiae, Drosophila melanogaster,
Caenorhabditis elegans, Mus musculus

 Comparative genomics – allow researchers to


study gene structure and function in these
organisms in ways designed to understand gene
structure and function in other species
Comparative Genomics
 Strongylocentrus purpuratus – an invertebrate that
has served as an important model organism for
developmental biologist, has genes with important
functions in humans (23, 500 genes)

 Genome 10K Plan – proposes to assemble 10,000


genomes in 5 years
 black cottonwood – help forestry industry make
better products, biofuels, to capture high levels of
CO2 from the atmosphere (45, 555 genes)
 honeybee – help honey-producing industry, how bee
toxins produced allergic responses
Stone Age Genomics
(Paleogenomics)
 Analysis of DNA from a 2,400-year-old Egyptian
mummy, mammoths, platypuses, Pleistocene-age
cave bears, and Neanderthals

 McMaster University in Canada and Pennsylvania


State University
- appx. 98.5% sequence identity between
mammoths and African elephants
- Siberian mammoths differ from African mammoths
by as little as 0.6%
Stone Age Genomics
(Paleogenomics)
 In 2009, Svante Paabo – reported completion of
rough draft of Homo neanderthalensis genome
encompassing more than 3 B base pairs
 FOXP2 – gene linked to speech and language
ability
What is Next?
 Human Epigenome Project
- involves creating hundreds of maps of epigenetic
changes in different cells and tissues and
evaluating potential roles of epigenetics in
complex diseases.

 Encyclopedia of DNA Elements (ENCODE)


- to use both experimental approaches and
bioinformatics to identify and analyze functional
elements (such as transcriptional start sites, promoters
and enhancers) that regulate expression of human
genes.
Personalized Genomics
 to sequence genome for individual people

 In 2007, 454 Life Sciences sequenced James


Watson’s genome for appx. $ 1M
- in mid-2007, it was completed and Watson allowed
his sequence to be available to researchers except
for the sequence of his apolipoprotein E gene
(ApoE)

George Church
-started the Personal Genome Project
Cancer Genome Atlas Project
(TCGA)
 to map important genes and genetic changes
involved in cancer
 has sequenced over 100 partial genomes for
various cancers

International Cancer Genome Consortium (ICGC)


- to sequence genomes from over 500 tumor samples
representing from more than 20 different cancers

Vous aimerez peut-être aussi