Vous êtes sur la page 1sur 17

TOPIC

FOR TERM PAPER


“HOW CAN YOU SEQUENCE A GENOME”

COURSE CODE: BTC 012


COURSE NAME: CONCEPTS IN BIOTECHNOLOGY.

SUBMITTED TO:

DR. HARSH
DEPTT. OF BIO-TECHNOLOGY
SUBMITTED BY:
RAKESH KUMAR,
B.TECH BIO-TECH
SEC-“A”
ROLL NO-41
REG. NO-1040070163
ACKNOWLEDGEMENT

On completion of the assignment project” HOW CAN YOU SEQUENCE A


GENOME”, i wish to acknowledge my thanks and gratitude to all those who have been
associated with this work directly/indirectly for their help,support,advice and suggestions
during the various stages of this project.

Above all this, I thank our academic faculty Dr.HARSH, the faculty of
our University, for providing all the guidance and motivation,which made it possible for
me to complete a project like this .I thank my respected teacher again for his encouraging
and supportive attitude towards my academic efforts towards the project,which made this
assignment a new and interesting experience.

RAKESH KUMAR

Introduction:
Studying the human genome - the complete set of human genes - is a way of studying
fundamental details about ourselves. The three billion letters of the human genome are
written using the four-letter alphabet of DNA. The DNA is divided among 23 pairs of
chromosomes that are found in each of the trillions of cells in our bodies. In 2003, The
Human Genome Project produced a complete representative sequence of the human
genome. Of course, people are not identical, and DNA sequences do differ subtly
between individuals. Currently, a number of separate projects are charting sequence
variations found in human populations.

The representative sequence is a composite from several people who donated blood
samples. Originally, close to 100 people volunteered to give a sample of their blood.
Each person provided their informed consent, affirming that they agreed to the study of
their DNA. No names were attached to the blood samples and ultimately scientists used
only a few of them. These measures ensured that the DNA sequences remained
anonymous; not even the donors knew whether their samples were actually used or not.

The main goal of The Human Genome Project was to read, letter by letter, the three
billion bases of human DNA. Before starting to sequence the human genome, scientists
built maps of the chromosomes and developed and refined techniques for analyzing
DNA. With the tools in place, project scientists began large-scale DNA sequencing in
1999. In just one year, they had amassed sequence data covering more than 80 percent of
the genome.

The human genome is a massive text. If the three billion letters (or bases) of the genome
were printed in telephone books, they would require a stack of books nearly as tall as the
Washington monument.

To accurately determine the sequence of every base in the genome, scientists needed to
read the three billion bases not just once, but at least six to ten times. Individual
sequencing reactions could only reveal the order of a few hundred bases of DNA at a
time - amounting to a fraction of a page. This meant that to place in order all of the DNA
bases, it was necessary to produce many thousands of overlapping segments of DNA
sequence.

A History of Genome Sequencing:


The sequencing of the human genome along with related organisms represents one of the
largest scientific endeavors in the history of mankind. The information garnered from
sequencing will provide the raw data for the exploding field of bioinformatics, where
computer science and biology live in symbiotic harmony. The large scale sequencing
proposed by the Human Genome Project in 1990 could never have been a reality without
modern computer facilities. Merely, twenty years ago computers would have been
powerless in light of such a daunting amount and array of data. Homologue identification
and genome characterization between organisms constituting millions of nucleotides was
unimaginable until the rapid advancement of microchips and processors over the past two
decades. In addition the first sequenced genome of a live organism, Haemophilus
influenzae, would have been impossible without the computational methods developed at
the facilities of The Institute for Genetic Research (TIGR). While this is not technic

The art of determining the sequence of DNA is known as Sanger sequencing after its
brilliant pioneer. This technique involves the separation of flourescent labeled DNA
fragments according to their length on a polyacrilimide gel (PAGE). The base at the end
of each fragment can then be visualized and identified by the dye with which it reacts.
The time and labor intensive nature of gel preparation and running, as well as the large
amounts of sample required, increase the time and costs of genomic sequencing. These
conditions drastically reduce the efficiency of sequencing projects ultimately limiting
researchers in their sequencing attempts.

Bacteriophage fX174, was the first genome to be sequenced, a viral genome with only
5,368 base pairs (bp). Frederic Sanger, in another revolutionary discovery, invented the
method of "shotgun" sequencing, a strategy based on the isolation of random pieces of
DNA from the host genome to be used as primers for the PCR amplification of the entire
genome. The amplified portions of DNA are then assembled by their overlapping regions
to form contiguous transcripts (otherwise known as contigs). The final step involved the
utilization of custom primers to elucidate the gaps between the contigs thus giving the
completely sequenced genome. Sanger first used "shotgun" sequencing five years later to
complete the bacteriophage l sequence that was significantly larger, 48,502 bp. This
method allowed sequencing projects to proceed at a much faster rate thus expanding the
scope of realistic sequencing venture. Since then a couple of other viral and organellar
genomes have been sequenced using similar techniques such as the 229 kb genome of
cytomegalovirus (CMV), the 192 kb genome of vaccinia, and the 187 kb mitochondrial
and the 121 kb chloroplast genomes of Marchantia polymorpha, and the 186 kb genome
of smallpox.

The success with viral genome sequencing stemmed from the relatively small length of
their genetic codes. In 1989, Andre Goffeau set up a European consortium to sequence
the genome of the budding yeast Saccharomyces cerevisiae (12.5 Mb). Goffeau's
European collaboration involved 74 different laboratories drawn to the project in hopes of
sequencing the homologs of their favorite genes. Most laboratories utilized Sanger's
"shotgun" method of sequencing that had become the accepted standard for genome
sequencing. S. Cerevisiae had a sequence approximately 60 times larger than any
sequence previously attempted indicating why Goffeau felt compelled to invite the
cooperation of a group of laboratories. At the time the sequencing of model organisms
such as S. Cerevisiae appeared to be the logical step towards the eventual characterization
of the human genome, a task that seemed beyond the scope of technology due to its
tremendous size of 3,000 Mb. Sequencing smaller genomes would highlight the problems
with sequencing techniques eventually refining the technology to be used on large-scale
projects like H. Sapiens. In addition, valuable insight concerning these organisms would
be gained with the elucidation of their genetic makeup.

The following year saw the initiation of a plethora of ambitious sequencing proposals the
foremost being the introduction of the Human Genome Project in 1990. The U.S. Human
Genome Project (HGP) is a joint effort of the Department of Energy and the National
Institute of Health that was designed as a three-step program to produce genetic maps,
physical maps, and finally the complete nucleotide sequence map of the human
chromosomes. The first two aims of the project are practically fulfilled and now the
majority of work is concentrated on the exact nucleotide sequence of the human. In the
wake of this pronouncement came the start of three projects aimed at elucidating the
sequences of smaller model organisms, similar to S. Cerevisiae in their academic utility,
such as Escherichia. coli, Mycoplasma capricolum, and Caenorhabditis. elegans. It was
hoped that these projects would increase the efficiency of sequencing but unfortunately
they fell short of this task. Many anticipated that E. coli would be the first genome to be
sequenced entirely but to the shock of the science community, an outsider won the race
for the first complete genome sequence of a free living organism, Haemophilus
influenzae.

A team headed by J. Craig Venter from the Institute for Genomic Research (TIGR) and
Nobel laureate Hamilton Smith of Johns Hopkins University, sequenced the 1.8 Mb
bacterium with new computational methods developed at TIGR's facility in Gaithersburg,
Maryland. Previous sequencing projects had been limited by the lack of adequate
computational approaches to assemble the large amount of random sequences produced
by "shotgun" sequencing. In conventional sequencing, the genome is broken down
laboriously into ordered, overlapping segments, each containing up to 40 Kb of DNA.
These segments are "shotgunned" into smaller pieces and then sequenced to reconstruct
the genome. Venter's team utilized a more comprehensive approach by "shotgunning" the
entire 1.8 Mb H. Influenzae genome. Previously, such an approach would have failed
because the software did not exist to assemble such a massive amount of information
accurately. Software, developed by TIGR, called the TIGR Assembler was up to the task,
reassembling the approximately 24,000 DNA fragments into the whole genome. After the
H. Influenzae genome was "shotgunned" and the clones purified sufficiently the TIGR
Assembler software required approximately 30 hours of central processing unit time on a
SPARCenter 2000 containing half a gigabyte of RAM testifying to the enormous
complexity of the computation.

Venter's H. Influenzae project had failed to win funding from the National Institute of
Health indicating the serious doubts surrounding his ambitious proposal. It simply was
not believed that such an approach could sequence the large 1.8 Mb sequence of the
bacterium accurately. Venter proved everyone wrong and succeeded in sequencing the
genome in 13 months at a cost of 50 cents per base which was half the cost and
drastically faster thanconventional sequencing. This new method of sequencing led to a
multitude of completed sequences over the ensuing years by TIGR. Mycoplasma
Genitalium, a bacterium that is associated with reproductive-tract infections and is
renowned for having the shortest genome of all free-living organisms was sequenced by
TIGR in a period of eight months between January and August of 1995 an extraordinary
example of the efficiency of TIGR's new sequencing method. TIGR subsequently
published the first genome sequence of a representative of the Archaea, Methanococcus
jannaschii , the first genome sequence of a sulfur-metabolizing organism, Archaeoglobus
fulgidus , the genome sequence of the pathogen involved in peptic ulcer disease,
Helicobacter pylori , and the genome sequence of the Lyme disease spirochaete, Borrelia
burgdorferi.

At the close of 1997, we are halfway through the time allotted for completing the Human
Genome Project projected to finish on September 30, 2005 approximately fifty years after
the landmark paper of Watson and Crick. Currently major groups have sequenced
approximately 50 Mb of human DNA representing less than 1.5% of the 3,000 Mb
genome. The estimated finish of the human genome by the year 2,000 appears quite
optimistic considering that the world's large-scale sequencing capacity is approximately
100 Mb per year. To complete the genome the average production must increase to 400
Mb per year. Several factors including the slow rate of Sanger sequencing and the high
accuracy goal of the HGP which allows for one error in 10,000 bases limits the ability of
researchers to proceed more quickly. Advancements in Sanger sequencing or possible
replacements for this time intensive process will be necessary to ensure the HGP's goal of
completion by the year 2005.
As of September of 1997, thirteen genome sequences of free-living organisms had been
completed including the two largest, E. Coli and yeast, and eleven other microbial
genomes under the length of 4.2 Mb. Four other large-scale projects are in progress
including the sequencing of the Nematode, C. Elegans, which is 71% completed, the fruit
fly, Drosophola Melanogaster which is 6% completed, the mouse which has less than 1%
finished, and the human which is only 1.5% completed. These statistics are impressive
considering that only four years ago no completed sequences existed.

The rapid proliferation of biological information in the form of genome sequences has
been the major factor in the creation of the field of bioinformatics, that focuses on the
acquisition, storage, access, analysis, modeling, and distribution of the many types of
information embedded in DNA sequences. This field will be challenged by the
heightening demands of increased information on the algorithms currently utilized for
sequence manipulation. The growing sequence knowledge of the human genome has
been likened to the establishment of the periodic table in the 19th century. Just as past
chemists systematically organized all elements in an array that captured their differences
and similarities, the Human Genome Project will allow modern scientists to construct a
biological periodic table relating units of nucleotides. The periodic table will not contain
100 elements, but 100,000 genes reflecting not their similarity in electronic configuration
but their evolutionary and functional relationship. Bioinformatics will be the tool of the
modern scientist in interpreting this periodic table of biological information.ally an aspect
of bioinformatics, this development would have been impossible with the computers of
yesterday. The short history of genome sequencing began with Frederic Sanger's
invention of sequencing almost twenty-five years ago.

How to sequence a genome:


This page contains narrated segments presenting all the essential steps in sequencing a
genome.

1.Mapping:

To begin the project, researchers built maps of the human genome. They identified
thousands of DNA sequence landmarks that helped them navigate across the
chromosomes.

Developing genome maps was necessary preparation for DNA sequencing. These same
maps also served to orient geneticists who were hunting for disease genes.

With enough landmarks in place, project scientists created "libraries" of clones that
spanned the genome. Each clone contained a manageably small fragment of human DNA
that was stored in bacteria. Scientists used the landmarks to tell them what part of the
human genome each fragment came from.

This clone-by-clone approach made it possible to double check the location of each DNA
sequence. It also allowed participating laboratories from around the world to carve up the
genome and coordinate their work.

2.Building Libraries:

Clone libraries offered the same advantage of real libraries: orderly access to information.
In most clone libraries, the DNA fragments were stored in E. Coli. These are bacteria that
normally live in our intestines. Each E. Coli cell stored a single segment of human DNA
and represented a single book of the library. Clone libraries allowed each human
fragment to be tracked and easily copied.

3.Subclones:

The clone libraries were prepared using bacterial artificial chromosomes, or BACs. Each
BAC clone contained 100,000 to 200,000 bases of DNA sequence. The large BAC clones
were used to establish the order of the DNA sequences. To sequence the DNA, smaller-
sized clones were needed. Project scientists cut the large BAC clones into smaller
fragments of about 2,000 bases. These smaller fragments were typically stored in viruses
called phage that can infect E. coli cells.
4.E. Coli to Store and Copy DNA:

E. coli cells containing fragments of human DNA, or any other type of DNA, can be
stored in freezers indefinitely. When scientists need to retrieve DNA from the library,
they simply revive the cells by bringing them back up to 37 degrees Centigrade - gut
temperature.

The E. coli cells act as copiers, producing many copies of the human DNA sequence that
they contain. To prepare to sequence DNA, a clone of cells containing the same bit of
human DNA is released into a rich, warm broth. The cells are shaken vigorously to
provide them with air. This causes them to divide rapidly - about once every half hour.
After incubating for just a single night, one third of a teaspoon of broth contains billions
of E. coli cells and so, billions of copies of the particular fragment of human DNA they
contained.

5.Preparing DNA for Sequencing Reactions:

The next morning, the E. coli cells are broken open to release their DNA. The human
DNA is separated from the cell debris and washed clean.

Now there are enough copies of the human DNA fragment to set up a sequencing
reaction.

6.Sequencing Reactions:

A DNA sequencing reaction includes four main ingredients, "Template" DNA copied by
the E. coli; free bases, the building blocks of DNA that come in 4 types; short pieces of
DNA called "primers"; and DNA polymerase, the enzyme that copies DNA.

The chemical reaction that makes DNA in a test tube is similar to what happens in a
living cell: both rely on DNA polymerase and, in both cases, DNA strands have a head
end, which is called the 5' end, and a tail end, which is called the 3' end. A DNA strand
can grow only from its 3' end.

Making DNA in cells and sequencing DNA in test tubes both depend on complementary
base pairing. The building blocks on opposite strands of DNA pair specifically - a C
always pairs with a G, and an A always pairs with a T.

The primer sequence binds to its complementary sequence on the template DNA.

Free bases that match the template sequence can attach to the new strand's growing (3')
end.

Among the free bases in the solution are a few that have a fluorescent dye attached to
them. When a dye-bearing base attaches to the growing strand, it stops the new DNA
strand from growing any further. A different colored dye is attached to each of the four
kinds of bases.

7.Products of Sequencing Reactions:

A completed sequencing reaction contains an array of colored DNA fragments. The


shortest fragments correspond to the length of the primer plus one dye-colored base. The
longest fragments are usually between 500 and 800 bases long, depending on when the
sequencing reaction ran out of steam.

The products of sequencing reactions are fed into an automated sequencing machine.
Automated sequencers have become increasingly sophisticated during the past decade.
They can run more samples, process them more quickly, and are easier to operate.

8.Separating the Sequencing Products:

The DNA molecules produced during the sequencing reaction are separated from each
other by a process called electrophoresis. DNA molecules are negatively charged. The
sequencing machine sets up an electric field; all the DNA moves through a porous gel
toward the positive electrode. The gel acts like a sieve; shorter DNA fragments move
more quickly through the holes of the gel than do larger DNA fragments.

9.Reading the Sequencing Products:

As each DNA fragment reaches the end of the gel, a laser excites its fluorescent dye. A
camera detects the color of the emitted light and passes that information to a computer.
One by one, the machine records the colors of the DNA fragments that pass through the
gel.

A single sequencing reaction can reveal the order of several hundred DNA bases.

10.Assembling the Results:

A computer program integrates the data from individual sequencing reactions. It can spot
where DNA fragments overlap and order them as they originally were on the
chromosome.

Many overlapping sequences reads are needed to generate the uninterrupted sequence of
the original stretch of DNA. During the Human Genome Project, every base pair of DNA
was sequenced an average of nine times. Some stretches of DNA were easy to read and
needed to be sequenced little less often, while other stretches were more difficult to read
and had to be sequenced more often.

During the Human Genome Project scientists ran more than 50 million sequencing
reactions. Some 2000 scientists from more than two dozen labs around the world, worked
on the project.
11.Working Draft Sequence:

Whenever a stretch of DNA that spanned 2,000 or more bases was assembled, it was
placed into public databases within 24 hours. Anyone with access to the Internet could
see and analyze the sequence data.

After sequencing the 3 billion letters in the human genome an average of nine times, the
Human Genome Project had released DNA sequence for 99 percent of the genome. This
finished sequence was 99.99 percent accurate. The project had completed all of its goals
ahead of schedule and under budget.

Multiple Sequence Alignment


One of the most important contributions of biological sequences to evolutionary analysis
is the discovery that sequences of different organisms are often related. Similar genes are
conserved across widely divergent species, often performing a similar or even identical
function, and at other times, mutating or rearranging to perform an altered function
through the forces of natural selection. Thus, many genes are represented in highly
conserved forms in a wide range of organisms. Through simultaneous alignment of the
sequences of these genes, the patterns of change in the sequences may be analyzed.
Because the potential for learning about the structure and function of molecules by
multiple sequence alignment (msa) is so great, the necessary computational methods have
received a great deal of attention. In msa, sequences are aligned optimally by bringing the
greatest number of similar characters into register in the same column of the alignment,
just as described in Chapter 3 for the alignment of two sequences.

As with aligning a pair of sequences, the difficulty in aligning a group of sequences


varies considerably, being much greater as the degree of sequence similarity decreases. If
the amount of sequence variation is minimal, it is quite straightforward to align the
sequences, even without the assistance of a computer program. However, if the amount of
sequence variation is great, it may be very difficult to find an optimal alignment of the
sequences because so many combinations of substitutions, insertions, and deletions, each
predicting a different alignment, are possible.
Human Genome Project
The Human Genome Project (HGP) was an international scientific research project
with a primary goal to determine the sequence of chemical base pairs which make up
DNA and to identify the approximately 20,000-25,000 genes of the human genome from
both a physical and functional standpoint.

The project began in 1990 initially headed by James D. Watson at the U.S. National
Institutes of Health. A working draft of the genome was released in 2000 and a complete
one in 2003, with further analysis still being published. A parallel project was conducted
by the private company Celera Genomics. Most of the sequencing was performed in
universities and research centers from the United States, Canada, New Zealand and
Britain. The mapping of

human genes is an important step in the development of medicines and other aspects of
health care.

While the objective of the Human Genome Project is to understand the genetic makeup of
the human species, the project also has focused on several other nonhuman organisms
such as E. coli, the fruit fly, and the laboratory mouse. It remains one of the largest single
investigational projects in modern science.

The HGP originally aimed to map the nucleotides contained in a haploid reference human
genome (more than three billion). Several groups have announced efforts to extend this to
diploid human genomes including the International HapMap Project, Applied
Biosystems, Perlegen, Illumina, JCVI, Personal Genome Project, and Roche-454.

The "genome" of any given individual (except for identical twins and cloned animals) is
unique; mapping "the human genome" involves sequencing multiple variations of each
gene. The project did not study the entire DNA found in human cells; some
heterochromatic areas (about 8% of the total) remain un-sequenced.

Project

Background

Initiation of the project was the culmination of several years of work supported by the
United States Department of Energy, in particular workshops in 1984 and 1986 and a
subsequent initiative of the US Department of Energy. This 1987 report stated boldly,
"The ultimate goal of this initiative is to understand the human genome" and "knowledge
of the human genome is as necessary to the continuing progress of medicine and other
health sciences as knowledge of human anatomy has been for the present state of
medicine." Candidate technologies were already being considered for the proposed
undertaking at least as early as 1985.
James D. Watson and Victor Shmerkovich were joint heads of the National Center for
Human Genome Research at the National Institutes of Health nvc nv(NIH) in the United
States starting from 1988. Largely due to his disagreement with his boss, Bernadine
Healy, over the issue of patenting genes,

he was forced to resign in 1992. He was replaced by Francis Collins in April 1993, and
the name of the Center was changed to the National Human Genome Research Institute
(NHGRI) in 1997.

The $3-billion project was formally founded in 1990 by the United States Department of
Energy and the U.S. National Institutes of Health, and was expected to take 15 years. In
addition to the United States, the international consortium comprised geneticists in China,
France, Germany, Japan, and the United Kingdom.

Due to widespread international cooperation and advances in the field of genomics


(especially in sequence analysis), as well as major advances in computing technology, a
'rough draft' of the genome was finished in 2000 (announced jointly by then US president
Bill Clinton and British Prime Minister Tony Blair on June 26, 2000). Ongoing
sequencing led to the announcement of the essentially complete genome in April 2003, 2
years earlier than planned. In May 2006, another milestone was passed on the way to
completion of the project, when the sequence of the last chromosome was published in
the journal Nature.

Benefits

The work on interpretation of genome data is still in its initial stages. It is anticipated that
detailed knowledge of the human genome will provide new avenues for advances in
medicine and biotechnology. Clear practical results of the project emerged even before
the work was finished. For example, a number of companies, such as Myriad Genetics
started offering easy ways to administer genetic tests that can show predisposition to a
variety of illnesses, including breast cancer, disorders of hemostasis, cystic fibrosis, liver
diseases and many others. Also, the etiologies for cancers, Alzheimer's disease and other
areas of clinical interest are considered likely to benefit from genome information and
possibly may lead in the long term to significant advances in their management.

There are also many tangible benefits for biological scientists. For example, a researcher
investigating a certain form of cancer may have narrowed down his/her search to a
particular gene. By visiting the human genome database on the world wide web, this
researcher can examine what other scientists have written about this gene, including
(potentially) the three-dimensional structure of its product, its function(s), its
evolutionary relationships to other human genes, or to genes in mice or yeast or fruit
flies, possible detrimental mutations, interactions with other genes, body tissues in which
this gene is activated, diseases associated with this gene or other datatypes.

The Human Genome Diversity Project (HGDP), spinoff research aimed at mapping the
DNA that varies between human ethnic groups, which was rumored to have been halted,
actually did continue and to date has yielded new conclusions.In the future, HGDP could
possibly expose new data in disease surveillance, human development and anthropology.
HGDP could unlock secrets behind and create new strategies for managing the
vulnerability of ethnic groups to certain diseases (see race in biomedicine). It could also
show how human populations have adapted to these vu There are essentially two ways to
sequence a genome. The BAC-to-BAC method, the first to be employed in human
genome studies, is slow but sure. The BAC-to-BAC approach, also referred toprocedures
developed by a number of researchers during the late 1980s and 90s and that continues to
develop and change.

The other technique, known as whole genome shotgun sequencing, brings speed into the
picture, enabling researchers to do the job in months to a year. The shotgun method was
developed by GNN president J. Craig Venter in 1996 when he was at the Institute for
Genomic Research (TIGR).

Now that the human genome sequence is nearing completion

Conclusion:
The Human Genome Project also produced other advances, not expected to be
accomplished until much later. These included an advanced draft of the mouse genome
and an initial draft of the rat genome.

Medical researchers did not wait to use data from the Human Genome Project. When the
project began in 1990, fewer than 100 human disease genes had been identified. At the
project's conclusion in 2003, the number of identified disease genes had risen to more
than 1,400.

The Human Genome Project focused on the DNA sequence of an individual. The next
step was to analyze DNA sequences from different populations. This catalog of human
genetic variation was called the HapMap. Completed in 2005, the HapMap used single
nucleotide polymorphisms called SNPs to identify large blocks of DNA sequence called
haplotypes that tend to be inherited together. To use the data, researchers compare
haplotypes between people with and without a disease. Haplotypes shared by people with
the disease are then examined in detail to look for associated genes. Already, scientists
have used its data to identify a gene associated with age-related macular degeneration, a
disease responsible for blindness among the elderly. It is expected that the HapMap will
play an important role in identifying many more disease genes in the future.
Reviews:

“In conclusion, the second edition of ‘Bioinformatics: Sequence


and Genome Analysis’ is an excellent textbook for bioinformatics
introductory courses for both life sciences and computer science
students, and a good reference for current problems in the field and
the tools and methods employed in their solution.”
—Briefings in Bioinformatics

“This second edition is a qualified success. Every chapter in the


second edition appears to be rewritten extensively, and three useful
new chapters have been added. As a result, the new edition tops out at
692 pages, and many of the problems with the first edition have been
rectified...

Overall, this second edition is a considerable improvement over the


first and will be popular on the desks of many scientists as well as
many students....If you find that you need a reference that covers the
entire breadth of bioinformatics, you need to buy this book.”
—Clinical Chemistry

References :
Google.com

Bioinformatics: Sequence and Genome Analysis, Second Edition, is based on a course he


teaches at the University of Arizona and on his research at the University of Arizona.

1. genome.gov | Online Education Kit: How to Sequence a Human


Genome
14 Oct 2008 ... Online Education

1. ^ Cook-Deegan R (1989). "The Alta Summit, December 1984". Genomics 5:


661–663. doi:10.1016/0888-7543(89)90042-6,
http://www.ornl.gov/sci/techresources/Human_Genome/project/alta.shtml.
2. ^ Barnhart, Benjamin J. (1989). "DOE Human Genome Program". Human
Genome Quarterly 1: 1,
http://www.ornl.gov/sci/techresources/Human_Genome/publicat/hgn/v1n1/01doe
hgp.shtml. Retrieved 2005-02-03.
3. ^ DeLisi, Charles (2001). "Genomes: 15 Years Later A Perspective by Charles
DeLisi, HGP Pioneer". Human Genome News 11: 3–4,
http://genome.gsc.riken.go.jp/hgmis/publicat/hgn/v11n3/05delisi.html. Retrieved
2005-02-03.
4. ^ "White House Press Release". Retrieved on 2006-07-22.
5. ^ "BBC NEWS". Retrieved on 2006-07-22.
6. ^ "Guardian Unlimited". Retrieved on 2006-07-22.
7. ^ a b International Human Genome Sequencing Consortium (2001). "Initial
sequencing and analysis of the human genome." (PDF). Nature 409: 860−921.
doi:10.1038/35057062,
http://www.nature.com/nature/journal/v409/n6822/pdf/409860a0.pdf.
8. ^ a b Venter, JC, et al (2001). "The sequence of the human genome." (PDF).
Science 291: 1304−1351. doi:10.1126/science.1058040. PMID 11181995,
http://www.sciencemag.org/cgi/reprint/291/5507/1304.pdf.
9. ^ IHGSC (2004). "Finishing the euchromatic sequence of the human genome.".
Nature 431: 931–945. doi:10.1038/nature03001,
http://www.nature.com/nature/journal/v431/n7011/full/nature03001.html.
10. ^ Fiers W, Contreres R, Duerinck F, Haegeman G, Iserentant D, Merregaert J,
Min Jou W, Molemans F, Raeymaekers A, Van den Berghe A, Volckaert G,
Ysebaert M. Complete nucleotide sequence of bacteriophage MS2 RNA: primary
and secondary structure of the replicase gene, Nature. 1976 Apr 8;260(5551):500-
7.
11. ^ Sanger F, Air GM, Barrell BG, Brown NL, Coulson AR, Fiddes CA, Hutchison
CA, Slocombe PM, Smith M., Nucleotide sequence of bacteriophage phi X174
DNA, Nature. 1977 Feb 24;265(5596):687-95
12. ^ Fleischmann, R. D. et al. (1995). "Whole-genome random sequencing and
assembly of Haemophilus influenzae Rd.". Science 269: 496−512.
doi:10.1126/science.7542800. PMID 7542800.
13. ^ C. elegans Sequencing Consortium (1998). "Genome sequence of the nematode
C. elegans: A platform for investigating biology.". Science 282: 2012–18.
doi:10.1126/science.282.5396.2012. PMID 9851916.
14. ^ Adams, MD. et al. (2000). "The genome sequence of Drosophila
melanogaster.". Science 287: 2185−2195. doi:10.1126/science.287.5461.2185.
PMID 10731132.
15. ^ Waterston RH, Lander ES, Sulston JE (2002). "On the sequencing of the human
genome". Proc Natl Acad Sci U S A. 99: 3712–6. doi:10.1073/pnas.042692499.
PMID 11880605, http://www.pubmedcentral.nih.gov/articlerender.fcgi?
tool=pubmed&pubmedid=11880605.
16. ^ Waterston RH, Lander ES, Sulston JE (2003). "More on the sequencing of the
human genome". Proc Natl Acad Sci U S A. 100: 3022–4.
doi:10.1073/pnas.0634129100. PMID 12631699,
http://www.pubmedcentral.nih.gov/articlerender.fcgi?
tool=pubmed&pubmedid=12631699.
17. ^ Osoegawa, Kazutoyo (2001). "A Bacterial Artificial Chromosome Library for
Sequencing the Complete Human Genome". Genome Research 11: 483–496.
doi:10.1101/gr.169601. PMID 11230172,
http://www.genome.org/cgi/content/full/11/3/483.
18. ^ Kennedy D (2002). "Not wicked, perhaps, but tacky". Science 297: 1237.
doi:10.1126/science.297.5585.1237. PMID 12193755.
19. ^ Venter D (2003). "A Part of the Human Genome Sequence". Science 299: 1183-
1184. doi:10.1126/science.299.5610.1183. PMID 12595674.
20. ^ Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, et al. (2007). "The Diploid
Genome Sequence of an Individual Human". PLoS Biology 5 (10): e254.
doi:10.1371/journal.

Vous aimerez peut-être aussi