Académique Documents
Professionnel Documents
Culture Documents
CONTENTS
1) Automated Biological Analysers.
1.1) Routine biochemistry analysers.
1.2) Immuno-based analysers.
1.3) Hematology analysers.
2) Biological Databases.
2.1) Nucleic Acids Research Database
2.2) Species-specific databases
2.3) Applications
2.3.1) Mass Spectrometry - MALDI TOF
2.3.2) Protein Fingerprinting
2.3.3) Proteomics
3) The Human Genome Project
3.1) Sequencing Applications
3.2) Computational Analysis
3.3) Computational Engine for Genomic Sequences
3.4) Data mining and information retrieval.
3.4) Data warehousing.
3.5) Visualization for data and collaboration.
4) Bibliography
PAGE 1
PAGE 2
The first applications were for clinical (medical) analysis. The AutoAnalyzer
profoundly changed the character of the chemical testing laboratory by
allowing significant increases in the numbers of samples that could be
processed. The design based on separating a continuously flowing stream with
air bubbles largely reduced slow, clumsy, and error prone manual methods of
analysis.
The types of tests required include enzyme levels (such as many of the liver
function tests), ion levels (e.g. sodium and potassium, and other tell-tale
chemicals (such as glucose, serum albumin, or creatinine).
Simple ions are often measured with ion selective electrodes, which let one
type of ion through, and measure voltage differences. Enzymes may be
measured by the rate they change one coloured substance to another; in these
tests, the results for enzymes are given as an activity, not as a concentration
of the enzyme.
Other tests use colorimetric changes to determine the concentration of the
chemical in question. Turbidity may also be measured.
Immuno-based analysers
Antibodies are used by some analysers to detect many substances by
immunoassay and other reactions that employ the use of antibody-antigen
reactions.
When concentration of these compounds is too low to cause a measurable
increase in turbidity when bound to antibody, more specialised methods must
be used. Recent developments include automation for the
immunohaematology lab, also known as transfusion medicine.
PAGE 3
Hematology analysers
These are used to perform complete blood counts, erythrocyte sedimentation
rates (ESRs), or coagulation tests.
Cell counters
Automated cell counters sample the blood, and
quantify, classify, and describe cell populations
using both electrical and optical techniques.
Electrical analysis involves passing a dilute
solution of the blood through an aperture across
which an electrical current is flowing. The
passage of cells through the current changes the
impedance between the terminals (the Coulter
principle). A lytic reagent is added to the blood
solution to selectively lyse the red cells (RBCs),
leaving only white cells (WBCs), and platelets
intact. Then the solution is passed through a
second detector. This allows the counts of RBCs,
WBCs, and platelets to be obtained. The platelet
count is easily separated from the WBC count by
the smaller impedance spikes they produce in
the detector due to their lower cell volumes.
Computer Controlled Electron Microscopy
Computer-controlled scanning electron microscopy is introduced as a faster,
reliably and cost-reducing alternative to conventional electron microprobe
analyses on Biological samples. Many universities link their EM to computer
networks which can be operated by their doctorate candidates to view images
in real time from any point in the computer network.
PAGE 4
BIOLOGICAL DATABASES
Biological databases are libraries of life sciences information, collected from
scientific experiments, published literature, high-throughput experiment
technology, and computational analysis.[2] They contain information from
research areas including genomics, proteomics, metabolomics, microarray
gene expression, and phylogenetics. Information contained in biological
databases includes gene function, structure, localization (both cellular and
chromosomal), clinical effects of mutations as well as similarities of biological
sequences and structures.
PAGE 5
PAGE 6
Species-specific databases
Species-specific databases are available for some species, mainly those that are
often used in research. For example, Colibase is an E. coli database. Other
popular species specific databases include Mouse Genome Informatics for the
laboratory mouse, Mus musculus, the Rat Genome Database for Rattus, ZFIN
for Danio Rerio (zebrafish), FlyBase for Drosophila, WormBase for the
nematodes Caenorhabditis elegans and Caenorhabditis briggsae, and Xenbase
for Xenopus tropicalis and Xenopus laevis frogs.
PAGE 7
THE HUMAN
GENOME PROJECT
Just a few short years ago, most of us knew
very little about our genes and their impact on
our lives. But more recently, it has been
virtually impossible to escape the popular
medias attention to a number of breathtaking
discoveries of human genes, especially those
related to diseases such as cystic fibrosis,
Huntingtons chorea, and breast cancer. What
has brought this about is the Human Genome
Project, an international effort started in 1988
and sponsored in the United States by the
Department of Energy (DOE) and the
National Institutes of Health (NIH). The goal
of this project is to elucidate the information
that makes up the genetic blueprint of human
beings.
COMPUTING
THE GENOME
PAGE 8
Despite this projects impact, the pace of gene discovery has actually been
rather slow. The initial phase of the project, called mapping, has been
primarily devoted to fragmenting chromosomes into manageable ordered
pieces for later high-throughput sequencing. In this period, it has often taken
years to locate and characterize individual genes related to human disease.
Thus, biologists began to appreciate the value of computing to mapping and
sequencing.
A good illustration of the emerging impact of computing on genomics is the
search for the gene for Adrenoleukodystrophy (related to the disease in the
movie Lorenzos Oil). A team of researchers in Europe spent about two years
searching for the gene using standard experimental methods. Then they
managed to sequence the region of the chromosome containing the gene.
Finally, they sent information on the sequence to the ORNL server containing
the ORNL-developed computer program called Gene Recognition and
Analysis Internet Link (GRAIL). Within a couple of minutes, GRAIL returned
the location of the gene within the sequence.
The Human Genome Project has entered a new phase. Six NIH genome
centers were funded recently to begin high-throughput sequencing, and plans
are under way for large-scale sequencing efforts at DOE genome centers at
Lawrence Berkeley National Laboratory (LBNL), Lawrence Livermore
National Laboratory (LLNL), and Los Alamos National Laboratory (LANL);
these centers have been integrated to form the Joint Genome Institute. As a
result, researchers are now focusing on the challenge of processing and
understanding much larger domains of the DNA sequence. It has been
estimated that, on average from 1997 to 2003, new sequence of approximately
2 million DNA bases will be produced every day. Each days sequence will
represent approximately 70 new genes and their respective proteins. This
information will be made available immediately on the Internet and in central
genome databases.
PAGE 9
PAGE 10
Computational Analysis
Computers can be used very effectively to indicate the location of genes and
of regions that control the expression of genes and to discover relationships
between each new sequence and other known sequences from many different
organisms. This process is referred to as sequence annotation. Annotation
(the elucidation and description of biologically relevant features in the
sequence) is the essential prerequisite before the genome sequence data can
become useful, and the quality with which annotation is done will directly
affect the value of the sequence. In addition to considerable organizational
issues, significant computational challenges must be addressed if DNA
sequences that are produced can be successfully annotated. It is clear that new
computational methods and a workable process must be implemented for
effective and timely analysis and management of these data.
In considering computing related to the large-scale sequence analysis and
annotation process, it is useful to examine previously developed models.
Procedures for high-throughput analysis have been most notably applied to
several microorganisms (e.g., Haemophilus influenzae and Mycoplasma
genitalium) using relatively simple methods designed to facilitate basically a
single pass through the data (a pipeline that produces a one-time result or
report).
However, this is too simple a model for analyzing genomes as complex as the
human genome. For one thing, the analysis of genomic sequence regions
needs to be updated continually through the course of the Genome Project
the analysis is never really done. On any given day, new information relevant
to a sequenced gene may show up in any one of many databases, and new links
to this information need to be discovered and presented. Additionally, our
capabilities for analyzing the sequence will change with time.
The analysis of DNA sequences by computer is a relatively immature science,
and we in the informatics community will be able to recognize many features
(like gene regulatory regions) better in a year than we can now. There will be
a significant advantage in reanalyzing sequences and updating our knowledge
of them continually as new sequences appear from many organisms, methods
improve, and databases with relevant information grow. In this model,
sequence annotation is a living thing that will develop richness and improve
PAGE 11
in quality over the years. The single pass-through pipeline is simply not the
appropriate model for human genome analysis, because the rate at which new
and relevant information appears is staggering.
Computational Engine for Genomic Sequences
Researchers at ORNL, LBNL, Argonne National Laboratory (ANL), and several
other genome laboratories are teaming to design and implement a new kind
of computational engine for analysis of large-scale genomic sequences. This
sequence analysis engine, which has become a Computational Grand
Challenge problem, will integrate a suite of tools on high-performance
computing resources and manage the analysis results.
In addition to the need for state-of-the-art computers at several
supercomputing centers, this analysis system will require dynamic and
seamless management of contiguous distributed high-performance
computing processes, efficient parallel implementations of a number of new
algorithms, complex distributed data mining operations, and the application
of new inferencing and visualization methods. A process of analysis that will
be started in this engine will not be completed for seven to ten years.
The data flow in this analysis engine is shown in the next page. Updates of
sequence data will be retrieved through the use of Internet retrieval agents
and stored in a local data warehouse. Most human genome centers will daily
post new sequences on publicly available Internet or World Wide Web sites,
and they will establish agreed-upon policies for Internet capture of their data.
These data will feed the analysis engine that will return results to the
warehouse for use in later or long-term analysis processes, visualization by
researchers, and distribution to community databases and genome
sequencing centers. Unlike the pipeline analysis model, the warehouse
maintains the sequence data, analysis results, and data links so that continual
update processes can be made to operate on the data over many years.
PAGE 12
PAGE 13
PAGE 14
neural network [Fig. 4(a)]. GRAIL is able to determine regions of the sequence
that contain genes [Fig. 4(b)], even genes it has never seen before, based on
its training from known gene examples.
High-speed sequence comparison represents another important class of
algorithms used to compare one DNA or protein sequence with another in a
way that extracts how and where the two sequences are similar. Many
organisms share many of the same basic genes and proteins, and information
about a gene or protein in one organism provides insight into the function of
its relatives or homologs in other organisms.
To get a sense of the scale, examination of the relationship between 2
megabases of sequence (one days finished sequence) and a single database of
known gene sequence fragments (called ESTs) requires the calculation of 1015
DNA base comparisons. And there are quite a number of databases to consider.
PAGE 16
PAGE 17
The approach outlined is flexible enough to use a variety of hardware and data
resources; configure analysis steps, triggers, conditions, and updates; and
provide the means to maintain and update the description of each genomic
region. Users can specify recurrent analysis and data mining operations that
continue for years. The combined computational process will provide new
knowledge about the genome on a scale that is impossible for individual
researchers using current methods. Such a process is absolutely necessary to
keep up with the flood of data that will be gathered over the remainder of the
Human Genome Project.
PAGE 19
BIBLIOGRAPHY
1) Hyman, E. D. A new method of sequencing DNA. Analytical
Biochemistry 174, 423436 (1988)
2) Metzker, M. L. Emerging technologies in DNA sequencing. Genome
Research 15, 17671776 (2005)
3) Oak Ridge National Laboratory
http://web.ornl.gov/info/ornlreview/v30n3-4/genome.htm
4) Wikipedia
https://en.wikipedia.org
5) Google
https://google.co.in
PAGE 20