Vous êtes sur la page 1sur 53

Introduction to Bioinformatics

Collection Editor:
Ewa Paszek
Introduction to Bioinformatics

Collection Editor:
Ewa Paszek
Authors:
Ewa Paszek
Lukasz Wita

Online:
< http://cnx.org/content/col10240/1.3/ >

CONNEXIONS

Rice University, Houston, Texas


This selection and arrangement of content as a collection is copyrighted by Ewa Paszek. It is licensed under the
Creative Commons Attribution 1.0 license (http://creativecommons.org/licenses/by/1.0).
Collection structure revised: October 9, 2007
PDF generated: March 18, 2010
For copyright and attribution information for the modules contained in this collection, see p. 45.
Table of Contents
1 Biological Background
1.1 Dogma of Molecular Biology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Gene Regulatory Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Microarrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Gene Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2 Microarray
2.1 Normalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3 Dierential Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4 Classication (Clustering) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5 Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3 Publications/References
3.1 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Attributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
iv
Chapter 1

Biological Background

1.1 Dogma of Molecular Biology


1
1.1.1 Dogma of Molecular Biology

1.1.1.1 1. Introduction
Thousands of genes are being discovered for the rst time by sequencing the genomes of model organisms,
a reminder that much of the natural world remains to be explored at the molecular level. DNA microarrays
provide a natural vehicle for this exploration. The model organisms are the rst for which comprehensive
genome-wide surveys of gene expression patterns or function are possible. The results should be viewed
as maps that reect the order and logic of the genetic program, rather than the physical order of genes
on chromosomes. Exploration of the genome using DNA microarrays and other genome-scale technologies
should narrow the gap in our knowledge of gene function and molecular biology.

1.1.1.2 2. Dogma of Molecular Biology


Deoxyribonucleic acid (DNA) is the elementary template carrying essential genetic code for every living
organism. In bacteria and other simple cell organisms, DNA is distributed more or less throughout the cell.
In the complex cells that make up plants, animals and in other multi-cellular organisms, most of the DNA is
found in the chromosomes, which are located in the cell nucleus. The energy-generating organelles known as
chloroplasts and mitochondria also carry DNA, as do many viruses. Pieces of DNA are pairs of molecules,
which entwine like vines to form a double helix. DNA strands are composed of four nucleotide subunits.
These are adenine (A), thymine (T), cytosine (C)and guanine (G). Each base forms hydrogen bonds readily
to only one other  A to T and C to G. the entire nucleotide sequence of each strand is complementary to
that of the other, and when separated, each may act as a template with which to replicate the other.The
information contained by the DNA strand allows for development and control of any processes taking place
in living organism over its lifetime span, not only on the cellular, but also on the whole system level. The
general structure of the DNA is depicted on the Figure1. (Figure 1.1: The DNA structure.)

1 This content is available online at <http://cnx.org/content/m12382/1.5/>.

1
2 CHAPTER 1. BIOLOGICAL BACKGROUND

The DNA structure.

Figure 1.1: The DNA structure.

1.1.1.2.1
In order to read the information contained in DNA, rst, their functional units, genes are transcribed
during transcription into messenger ribonucleic acid (mRNA)), which is based on the complementary DNA
3

strand. mRNA molecules serve as templates for the protein synthesis; they are transported to the cytoplasm
and repeatedly read by the ribosomes. Before the mRNA is ready to be translated, it undergoes several
processes i.e. splicing, which means that the pre-mRNA is modied to remove certain stretches of non-coding
sequences called introns. The stretches that remain includ protein-coding sequences and are called exons.
Finally, consecutive three nucleotide bases of the mRNA sequence are translated into corresponding amino
acids and linked together to form protein chains. Proteins are required for the structure, function, and
regulation of the cells, tissues and organs. Each protein has its unique functions. The process of reading
content of a gene is depicted in Figure2.
In order to understand the role and function of the genes one needs the complete information about
their mRNA transcripts and proteins. Unfortunately, exploring the protein functions is very dicult due to
their unique 3-dimentional complicated structure and a shortage of ecient technologies. To overcome this
diculty one may concentrate on the mRNA molecules produced by the genes of interest (gene expression)
and use this information to investigate the functional roles of the genes. This idea was a motivation for the
development of microarrays technique, as a method allowing for studying the interaction between thousands
of genes based on their mRNA transcript level.

1.1.1.2.2

The Central Dogma of Molecular Biology.

Figure 1.2: Block diagram representation


4 CHAPTER 1. BIOLOGICAL BACKGROUND

Figure 1.3: Cellular representation

1.1.1.2.3
note: http://128.32.135.2/∼sandrine/presentations.html
2

2 http://128.32.135.2/∼sandrine/presentations.html
5

1.2 Gene Regulatory Networks


3
1.2.1 Gene Networks

1.2.1.1 Gene Networks.


A gene regulatory network (also called a GRN or genetic regulatory network,
4 ) is a collection of DNA
segments in a cell which interact with each other and with other substances in the cell, thereby governing
the rates at which genes are transcribed into mRNA. Genes can be viewed as nodes in such a network, with
input being proteins such as transcription factors, and outputs being the level of gene expression. The node
itself can also be viewed as a function which can be obtained by combining basic functions upon the inputs
(in the Boolean network these are boolean functions or gates computed using the basic AND OR and NOT
gates in electronics). These functions have been interpreted as performing a kind information processing
within cell which determine cellular behaviour. The basic drivers within cells are levels of some proteins,
which determine both spatial (tissue related) and temporal (developmental stage) co-ordinates of the cell, as
a kind of "cellular memory". The gene networks are only beginning to be understood, and it is a next step
for biology to attempt to deduce the functions for each gene "node", to assist in modeling behaviour of a cell.
Mathematical models of GRNs have been developed to allow predictions of the models to be tested. Various
modeling techniques have been used, including boolean networks, Petri nets, Bayesian networks, and sets of
dierential equations. Conversely, techniques have been proposed for generating models of GRNs that best
explain a set of time series observations.

1.2.1.2
One gene can aect the expression of another gene by binding of the gene product of one gene to the
promoter region of another gene. Looking at more than two genes, we refer to the regulatory network as the
regulatory interactions between the genes. If we have a large number of measurements of the expression level
of a number of genes, we should be able to model or reverse engineer the regulatory network that controls
their expression level. The problem can be attacked in two fundamentally dierent ways: using time-series
data and using steady-state data of gene knockout.
GRNs act as analog biochemical computers to specify the identity and level of expression of groups of
target genes. Central to this computation are DNA recognition sequences with which transcription factors
associate. When active transcription factors associate with the promontory region of target genes, they can
function to specically repress (down-regulate) or induce (up-regulate) synthesis of the corresponding RNA.
The immediate molecular output of a gene regulatory network is the constellation of RNAs and proteins
encoded by network target genes. The resulting cellular outputs are changes in the structure, metabolic
capacity, or behavior of the cell mediated by new expression of up-regulated proteins and elimination of
down-regulated proteins.
GRNs are remarkably diverse in their structure, but several basic properties are illustrated in the gure
below (Figure1.) (Figure 1.4) . In this example, two dierent signals converge on a single target gene where
the cis-regulatory elements provide for an integrated output in response to the two inputs. Signal molecule A
triggers the conversion of inactive transcription factor A (green oval) into an active form that binds directly
to the target gene's cis-regulatory sequence. The process for signal B is more complex. Signal B triggers
the separation of inactive B (red oval) from an inhibitory factor (yellow rectangle). B is then free to form
an active complex that binds to the active A transcription factor on the cis-regulatory sequence. The net
output is expression of the target gene at a level determined by the action of factors A and B. In this way,
cis-regulatory DNA sequences, together with the proteins that assemble on them, integrate information from
multiple signaling inputs to produce an appropriately regulated readout. A more realistic network might
contain multiple target genes regulated by signal A alone, others by signal B alone, and still others by the
pair of A and B. Co-regulated target genes often code for proteins that act together to build a specic cell

3 This content is available online at <http://cnx.org/content/m12383/1.4/>.


4 http://www.doegenomestolife.org/science/generegulatorynetwork.shtml
6 CHAPTER 1. BIOLOGICAL BACKGROUND

structure or to eect a concerted change in cell function. For example, genes encoding components of the
multiprotein proteasome machine (see The Machines of Life) are co-regulated at the RNA level. This was
shown by microarray gene chip analyses in yeast cells, and each gene was found to possess a similar cis-
regulatory DNA sequence that mediates binding of a particular transcription factor. Similarly, a bacterium
may respond to a shortage of its preferred energy source by activating expression of genes whose protein
products function in a biochemical pathway that allows it to use a dierent, more abundant source of energy.

Figure 1.4: The gene regulatory network.

note: Boolean Networks (Section 1.4.1.2: Boolean Networks)

note: Probabilistic Boolean Networks (Section 1.4.2.1: Probabilistic Boolean Networks)

note: Bayesian Networks (Section 1.4.2.2: Bayesian Networks)


7

1.3 Microarrays
1.3.1 cDNA Arrays

1.3.1.1 cDNA-Basic Concept5


1.3.1.1.1 cDNA-Basic Concept
6 Recently, several types of the DNA microarrays were introduced. Applications of microarrays range
from the study of gene expression in yeast( Lashkari et al., 1997) under dierent environmental stress
conditions to the comparison of gene expression proles for tumors from cancer patients (Golub et al.,
1999). The rst approach is to use the chemically synthesized form of DNA called COMPLEMENTARY
DNA (cDNA), which contains only coding part of the sequence, complementary to its corresponding mRNA
transcript. Microarrays have a form of microscope slides containing hundreds to thousands of immobilized
DNA samples that are hybridized in a manner very similar to the Northern (Alwine et al., 1977)and Southern
blot (Southern, 1975). The main function of a microarray is to detect the level of mRNA transcript of genes
of interest. The plates are incubated in the solution containing genetic material under consideration. The
mRNA transcripts oating in the solution would hybridize to their complementary cDNA, previously placed
on the microarray chip. Since the cDNA on the chip is uorescently labeled, every spot will emit a light
in the ultraviolet environment, intensity of which depends on the amount of hybridized mRNA (Schena et
al., 1995). The dierentiation of the cDNA's ultraviolet dye allows the comparison of the gene expression
under dierent experimental conditions (case- control studies). The preparation of the microarray for case-
control study is schematically depicted on Figure 1. (Figure 1.5: The spotted array technology.) Initial data
obtained from DNA microarrays are in the form of scanned images. Coding the gene expression by means
of colors can be helpful for building d genetic maps and graphical data processing. Expression gene map
is presented in the form of a table; the rows of which corresponds to the consecutive genes and columns
represent dierent samples, for example under multiple experimental conditions or for dierent patients.
More informations available at: Bioconductors
7 , follow link to training .
5 This content is available online at <http://cnx.org/content/m12384/1.5/>.
6 http://www.wp.pl
7 http://www.bioconductor.org/
8 CHAPTER 1. BIOLOGICAL BACKGROUND

The spotted array technology.

Figure 1.5: Overview of Procedures for Preparing and Analyzing Microarrays of Complementary DNA
(cDNA). As shown in Panel A, reference RNA and tumor RNA is labeled by reverse transcription with
dierent uorescent dyes (green for the reference cells and red for the tumor cells) and hybridized to
a cDNA microarray containing robotically printed cDNA clones. As shown in Panel B, the slides are
scanned with a confocal laser-scanning microscope, and color images are generated for each hybridization
with RNA from the tumor and reference cells. Genes up-regulated in the tumors appear red, whereas
those with decreased expression appear green. Genes with similar levels of expression in the two samples
appear yellow. Genes of interest are selected on the basis of the dierences in the level of expression
by known tumor classes (e.g., BRCA1-mutationpositive and BRCA2-mutationpositive). Statistical
analysis determines whether these dierences in the gene-expression proles are greater than would be
expected by chance. As shown in Panel C, the dierences in the patterns of gene expression between
tumor classes can be portrayed in the form of a color-coded plot, and the relations between tumors can
be portrayed in the form of a multidimensional-scaling plot. Tumors with similar gene-expression proles
cluster close to one another in the multidimensional-scaling plot.

note: cDNA arrays - detailed informations. (Section 1.3.1.2.1: Detailed information on the cDNA
technology)

note: Oligonucleotide arrays. (Section 1.3.2.1.1: Oligonucleotide arrays.)

1.3.1.2 cDNA-Detailed Information8


1.3.1.2.1 Detailed information on the cDNA technology
To prepare microarrays, glass or nylon micro plates are used, onto which thousands of single stranded
pieces of DNA of length of tens of nucleotides are placed (Cheung
9 et al.,1999). Each spot on the plate

8 This content is available online at <http://cnx.org/content/m12385/1.5/>.


9 http://array.ucsd.edu/pubs/99112701.pdf
9

corresponds to a particular gene. The special computercontrolled threeaxis robots generate highdensity,
gridded arrays of cDNA. Figure 1 a, b, c (Figure 1.6: The microarray robot.) presents an example of a
workstation for producing a microarrays. Figure 1 d (Figure 1.6: The microarray robot.) depicts a scanner
used for reading a microarray with genetic material introduced during the course of an experiment.

The microarray robot.

Figure 1.6: . 1a, The Pennsylvania University's microarray robot (Cheung et al.,1999). The X-, Y-, Z-
axes are labesled 1, 2, and 3,respectively. The key component of the arrayer is the print-head, containing
pens (4). Microscope glass slides are placed on the slide station (5). Samples are prepared and arrayed
from 96-well sample plates (6). The pins are cleaned between sample acquisitions at the washing (7) and
drying (8) stations. b, AECOM microarray robot. The table conguration shown contains 160 slides with
four microtitre plates, two wash stations and the dryer. The print-head (c) shows four of the possible
twelve pen tips in use. d, AECOM laser scanner. Visible are the optical table, power supplies for lasers
and PMT cooling, the Ludl stage, and lasers. The 20microscope objective is inside the ludlstage while
lenses, mirrors and otheroptics are enclosed in the metal casing. PMTs are to the right and outside the
photo.
10 CHAPTER 1. BIOLOGICAL BACKGROUND

1.3.1.2.1.1
In a single reaction, two dierent probes can be labelled with dierent colors, and simultaneously incubated
with a microarray. Robots (arrayers) are required to place (or array) a large number of probes onto slides.
The AECOM arrayer generates highdensity, gridded arrays of cDNA, genomic DNA or similar biological
material on glass surfaces. Its principal components are a computercontrolled threeaxis robot and a unique
pen tip assembly. The wash stations are stationary basins containing distilled water that is replaced after
every two-microtitre plate. When the pen tips are immersed, the robot shakes the pen assembly back and
forth to enhance cleaning. A computercontrolled water bath sonicator and/or owing water bath could be
substituted. The dryer is essentially a computercontrolled wet/dry vacuum cleaner and an adapter tted
with restricting inlet holes into which the pen tips are inserted. Drying is accomplished by the rapid airow
around the tips and the partial vacuum this creates.
After DNA samples are arrayed onto slides, they are airdried. The samples are immobilized by ultraviolet
(UV)irradiation to form covalent bonds between the thymidine residues in the DNA and the positively
charged amine groups on the silane slides. After crosslinking, excess DNA molecules are removed by washing
the arrays at room temperature and arrayed samples are denatured in water before hybridization. There are
many methods for hybridizing targets and probes; they dier with respect to the solvents and temperatures
used. TheFigure 2 (Figure 1.7: The nucleic acid hybridization.) presents the typical process of the nucleic
acid hybridization.
Once extracted from the two populations, the RNA samples are typically labeled with uorescent dyes
in order to generate probes. The commercial cyanine dyes Cy3 and Cy5 are commonly used in labeling
reactions. Fluorescently labeled probes can be prepared by several dierent methods including direct or
indirect cDNA labeling, (Hegde et al., 2000; et al., 2002;
Richter Van Gelder et al., 1990). After cDNA
synthesis, a uorescent cascade molecule with hundreds of dye molecules per complex is hybridized to the
cDNA. The labeled probes prepared from the two RNA sources are co-hybridized to the same DNA chip.
Important parameters include hybridization temperature, length of hybridization, concentration of salts, pH
of the solution, and the presence or absence of denaturants such as formaldehyde in the hybridization buer.
The hybridized array is typically scanned with a system that uses lasers as a source of excitation light and
photomultiplier tubes as detectors. This system is capable of dierentiating the uorescently labeled probes.
11

1.3.1.2.1.2

The nucleic acid hybridization.

Figure 1.7: The nucleic acid hybridization.

1.3.1.2.2
note: Oligonucleotide arrays. (Section 1.3.2.1.1: Oligonucleotide arrays.)

note: cDNA arrays - basic Concepts (Section 1.3.1.1.1: cDNA-Basic Concept)

_gelder
12 CHAPTER 1. BIOLOGICAL BACKGROUND

1.3.2 Oligonucleotide Arrays

1.3.2.1 Aymetrix Chip-Basic Concepts10


1.3.2.1.1 Oligonucleotide arrays.
1.3.2.1.1.1 4.2.1.Overview
The oligonucleotide arrays, developed by the Aymetrix Company
11 , are a new approach in microarray
technology, based on hybridization to small, high-density arrays containing tens of thousands of synthetic
oligonucleotides. The arrays are designed based on sequence information alone and are synthesized in situ
using a combination of photolithography and oligonucleotide chemistry. RNAs present at a frequency of
1:300,000 are unambiguously detected, and detection is quantitative over more than three orders of mag-
nitude. This approach provides a way to use directly the growing body of sequence information for highly
parallel experimental investigations. Because of the combinatorial nature of the chemistry and the ability to
synthesize small arrays containing hundreds of thousands of specically chosen oligonucleotides, the method
is readily scalable to the simultaneous monitoring of tens of thousands of genes. The Aymetrix integrated
GeneChip arrays contain up to 500,000 unique probes corresponding to tens of thousands of gene expression
measurements.
Aymetrix manufactures arrays monitor the global activities of genes in yeast, Arabidopsis, Drosophila,
mice, rats, and humans. In addition, custom expression arrays can be designed for other model organisms,
proprietary sequences, or specic subsets of known genes. For human arrays, expressed sequences from
databases are collected and clustered into groups of similar sequences. Using clusters as a starting point, se-
quences are further subdivided into subclusters representing distinct transcripts. This categorization process
involves alignment to the human genome, which reveals splicing and polyadenylation variants.

10 This content is available online at <http://cnx.org/content/m12387/1.4/>.


11 http://www.aymetrix.com/index.ax
13

1.3.2.1.1.2

Oligonucleotide chips.

Figure 1.8: A typical experiment with an oligonucleotide chip; preparation of sample for GeneChip
arrays. Messenger RNA (mRNA) is extracted from the cell and converted to cDNA. It then undergoes
amplication and labeling step before fragmentation and hybridization to 25-mer oligos on the surface
to the chip. After washing of unhybridized material, the chip is scanned in a confocal laser scanner and
the image analyzed by computer.

note: Oligonucleotide arrays - detailed informations (Section 1.3.2.2.1: Detailed Information on


the Oligonucleotide Arrays.)

note: cdna arrays (Section 1.3.1.1.1: cDNA-Basic Concept)

1.3.2.2 Oligonucleotide Arrays-Detailed Information12


1.3.2.2.1 Detailed Information on the Oligonucleotide Arrays.
A core element of array design, the Perfect Match/Mismatch probe strategy , is universally applied to the
production of GeneChip arrays. For each probe designed to be perfectly complementary to a target sequence,
a partner probe is generated that is identical except for a single base mismatch in its center. These probe

12 This content is available online at <http://cnx.org/content/m12388/1.5/>.


14 CHAPTER 1. BIOLOGICAL BACKGROUND

pairs, called the Perfect Match probe (PM) and the Mismatch probe (MM), allows the quantization and
subtraction of signals caused by non-specic cross-hybridization(further web presentation
13 ). The dierence
in hybridization signals between the partners, as well as their intensity ratios, serves as indicators of specic
target abundance.

GeneChip Expression Array Design

Figure 1.9: The Aymetrix GeneChip technology. There may be 5,000-20,000 probe sets per chip.
The presence of messenger RNA (mRNA) is detected by a series of probe pairs that dier in only one
nucleotide. Hybridization of uorescent mRNA to these probes pairs on the chip is detected by laser
scanning of the chip surface. A probe set = 11-20 PM, MM pairs.

1.3.2.2.1.1
Probe synthesis occurs in parallel, resulting in the addition of an A, C, T, or G nucleotide to multiple
growing chains simultaneously. To dene which oligonucleotide chains will receive a nucleotide in each step,
photolithographic masks, carrying 18 to 20 square micron windows that correspond to the dimensions of
individual features, are placed over the coated wafer. The windows are distributed over the mask based
on the desired sequence of each probe. When ultraviolet light is shone over the mask in the rst step of
synthesis, the exposed linkers become deprotected and are available for nucleotide coupling. Critical to this
step is the precise alignment of the mask with the wafer before each synthesis step. The nucleotide attaches
to the activated linkers, initiating the synthesis process. In the following synthesis step, another mask is
placed over the wafer to allow the next round of deprotection and coupling. The process is repeated until
the probes reach their full length, usually 25 nucleotides.

13 http://www.bioinformatica.unito.it/bioinformatics/AIRBB_courses/1stday/mas.5.0.pdf
15

Figure 1.10: Using technologies adapted from the semiconductor industry, GeneChip manufacturing
begins with a 5-inch square quartz wafer Aymetrix14 . Initially the quartz is washed to ensure uniform
hydroxylation across its surface. The wafer is placed in a bath of silane, which reacts with the hydroxyl
groups of the quartz, and forms a matrix of covalently linked molecules. Each of these features harbors
millions of identical DNA molecules. The silane lm provides a uniform hydroxyl density to initiate
probe assembly. Linker molecules, attached to the silane matrix, provide a surface that may be spatially
activated by light.

1.3.2.2.1.2
Once the synthesis is completed, the wafers are deprotected, diced, and the resulting individual arrays are
packaged in ow cell cartridges. Depending on the number of probe features per array, a single wafer can
yield between 49 and 400 arrays. The manufacturing process ends with a comprehensive series of quality
control tests.
The design and manufacture of GeneChip probe arrays are highly stereotyped and consistent, eliminating
the need to make arrays in individual labs, thereby, signicantly minimizing user setup time, and providing
a higher degree of reproducibility between experiments. Taking advantage of these capabilities, researchers
have used GeneChip probe arrays to study the regulation of gene expression associated with a wide variety
of basic biological functions, including development, hormonal signaling, and circadian rhythms. Also, many
studies have used GeneChip probe arrays to tackle disease. A rapidly growing area of application is cancer
research, for instance, in which arrays have helped researchers discover new tumor classes, assign patient
samples to known tumor classes, reveal cancer-related alterations in molecular pathways, predict clinical

14 http://www.aymetrix.com/index.ax
16 CHAPTER 1. BIOLOGICAL BACKGROUND

outcomes, and identify new drug targets( Shipp et al.,2002; Pomeroy et al., 2002; Schadt et al., 2001;
Golub et al., 1999; Lockhart et al., 1996).

Figure 1.11: Standard eukaryotic gene expression assay Aymetrix15 . The basic concept behind the
use of GeneChip arrays for gene expression is simple: labeled cDNA or cRNA targets derived from the
mRNA of an experimental sample are hybridized to nucleic acid probes attached to the solid support. By
monitoring the amount of label associated with each DNA location, it is possible to infer the abundance of
each mRNA species represented. Although hybridization has been used for decades to detect and quantify
nucleic acids, the combination of the miniaturization of the technology and the large and growing amounts
of sequence information, have enormously expanded the scale at which gene expression can be studied.

1.3.2.2.2
note: data analysis (Section 1.3.1.2.1: Detailed information on the cDNA technology)

note: cdna arrays (Section 1.3.2.1.1: Oligonucleotide arrays.)

15 http://www.aymetrix.com/index.ax
17

1.3.3 Data Analysis

1.3.3.1 Data Analysis16


1.3.3.1.1 Data Analysis.
After scanning, a grid must be placed on the image and the spots representing the arrayed genes must
be identied. The background uorescence is calculated locally for each spot and is subtracted from the
hybridization intensities. Comparing the uorescence intensity of control identies dierentially expressed
genes and experimental probes hybridized to each spot, (Freeman et al.,2000; Bowtell, 1999; Knudsen, 2002)
Typically, the experimental target sequences are labeled with Cy5, which uoresces red light (667 nm), and
control targets are labeled with Cy3, which uoresces green light (568 nm). The ratio of red to green signal
can then be used as a measure of the eect of the experimental treatment on the expression of each gene. A
ratio of 1 (yellow spot) indicates no change in the expression level between experimental and control samples,
while a ratio greater than 1 (red spot) indicates increased transcription in the experimental sample, and a
ratio less than 1 (green spot) indicates decreased transcription in the experimental sample. A scatter plot
is a very useful representation of the expression data; the signal intensities of the experimental and control
samples are plotted along the x- and y-axes, and the ratio values are plotted as a distance from the diagonal,
(Schena, 2003). The diagonal separates spots with higher activity than the control sample from spots with
lower activity than the control. The scatter plot provides a visualization of the uorescence ratios obtained
from the experimental and control samples. One can then easily choose points that represent a several fold
increase or decrease in gene expression and focus additional analyses on these genes.

16 This content is available online at <http://cnx.org/content/m12389/1.4/>.


18 CHAPTER 1. BIOLOGICAL BACKGROUND

1.3.3.1.1.1

The hybridized microarray.

Figure 1.12: A hybridized microarray printed by the AECOM robot (Cheung et al.,1999). A 5550-
gene mouse cDNA microarray was printed and hybridized to Cye3-dUTP and Cye5-dUTP probes from
wild-type and mutant mouse cell lines and imaged using the AECOM laser scanner. Shown is one out
o our of the pen tip printing areas region of the array.

With just one experimental condition and a control, the data analysis is limited to a list of regulated
genes ranked by the fold-change or by the signicance of the change determined in a t test. Normalization
of data must be performed to compare separate arrays. With multiple experimental conditions (e.g.
time-points or drug doses), the genes are often grouped into clusters that behave similarly under the
dierent conditions. Complex computational methods such as hierarchical clustering or k-means are used
to analyze the massive amounts of data generated by these experiments. Gene clusters are visualized
with trees or color-coded matrices by placing genes with similar patterns of expression into a clustered
group Figure11. Image processing and analysis software is commercially available, and several pack-
ages are available as freeware: http://www.bio.davidson.edu/projects/GCAT/GCATprotocols.html
17
, http://www.tigr.org/software/
18 , http://research.nhgri.nih.gov/microarray/main.html
19 ,
http://www.bio.davidson.edu/projects/magic/magic.html
20 .

17 http://www.bio.davidson.edu/projects/GCAT/GCATprotocols.html
18 http://www.tigr.org/software/
19 http://research.nhgri.nih.gov/microarray/main.html
20 http://www.bio.davidson.edu/projects/magic/magic.html
19

1.3.3.1.1.1.1

Clustering of gene expression patterns.

Figure 1.13: Clustering of gene expression patterns. a, the ratio of gene expression in control relative to
experimental for individual genes is displayed using a color scale. Black indicates no change in expression,
while an increase in the experimental relative to the control is shown as red, and a decrease in the
experimental relative to the control is shown as green. Genes displaying similar patterns of induction or
repression are clustered together. b, clustering of thousands of genes by patterns of gene induction or
repression following a treatment, (Campbell and Heyer, 2003).

1.3.3.1.1.2
Microarray analysis of gene expression does have limitations that researchers must consider. In gene ex-
pression, the correlation between induced mRNA and induced levels of protein are not always well aligned.
Translational and post-translational regulatory mechanisms that impact the activity of various cellular pro-
teins are not examined by DNA microarrays, though the emerging eld of proteomics is beginning to address
20 CHAPTER 1. BIOLOGICAL BACKGROUND

this issue. Other limitations of microarray analysis include the impact of alternative splicing during tran-
script processing and the limited detectability of unstable mRNAs. Dierential gene expression results must
be conrmed through direct examination of selected genes. These analyses are typically at the level of
RNA blot or quantitative RT-PCR to examine transcripts of a specic gene, and/or detection of protein
concentration using immunoblots. Additional studies often include alteration of gene function with targeted
mutations, antisense technology, or protein inhibition.

1.3.3.1.2
note: cdna arrays (Section 1.3.1.1.1: cDNA-Basic Concept)

note: Oligonucleotide arrays (Section 1.3.2.1.1.1: 4.2.1.Overview)

note: Gene Networks (Section 1.4.1.1: Introduction)

1.4 Gene Networks


21
1.4.1 Boolean Networks

1.4.1.1 Introduction
A central goal of molecular biology is to understand the regulation of protein synthesis and its reactions to
external and internal signals. All the cells in an organism carry the same genomic data, yet their protein
makeup can be drastically dierent both temporally and spatially, due to regulation. Protein synthesis is
regulated by many mechanisms at its dierent stages. These include mechanisms for controlling transcrip-
tion initiation, RNA splicing, mRNA transport, translation initiation, post-translational modications, and
degradation of mRNA/protein. One of the main junctions at which regulation occurs is mRNA transcription.
A major role in this machinery is played by proteins themselves that bind to regulatory regions along the
DNA, greatly aecting the transcription of the genes they regulate. In recent years, technical breakthroughs
in spotting hybridization probes and advances in genome sequencing eorts lead to development of DNA
microarrays, which consist of many species of probes, either oligonucleotides or cDNA, that are immobilized
in a predened organization to a solid phase. By using DNA microarrays, researchers are now able to mea-
sure the abundance of thousands of mRNA targets simultaneously ( DeRisi et al.,1997 ([6]); Lockhart et al.,
1996; Wen et al., 1998). Unlike classical experiments, where the expression levels of only a few genes were
reported, DNA microarray experiments can measure all the genes of an organism, providing a genomic
viewpoint on gene expression. As a consequence, this technology facilitates new experimental approaches for
understanding gene expression and regulation (Iyer et al., 1999; Spellman et al., 1998).
A central focus of genomic research concerns understanding the manner in which cells execute and control
the enormous number of operations required for their function. Biological systems behave in an exceedingly
parallel and extraordinarily integrated fashion. Feedback and damping are routine even for the most common
activities. Thus, in this area of genomic biology, single gene perspectives are becoming increasingly limited
for gaining insight into biological processes. Network applications are becoming increasingly important
for making progress in our understanding of the manner in which genes and molecules collectively form a
biological system and harnessing this understanding in educated intervention for correcting human diseases.
Such approaches inevitably require computational and formal methods to process massive amounts of data,
understand general principles governing the system under study, and make useful predictions about system
behavior in the presence of known conditions. There is a rather wide spectrum of approaches for modeling
gene regulatory networks, each with its own assumptions, data requirements, and goals. The group of the
most popular models includes: Boolean, Probabilistic Boolean and Bayesian networks.

21 This content is available online at <http://cnx.org/content/m12394/1.5/>.


21

1.4.1.2 Boolean Networks


The Boolean network model, introduced by Kauman (Kauman, 1969, 1974; Kauman and Glass, 1973)and
recently developed by Shmulevich(Shmulevich, 2002), has received the most attention, not only from the
biology community, but also in physics. In this model, gene expression is quantized to only two levels: ON
and OFF. The expression level (state) of each gene is functionally related to the expression states of some
other genes, using logical rules. A Boolean network G(V,F) is dened by a set of nodes corresponding to
genes V = {x1, . . . , xn} and a list of Boolean functions F = (f1, . . . , fn). The state of a node (gene)
is completely determined by the values of other nodes at time t by means of underlying logical Boolean
functions. The model is represented in the form of directed graph. Each xi represents the state (expression)
of gene i, where xi=1 represents the fact that gene i is expressed and xi=0 means it is not expressed. The
list of Boolean functions F represents the rules of regulatory interactions between genes. That is, any given
gene transforms its inputs (regulatory factors that bind to it) into an output, which is the state or expression
of the gene itself. The maximum connectivity of a Boolean network is dened by K= maxi (ki). All genes
are assumed to update synchronously in accordance with the functions assigned to them and this process
is then repeated. The articial synchrony simplies computation while preserving the qualitative, generic
properties of global network dynamics (Kauman, 1993; Huang, 1999; Wuensche, 1998).
Below the example is presented. Consider a Boolean network consisting of 5 genes {x1, . . . , x5}
with the corresponding Boolean functions given by the truth tables shown in Figure1. (Figure 1.14) The
maximum connectivity is K=3, although we allow some input variables to duplicate, essentially reducing the
connectivity. The dynamics of this Boolean network are shown in Figure2. (Figure 1.15) Since there are 5
genes, there are 2^5 = 32 possible states that the network can be in. Each state is represented by a circle
and the arrows between states show the transitions of the network according to the functions in Table 1.,
Figure1. (Figure 1.14). It is easy to see that because of the inherent deterministic directionality in Boolean
networks as well as only a nite number of possible states.
22 CHAPTER 1. BIOLOGICAL BACKGROUND

Figure 1.14: Truth tables of the functions in a Boolean network with 5 genes. The indices j1, j2, and
j3 indicate the input connections for each of the functions.
23

Figure 1.15: The state-transition diagram for the Boolean network dened in table 1.(Figure1). (Fig-
ure 1.14)

In the context of Boolean networks as models of genetic regulatory networks, there is no doubt that the
binary approximation of gene expression is an oversimplication (Huang, 1999). However, even though most
biological phenomena manifest themselves in the continuous domain, they are often described in a binary
logical language such as `on and o,' `upregulated and downregulated', and `responsive and nonresponsive.'
There is a several examples showing that a Boolean formalism is meaningful in biology, in (Shmulevich and
Zhang, 2002), one reasoned that if the genes, when quantized to only two levels (1 or 0), would not be
informative in separating known sub-classes of tumors, then there would be little hope for Boolean modeling
of realistic genetic networks based on gene expression data.
Fortunately, the results were very promising. By using binary gene expression data, generated via cDNA
microarrays, and the Hamming distance as a similarity metric, a clear separation between dierent sub-types
of gliomas as well as between dierent sarcomas was showed. This seems to suggest that a good deal of
meaningful biological information, to the extent that it is contained in the measured continuous-domain gene
expression data, is retained when it is binarized.

1.4.1.3 Biological Example


Below an example id presented, borrowed from (Shmulevich et al., 2002), showing the logical representation
of cell cycle regulation. This process of cellular growth and division is highly regulated. A disbalance
in this process results in unregulated cell growth in diseases such as cancer. In order for cells to move
from the G1 phase to the S phase, when the genetic material, DNA, is replicated for the daughter cells, a
series of molecules such as cyclin E and cyclin dependent kinase 2 (cdk2) work together to phosphorylate
the retinoblastoma (Rb) protein and inactivate it, thus releasing cells into the S phase. Cdk2/cyclin E is
regulated by two switches: the positive switch complex called cdk activating kinase (CAK) and the negative
switch p21/WAF1. The CAK complex can be composed of two gene products: cyclin H and cdk7. When
cyclin H and cdk7 are present, the complex can activate cdk2/cyclin E. A negative regulator of cdk2/cyclin
E is p21/WAF1, which in turn can be activated by p53. When p21/WAF1 binds to cdk2/cyclin E, the kinase
24 CHAPTER 1. BIOLOGICAL BACKGROUND

complex is turned o (Gartel and Tyner, 1999). Further, p53 can inhibit cyclin H, a positive regulator of
cyclin E/cdk2 (Schneider et al., 1998). This negative regulation is an important defensive system in the
cells. For example, when cells are exposed to mutagen, DNA damage occurs. It is to the benet of cells to
repair the damage before DNA replication so that the damaged genetic materials do not pass onto the next
generation. Extensive amount of work has demonstrated that DNA damage triggers switches that turn on
p53, which then turns on p21/WAF1. p21/WAF1 then inhibits cdk2/cyclin E, thus Rb becomes activated
and DNA synthesis stops. As an extra measure, p53 also inhibits cyclin H, thus turning o the switch
that turns on cdk2/cyclin E. Such delicate genetic switch networks in the cells are the basis for cellular
homeostasis  the ability of an organism to maintain equilibrium.
For purposes of illustration, let consider a simplied diagram, shown in Figure3 (Figure 1.16), illustrating
the eects of cdk7/cyclin H, cdk2/cyclin E, and p21/WAF1 on Rb. Thus, p53 and other known regulatory
factors are not considered. While this diagram represents the above relationships from a pathway perspective,
one may also represent the activity of Rb in terms of the other variables in a logic-based fashion. Figure4
(Figure 1.17) contains a logic circuit diagram of the activity of Rb (`on' or `o ') as a Boolean function of
four input variables: cdk7, cyclin H, cyclin E, and p21/WAF1. Note that cdk2 is shown to be completely
determined by the values of cdk7 and cyclin H using the AND operation and thus, cdk2 is not an independent
input variable. Also, in Figure3 (Figure 1.16), p21/WAF1 is shown to have an inhibitive eect on the
cdk2/cyclin E complex, which in turn regulates Rb, while in Figure4 (Figure 1.17), we see that from a
logic-based perspective, the value of p21/WAF1 works together with cdk2 and cyclin E to determine the
value of Rb.

Figure 1.16: A diagram illustrating the cell cycle regulation example. Arrowed lines represent activation
and lines with bars at the end represent inhibition.
25

Figure 1.17: The logic diagram describing the activity of retinoblastoma (Rb) protein in terms of 4
inputs: cdk7, cyclin H, cyclin E, and p21. The gate with inputs cdk7 and cyclin H is an AND gate, the
gate with input p21/WAF1 is a NOT gate, and the gate whose output is Rb is a NAND (negated AND)
gate.

note: Probabilistic Boolean Networks (Section 1.4.2.1: Probabilistic Boolean Networks)

note: Bayesian Networks (Section 1.4.2.2: Bayesian Networks)

22
1.4.2 Probabilistic Boolean and Bayesian Networks

1.4.2.1 Probabilistic Boolean Networks


In a Boolean network, each (target) gene is `predicted' by several other genes by means of a Boolean function
(predictor). Thus, after having inferred such a function from gene expression data, it could be concluded
that if we observe the values of the predictive genes, we know, with full certainty, the value of the target
gene. Conceptually, such an inherent determinism seems problematic as it assumes an environment with no
uncertainty. However, the data that used for the inference exhibits uncertainty on several levels.
Another class model called Probabilistic Boolean Networks (PBNs) (Shmulevich et al., 2002) shares the
appealing properties of Boolean networks, but is able to cope with uncertainty, both in the data and the
model selection. A model incorporates only a partial description of a physical system. This means that a
Boolean function giving the next state of a variable is likely to be only partially accurate.
The basic idea is to extend the Boolean network to accommodate more than one possible function for
each node. Thus, to every node xi. , their corresponds a set Fi={ fj },j=1,..., l(i), Where each fj is a
possible function determining the value of gene xi and l(i) is the number of possible functions for gene xi.
A realization of the PBN at a given instant of time is determined by a vector of Boolean functions, where
xi. In other words,
the ith element of that vector contains the predictor selected at that instant for gene
the vector function fk:{0,1}^n mapps to {0,1}^n acts as a transition function (mapping) representing a
possible realization of the entire PBN. Such functions are commonly referred to as multiple-output Boolean
functions Each of the N possible realizations can be thought of as a standard Boolean network operates for
one time step. In other words, at every state x(t) belongs to {0,1}^n, one of the N Boolean networks is
chosen and used to make the transition to the next state x(t+1) belongs to {0,1}^n . The probability Pi
that the ith (Boolean) network or realization is selected can be easily expressed in terms of the individual
selection probabilities Cj see (Shmulevich et al., 2002). The dynamics of the PBN are essentially the same

22 This content is available online at <http://cnx.org/content/m12395/1.5/>.


26 CHAPTER 1. BIOLOGICAL BACKGROUND

as for Boolean networks, but at any given point in time, the value of each node is determined by one of the
possible predictors, chosen according to its corresponding probability.This can be interpreted by saying that
at any point in time, we have one out of N possible networks. The basic building block of a PBN is shown
in the Figure1. (Figure 1.18: AN EXAMPLE)

AN EXAMPLE

Figure 1.18: A basic building block of a probabilistic Boolean network. A number of predictors
share common inputs while their outputs are synthesized, in this case by random selection, into a single
output. This type of structure is known as a synthesis lter bank in digital signal processing literature.
The wiring diagram for the entire PBN would consist of n such building blocks. Although the `wiring' of
the inputs to each function is shown to be quite general, in practice, each function (predictor) has only
a few input variables.

1.4.2.2 Bayesian Networks


The well-studied statistical tool, Bayesian networks (Friedman et al.,2000; Pearl, 1988), represent the
dependence structure between multiple interacting quantities (e.g., expression levels of dierent genes).
Bayesian networks are a promising tool for analyzing gene expression patterns. First, they are particu-
larly useful for describing processes composed of locally interacting components; that is, the value of each
component directly depends on the values of a relatively small number of components. Second, statistical
27

foundations for learning Bayesian networks from observations, and computational algorithms to do so, are
well understood and have been used successfully in many applications. Finally, Bayesian networks provide
models of causal inuence: Although Bayesian networks are mathematically dened strictly in terms of
probabilities and conditional independence statements, a connection can be made between this characteri-
zation and the notion of direct causal inuence. (Heckerman et al., 1999; Pearl and Verma, 1991; Spirtes
et al.,1993). Although this connection depends on several assumptions that do not necessarily hold in gene
expression data, the conclusions of Bayesian network analysis might be indicative of some causal connections
in the data.
A Bayesian network (also known as causal probabilistic networks) is an annotated directed acyclic graph
X. Formally, a Bayesian network for
that encodes a joint probability distribution of a set of random variables
X is a pair B=(G,Q). The rst component, G, is a directed acyclic graph (DAG) whose vertices correspond
to the random variables x1, . . . , xn, and whose edges represent direct dependencies between the variables.
The graph G encodes the following set of independence statements: each variable xi is independent of its
nondescendants given its parents G. The second component of the pair, namely Q, represents the set of
parameters that quanties the network and describes a conditional distribution for each variable, given its
parents in G. Together, these two components specify a unique distribution on x1, . . . , xn. The graph
G represents conditional independence assumptions that allow the joint distribution to be decomposed,
economizing on the number of parameters. The graph G encodes the Markov Assumption: (Each variable
Xi is independent of its nondescendants, given its parents in G. Given a Bayesian network, we might want
to answer many types of questions that involve the joint probability (e.g., what is the probability of X =
x given observation of some of the other variables?) or independencies in the domain (e.g., are X and Y
independent once we observe Z?). The literature contains a suite of algorithms that can answer such queries
eciently by exploiting the explicit representation of structure (Jensen, 1996; Pearl, 1988).

1.4.2.3 Biological Example


Let apply the approach to the data of Spellman,(Spellman et al., 1998). This data set contains 76 gene
expression measurements of the mRNA levels of 6177 S. cerevisiae ORFs. These experiments measure six
time series under dierent cell cycle synchronization methods. Spellman et al., (1998) identied 800 genes
whose expression varied over the dierent cell-cycle stages. In learning from this data, one treat each mea-
surement as an independent sample from a distribution and do not take into account the temporal aspect
of the measurement. Since it is clear that the cell cycle process is of a temporal nature, compensatation
is done by introducing an additional variable denoting the cell cycle phase. This variable is forced to be
a root in all the networks learned. Its presence allows one to model dependency of expression levels on
the current cell cycle phase.3 Two experiments were performed, one with the discrete multinomial distri-
bution, the other with the linear Gaussian distribution. The learned features show that we can recover
intricate structure even from such small data sets. It is important to note that a learning algorithm uses
no prior biological knowledge nor constraints. All learned networks and relations are based solely on the in-
formation conveyed in the measurements themselves. These results are available at the following web page:
http://www.cs.huji.ac.il/labs/compbio/expression
23 . The Figure2. (Figure 1.19: SVS1 Gene Interaction
Network) illustrates the graphical display of some results from this analysis.

23 http://www.cs.huji.ac.il/labs/compbio/expression
28 CHAPTER 1. BIOLOGICAL BACKGROUND

SVS1 Gene Interaction Network

Figure 1.19: The graph shows a local Bayesian network for the gene SVS1. The width (and color) of
edges corresponds to the computed con. dence level. An edge is directed if there is a suf. ciently high
con. dence in the order between the genes connected by the edge. This local map shows that CLN2
separates SVS1 from several other genes. Although there is a strong connection between CLN2 to all
these genes, there are no other edges connecting them. This indicates that, with high con. dence, these
genes are conditionally independent given the expression level of CLN2.

1.4.2.4
note: Boolean Networks (Section 1.4.1.2: Boolean Networks)
Chapter 2

Microarray

2.1 Normalisation
2.2 Data Analysis
2.3 Dierential Expressions
2.4 Classication (Clustering)
2.5 Networks

29
30 CHAPTER 2. MICROARRAY
Chapter 3

Publications/References

3.1 Glossary
1
3.1.1 Glossary

3.1.1.1 Alphabet
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

1 This content is available online at <http://cnx.org/content/m12371/1.13/>.

31
32 GLOSSARY

Glossary

A ADENINE ASX (ASPARAGINE OR ASPARTIC


One of the four bases in DNA (Denition:
ACID)(B)
"DNA", p. ??) that make up the letters
ATGC, adenine is the "A". The others B BASE PAIR
are guanine (Denition: "GUANINE", p. Two bases which form a "rung of the DNA
??), cytosine (Denition: "CYTOSINE", (Denition: "DNA", p. ??) ladder." A
p. ??), and thymine (Denition: DNA nucleotide (Denition:
"THYMINE", p. ??). Adenine always "NUCLEOTIDE", p. ??) is made of a
pairs with thymine. (from National molecule of sugar, a molecule of
Human Genome Research Institute)
2
phosphoric acid, and a molecule called a

ALA (ALANINE)(A) base. The bases are the "letters" that


spell out the genetic code. In DNA, the
One of the twenty naturally occurring
code letters are A, T, G, and C, which
amino acids (Denition: "AMINO
ACIDS", p. ??). (from BioTech
stand for the chemicals adenine
(Denition: "ADENINE", p. ??),
Dictionary)
3
thymine (Denition: "THYMINE", p.
AMINO ACIDS ??), guanine (Denition: "GUANINE",
A group of 20 dierent kinds of small p. ??), and cytosine (Denition:

molecules that link together in long "CYTOSINE", p. ??), respectively. In

chains to form proteins (Denition: base pairing, adenine always pairs with

"PROTEIN", p. ??). Often referred to thymine, and guanine always pairs with

as the "building blocks" of proteins. cytosine. (from National Human Genome


Research Institute)
7
(from National Human Genome Research
Institute)
4

ANAPHASE C CENTROMERE
the stage of meiosis or mitosis when The constricted region near the center of a

chromosomes move toward opposite ends human chromosome (Denition:

of the nuclear spindle. (from WordNet)


5 "CHROMOSOME", p. ??). This is the
region of the chromosome where the two
ARG (ARGININE)(R) sister chromatids (Denition:
One of the twenty naturally occurring "CHROMATID", p. ??) are joined to
amino acids (Denition: "AMINO one another. (from National Human
ACIDS", p. ??). (from BioTech Genome Research Institute)
8
6
Dictionary)
CHROMATID
ASN (ASPARAGINE)(N) one of two identical strands into which a
ASP (ASPARTIC ACID)(D) chromosome (Denition:
2 http://www.genome.gov/glossary.cfm?key=adenine
3 http://biotech.icmb.utexas.edu/search/dict-search.html
4 http://www.genome.gov/glossary.cfm?key=amino%20acids
5 http://wordnet.princeton.edu/index.shtml
6 http://biotech.icmb.utexas.edu/search/dict-search.html
7 http://www.genome.gov/glossary.cfm?key=base%20pair
8 http://www.genome.gov/glossary.cfm?key=centromere
9 http://wordnet.princeton.edu/index.shtml
GLOSSARY 33

"CHROMOSOME", p. ??) splits during "RNA", p. ??) sequence which specify a


mitosis (Denition: "MITOSIS", p. ??). single amino acid (Denition: "AMINO
(from WordNet)
9 ACIDS", p. ??). (from National Human
13
CHROMATIN (CHROMATIN Genome Research Institute)

GRANULE) COMPLEMENTARY DNA (CDNA)


the readily stainable substance of a cell
nucleus (Denition: "NUCLEUS", p. ??) a single-stranded DNA (Denition:

consisting of DNA (Denition: "DNA",


"DNA", p. ??) synthesized from a
p.??) and RNA (Denition: "RNA", p. mature mRNA (Denition: "MRNA", p.

??) and various proteins (Denition: ??) template. cDNA is often used to

"PROTEIN", p. ??); during mitotic


clone eukaryotic genes (Denition:

(Denition: "MITOSIS", p. ??) division


"GENE", p. ??) in prokaryotes. (from
Wikipedia)
14
the chromatin condenses into
chromosomes (Denition: COVALENT BOND
"CHROMOSOME", p. ??)(from
WordNet)
10 A bond between two or more atoms that is
provided by electrons that travel between
CHROMOSOME the atoms' nuclei (Denition:
One of the threadlike "packages" of genes "NUCLEUS", p. ??), holding them
(Denition: "GENE", p. ??) and other together but keeping them a stable
DNA (Denition: "DNA", p. ??) in the distance apart. (from BioTech
nucleus (Denition: "NUCLEUS", p. ??) Dictionary)
15
of a cell. Dierent kinds of organisms
have dierent numbers of chromosomes.
CROSSING OVER
Humans have 23 pairs of chromosomes, The breaking during meiosis (Denition:
46 in all: 44 autosomes and two sex "MEIOSIS", p. ??) of one maternal and
chromosomes. Each parent contributes one paternal chromosome (Denition:
one chromosome to each pair, so children "CHROMOSOME", p. ??), the exchange
get half of their chromosomes from their of corresponding sections of DNA
mothers and half from their fathers. (Denition: "DNA", p. ??), and the
(from National Human Genome Research rejoining of the chromosomes. This
Institute)
11
process can result in an exchange of
CLUSTER (GENE CLUSTER) alleles between chromosomes. (from
Human Genome Project Information)
16
A set of closely related genes (Denition:
"GENE", p. ??) that code for the same CROSSLINKING
or similar proteins (Denition:
"PROTEIN", p. ??) and which are The linking of two strands of DNA by

usually grouped together on the same covalent bonds (Denition: "COVALENT

chromosome (Denition: BOND", p. ??) (as opposed to the


"CHROMOSOME", p. ??). (from normal hydrogen bonds between base

BioTech Dictionary)
12 pairs (Denition: "BASE PAIR", p. ??)
CODON ), which can occur by exposure to X-rays.
(from BioTech Dictionary)
17
Three bases in a DNA (Denition:
"DNA", p. ??) or RNA (Denition: CYS (CYSTEINE)(C)
10 http://wordnet.princeton.edu/index.shtml
11 http://www.genome.gov/glossary.cfm?key=chromosome
12 http://biotech.icmb.utexas.edu/search/dict-search.mhtml
13 http://www.genome.gov/glossary.cfm?key=codon
14 http://en.wikipedia.org/wiki/CDNA
15 http://biotech.icmb.utexas.edu/search/dict-search.html
16 http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml
17 http://biotech.icmb.utexas.edu/search/dict-search.html
34 GLOSSARY

One of the twenty naturally occurring genes (hence the name) in a gene-cluster.
amino acids (Denition: "AMINO (from Wikipedia)
23
ACIDS", p. ??). (from BioTech
EXON
Dictionary)
18

CYTOPLASM The region of a gene (Denition:


"GENE", p. ??) that contains the code
All the contents of a cell, including the for producing the gene's protein
plasma membrane, but not including the (Denition: "PROTEIN", p. ??). Each
nucleus (Denition: "NUCLEUS", p. exon (Denition: "EXON", p. ??) codes
??). (from UCMP Glossary)19 for a specic portion of the complete

CYTOSINE protein (Denition: "PROTEIN", p. ??).


In some species (including humans), a
One of the four bases in DNA (Denition:
"DNA", p. ??) that make up the letters gene's (Denition: "GENE", p. ??)
exons are separated by long regions of
ATGC, cytosine is the "C". The others
are adenine (Denition: "ADENINE", p.
DNA (Denition: "DNA", p. ??) (called
??), guanine (Denition: "GUANINE", introns or sometimes "junk DNA") that

p. ??), and thymine (Denition:


have no apparent function. (from

"THYMINE", p. ??). Cytosine always


National Human Genome Research
24
Institute)
pairs with guanine (Denition:
"GUANINE", p. ??). (from National EXPRESSION (GENE EXPRESSION)
Human Genome Research Institute)
20
The process by which a gene's (Denition:
"GENE", p. ??) coded information is
D DNA MICROARRAY (DNA CHIP) converted into the structures present and
a piece of glass or plastic on which operating in the cell. Expressed genes
single-stranded pieces of DNA include those that are transcribed
(Denition: "DNA", p. ??) have been (Denition: "TRANSCRIPTION", p.
axed in a microscopic array. (from ??) into mRNA (Denition: "MRNA", p.
Wikipedia)
21 ??) and then translated into protein
(Denition: "PROTEIN", p. ??) and
DNA
those that are transcribed into RNA
The chemical inside the nucleus (Denition: "RNA", p. ??) but not
(Denition: "NUCLEUS", p. ??) of a cell translated into protein. (from BioTech
that carries the genetic instructions for Dictionary)
25
making living organisms. (from National
22
Human Genome Research Institute)
G GAMETE
E ENHANCER Mature male or female reproductive cell
(sperm or ovum) with a haploid set of
a short region of DNA (Denition:
"DNA", p. ??) which can be bound with chromosomes (Denition:
??) (23 for
proteins (Denition: "PROTEIN", p. ??)
"CHROMOSOME", p.
humans). (from Human Genome Project
(namely, the trans-acting factors, much 26
Information)
like a set of transcription factors) to
enhance transcription levels of nearby GENE
18 http://biotech.icmb.utexas.edu/search/dict-search.html
19 http://www.ucmp.berkeley.edu/glossary/glossary.html
20 http://www.genome.gov/glossary.cfm?key=cytosine
21 http://en.wikipedia.org/wiki/DNA_microarray
22 http://www.genome.gov/glossary.cfm?key=deoxyribonucleic%20acid%20%28DNA%29
23 http://en.wikipedia.org/wiki/Enhancer
24 http://www.genome.gov/glossary.cfm?key=exon
25 http://biotech.icmb.utexas.edu/search/dict-search.html
26 http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml
GLOSSARY 35

The functional and physical unit of One of the twenty naturally occurring
heredity passed from parent to ospring. amino acids (Denition: "AMINO
Genes are pieces of DNA (Denition: ACIDS", p. ??). (from BioTech
"DNA", p. ??), and most genes contain Dictionary)
32
the information for making a specic
protein (Denition: "PROTEIN", p. ??). GUANINE
(from National Human Genome Research One of the four bases in DNA (Denition:
Institute)
27
"DNA", p. ??) that make up the letters
GENETIC MAP (LINKAGE MAP) ATGC, guanine is the "G". The others
are adenine (Denition: "ADENINE", p.
a chromosome (Denition: ??), cytosine (Denition: "CYTOSINE",
"CHROMOSOME", p. ??) map of a p. ??), and thymine (Denition:
species that shows the position of its "THYMINE", p. ??). Guanine always
known genes (Denition: "GENE", p. pairs with cytosine (Denition:
??) and/or markers relative to each other, "CYTOSINE", p. ??). (from National
rather than as specic physical points on Human Genome Research Institute)
33
each chromosome. (from National
28
Human Genome Research Institute)
H HIS (HISTIDINE)(H)
GENOME
One of the twenty naturally occurring
All the DNA (Denition: "DNA", p. ??) amino acids (Denition: "AMINO
contained in an organism or a cell, which ACIDS", p. ??). (from BioTech
includes both the chromosomes Dictionary)
34
(Denition: "CHROMOSOME", p. ??)
within the nucleus (Denition: HYBRIDIZATION
"NUCLEUS", p. ??) and the DNA in A genetics lab technique used to identify
mitochondria. (from National Human which colonies of bacteria on a plate
Genome Research Institute)
29
contain a particular sequence of DNA

GLN (GLUTAMINE)(Q) (Denition: "DNA", p. ??) or a


particular gene (Denition: "GENE", p.
One of the twenty naturally occurring
??). The technique involves pressing a
amino acids (Denition: "AMINO
nylon or nitrocellulose membrane onto
ACIDS", p. ??). (from BioTech
the plate so that each colony contributes
Dictionary)
30
a small smudge of itself to the membrane,
GLU (GLUTAMIC ACID)(E) then treating the membrane with

GLX (GLUTAMINE OR GLUTAMIC chemicals and heat, then washing the

ACID)(Z) membrane with a labeled probe to nd


the specic DNA sequence. The smudges
One of the twenty naturally occurring which are indicated by the probe are then
amino acids (Denition: "AMINO compared back to the colonies on the
ACIDS", p. ??). (from BioTech plate. (from BioTech Dictionary)
35
Dictionary)
31 .

GLY (GLYCINE)(G) I ILE (ISOLEUCINE)(I)


27 http://www.genome.gov/glossary.cfm?key=gene
28 http://www.genome.gov/glossary.cfm?key=genetic%20map
29 http://www.genome.gov/glossary.cfm?key=genome
30 http://biotech.icmb.utexas.edu/search/dict-search.html
31 http://biotech.icmb.utexas.edu/search/dict-search.html
32 http://biotech.icmb.utexas.edu/search/dict-search.html
33 http://www.genome.gov/glossary.cfm?key=guanine
34 http://biotech.icmb.utexas.edu/search/dict-search.html
35 http://biotech.icmb.utexas.edu/search/dict-search.html
36 http://biotech.icmb.utexas.edu/search/dict-search.html
36 GLOSSARY

One of the twenty naturally occurring The phase of mitosis (Denition:


amino acids (Denition: "AMINO "MITOSIS", p. ??), or cell division,
ACIDS", p. ??). (from BioTech when the chromosomes (Denition:
Dictionary)
36 "CHROMOSOME", p. ??) align along
IN SITU HYBRIDIZATION the center of the cell. Because metaphase
chromosomes are highly condensed,
The base pairing of a sequence of DNA
scientists use these chromosomes for gene
(Denition: "DNA", p. ??) to metaphase (Denition: "GENE", p. ??) mapping
chromosomes (Denition:
and identifying chromosomal aberrations.
"CHROMOSOME", p. ??) on a (from National Human Genome Research
microscope slide. (from National Human
Institute)
42
Genome Research Institute)
37

INTRON MITOSIS
A noncoding sequence of DNA (Denition: The process of nuclear division in cells

"DNA", p. ??) that is initially copied that produces daughter cells that are

into RNA (Denition: "RNA", p. ??) genetically identical to each other and to

but is cut out of the nal RNA the parent cell. (from Human Genome
43
(Denition: "RNA", p. ??) transcript. Project Information)

(from National Human Genome Research


38
MRNA
Institute)
Template for protein (Denition:

L LEU (LEUCINE)(L) "PROTEIN", p. ??) synthesis. Each set


of three bases, called codons (Denition:
One of the twenty naturally occurring
"CODON", p. ??), species a certain
amino acids (Denition: "AMINO
protein in the sequence of amino acids
ACIDS", p. ??). (from BioTech
(Denition: "AMINO ACIDS", p. ??)
Dictionary)
39
that comprise the protein (Denition:
LYS (LYSINE)(K) "PROTEIN", p. ??). The sequence of a
strand of mRNA (Denition: "MRNA",
M MEIOSIS p. ??) is based on the sequence of a
The process of two consecutive cell complementary strand of DNA

divisions in the diploid progenitors of sex (Denition: "DNA", p. ??). (from

cells. Meiosis results in four rather than National Human Genome Research
Institute)
44
two daughter cells, each with a haploid
set of chromosomes (Denition:
"CHROMOSOME", p. ??). (from N NORTHERN BLOT
Human Genome Project Information)
40
A technique used to identify and locate
MET (METHIONINE)(M) mRNA (Denition: "MRNA", p. ??)
One of the twenty naturally occurring sequences that are complementary to a

amino acids (Denition: "AMINO piece of DNA (Denition: "DNA", p. ??)


ACIDS", p. ??). (from BioTech called a probe. (from National Human
45 .
Dictionary)
41 Genome Research Institute)

METAPHASE NUCLEOTIDE
37 http://www.genome.gov/glossary.cfm?key=in%20situ%20hybridization
38 http://www.genome.gov/glossary.cfm?key=intron
39 http://biotech.icmb.utexas.edu/search/dict-search.html
40 http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml
41 http://biotech.icmb.utexas.edu/search/dict-search.html
42 http://www.genome.gov/glossary.cfm?key=metaphase
43 http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml
44 http://www.genome.gov/glossary.cfm?key=messenger%20rna%20%28mrna%29
45 http://www.genome.gov/glossary.cfm?key=northern%20blot
GLOSSARY 37

One of the structural components, or A protein (Denition: "PROTEIN", p.


building blocks, of DNA (Denition: ??) or part of a protein (Denition:
"DNA", p. ??) and RNA (Denition: "PROTEIN", p. ??) made of a chain of
"RNA", p. ??). A nucleotide consists of amino acids (Denition: "AMINO
a base (one of four chemicals: adenine ACIDS", p. ??) joined by a peptide
(Denition: "ADENINE", p. ??), bond. (from Human Genome Project
thymine (Denition: "THYMINE", p. Information)
51
??), guanine (Denition: "GUANINE", PRO (PROLINE)(P)
p. ??), and cytosine (Denition:
"CYTOSINE", p. ??)) plus a molecule of
One of the twenty naturally occurring
amino acids (Denition: "AMINO
sugar and one of phosphoric acid. (from
National Human Genome Research
ACIDS", p. ??). (from BioTech
46 Dictionary)
52 .
Institute)

NUCLEUS PROMOTER
The central cell structure that houses the
a DNA (Denition: "DNA", p. ??)
sequence that enables a gene (Denition:
chromosomes (Denition:
"CHROMOSOME", p. ??). (from
"GENE", p. ??) to be transcribed
(Denition: "TRANSCRIPTION", p.
National Human Genome Research
Institute)
47 ??). The promoter is recognized by RNA
(Denition: "RNA", p. ??) polymerase,
O OLIGO which then initiates transcription. (from
53 .
Wikipedia)
Oligonucleotide, short sequence of
single-stranded DNA (Denition:
PROPHASE
"DNA", p. ??) or RNA (Denition: 1. the rst stage of meiosis (Denition:
"RNA", p. ??). Oligos are often used as "MEIOSIS", p. ??)
probes for detecting complementary DNA
2. the rst stage of mitosis (Denition:
or RNA because they bind readily to
"MITOSIS", p. ??)(from WordNet)54
their complements. (from National
Human Genome Research Institute)
48 PROTEIN
A large complex molecule made up of one
P PHE (PHENYLALANINE)(F) or more chains of amino acids (Denition:

One of the twenty naturally occurring


"AMINO ACIDS", p. ??). Proteins
perform a wide variety of activities in the
amino acids (Denition: "AMINO
ACIDS", p. ??). (from BioTech
cell. (from National Human Genome
55
Dictionary)
49 Research Institute)

POLYMER R RECOMBINATION
A polymer is formed from the fusion of
The process by which progeny derive a
two monomers which join completely
combination of genes (Denition:
without losing any small molecules.
50 .
"GENE", p. ??) dierent from that of
(from BioTech Dictionary)
either parent. In higher organisms, this
POLYPEPTIDE can occur by crossing over (Denition:
46 http://www.genome.gov/glossary.cfm?key=nucleotide
47 http://www.genome.gov/glossary.cfm?key=nucleus
48 http://www.genome.gov/glossary.cfm?key=oligo
49 http://biotech.icmb.utexas.edu/search/dict-search.html
50 http://biotech.icmb.utexas.edu/search/dict-search.html
51 http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml
52 http://biotech.icmb.utexas.edu/search/dict-search.html
53 http://en.wikipedia.org/wiki/Promoter
54 http://wordnet.princeton.edu/index.shtml
55 http://www.genome.gov/glossary.cfm?key=protein
38 GLOSSARY

"CROSSING OVER", p. ??). (from another piece of DNA called a probe.


Human Genome Project Information)
56 (from National Human Genome Research
62 .
REPLICATION Institute)

The process by which DNA (Denition: SPLICING


"DNA", p. ??) copies itself before cell The joining of separate strands of DNA
division. Unless mutation occurs, the new (Denition: "DNA", p. ??) or RNA
copy of DNA (Denition: "DNA", p. ??) (Denition: "RNA", p. ??). (from
is identical to the original DNA Wikipedia)
63 .
(Denition: "DNA", p. ??). (from
57
HOPES) T TELOMERE
RIBOSOME The end of a chromosome (Denition:
Cellular organelle that is the site of "CHROMOSOME", p. ??). This
protein (Denition: "PROTEIN", p. ??) specialized structure is involved in the
synthesis (from National Human Genome replication (Denition:
Research Institute)
58 "REPLICATION", p. ??) and stability
RNA of linear DNA (Denition: "DNA", p.
??) molecules. (from Human Genome
A chemical similar to a single strand of
Project Information)
64
DNA (Denition: "DNA", p. ??). In
RNA, the letter U, which stands for
TELOPHASE
uracil (Denition: "URACIL", p. ??), is 1. the nal stage of meiosis (Denition:
substituted for T (Denition: "MEIOSIS", p. ??) when the
"THYMINE", p. ??) in the genetic code. chromosomes (Denition:
RNA delivers DNA (Denition: "DNA", "CHROMOSOME", p. ??) move toward
p.??)'s genetic message to the cytoplasm opposite ends of the nuclear spindle
(Denition: "CYTOPLASM", p. ??) of a
2. the nal stage of mitosis (Denition:
cell where proteins (Denition:
"PROTEIN", p. ??) are made. (from
??)(from WordNet)65
"MITOSIS", p.

National Human Genome Research THR (THREONINE)(T)


Institute)
59 60
One of the twenty naturally occurring
amino acids (Denition: "AMINO
S SER (SERINE)(S) ACIDS", p. ??). (from BioTech
Dictionary)
66 .
One of the twenty naturally occurring
amino acids (Denition: "AMINO THYMINE
ACIDS", p. ??). (from BioTech
Dictionary)
61 . One of the four bases in DNA (Denition:
"DNA", p. ??) that make up the letters
SOUTHERN BLOT ATGC, thymine is the "T". The others
A technique used to identify and locate are adenine (Denition: "ADENINE", p.
DNA (Denition: "DNA", p. ??) ??), guanine (Denition: "GUANINE",
sequences which are complementary to p. ??), and cytosine (Denition:
56 http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml
57 http://www.stanford.edu/group/hopes/sttools/gloss/r.html
58 http://www.genome.gov/glossary.cfm?key=ribosome
59 http://www.genome.gov/glossary.cfm?key=ribonucleic%20acid%20%28rna%29
60 http://www.genome.gov/glossary.cfm?key=ribonucleic%20acid%20%28rna%29
61 http://biotech.icmb.utexas.edu/search/dict-search.html
62 http://www.genome.gov/glossary.cfm?key=southern%20blot
63 http://en.wikipedia.org/wiki/Splicing
64 http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml
65 http://wordnet.princeton.edu/index.shtml
66 http://biotech.icmb.utexas.edu/search/dict-search.html
67 http://www.genome.gov/glossary.cfm?key=thymine
GLOSSARY 39

"CYTOSINE", p. ??). Thymine always complementary to the triplet nucleotide


pairs with adenine. (from National coding sequences of mRNA (Denition:
Human Genome Research Institute)
67 "MRNA", p. ??). The role of tRNAs in

TRANSCRIPTION FACTOR protein (Denition: "PROTEIN", p. ??)


synthesis is to bond with amino acids
a protein that binds DNA (Denition:
(Denition: "AMINO ACIDS", p. ??)
"DNA", p. ??) at a specic promoter and transfer them to the ribosomes,
(Denition: "PROMOTER", p. ??) or
where proteins are assembled according
enhancer (Denition: "ENHANCER", p.
to the genetic code carried by mRNA
??) region or site, where it regulates (Denition: "MRNA", p. ??). (from
transcription (Denition:
Human Genome Project Information)
71
"TRANSCRIPTION", p. ??).
Transcription factors can be selectively TRP TRYPTOPHAN)(W)
activated or deactivated by other One of the twenty naturally occurring
proteins, often as the nal step in signal amino acids (Denition: "AMINO
68 .
transduction. (from Wikipedia) ACIDS", p. ??). (from BioTech
TRANSCRIPTION Dictionary)
72 .

the organic process whereby the DNA TYR (TYROSINE)(Y)


(Denition: "DNA", p. ??) sequence in a
gene (Denition: "GENE", p. ??) is U URACIL
copied into mRNA (Denition:
"MRNA", p. ??); the process whereby a One of the four bases in RNA (Denition:
"RNA", p. ??). The others are adenine
base sequence of messenger RNA
(Denition: "MRNA", p. ??) is (Denition: "ADENINE", p. ??),
guanine (Denition: "GUANINE", p.
synthesized on a template of
??), and cytosine (Denition:
"CYTOSINE", p. ??). Uracil replaces
complementary DNA (Denition:
??)(from WordNet)69
"DNA", p.
thymine (Denition: "THYMINE", p.
TRANSLATION ??), which is the fourth base in DNA
the process whereby genetic information (Denition: "DNA", p. ??). Like

coded in messenger RNA (Denition: thymine (Denition: "THYMINE", p.

"MRNA", p. ??) directs the formation of ??), uracil always pairs with adenine
a specic protein (Denition: (Denition: "ADENINE", p. ??). (from

"PROTEIN", p. ??) at a ribosome National Human Genome Research

"RIBOSOME", p. ??) in the


73
Institute)
(Denition:
cytoplasm (Denition: "CYTOPLASM",
p. ??). (from WordNet)
70 V VAL (VALINE)(V)
TRNA One of the twenty naturally occurring

A class of RNA (Denition: "RNA", p. amino acids (Denition: "AMINO

??) having structures with triplet ACIDS", p. ??). (from BioTech


Dictionary)
74 .
nucleotide sequences that are

68 http://en.wikipedia.org/wiki/Transcription_factor
69 http://wordnet.princeton.edu/index.shtml
70 http://wordnet.princeton.edu/index.shtml
71 http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml
72 http://biotech.icmb.utexas.edu/search/dict-search.html
73 http://www.genome.gov/glossary.cfm?key=uracil
74 http://biotech.icmb.utexas.edu/search/dict-search.html
40 BIBLIOGRAPHY
Bibliography

[1] Kemp D.J. Stark G.R. Alwine, J.C. Method for detection of specic rnas in agarose gels by transfer
to diazobenzyloxymethyl-paper and hybridization with dna probes. Proc. Natl. Acad. Sci. U. S. A.,
74:53508211;5354, 1977.

[2] D.L. Bowtell. Options available8212;from start to nish8212;for obtaining expression data by microarray.
Nat. Genet. Suppl., 21:258211;32, 1999.
[3] Heye L.J. Campbell, A.M. Discovering genomics, proteomics and bioinformatics. CSHL Press and
Benjamin Cummings, San Francisco, CA., 2003.
[4] Morley M. Aguilar F. Massimi A. Kucherlapati R. Childs G. Cheung, V.G. Making and reading mi-
croarrays. nature genetics supplement, 21:1519, 1999.
[5] Morley M. Aguilar F. Massimi A. Kucherlapati R. Childs G. Cheung, V.G. Making and reading mi-
croarrays. nature genetics supplement, 21:1519, 1999.
[6] Iyer V. DeRisi, J. and P. Brown. Exploring the metabolic and genetic control of gene expression on a
genomic scale. Science, 282:6998211;705, 1997.
[7] Robertson D.J. Vrana K.E. Freeman, W.M. Fundamentals of dna hybridization arrays for gene expres-
sion analysis. BioTechniques, 29:10428211;1055, 2000.
[8] Linial M. Nachman I.. Pe8217;er D. Friedman, N. Using bayesian networks to analyze expression data.
Journal of Computational Biology, 7:6018211;620, 2000.
[9] Tyner A.L. Gartel, A.L. Transcriptional regulation of the p21(waf1/cip1) gene. Experimental Cell
Research, 246(2):280289, 1999.
[10] Slonim D.K. Tamayo P. Huard C. Gaasenbeek M. Mesirov J.P. Coller H. Loh M.L. Downing J.R.
Caligiuri M. Abloomeld C.D. Lander E.S. Golub, T.R. Molecular classication of cancer: Class dis-
covery and class prediction by gene expression monitoring. Science, 286:5318211;537, 1999.
[11] Slonim D.K. Tamayo P. Huard C. Gaasenbeek M. Mesirov J.P. Coller H. Loh M.L. Downing J.R.
Caligiuri M.A. et al. Golub, T.R. Molecular classication of cancer: class discovery and class prediction
by gene expression monitoring. Science, 286:531  537, 1999.
[12] Meek C. Heckerman, D. and G. Cooper. A bayesian approach to causal discovery. in Cooper and
Glymour, page 1418211;166, 1999.
[13] Qi R. Abernathy K. Gay C. Dharap S. Gaspard R. Hughes J.E. Snesrud E. Lee N. Quackenbush J.
Hegde, P. A concise guide to microarray analysis. BioTechniques, 29:5488211;562, 2002.
[14] S. Huang. Gene expression proling, genetic networks, and cellular states: an integrating concept for
tumorigenesis and drug discovery. Journal of Molecular Medicine, 77:469480, 1999.

41
42 BIBLIOGRAPHY

[15] Eisen M. Ross D. Schuler G. Moore T. Lee J. Trent J. Staudt L. Hudson J. Boguski M. Lashkari D.
Shalon D. Botstein D. Iyer, V. and P Brown. The transcriptional program in the response of human
broblasts to serum. Science, 283:838211;87, 1999.
[16] F.V. Jensen. An introduction to bayesian networks. University College London Press, London, 1996.
[17] Glass K. Kauman, S.A. The logical analysis of continuous, nonlinear biochemical control networks.
Journal of Theoretical Biology, 39:103129, 1973.
[18] S.A. Kauman. Homeostasis and dierentiation in random genetic control networks. Nature, 224:177
178, 1969.

[19] S.A. Kauman. Metabolic stability and epigenesis in randomly constructed genetic nets. Journal of
Theoretical Biology, 22:437 467, 1969.
[20] S.A. Kauman. The large scale structure and dynamics of genetic control circuits: an ensemble approach.
Journal of Theoretical Biology, 44:167190, 1974.
[21] S.A. Kauman. The origins of order: Self-organization and selection in evolution. Oxford University
Press, New York, 1993.
[22] S. Knudsen. A biologist8217;s guide to analysis of dna microarray data. Wiley-Liss, New York, 2002.
[23] DeRisi J.L. McCusker J.H. Namath A.F. Gentile C. Hwang S.Y. Brown P.O. Davis R.W. Lashkari, D.A.
Yeast microarrays for genome wide parallel genetic and gene expression analysis. Proc. Natl. Acad. Sci.
U. S. A., 94:130578211;13062, 1997.
[24] Dong H. Byrne M.C. Follettie M.T. Gallo M.V. Chee M.S. Mittmann M. Wang C. Kobayashi M. Horton
H. Brown E.L. Lockhart, D.J. Expression monitoring by hybridization to high-density oligonucleotide
arrays. Nat Biotechnol, 14:1675  1680, 1996.
[25] Dong H. Byrne M.C. Follettie M.T. Gallo M.V. Chee M.S. Mittmann M. Want C. Kobayashi M. Hor-
ton H. Lockhart, D.J. and E.L. Brown. Dna expression monitoring by hybridization of high density
oligonucleotide arrays. Nature Biotechnology, 14:16758211;1680, 1996.
[26] J. Pearl. Probabilistic reasoning in intelligent systems. Morgan Kaufmann, San Francisco, 1988.
[27] J. Pearl and T.S. Verma. A theory of inferred causation.in Principles of Knowledge Representation
and Reasoning: Proc. Second International Conference (KR 8217;91), page 4418211;452, 1991.
[28] Tamayo P. Gaasenbeek M. Sturla L.M. Angelo M. McLaughlin M.E. Kim J.Y. Goumnerova L.C. Black
P.M. Lau C. et al.: Pomeroy, S.L. Prediction of central nervous system embryonal tumor outcome based
on gene expression. Nature, 415:436  442, 2002.
[29] Schwager C. Hentze S. Ansorge W. Hentze M.W. Muckenthaler M. Richter, A. Comparison of u-
orescent tag dna labeling methods used for expression analysis by dna microarrays. BioTechniques,
33:6208211;630, 2002.

[30] Li C. Ellis B. Wong W.H. Schadt, E.E. Feature extraction and normalization algorithms for high-density
oligonucleotide gene expression array data. J Cell Biochem Suppl, pages 120  125, 2001.
[31] M. Schena. Microarray analysis. Wiley-Liss, Hoboken, NJ., 2003.
[32] Shalon D. Davis R.W. Brown P.O. Schena, M. Quantitative monitoring of gene expression patterns
with a complementary dna microarray. Science, 270, 1995.
[33] Montenarh M. Wagner P. Schneider, E. Regulation of cak kinase activity by p53. Oncogene, 17:2733
2741, 1998.
BIBLIOGRAPHY 43

[34] Ross K.N. Tamayo P. Weng A.P. Kutok J.L. Aguiar R.C. Gaasenbeek M. Angelo M. Reich M. Pinkus
G.S. et al.: Shipp, M.A. Diuse large b-cell lymphoma outcome prediction by gene-expression proling
and supervised machine learning. Nat Med 2002, 8:68  74, 2002.
[35] Dougherty E.R. Kim S. Zhang W. Shmulevich, I. Probabilistic boolean networks: A rule-based uncer-
tainty model for gene regulatory networks. Bioinformatics, 18(2):261274, 2002.
[36] Dougherty E.R. Zhang W. Shmulevich, I. From boolean to probabilistic boolean networks as models of
genetic regulatory networks. Proceedings of the IEEE, 90(11):17781792, 2002.
[37] Dougherty R. Kim S. Zhang W. Shmulevich, I. Probabilistic boolean networks: A rule-based uncertainty
model for gene regulatory networks. Bioinformatics, 18(2):261274, 2002.
[38] Dougherty R. Zhang W. Shmulevich, I. From boolean to probabilistic boolean networks as models of
genetic regulatory networks. proceeding of the IEEE, 90(11):17781792, 2002.
[39] Zhang W. Shmulevich, I. Binary analysis and optimization- based normalization of gene expression
data. Bioinformatics, 18(4):555565, 2002.
[40] E.M. Southern. ) detection of specic sequences among dna fragments separated by gel electrophoresis.
J. Mol. Biol., 98:5038211;517, 1975.
[41] Sherlock G. Zhang M. Iyer V. Anders K. Eisen M. Brown P. Botstein D. Spellman, P. and B. Futcher.
Comprehensive identication of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by mi-
croarray hybridization. Mol. Biol. Cell, 9:32738211;3297, 1998.
[42] Sherlock G. Zhang M. Iyer V. Anders K. Eisen M. Brown P. Botstein D. Spellman, P. and B. Futcher.
Comprehensive identication of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by mi-
croarray hybridization. Mol. Biol. Cell, 9:32738211;3297, 1998.
[43] Glymour C. Spirtes, P. and R. Scheines. Causation, prediction, and search. Springer-Verlag, New York,
1993.

[44] Furhmann S. Micheals G.S. Carr D.B. Smith S. Barker J.L. Wen, X. and R. Somogyi. Large-scale
temporal gene expression mapping of central nervous system developmen. Proc. Nat. Acad. Sci. USA,
95:3348211;339, 1998.

[45] A. Wuensche. Genomic regulation modeled as a network with basins of attraction. Pacic Symposium
on Biocomputing, 3:89102, 1998.
44 INDEX

Index of Keywords and Terms


Keywords are listed by the section with that keyword (page numbers are in parentheses). Keywords
do not necessarily appear in the text of the page. They are merely associated with that section. Ex.
apples, Ÿ 1.1 (1) Terms are referenced by the page they appear on. Ex. apples, 1

A Aymetrix, Ÿ 1.3.2.1(12) G Gene Regulatory Networks, Ÿ 1.2.1(5)


Aymetrix chip, Ÿ 1.3.2.2(13) gene_clustering, Ÿ 1.3.3.1(17)
glossary, Ÿ 3.1.1(31)
B Bayesian Networks, Ÿ 1.4.2(25), 26
Boolean_network, Ÿ 1.4.1(20) H hybridization, Ÿ 1.3.2.2(13), Ÿ 1.3.3.1(17)

C cDNA, Ÿ 1.1.1(1) M mRNA, Ÿ 1.1.1(1)


cDNA microarray, Ÿ 1.3.1.1(7), Ÿ 1.3.1.2(8)
Chip, Ÿ 1.3.2.1(12)
O Oligonucleotide microarray, Ÿ 1.3.2.2(13)
Oligonucleotide_microarray, Ÿ 1.3.2.1(12)
D DNA, Ÿ 1.1.1(1)
DNA_microarray, Ÿ 1.3.3.1(17)
P Probabilistic Boolean Networks, Ÿ 1.4.2(25)
ATTRIBUTIONS 45

Attributions
Collection: Introduction to Bioinformatics
Edited by: Ewa Paszek
URL: http://cnx.org/content/col10240/1.3/
License: http://creativecommons.org/licenses/by/1.0

Module: "Dogma of Molecular Biology"


By: Ewa Paszek
URL: http://cnx.org/content/m12382/1.5/
Pages: 1-4
Copyright: Ewa Paszek
License: http://creativecommons.org/licenses/by/1.0

Module: "Gene Networks"


By: Ewa Paszek
URL: http://cnx.org/content/m12383/1.4/
Pages: 5-6
Copyright: Ewa Paszek
License: http://creativecommons.org/licenses/by/1.0

Module: "cDNA-Basic Concept"


By: Ewa Paszek
URL: http://cnx.org/content/m12384/1.5/
Pages: 7-8
Copyright: Ewa Paszek
License: http://creativecommons.org/licenses/by/1.0

Module: "cDNA-Detailed Information"


By: Ewa Paszek
URL: http://cnx.org/content/m12385/1.5/
Pages: 8-11
Copyright: Ewa Paszek
License: http://creativecommons.org/licenses/by/1.0

Module: "Aymetrix Chip-Basic Concepts"


By: Ewa Paszek
URL: http://cnx.org/content/m12387/1.4/
Pages: 12-13
Copyright: Ewa Paszek
License: http://creativecommons.org/licenses/by/1.0

Module: "Oligonucleotide Arrays-Detailed Information"


By: Ewa Paszek
URL: http://cnx.org/content/m12388/1.5/
Pages: 13-16
Copyright: Ewa Paszek
License: http://creativecommons.org/licenses/by/1.0
46 ATTRIBUTIONS

Module: "Data Analysis"


By: Ewa Paszek
URL: http://cnx.org/content/m12389/1.4/
Pages: 17-20
Copyright: Ewa Paszek
License: http://creativecommons.org/licenses/by/1.0

Module: "Boolean Networks"


By: Ewa Paszek
URL: http://cnx.org/content/m12394/1.5/
Pages: 20-25
Copyright: Ewa Paszek
License: http://creativecommons.org/licenses/by/1.0

Module: "Probabilistic Boolean and Bayesian Networks"


By: Ewa Paszek
URL: http://cnx.org/content/m12395/1.5/
Pages: 25-28
Copyright: Ewa Paszek
License: http://creativecommons.org/licenses/by/1.0

Module: "Glossary"
By: Lukasz Wita, Ewa Paszek
URL: http://cnx.org/content/m12371/1.13/
Page: 31
Copyright: Ewa Paszek
License: http://creativecommons.org/licenses/by/1.0
Based on: Glossary
By: Lukasz Wita
URL: http://cnx.org/content/m12137/1.1/
Introduction to Bioinformatics
This course is a short series of lectures on Statistical Bioinformatics. Topics covered are listed in the Table
of Contents. The notes were prepared by Ewa Paszek, Lukasz Wita and Marek Kimmel. The development
of this course has been supported by NSF 0203396 grant.

About Connexions
Since 1999, Connexions has been pioneering a global system where anyone can create course materials and
make them fully accessible and easily reusable free of charge. We are a Web-based authoring, teaching and
learning environment open to anyone interested in education, including students, teachers, professors and
lifelong learners. We connect ideas and facilitate educational communities.

Connexions's modular, interactive courses are in use worldwide by universities, community colleges, K-12
schools, distance learners, and lifelong learners. Connexions materials are in many languages, including
English, Spanish, Chinese, Japanese, Italian, Vietnamese, French, Portuguese, and Thai. Connexions is part
of an exciting new information distribution system that allows for Print on Demand Books. Connexions
has partnered with innovative on-demand publisher QOOP to accelerate the delivery of printed course
materials and textbooks into classrooms worldwide at lower prices than traditional academic publishers.

Vous aimerez peut-être aussi