Bioinfomatics

CHAPTER 2
Computational Molecular Biology
Bioinformatics
Bioinformatics is the mixture of biology, computer science and information technology.
Biological Database
Biological database is large and organized body of persistent data. It associates with
computerized software and it is used for update, query and retrieve the stored data.
Computational Biology
Computational biology is a tool enable efficient access to various types of data. Algorithms
and statistics to assess relationships among members of data sets.
Primary and secondary data

Primary: DNA/protein sequence (letters), with relevant information.
Secondary: the result of analysis of primary source databases.
There are 2 methods of varieties:
Pairwise sequence comparison & alignments.
Motif (family): based sequence comparison & alignments.
Genetic & Physical Mapping
Genetic Maps
Genetic Maps is a scaffold for orienting sequence information. There are many high quality
genome wide. The first use of a genetic map is to assign a gene to a small area of
chromosome, followed by using a physical map to examine the region of interest close up
and determine the precise location of a gene. Computerized maps make gene hunting faster.
Genetic markers
Genetic markers are observable variation resulting from an alteration, mutation at a single
genetic locus. It can be within genes coding for a noticeable physical characteristic (example:
eye colour) or a not so noticeable trait. (example: disease) It can also be DNA-based reagents
within the non-coding regions of genes.
*Linkage analysis- the probability that a recombination occurs between each pair of
markers.
Gene hunting
E.g. an inherited disease can be located on the map by following the inheritance of a DNA
marker (in affected but absent in unaffected persons) though the gene (s) responsible may
not be identified.
Types of physical Maps

1. Chromosomal/Cytogenetic maps
2. Radiation Hybrid(RH) maps
3. Sequence maps
CHAPTER 2
1. Chromosomal/Cytogentic maps
A cytogenetic map is the visual appearance of a chromosome when stained and
examined under a microscope. Particularly important are visually distinct regions, called
light and dark bands, which give each of the chromosomes a unique appearance. This
feature allows a person's chromosomes to be studied in a clinical test known as a
karyotype, which allows scientists to look for chromosomal alterations. The field of
cytogenetic emerged in the early twentieth century, when scientists realized that
chromosomes are the physical carriers of genes. As is always the case in science,
researchers built on the observations of their fellow investigators to synthesize
the chromosome theory of heredity. This ground-breaking theory had its foundations in
the detailed observations that cytologists had made about chromosome movements
during mitosis and meiosis, which suggested that chromosome behaviour could explain
Mendel's principles of inheritance. In the early years of cytogenetic, scientists had a
difficult time distinguishing individual chromosomes, but over the years, they continued
to refine the conditions for preserving and staining chromosomes to the reproducible
Reference:
- http://www.nature.com/scitable/topic/chromosomes-and-cytogenetics-7
- http://www.ncbi.nlm.nih.gov/Class/MLACourse/Original8Hour/Genetics/cytogenetic.ht
ml
2. Radiation Hybrid (RH) maps
RH maps are able to provide precise information regarding the distance between
markers. Rather than relying on natural recombination to separate two markers,
scientist use breaks induced by radiation to determine the distance between two
markers. In RH mapping, a scientist exposes DNA to measure doses of radiation and by
doing so, it controls the average distance between breaks in a chromosome. By varying
the degree of radiation exposure to DNA, scientist can induced breaks between two
markers that are way too close to each other. The ability to separate closely linked
markers allows scientist to produce more detailed maps. RH mapping provides a way to
localize almost any genetic marker, as well as other genetic fragments, to a defined map
position and are extremely useful for ordering markers in regions where highly
polymorphic genetic markers are hard to come by.
3. Sequence maps
Sequence Tagged Site (STS) mapping is a physical mapping technique. An STS is a short
DNA sequence that has been shown to be unique. To qualify as an STS, the exact
location and order of the bases of the sequence must be known and this sequence may
occur only once in the chromosome being studied, or in the genome as a whole if the
DNA fragment set covers the entire genome. To map a set of STSs, a collection of
overlapping DNA fragments from a chromosome is digested into smaller fragments using
restriction enzymes (agents that cut up DNA molecules at defined target points). The
data from which the map will be derived are then obtained by noting which fragments
contain which STSs. To accomplish this, scientists copy the DNA fragments using a
process known as molecular cloning. Cloning involves the use of a special technology,
CHAPTER 2
called recombinant DNA technology, to copy DNA fragments inside a foreign host. First,
the fragments are united with a carrier, or a vector. Following introduction into a
suitable host, the DNA fragments can then be reproduced along with the host cell DNA,
providing unlimited material for experimental study. An unordered set of cloned DNA
fragments is called a library. Next, the clones, or copies, are assembled in the order they
would be found in the original chromosome by determining which clones contain
overlapping DNA fragments. This assembly of overlapping clones is called a clone coting.
Once the order of the clones in a chromosome is known, the clones are placed in frozen
storage and the information about the order of the clones is stored in a computer,
providing a valuable resource that may be used for further studies. This data is then
used as the base material for generating a lengthy, continuous DNA sequence, and the
STSs serve to anchor the sequence onto a physical map.
Limitations of STS mapping
DNA fragments may become lost or mistakenly mapped to a wrong position. To
overcome these problems and improve overall mapping accuracy- comparing and
integrating STS-based physical maps with genetic, radiation hybrid and cytogenetic
maps. Cross referencing different genomic maps, it enhances the utility of a given map,
confirms STS order, and helps order and orient evolving cotings.
Protein Structure modelling
Proteins are fundamental components of all living cells. They exhibit an enormous amount of
chemical and structural diversity, enabling them to carry out an extraordinarily diverse range of
biological functions. Proteins help us digest our food, fight infections, control body chemistry, and in
general, keep our bodies functioning smoothly. Scientists know that the critical feature of a protein
is its ability to adopt the right shape for carrying out a particular function. But sometimes a protein
twists into the wrong shape or has a missing part, preventing it from doing its job. Many diseases,
such as Alzheimer's and "mad cow" are now known to result from proteins that have adopted an
incorrect structure. Identifying the shape, or structure, of a protein is key to understanding its
biological function and its role in health and disease. Illuminating the structure of a protein structure
also paves the way for the development of new agents and devices to treat a disease. Yet solving the
structure of a protein is no easy feat .It often takes scientists working in the laboratory months,
sometimes years, to experimentally determine a single structure. Therefore, scientists have begun to
turn towards computers to help predict the structure of a protein based on its sequence. The
challenge lies in developing methods for accurately and reliably understanding this intricate
relationship.
CHAPTER 2
Protein Structure hierarchy
Cellular structures ( ribosomes) join together long chains of amino acids
20 different amino acids arranged in any order a polypeptide
Chains loop about each other, or fold, in a variety of ways, but only one of these ways allows
a protein to function properly
Critical feature creates structural features fulfill its role
Ex. Surface grooves, ridges & pockets.
Confirmation of a protein- 4 levels of structure
Primary structure strining amino acid chain to form a polypeptide
Secondary structure folding of the primary sequence ( the path that the
polypeptide backbone of the protein follows in space, Ex. Alpha Helix & Beta Sheets)
Tertiary structure organization in 3D of all of the atoms in the polypeptide
Quaternary structure conformation assumed by a multimeric protein
Correct Conformation of a protein
Primary amino acid sequence of a protein is crucial in determining its final structure
In some cases, the sole determinant
In other cases, additional interaction ( e.eg. Presence of a cofactor ) may be required

before a protein can attain its final conformation
Allosteric protein having a stable alternate conformation that enables it to carry out a
different biological function
Change in conformation brought at one site an alternation in the structure & thus function
( but not the primary amino acid sequence ) at another site
Determination of protein structure
1. X-ray Crystallography
Crystal a solid form of a substance in which the component molecules are present
in an ordered array ( lattice)
CHAPTER 2
One unique set of the crystals components- the smallest possible set fully
representative of the crystal
Crystals of a complex molecule ( protein ) produce a complex pattern of x-ray

diffraction
2. Nuclear Magnetic Resonance Spectroscopy
Experiments are performed in solution as opposed to a crystal lattice
Computational Modeling
Homology modeling predicting protein structure, not so time-consuming & hindered by

size and solubility constraints
Closely related organisms similar sequences
Uses experimentally determined protein structures ( templates) to predict the structure of

another protein that has a similar amino acid sequence ( target)
PDB database a collection of all publicly available 3 D structures of proteins, nucleic acids,
carbohydrates & variety of other complexes experimentally determined by X-ray
crystallography & NMR
Single nucleotide polymorphism

A Single Nucleotide Polymorphism (SNP) is a small genetic change, or variation, that can occur within
a person's DNA sequence. The genetic code is specified by the fournucleotide letters A (adenine), C
(cytosine), T (thymine), and G (guanine). SNP variation occurs when a single nucleotide, such as an A,
replaces one of the other three nucleotide letters C, G, or T.
On average, SNPs occur in the human population greater than one percent of the time. Because only
about three to five percent of a person's DNA sequence codes for the production of proteins, most
SNPs are found outside of "coding sequences". SNPs found within a coding sequence are of
particular interest to researchers as they are more likely to alter the biological function of a protein.
Due to recent advances in technology, coupled with the unique ability of these genetic variations to
facilitate gene identification, there has been a recent flurry of SNP discovery and detection.
SNP in the human genome

Finding single nucleotide changes in the human genome seems like a daunting prospect, but, over
the last 20 years, biomedical researchers have developed a number of techniques that make it
possible to do just that. Each technique uses a different method to compare selected regions of a
DNA sequence obtained from multiple individuals who share a common trait. In each test, the result
shows a physical difference in the DNA samples only when a SNP is detected in one individual and
not in the other.
Many common diseases in humans are not caused by a genetic variation within a single gene, but
are influenced by complex interactions among multiple genes as well as environmental and lifestyle
factors. Although both environmental and lifestyle factors add tremendously to the uncertainty of
developing a disease, it is currently difficult to measure and evaluate their overall effect on a disease
CHAPTER 2
process. Therefore, we refer here mainly to a person's genetic predisposition, or the potential of an
individual to develop.
cDNA
Gene identification is very difficult in humans, as most of our genome is comprised of introns
interspersed with a relative few DNA coding sequences, or genes. These genes are expressed as
proteins, a complex process comprised of two main steps. First, each gene (DNA) must be converted,
or transcribed, into mRNA, RNA that serves as a template for protein synthesis. The resulting mRNA
then guides the synthesis of a protein through a process called translation.
Interestingly, mRNAs in a cell do not contain sequences from the regions between genes, nor from
the non-coding introns that are present within many genes. Therefore, isolating mRNA is key to
finding expressed genes in the vast expanse of the human genome.
The problem however, is that mRNA is very unstable outside of a cell, so scientists use special
enzymes to convert it to complementary DNA, or cDNA. cDNA is a much more stable compound and,
importantly, because it was generated from a mRNA in which the introns had been removed, cDNA
represents only expressed DNA sequence.
Once cDNA representing an expressed gene has been isolated, scientists can then sequence a few
hundred nucleotides from either end of the molecule to create two different kinds of ESTs.
Sequencing only the beginning portion of the cDNA produces what is called a 5' EST. A 5' EST, which
is obtained from the portion of a transcript that usually codes for a protein. These regions tend to be
conserved across species and do not change much within a gene family.
Microarray
Microarrays are a significant advance both because they may contain a very large number of genes
and because of their small size. Microarrays are therefore useful when one wants to survey a large
number of genes quickly or when the sample to be studied is small. Microarrays may be used to
assay gene expression within a single sample or to compare gene expression in two different cell
types or tissue samples, such as in healthy and diseased tissue. Because a microarray can be used to
examine the expression of hundreds or thousands of genes at once, it promises to revolutionize the
way scientists examine gene expression. This technology is still considered to be in its infancy,
therefore, many initial studies using microarrays have represented simple surveys of gene
expression profiles in a variety of cell types. Nevertheless, these studies represent an important and
necessary first step in our understanding and cataloging of the human genome.
As more information accumulates, scientists will be able to use microarrays to ask increasingly
complex questions and perform more intricate experiments. With new advances, researchers will be
able to infer probable functions of new genes based on similarities in expression patterns with those
CHAPTER 2
of known genes. Ultimately, these studies promise to expand the size of existing gene families,
reveal new patterns of coordinated gene expression across gene families, and uncover entirely new
categories of genes. Furthermore, because the product of any one gene usually interacts with those
of many others, our understanding of how these genes coordinate will become clearer through such
analyses, and precise knowledge of these inter-relationships will emerge. The use of microarrays
may also speed the identification of genes involved in the development of various diseases by
enabling scientists to examine a much larger number of genes. This technology will also aid the
examination of the integration of gene expression and function at the cellular level, revealing how
multiple gene products work together to produce physical and chemical responses to both static and
changing cellular needs.
Now that you understand the concept behind microarray technology, picture this: a hand-held
instrument that a physician could use to quickly diagnose cancer or other diseases during a routine
office visit. What if that same instrument could also facilitate a personalized treatment regimen,
exactly right for you? Personalized drugs, molecular diagnostics, integration of diagnosis and
therapeutics. All these are the long-term promises of microarray technology. For the first time,
arrays offer hope for obtaining global views of biological processes (simultaneous readouts of all the
body's components) by providing a systematic way to survey DNA and RNA variation.
Goals of the Human Genome Project.

Create genetic map
Create physical map
Sequence all base pairs
Create information, tools, training, etc.
A brief description of each of the above should be given.

Bioinfomatics

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Bioinfomatics

Transféré par

Droits d'auteur :

Formats disponibles

CHAPTER 2

Computational Molecular Biology

Primary and secondary data

Genetic & Physical Mapping

Types of physical Maps

Cellular structures ( ribosomes) join together long chains of amino acids

20 different amino acids arranged in any order a polypeptide

Critical feature creates structural features fulfill its role

Ex. Surface grooves, ridges & pockets.

Confirmation of a protein- 4 levels of structure

Primary structure strining amino acid chain to form a polypeptide

Tertiary structure organization in 3D of all of the atoms in the polypeptide

Quaternary structure conformation assumed by a multimeric protein

Correct Conformation of a protein

In some cases, the sole determinant

In other cases, additional interaction ( e.eg. Presence of a cofactor ) may be required

Determination of protein structure

Crystals of a complex molecule ( protein ) produce a complex pattern of x-ray

2. Nuclear Magnetic Resonance Spectroscopy

Experiments are performed in solution as opposed to a crystal lattice

Homology modeling predicting protein structure, not so time-consuming & hindered by

Closely related organisms similar sequences

Uses experimentally determined protein structures ( templates) to predict the structure of

Single nucleotide polymorphism

SNP in the human genome

Goals of the Human Genome Project.

A brief description of each of the above should be given.

Vous aimerez peut-être aussi