Vous êtes sur la page 1sur 20

0303

Functional enomics and Microarray Technology

Purushothaman.

UNIT-1 GENOMICS Genomics is the development and application of new mapping, sequencing, and computational procedures for the analysis of the entire genome of organisms. It deals with the systematic molecular characterization of genomes. Some of the methods used are traditional genetic-mapping procedures; in addition, specialized techniques have been developed for manipulating the large amounts of DNA in a genome. Genomic analysis is important for two reasons: (1) It represents a way of obtaining an overview of the genetic architecture of an organism and (2) It forms a set of basic information that can be used to find new genes such as those responsible for disease. Genomic analysis generally proceeds from low-resolution analysis to techniques with higher resolution. Genomics is divided into three basic areas: structural genomics, characterizing the physical nature of whole genomes; functional genomics, characterizing the gene and non-gene sequences in entire genome and Comparative genomics: better understanding of function including evolutionary relationships. Structural Genomics: As its name suggests, the aim of structural genomics is to characterize the structure of the genome. Knowledge of the structure of an individual genome can be useful in manipulating genes and DNA segments in that particular species. For example, genes can be cloned on the basis of knowing where they are in the genome. When a number of genomes have been characterized at the structural level, the hope is that, through comparative genomics, it will become possible to deduce the general rules that govern the

0303

Functional enomics and Microarray Technology

Purushothaman.

overall structural organization of all genomes. Structural genomics proceeds through increasing levels of analytic resolution, starting with the assignment of genes and markers to individual chromosomes, then the mapping of these genes and markers within a chromosome, and finally the preparation of a physical map culminating in sequencing. Functional genomics: Functional genomics uses a variety of approaches such as defining all ORFs, the use of gene knockouts to probe gene function, the yeast two-hybrid system to look for gene interaction, and DNA microarrays to determine which genes are transcribed. It attempts to understand the broad sweep of genome function at different developmental stages and under different environmental conditions. Comparative genomics: The basis of comparative genomics is that the genomes of related organisms are similar. The argument is the same one that we considered when looking at homologous genes. Two organisms with a relatively recent common ancestor will have genomes that display species-specific differences built onto the common plan possessed by the ancestral genome. The closer two organisms are on the evolutionary scale, the more related their genomes will be Studies of comparative genomics also offer a powerful opportunity to identify highly conserved and therefore functionally important sequence motifs in coding and non-coding genomic DNA. This identification helps researchers confirm predictions of protein-coding regions of the genome and identify important regulatory elements within DNA. Computational Genomics It involves the used computational programs to make synthetic genomes and to make it functional in the model organism. It uses extensive programming

0303

Functional enomics and Microarray Technology

Purushothaman.

languages to develop various assemblers and to study structure of the genomes to modify. Genome and Genome size The completed and on-going genome projects are revealing a great deal about how genomes are organized, including a number of unexpected discoveries that have taken molecular biologists by surprise. It is very important to survey the information that has arisen from genome projects and to learn how the genome is organized in a eukaryotic organism. Every organism possesses a genome that contains the biological information needed to construct and maintain a living example of that organism. Most genomes, including the human genome and those of all other cellular life forms, are made of DNA (deoxyribonucleic acid) but a few viruses have RNA (ribonucleic acid) genomes. DNA and RNA are polymeric molecules made up of chains of monomeric subunits called nucleotides. Humans are fairly typical eukaryotes and the human genome is in many respects a good model for eukaryotic genomes in general. All of the eukaryotic nuclear genomes that have been studied are, like the human version, divided into two or more linear DNA molecules, each contained in a different chromosome; all eukaryotes also possess smaller, usually circular, mitochondrial genomes. The only general eukaryotic feature not illustrated by the human genome is the presence in plants and other photosynthetic organisms of a third genome, located in the chloroplasts. Although the basic physical structures of all eukaryotic nuclear genomes are similar, one important feature is very different in different organisms. This is genome size, the smallest eukaryotic genomes being less than 10 Mb in length, and the largest over 100,000Mb as seen in following table. Here genome size is Total amount of DNA contained within one copy of a genome. Genome size can be compared to molecular mass using formula 1 pg= 978 Mb = 978000000 bp.

0303

Functional enomics and Microarray Technology

Purushothaman.

Genome size range coincides to a certain extent with the complexity of the organism, the simplest eukaryotes such as fungi having the smallest genomes, and higher eukaryotes such as vertebrates and flowering plants having the largest ones. This might appear to make sense as one would expect the complexity of an organism to be related to the number of genes in its genome - higher eukaryotes need larger genomes to accommodate the extra genes. However, the correlation is far from precise: if it was, then the nuclear genome of the yeast S.cerevisiae, which at 12 Mb is 0.004 times the size of the human nuclear genome, would be expected to contain 0.004 35 000 genes, which is just 140. In fact the S. Cerevisiae genome contains about 5800 genes. For many years the lack of precise correlation between the complexity of an organism and the size of its genome was looked on as a bit of a puzzle, the so-called C-value paradox.

0303

Functional enomics and Microarray Technology

Purushothaman.

Eukaryotic Genome Organization The organization of Genome in the Eukaryotes can be best studied based on the following two levels, 1. Genome organization at Chromosome level 2. Genome organization at DNA level

1. Genome Organization at Chromosome level (Packaging of DNA into chromosomes) Chromosomes are much shorter than the DNA molecules that they contain. A highly organized packaging system is therefore needed to fit a DNA molecule into its chromosome. DNA in the nucleus exists mainly in combination with histone proteins; the DNAhistone complex is called chromatin. Chromatin can undergo changes in its structure in response to various cellular metabolic demands. Chromatin can be envisioned as a repeat of structural units called nucleosomes. The nucleosome core particle is composed of histone octamer plus the DNA that wraps around it.

The histone octamer contains two molecules each of histones H2A, H2B, H3, and H4. DNA wraps around the octamer in a left-handed supercoil in about 1.65 turns which encloses about 150 bp.
5

0303

Functional enomics and Microarray Technology

Purushothaman.

The histone proteins interacts with the DNA through the following interactions, 1. Helix-dipoles from alpha-helices in H2B, H3, and H4 cause a net positive charge to accumulate at the point of interaction with negatively charged phosphate groups on DNA 2. Hydrogen bonds between the DNA backbone and the amide group on the main chain of histone proteins 3. Nonpolar interactions between the histone and deoxyribose sugars on DNA Salt bridges and hydrogen bonds between side chains of basic amino acids (especially lysine and arginine) and phosphate oxygens on DNA 4. Non-specific minor groove insertions of the H3 and H2B N-terminal tails into two minor grooves each on the DNA molecule

The highly basic nature of histones, aside from facilitating DNA-histone interactions, contributes to their water solubility.

Histone H1 is a linker histone that, along with linker DNA (the DNA in between two nucleosome core particles), physically connects the adjacent nucleosome core particles. The length of linker DNA varies with species and cell types. Usually, nucleosome core particle and linker DNA on both sides of the core encompasses between 180- and 200-bp DNA. Between the nucleosome unit structure and the metaphase chromosome structure containing two

chromatids, there are several levels of organization and compaction of the chromatin. Each nucleosome has a diameter of 10 nm; the nucleosomes are compacted into a solenoid fiber structure of 30 nm called as 30 nm fiber; the 30-nm solenoid fibers are compacted into a 300-nm filament; and finally, the 250-nm wide filaments are further compacted into a 700-nm chromosome. During cell division, when the chromosomes duplicate, a 1,400-nm metaphase chromosome is produced containing two chromatids, each chromatid being 700 nm.
6

0303

Functional enomics and Microarray Technology

Purushothaman.

Figure: From DNA to chromosome.

The 30 nm fiber is probably the major type of chromatin in the nucleus during interphase, the period between nuclear divisions. When the nucleus divides, the DNA adopts a more compact form of packaging, resulting in the highly condensed metaphase chromosomes that can be seen with the light microscope and which have the appearance generally associated with the word 'chromosome'. The metaphase chromosomes form at a stage in the cell cycle after DNA replication has taken place and so each one contains two copies of its chromosomal DNA molecule. The two copies are held together at the centromere, which has a specific position within each chromosome.
7

0303

Functional enomics and Microarray Technology

Purushothaman.

Individual chromosomes can therefore be recognized because of their size and the location of the centromere relative to the two ends.

An important part of the chromosome is the terminal region or telomere. Telomeres are important because they mark the ends of chromosomes and therefore enable the cell to distinguish a real end from an unnatural end caused by chromosome breakage an essential requirement because the cell must repair the latter but not the former. Telomeric DNA is made up of hundreds of copies of a repeated motif, 5 -TTAGGG-3 in humans, with a short extension of the 3 terminus of the double-stranded DNA molecule. 2. Genome organization at DNA sequence level

Functional DNA content of genome: This includes coding and non-coding gene content and contributes 25% of nuclear genome. As we have seen earlier in our comparison of genome fragment from different organisms one thing becomes clear that genes are not arranged indefinite pattern but rather arranged unevenly throughout the entire genome.

0303

Functional enomics and Microarray Technology

Purushothaman.

Figure: Functional Gene content

The space is saved in the genomes of less complex organisms because the genes are more closely packed together. We will try to understand this by comparison of the 50 kb fragment of genomes of humans, yeast, fruit flies, maize and Escherichia coli. .

0303

Functional enomics and Microarray Technology

Purushothaman.

The yeast genome segment, which comes from chromosome III (the first eukaryotic chromosome to be sequenced), has the following distinctive features: o It contains more genes than the human segment o Relatively few of the yeast genes are discontinuous o There are fewer genome-wide repeats

The picture that emerges is that the genetic organization of the yeast genome is much more economical than that of the human version. The genes themselves are more compact, having fewer introns, and the spaces between the genes are relatively short, with much less space taken up by genome-wide
10

0303

Functional enomics and Microarray Technology

Purushothaman.

repeats and other non-coding sequences. The hypothesis that more complex organisms have less compact genomes holds when other species are examined. Lets examine fruit fly fragment. If we agree that a fruit fly is morecomplex than a yeast cell but less complex than a human then we would expect the organization of the fruit-fly genome to be intermediate between that of yeast and humans. The gene density in the fruit-fly genome is intermediate between that of yeast and humans, and the average fruit-fly gene has many more introns than the average yeast gene but still three times fewer than the average human gene. It is beginning to become clear that the genome-wide repeats play an intriguing role indicating the compactness or otherwise of a genome. This is strikingly illustrated by the maize genome, which at 5000 Mb is larger than the human genome but still relatively small for a flowering plant. Only a few limited regions of the maize genome have been sequenced,but some remarkable results have been obtained, revealing a genome dominated by repetitive elements. The only gene in 50-kb region is one member of a family of genes coding for the alcohol dehydrogenase enzymes. Instead of genes, the dominant feature of this genome segment is the genome-wide repeats. The majority of these are of the LTR element type, which comprise virtually all of the non-coding part of the segment, and on their own are estimated to make up approximately 50% of the maize genome. It is becoming clear that oneor more families of genome-wide repeats have undergone a massive proliferation in the genomes of certain species. This may provide an explanation for the most puzzling aspect of the C-value paradox, which is not the general increase in genome size that is seen in increasingly complex organisms, but the fact that similar organisms can differ greatly in genome size. A good example is provided by Amoeba dubia which, being a protozoan, might be expected to have a genome of 100-500 kb, similar to other protozoa such as Tetrahymena pyriformis . In fact the Amoeba genome is over 200,000 Mb. Similarly, we might guess that the genomes of crickets are similar in size to those of other insects, but these bugs have genomes of approximately 2000 Mb, 11 times that of the fruit fly.
11

0303

Functional enomics and Microarray Technology

Purushothaman.

Nuclear genome:

Figure: Classification of nuclear genome into various categories

The nuclear genome is split into a set of linear DNA molecules, each contained in a chromosome. No exceptions to this pattern are known: all eukaryotes that have been studied have at least two chromosomes and the DNA molecules are always linear. The only variability at this level of eukaryotic genome structure lies with chromosome number, which appears to be unrelated to the biological features of the organism. For example, yeast has 16 chromosomes, four times as many as the fruit fly.

The Repetitive DNA Content of Genomes Repetitive DNA is found in all organisms and that in some, including humans, it makes up a substantial fraction of the entire genome. There are various types of repetitive DNA, and several classification systems
12

0303

Functional enomics and Microarray Technology

Purushothaman.

have been devised. The scheme that we will use begins by dividing the repeats into those that are clustered into tandem arrays and those that are dispersed around the genome. a) Tandemly repeated DNA: Tandemly repeated DNA is a common feature of eukaryotic genomes but is found much less frequently in prokaryotes. Eukaryotic DNA are made up of fragments composed of long series of tandem repeats, possibly hundreds of kb in length. Two types of tandemly repeated DNA are also classed as 'satellite' DNA. These are 1. Minisatellites (Variable number tandem repeats, VNTRs) 2. Microsatellites (Small Tandem repeats, STRs) . Minisatellites form clusters up to 20 kb in length, with repeat units up

to 25bp; microsatellite clusters are shorter, usually < 150 bp, and the repeat unit is usually 13 bp or less. The functions of these other minisatellite sequences have not been identified. The function of microsatellites is equally mysterious. The typical microsatellite consists of a 1-, 2-, 3- or 4-bp unit repeated 1020 times, as illustrated by the microsatellites in the human T-cell receptor locus. Although each microsatellite is relatively short, there are many of them in the genome. In humans, for example, microsatellites with a CA repeat, that make up 0.25% of the genome, 8 Mb in all. Single base-pair repeats such as: (A) 15 make up another 0.15%. Although their function, if any, is unknown, microsatellites have proved very useful to geneticists. Many microsatellites are variable, meaning that the number of repeat units in the array is different in different members of a species. This is because 'slippage' sometimes occurs when a microsatellite is copied during DNA replication, leading to insertion or, less frequently, deletion of one or more of the repeat units. No two individuals have exactly the
13

0303

Functional enomics and Microarray Technology

Purushothaman.

same combination of microsatellite length variants: if enough microsatellites are examined then a unique genetic profile can be established for every individual. The only exceptions are genetically identical twins. Genetic profiling is well known as a tool in forensic science, but identification of criminals is a fairly trivial application of microsatellite variability. More sophisticated methodology makes use of the fact that a person's genetic profile is inherited partly from the mother and partly from the father. This means that microsatellites can be used to establish kinship relationships and population affinities, not only for humans but also for other animals, and for plants. b) Interspersed genome-wide repeats: Tandemly repeated DNA sequences are thought to have arisen either by replication slippage, as described for microsatellites, or by DNA recombination processes. Interspersed repeats must therefore have arisen by a different mechanism, one that can result in a copy of a repeat unit appearing in the genome at a position distant from the location of the original sequence. The most frequent way in which this occurs is by transposition, and most interspersed repeats have inherent transpositional activity. There are two alternative modes of transposition, one that involves RNA intermediate and one that does not. The version that involves an RNA intermediate is called retrotransposition. The basic mechanism involves three steps:

14

0303

Functional enomics and Microarray Technology

Purushothaman.

1. An RNA copy of the transposon is synthesized by the normal process of transcription. 2. The RNA transcript is copied into DNA. This conversion of RNA to DNA, the reverse of the normal transcription process, requires a special enzyme called reverse transcriptase. Often the reverse transcriptase is coded by a gene within the transposon and is translated from the RNA copy synthesized in step 1. 3. The DNA copy of the transposon integrates into the genome; possibly back into the same chromosome occupied by the original unit, or possibly into a different chromosome. The end result is that there are now two copies of the transposon, at different points in the genome.

15

0303

Functional enomics and Microarray Technology

Purushothaman.

RNA transposons or retroelements are features of eukaryotic genomes but have not so far been discovered in prokaryotes.The three types of retroelement described so far are LTR elements, as they have long terminal repeats at either end which play a role in the transposition process. Other retro elements do not have LTRs. These are called retroposons and in mammals include the following: 1. LINEs (long interspersed nuclear elements) 2. SINEs (short interspersed nuclear elements)

LINEs (long interspersed nuclear elements) It contains a reverse-transcriptase-like gene probably involved in the retrotransposition process. An example is the human element LINE-1, which is 6.1 kb and has a copy number of 516,000 in the human genome. A LINE contains a pol II promoter and two open reading frames (ORFs), one

encoding the endonuclease and the other encoding the reverse transcriptase. LINE activity proceeds as follows: RNA pol II transcribes the LINE DNA into LINE RNA; the LINE RNA is translated into proteins; the proteins and RNA join together and re enter the nucleus; the endonuclease cuts a strand of the target genomic DNA, often in the intron of a gene; the reverse transcriptase copies the LINERNA into LINE DNA which is inserted into the target DNA forming a new LINE element there. Three distant related LINE families are found in the human genome: LINE1, LINE2, and LINE3. Only LINE1 (L1) is still active. SINEs (short interspersed nuclear elements) SINE does not have a reverse transcriptase gene but can still transpose, probably by 'borrowing' reverse transcriptase enzymes that have been synthesized by other retroelements. SINEs are short sequences (about
16

0303

Functional enomics and Microarray Technology

Purushothaman.

100400 bp) and they contain an internal pol III promoter but do not encode any proteins. All currently known SINEs are derived from tRNA and 7SL RNA genes. Most non autonomous SINEs share the 3 end with a resident LINE. The only active SINE in the human genome is the Alu element, which is the major SINE constituting about 11% of the genome (~1 million Alu elements). Not all transposons require an RNA intermediate. Many are able to transpose in a more direct DNA to DNA manner. In eukaryotes, DNA transposons are less common than retrotransposons, but they have a special place in genetics because a family of plant DNA transposons the Ac / Ds elements of maize - were the first transposable elements to be discovered, by Barbara McClintock in the 1950s. DNA transposons are a much more important component of prokaryotic genome anatomies than the RNA transposons.

Genome Complexity- C0t value- Calculation of Repetitive content of the Genome Genome complexity can be analyzed using the re-association kinetics called C0t value. When double-stranded DNA in solution is heated, it denatures (melts) releasing the complementary single strands. If the solution is cooled quickly the DNA remains in a single stranded state. However, if the solution is cooled slowly re-association will occur. The incubation time and the DNA concentration must be sufficient to permit an adequate number of collisions so that the DNA can re associate. The size of the DNA fragments affects the rate of re-association and is conveniently controlled if the DNA is sheared to small fragments. The re-association of a pair of complementary sequences results from their collision and therefore the rate depends on their concentration. As two strands are involved the process follows second-order kinetics.

17

0303

Functional enomics and Microarray Technology

Purushothaman.

The C0t value can be used to calculate the percentage of repetitive content of the Genome, which can be described as follows,

Where, K is the rate constant C0 is the initial DNA concentration and t is time

The greater the C0t1/2 value, the slower the reaction time at a given DNA concentration.

Procedure The procedure involves heating a sample of genomic DNA until it denatures into the single stranded-form, and then slowly cooling it, so the strands can pair back together. While the sample is cooling, measurements are taken of how much of the DNA is base paired at each temperature. The amount of single and double-stranded DNA is measured by rapidly diluting the sample, which slows reassociation, and then binding the DNA to a hydroxylapatite column. The column is first washed with a low concentration of sodium phosphate buffer, which elutes the single-stranded DNA, and then with high concentrations of phosphate, which elutes the double stranded DNA. The amount of DNA in these two solutions is then measured using a spectrophotometer.

Analysis Since a sequence of single-stranded DNA needs to find its complementary strand to reform a double helix, common sequences renature more rapidly than rare sequences. Indeed, the rate at which a sequence will reassociate is proportional to the number of copies of that sequence in the DNA sample. A sample with a highly-repetitive sequence will renature rapidly, while complex sequences will renature slowly. However, instead of simply measuring the percentage of double-stranded DNA versus time, the amount of
18

0303

Functional enomics and Microarray Technology

Purushothaman.

renaturation is measured relative to a C0t value. The C0t value is the product of C0 (the initial concentration of DNA), t (time in seconds), and a constant that depends on the concentration of cations in the buffer. Repetitive DNA will renature at low C0t values, while complex and unique DNA sequences will renature at high C0t values.

Applications

1. It can be used to analyze the genome complexity of any organism 2. It can be used to assess the genome nature before spending large amount of money for Genome sequencing 3. It can be used to find the repetitive DNA content of the genome.

Table: Percentage of Repeat DNA in the Genome


19

0303

Functional enomics and Microarray Technology

Purushothaman.

20

Vous aimerez peut-être aussi