Vous êtes sur la page 1sur 70

Genome Wide Detection of Protein-DNA Interactions

A. M. Patil

Contents
Introduction Detection techniques DNA-Protein ChIP ChIP-on-chip ChIP-SAGE ChIP-PET Application Conclusion

Genome
Predominantly composed of non-coding sequences (>98%)

Significant portion of non-CDS serve as transcriptional regulatory element

Previous investigations: molecular, genetical and biochemical approach identified several DNA binding proteins. Identification and characterization of gene regulatory sequences, reveal gene regulatory networks in cells. Which serves as a fundamental in understanding developmental biology and in determining the molecular basis of many biological processes.

Continued

The interaction b/n DNA & protein has an important role in Repairing damaged DNA. Recombination & replication. Genomic integrity. Controls of epigenetic changes. Differentiation & Development.

These proteins interact with DNA by means of various structural Motifs

Detecting: Protein-DNA interaction


Gel retardation assays DNase foot printing DNA modification assay Nitrocellulose filter binding assay Crystallography

Gel retardation assay: Principle: DNA Protein complexes move slowly through an electrophoretic gel than DNA.

DNA

Protein

Retarded band

DNase foot printing


Principle: DNA with bound protein is protected from nuclease digestion

10

15

20

10 P

15

20

20 15 10 5 Footprint

Modification Protection assay: Protein bound to DNA can protect specific bases from chemical modification G DNA G

Modification

G Residue protected G P

m G

mG

mG

Missing cleavage product because protected residue not modified

Nitrocellulose filter binding assay

Incubate labeled DNA with protein Filter the mixture through a filter disk made of nitrocellulose Proteins bind to nitrocellulose, but DNA does not. Any DNA that is retained on the filter is there because it is interacting with the protein.

DNA sequences interacting with proteins (TF) ..contd. Conventionally, studied in test tubes using purified proteins and naked DNA fragments. Demerit: In vitro methods do not replicate the in vivo conditions faithfully physiological condition. DNA inside cell exist in a compact chromatin state with distinct properties from that of naked DNA Besides chromatin organization and distribution of histone variants histone modification: acetylation, methylation.

ChIP
Allows study of protein-DNA interactions directly inside cell under in-vivo physiological conditions. Protocol: Cross linking of proteins and DNA (Formaldehyde) Extraction and fragmentation of chrom DNA (mechanical shearing / enzymatic digestion) Immuno-affinity purification (using specific antibody against a protein) Assay purified DNA (southern blotting/ PCRs)

Continued
Preparation of cross linked chromatin-cell line, tissue or whole organism. Formaldehyde-common ,Dimethyapitimidate (DMA) & disuccinimedyl glutrate (DSG) O II H-C-H Readily permeable & generates zero length cross link. Formaldehyde reactive dipolar in which protein or DNA carbon acts as nucleophilic centre. Amino & Imino gp of A.A & of DNA readily react with formaldehyde give Schiff base. Which further react with 2nd amino group & condense to give the DNA-Protein complex.

Continued ..
Cells or tissues incubated with 1%-10-15min RT or 4.c > time. Increase cross link time make fragmentation of chromatin difficult. Cell lysed isolate nuclei. Chromatin released from nuclei by ultrasonic sonication , solubilizes & fragments chromatin 5002000bp. Sonication important step to optimize-Long fragment not immunoprecipitated efficiently /detected on microarray against short fragment Empirical test :time of cross linking no. of sonication session

Chromatin immuno precipitation contd.


Immunoprecipitation : affinity & specificity crucial.

Not all available antibody efficiently immunoprecipites the protein DNA complexes .Masking of epitome by formaldehyde cross link.

Incubation with chromatin (2ng) with magnetic / agarose beads coupled with antibodies, followed by many wash to remove unbound fraction. Assaying purified DNA Southern blotting PCR

Chromatin immuno-precipitation contd.

Chromatin immunoprecipitation contd. Demerits


Allows to study interaction between proteins & limited no. of suspected DNA sequences. Genome wide interaction cannot be studied.

ChIP-on-chip

Genomic binding location of regulatory factors can be determined using Chromatin immunoprecipitation (ChIP) followed by determination of enriched fragments by DNA microarray (chip) hybridisation.

General features of DNA microarray

Microarray features
Density of the array Number of features in one array. Resolution
Distance between each feature on the contiguous genomic region.

Level of genome representation


Total amount of genomic sequence represented by the array.

(ChIP)-enriched targets. Probes Cy5/Cy3 ratios significantly high confidence (P < 0.001) level.

Contd
Approach to detect ChiP enriched DNA 2 color competitive hybridisation .ChiP DNA one color Control DNA other fluorescent color hybridized to array. Ratio b/w ChiP & control DNA correlates with enrichment of specific DNA sequences.

Alternately, to detect ChiP enriched DNA is detected through 1 color hybridization. Test & control to different arrays.

Microarray data analysis


Single array error model Rank based Chip & control DNA signal Ratio. To test specific enrichment is significant Single error model assumes of normal distribution & significance is than calculated by one sided probability. Rank based model no assumption but require the multiple replicates. ChIPOTle (Chromatin Immuno- Precipitation On Tiled arrays), Implemented as a Microsoft Excel macro written in Visual Basic, ChIPOTle uses a sliding window approach that yields improvements in the identification of bona fide sites of protein-DNA interaction. Michael et.al. 2005.

Data interpretation
Microarray & ChIP-on-chip Expression experiments each element on microarray measures the abundance of RNA molecules of fixed length. ChIP-on-chip each element measures the abundance of population of fragments of various lengths due to effect of chromatin shearing. Expression experiment the data are two tailed & roughly symetric ,there is biological significance associated with both low & high ratio measurement. ChIP-on-chip measurement derived from experiments arise as a mixture of two distribution .first corresponds to population of genomic fragments specifically enriched by Chip, & second corresponds to remaining population of genomic DNA that is not Chip enriched representing background or noise.

Validating the Identified Protein-Binding Sites

Microarray- and SAGE-coupled ChIP assays Multiple steps to identify the protein-binding sites in the genome, errors introduced at various steps Validation :Testing the relative enrichment of the identified sequences in the ChIP DNA in comparison to the total genomic DNA. Quantitative PCR

Merits of CHIP on chip Availability of high density oligonucleotide arrays representing entire genomes allows comprehensive mapping of protein-DNA interactions Technique doesn t rely upon previous knowledge of transcriptional regulatory elements Is a unbiased and comprehensive approach

Contd..
Major drawback Requirement of highly specific antibodies for the proteins Specific antibodies for all proteins are unavailable There is immunoprecipitation of large no. of nonspecific of DNA sequences

Alternate ways to purify DNA binding proteins no specific antibodies


Using a Engineered cell systems By introducing epitope tag to endogenous locus (TF)

DNA adenine methyl transferase (DamID) identification Method: expression of protein of interest fused with E.coli DamID, binding of fusion protein cause methylation of local DNA sequences, extract and digest DNA with methylation sensitive RE Demerit: Dam fusion proteins may have distinct properties from that of endogenous proteins

Chip-SAGE

Principles of SAGE
A short sequence tag (10-14bp) contains scientific information to uniquely identify a transcript provided that the tag is obtained from a unique position within each transcript. Sequence tags can be linked together to form long serial molecules that can be cloned and sequenced . Quantification of the number of times a particular tag is observed , provides the expression level of the corresponding transcript.

Advantages over ChiP-chip. It does not depend on preselected sequences & it can scan the whole genome Disadvantages Majority of ChIP DNA actually corresponds to non-specifically immunoprecipitated sequences and requires sequencing a large number of clones to distinguish the truly enriched sequences from background. Cost of sequencing is high.

Chip-SAGE

ChiP-Chip

Does not require prior

Require prior knowledge of


sequence to be analyzed

knowledge of sequence to be analyzed

Discovery of new sequences


is possible

Comparison is only possible


with existing sequences.

Quantification of sequences is
possible.

Quantification is not easy.

Applications

Decoding the Transcriptional Regulatory Function of Genome Sequences. Dissecting the Transcriptional Regulatory Circuits. Unravelling Epigenetic Mechanisms.

Tae-young Roh, 2004

ChiP-SAGE High resolution genome wide mapping of histone modification in Yeast Distribution of hyperacetylated histones H3 & H4 AcH3, AcH4 that recognize hyperacetylated histones in yeast. Compared directly the acetylation levels b/w promoter & coding region among 6040 genes.

NlaIII site appears once every 316 bp

class II enzyme, MmeI

Tae-young Roh,2004
Generally believed that promoter region are highly acetylated. The analysis unexpectedly revealed that Highest histone H3 acetylation observed after ATG start codon & within the first 500 bp of ORFs. The acetylation of H4 followed similar pattern. Deletion of gene encoding GCN5 ,GCN5 is critical catalytic subunit of ADA & SAGA complexes & plays a global role in regulating histone H3 acetylation. Deletion of GCN 5 specifically eliminated the peak acetylation in promoter , increased acetylation in 3 ORF.

These suggest that GCN 5 functions to maintain higher levels of H3 acetylation in promoter & 5 end of coding region of transcription unit

Transcription regulator networks in yeast Tonglee et al., 2002


Epitome Tagging

Continued .
Genome wide analysis for yeast transcription regulatory bind to promoter sequences across genome. Epitome tagging. All 141 transcription factors Yeast Proteome Database for study. Yeast strains were constructed so that each of transcription factor contained cMyc epitome tag. Epitope tag might be expected to affect the function of some transcriptional regulators; for 17 of the 141 factors, were not able to obtain viable tagged cells, despite three attempts to tag each regulator. 106 TF only showed immunoblot.Each strain one TF .

Continued .
Calculated a confidence value (P value) for each spot from each array by using an error model. Generally describe results obtained at a P value threshold of 0.001 because this threshold maximizes inclusion of legitimate regulator-DNA interactions and minimizes false positives. Various experimental and analytical methods indicate that the frequency of false positives in the genome-wide location data at the 0.001 threshold is 6% to 10%

Continued .
Nearly 4000 interactions between regulators and promoter regions at a P value threshold of 0.001. The promoter regions of 2343 of 6270 yeast genes (37%) were bound by one or more of the 106 transcriptional regulators. Many yeast promoters were bound by multiple transcriptional

regulators a feature previously associated with gene regulation in higher eukaryotes. Suggesting that yeast genes are also frequently through combinations of regulators. regulated

Continued

Effect of P value threshold. The sum of al regulator-promoter region interactions is displayed as a function of varying P value thresholds applied to the entire location data set for the 106 regulators. More stringent P values reduce the number of interactions reported but decrease the likelihood of false-positive results .

Chromosome wide analysis for differential methylation in normal and transformed human cells.
Michael et al 2005

Methylation of cytosine at CpG dinucleotide is a major epigenitic modification observed in mammalian cells.

CpG islands are exceptions and less methylated (are associated with promoter regions of several house keeping genes).

Abberant promoter hypermethylation is believed to contribute to sporadic cancer.

Relevance of DNA methylation in normal development and disease

Immunocapturing technique combined with DNA microarray for enrichment of methylated DNA and detection Use of methylation sensitive RE (HpaII) 3.9% of all CpGs Antibodies specific to -5 mcytosine MeDIP (Methylated DNA immunoprecipitation)

Relevance of MeDIP technique

Assessing the abberant CpG island methylation in transformed cell

CpG island methylation profile for both SW48 colon cancer and normal colon mucosa was generated Microarray representing ~ 12,000 CpG island probes (derived from CpG island library, 75% unique sequences, remaining repeatative elements (SINES,LINES,satellites) as well as ribosomal and mitochondrial genes.

Observation
Normal cells

Most of CpG islands including ribos and mito. DNA showed basal/low level methylation Sequences with highest methylation included repetitive DNA, promoters of imprinted genes or genes residing on X-chromosomes

Transformed cells

Methylation profile was almost same. Hypermethylation was observed in 108 clones of which 82 clones represented ribos DNA (reported in cells at the time of aging/neoplasia but physiologic al role unknown), remaining 26 clones were unique

Cell cycle progression

apoptosis

Cell matrix interaction

New targets for hypermethylation in colon cancer cell

Contd

Combining MedIP with hybridization on CpG island microarray could be used to detect epigenetically silenced genes in cancer cells Pattern of CpG island methylation is conserved and no. of genes that are hypermethylated in transformed cell are unexpectedly low.

Transcriptional regulatory circuitry in human ESC Laurie et al 2005


Mammalian development requires over 200 specialized cell types All derived from single totipotent zygotic cell

Embryonic Stem Cells


Derived from inner cell mass of blastocyst. Are capable of generating any cell type of body (Pluriotent) Realizing their therapeutic potential, underastanding the

transcriptional regulatory circuitry is fundamental in knowing molecular basis of pluripotency and self renewal of ESC

Contd ..
Three major TF OCT4, SOX2 and Nanog Key stem cell regulators in self renewal ability of stem cells They regulate genes encoding other transcriptional regulators involved in determining the developmental potential of cells Known to interact with one another in a complex synergistic way

Genome wide location analysis (CHIP on chip) Objective: To identify the target sites for three key stem cell regulators DNA microarray design: Oligo-nucleotide covering (-8 to +2 Kb relative to trans. start site) for 17,917 annoatated genes

Examples of target sites for OCT4

RESULTS
OCT4 showed binding to 623 (3%) of protein coding genes SOX2 to 1271 (7%) Nanog to 1681 (9%) OCT4 and SOX2 shared half of bound proteins, while > 90% of Nanog bound genes were of both OCT4 and SOX2 Regulators bind to both active and inactive/ repressed genes (that encode for TF involved in regulation of gene required for meso, endo and ectodermal diffrn)

Examples of protein coding genes co-occupied by OCT4, SOX2 and Nanog in close proximity

Myogenic differentiation Alexandre et.al 2005


Genome wide TF binding & expression profiling to assemble regulatory network controlling Myogenic differentiation in mammalian cells. Myoblast Myotubes (multinucleate)

Governed by TF ,MRFs(MyoD & Myogenin) MEF2 Myod bHLH bind to E boxes of genes.

First step of regulatory cascade involves expression of MyoD which subsequently leads to myogenin & MEF2 promoting myogenesis.

Contd..

Beyond these first steps knowledge is somewhat fragmentary: Relatively few physiological targets of MRFs and MEF2 have been identified. The number of genes known to be regulated by these factors is considerably smaller than the number of genes induced upon myogenic differentiation

Contd..
Recent attempts to identify MyoD-binding sites exploited gene expression profiling of cells that ectopically express MyoD. complications

Ectopic expression of a bHLH transcription factor can lead to promiscuous binding to E boxes throughout the genome. The fact that another E-box-binding factor, Myc, binds to an extensive portion of the genome

To know contribution of MRF family members on gene expression patterns. ChiP chip using promoter Chip from C2ci2 myoblast .

Confirmation for binding by MyoD, myogenin, and MEF2 to a subset of targets

Growing myoblasts, MyoD bound a Set of genes involved in synapse specification and utilization (NMj) and neuromuscular function Bound by MyoD in MT play a role in muscle development & contraction.

MRFs ,MEF2-Triangle Square TF Circle Non TF

ChiP-PET
Ruan and colleagues :Pair of signature tags, paired-end tags (PET), both ends of a DNA concatenate. Like SAGE, the PET method can be used to obtain large numbers of sequence tags with moderate cost. PET additional information of start and end of each ChIP DNA fragment, used precisely locate the binding sites. To achieve this, one identifies the overlapping PET, and the overlap between different PET fragments would correspond to the binding sites.

Vous aimerez peut-être aussi