Académique Documents
Professionnel Documents
Culture Documents
CTTCCAGTTCAACCGGCCGGTCGTCGCGGACGACGCGGCCGCCG
GCGCCGCGATGCTGGCGGACGTACCGCACACCCGCCCCATCTCC
ATCTTCGCTTC
transcription translation
Monteiro-
2004
Vitorello et al
December 7, 2021 JC Setubal 13
Molecular Plant-Microbe Interactions
Comparative genomics
• There are currently more than 300 completed
sequenced microbial genomes publicly
available
• Many are of closely related species
• In a few years there will be thousands
• Why compare?
• How to do it?
Boussau, Bastien et al. (2004) Proc. Natl. Acad. Sci. USA 101, 9722-9727
B C D
A
A D
C B
A B C D
www.somethinkodd.com/.../2006/01/suffixtree.png
December 7, 2021 JC Setubal 33
Proteome alignment done with LCS (top: Xcc; bottom: Xac )
Blue: BBHs that are in the LCS; dark blue: BBHs not in the LCS; red: Xac specifics;
yellow: Xcc specifics
RSA 493
RSA 331
10 genomes
Orthology
+
Phylogeny
44
AG: ancestral (belli [2], canadensis) TG: typhus (prowasekii, typhi)
TRG: transitional (akari, felis) SFG: spotted fever (rickettsii, conorii, sibirica)
45
46
How to find orthologs
• Desired features of ortholog clustering
– Ability to distinguish between in- and out-paralogs
• In-paralogs should be clustered with their orthologs
– Ability to cluster genes that have the same domain
architecture, rather than simply sharing just one domain
• Methods
– Phylogenetic trees
– BLAST
– MCL
– orthoMCL
52
Gene Set Computations
• Given a set of genomes, represented by their
‘proteomes’ or sets of protein sequences
• Given homlogous relationships (as given for
example by orthoMCL)
– Which genes are shared by genomes X and Y?
– Which genes are unique to genome Z?
– Venn or extended Venn diagrams
A
B
Genome 1
Genome 2
Ortholog set Script 1
Builder
(orthoMCL)
Genome n
Script 2
report annotators
December 7, 2021 JC Setubal 57
Replicon color key for HTML tables
R. M. loti M. loti
R/G S4 C58 K84 R. etli S. meliloti
leguminosarum MAFF BNC
II
Pseudogenes
• Nonfunctional protein coding genes
• Mutations introduce “sequence problems”
(frameshifts, stop in frame, absence of stop)
• Natural mutation or sequencing error?
65
Pseudogene cases
66
Why study pseudogenes?
• “Normal” bacterial genomes have 1-5% of
pseudogenes [Liu et al]
• Pseudogenes can give interesting clues to
evolutionary pathways
67
Why study pseudogenes? Cont’d
• High fractions of pseudogenes suggest a “genome
degradation” process
• May be cause or effect of niche restriction
• Examples
– Mycobacterium leprae: 36% (~1,100 genes)
– Leifsonia xyli subsp. xyli: 13% (~300 genes)
• Pseudogenes do not show up in BLAST searches
– Ortholog computations will in general not include them!
68
Pseudogene Identification by Sequence Similarity
Study of 8 Brucella Genomes
BLASTN
Annotated Pseudogenes
vs. Genome Sequences
Total
Alignments 4120 0.98
Previously Known Pseudogene
Gene hits 2627 0.62
Known Gene (Homologous to Pseudogene)
pseudogenes 1493 0.35 Newly Identified Pseudogene
500
Tot. A lignments
300
Know n Genes
200 PG Count: Final
100
0
Bab9941 BabS19 Bc an23365 Bmel16M Bab2308 Bov i25840 Bsui1330 Bs ui23445
69
Genomics is just the beginning
populations
Whole organisms
Cell processes
complexity
Interactions between molecules
Genomics/proteomics
December 7, 2021 JC Setubal 70
21 century Biology: integration
st
• Nalvo Almeida
• Chris Lasher
• Brett Tyler
• Rebecca Wattam