Académique Documents
Professionnel Documents
Culture Documents
and proteome annotation using automatically recognized concepts and functional networks
Adrian Bivol, Tobias Wittkop, Darcy Davis, and Sean Mooney Mooney laboratory, Buck Institute for Research on Aging, Novato, CA National Center for Biomedical Ontology, Stanford University, Stanford, CA
Typically uses Gene Ontology (GO) or disease annotation (e.g. OMIM) Many tools utilize similar set of features/networks, e.g. PPI networks, co-expression networks, sequence similarity,... Input: Set of genes with known function/disease Output: ranked list of remaining genes (closest at the top)
1. Annotate all (human) genes to terms from ontologies outside GO and OMIM, e.g. Phenotype Ontology, CHEBI, or Pathway Ontology. 2. For each term (gene set) evaluate predictability, i.e. how well can we predict the genes that are annotated to it using existing gene function prediction methods.
NCBO currently includes over 250 ontologies Ontologies are structured controlled vocabularies Gene/protein summary in Entrez Gene and UniProt often more up-to-date than manually curated GO NCBO provides annotator service1 that matches text to terms
1" 2"
Q147X3**human*****
The*status,*quality,*and*expansion*of*the*NIH*fullBlength*cDNAproject:* the*Mammalian*Gene*CollecKon*(MGC).*;*KinaseBselecKve*enrichment* enables*quanKtaKve*phosphoproteomics*oShe*kinome*across*the*cell* cycle.*;*A*quanKtaKve*atlas*of*mitoKc*phosphorylaKon.*;*A*synopsis*of* eukaryoKc*NalphaBterminal*acetyltransferases:nomenclature,*subunits* and* substrates.* ;* Knockdown* of* human* N* alphaBterminal* acetyltransferase*complex*C*leadsto*p53Bdependent* * * apoptosis* * * * * * *apoptosis* and* aberrant* human* Arl8b* localizaKon.* ;* Lysine* acetylaKon* targets* protein* complexes* and* coBregulates* majorcellular* funcKons.* ;B!B* FUNCTION:* CatalyKc* subunit* of* the* NBterminal* acetyltransferase* C(NatC)* complex.* Catalyzes* acetylaKon* of* the* NBterminal* methionineresidues*of*pepKdes*beginning*with*MetBLeuBAla*and*MetB LeuBGly.Necessary* for* the* lysosomal* localizaKon* and* funcKon* of* ARL8B.B!B* CATALYTIC* ACTIVITY:* AcetylBCoA* +* pepKde* =* N(alpha)B acetylpepKde+* CoA.B!B* SUBUNIT:* Component* of* the* NBterminal* acetyltransferase* C* (NatC)complex,* which* is* composed* of* NAA35,* LSMD1* and* NAA30.B!B* SUBCELLULAR* LOCATION:* Cytoplasm.B!B* ALTERNATIVE* PRODUCTS:Event=AlternaKve* splicing;* Named* i s o f o r m s = 2 ; N a m e = 1 ; I s o I d = Q 1 4 7 X 3 B 1 ;* S e q u e n c e = D i s p l a y e d ; N a m e = 2 ; I s o I d = Q 1 4 7 X 3 B 2 ;* Sequence=VSP_031581;Note=No* experimental* conrmaKon* available;B!B*SIMILARITY:*Belongs*to*the* acetyltransferase* * * * * * * * * * * * *acetyltransferase* family.*MAK3subfamily.B!B*SIMILARITY:*Contains*1*NBacetyltransferase* domain.*B.*
*
2"
Gene"Ontology"
2.Collect descriptive text for each gene/ protein from Entrez Gene/UniProt 3.Annotate text to over 200 ontologies via NCBO Annotator
3"
Biological*process* CytokineKc* process* DNA* replicaKon* iniKaKon* *
Biological* process*
Apoptosis"
signaling*
Cell"cycle"ontology"
Biological*conKnuant*
Acetyltransferase"
Q147X3''human'''''
The' status,' quality,' and' expansion' of' the' NIH' full>length' cDNAproject:' the' Mammalian' Gene' CollecIon' (MGC).' ;' Kinase>selecIve' enrichment' enables' quanItaIve'phosphoproteomics'oQhe'kinome'across'the'cell'cycle.';'A'quanItaIve' atlas' of' mitoIc' phosphorylaIon.' ;' A' synopsis' of' eukaryoIc' Nalpha>terminal' acetyltransferases:nomenclature,'subunits'and'substrates.';'Knockdown'of'human'N' alpha>terminal' acetyltransferase' complex' C' leadsto' p53>dependent'
targets' protein' complexes' and' co>regulates' majorcellular' funcIons.' ;>!>' FUNCTION:' CatalyIc' subunit' of' the' N>terminal' acetyltransferase' C(NatC)' complex.' Catalyzes' acetylaIon' of' the' N>terminal' methionineresidues' of' pepIdes' beginning' with' Met> Leu>Ala' and' Met>Leu>Gly.Necessary' for' the' lysosomal' localizaIon' and' funcIon' of' ARL8B.>!>' CATALYTIC' ACTIVITY:' Acetyl>CoA' +' pepIde' =' N(alpha)>acetylpepIde+' CoA.>!>' SUBUNIT:' Component' of' the' N>terminal' acetyltransferase' C' (NatC)complex,' which' is' composed' of' NAA35,' LSMD1' and' NAA30.>!>' SUBCELLULAR' LOCATION:' Cytoplasm.>!>' ALTERNATIVE' PRODUCTS:Event=AlternaIve' splicing;' Named' isoforms=2;Name=1;IsoId=Q147X3>1;'Sequence=Displayed;Name=2;IsoId=Q147X3>2;' Sequence=VSP_031581;Note=No' experimental' conrmaIon' available;>!>' SIMILARITY:' Belongs' to' the' ' family.' MAK3subfamily.>!>'SIMILARITY:'Contains'1'N>acetyltransferase'domain.'>.KEGG;'hsa: 122830;' >.UCSC;' uc001xcx.2;' human.CTD;' 122830;' >.GeneCards;' GC14P038022;' >.H> InvDB;' HIX0011696;' >.HGNC;' HGNC:19844;' NAA30.neXtProt;' NX_Q147X3;' > . P h a r m G K B ;' P A 1 3 4 9 3 1 3 1 5 ;' > . e g g N O G ;' p r N O G 1 5 4 6 3 ;' > . G e n e T r e e ;' ENSGT00390000005665;' >.HOGENOM;' HBG282398;' >.HOVERGEN;' HBG082671;' >.InParanoid;' Q147X3;' >.OMA;' AGVHSGE;' >.OrthoDB;' EOG4KKZ4S;' >.PhylomeDB;' Q147X3;' >.NextBio;' 81013;' >.ArrayExpress;' Q147X3;' >.Bgee;' Q147X3;' >.CleanEx;' HS_NAT12;' >.GenevesIgator;' Q147X3;' >.GO;' GO:0005737;' C:cytoplasm;' IEA:UniProtKB>SubCell.GO;' GO:0004596;' F:pepIde' alpha>N>acetyltransferase' acIvity;' IEA:EC.InterPro;' IPR000182;' AcTrfase_GCN5>related_dom.InterPro;' I P R 0 1 6 1 8 1 ;' A c y l _ C o A _ a c y l t r a n s f e r a s e . G e n e 3 D ;' G 3 D S A : 3 . 4 0 . 6 3 0 . 3 0 ;' Acyl_CoA_acyltransferase;' 1.Pfam;' PF00583;' Acetyltransf_1;' 1.SUPFAM;' SSF55729;' Acyl_CoA_acyltransferase;'1.PROSITE;'PS51186;'GNAT;'1.' '
acetyltransferase
Protein complexes, domains, interactions We lter for author names, db names, numbers
Simple string matching using mgrep Synonyms are annotated Annotations are propagated to the root No NLP Very fast
3"
Biological$process$ Cytokine6c$ process$ DNA$ replica6on$ ini6a6on$ $
Gene$Ontology$
Biological$ process$ Molecular$ func6on$ Cellular$ func6on$
Apoptosis$
signaling$
Cell$cycle$ontology$
Biological$con6nuant$
Acetyltransferase$
Annotation
results
683,753,623 annotations of 426,392 genes and proteins to 529,544 terms from 267 ontologies for 7 organism (human, mouse, rat, y, worm, yeast, E. coli) For human:
94,844,772 annotations of 43,823 genes to 436,576 terms 146,221,448 annotations of 68,079 proteins to 373,222 terms
Availability:
1. Annotate all (human) genes to terms from ontologies outside GO and OMIM, e.g. Phenotype Ontology, CHEBI, or Pathway Ontology. 2. For each term (gene set) evaluate predictability, i.e. how well can we predict the genes that are annotated to it using existing gene function prediction methods.
Find closest genes in genome Fast, accurate and can be executed locally
Examples
Examples
Examples
GO:0072599 (establishment of protein localization in endoplasmic reticulum) Gene Ontology AUC = 0.99
Examples
Examples
Conclusions
Special Thanks to ... Buck Institute for Research on Aging Adrian Bivol, Darcy Davis, Emily TerAvest, Uday Evani, Ari Berman,Tal Oron Ronnen, Mathew Fleisch, Corey Powell
!
!
Funding
NIH R01 LM009722 (PI:Mooney), Stanford University National Center for Biomedical Ontology U54 HG004028, and the Buck Trust.