Académique Documents
Professionnel Documents
Culture Documents
Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, West Bengal, India
Department of Physical Sciences, Indian Institute of Science Education and Research-Kolkata, Mohanpur 741246, Nadia, West Bengal, India
a r t i c l e
i n f o
Article history:
Received 13 January 2015
Received in revised form 25 March 2015
Accepted 26 March 2015
Available online 2 April 2015
Keywords:
Orphan genes
Evolutionary rate
Protein disorder
Interaction and trafcking motifs
Hostparasite interaction
Lineage-specic adaptation
a b s t r a c t
Orphan genes are protein coding genes that lack recognizable homologs in other organisms. These genes
were reported to comprise a considerable fraction of coding regions in all sequenced genomes and
thought to be allied with organisms lineage-specic traits. However, their evolutionary persistence
and functional signicance still remain elusive. Due to lack of homologs with the host genome and for
their probable lineage-specic functional roles, orphan gene product of pathogenic protozoan might be
considered as the possible therapeutic targets. Leishmania major is an important parasitic protozoan of
the genus Leishmania that is associated with the disease cutaneous leishmaniasis. Therefore, evolutionary
and functional characterization of orphan genes in this organism may help in understanding the factors
prevailing pathogen evolution and parasitic adaptation. In this study, we systematically identied orphan
genes of L. major and employed several in silico analyses for understanding their evolutionary and functional attributes. To trace the signatures of molecular evolution, we compared their evolutionary rate
with non-orphan genes. In agreement with prior observations, here we noticed that orphan genes evolve
at a higher rate as compared to non-orphan genes. Lower sequence conservation of orphan genes was
previously attributed solely due to their younger gene age. However, here we observed that together with
gene age, a number of genomic (like expression level, GC content, variation in codon usage) and proteomic factors (like protein length, intrinsic disorder content, hydropathicity) could independently
modulate their evolutionary rate. We considered the interplay of all these factors and analyzed their relative contribution on protein evolutionary rate by regression analysis. On the functional level, we observed
that orphan genes are associated with regulatory, growth factor and transport related processes.
Moreover, these genes were found to be enriched with various types of interaction and trafcking motifs,
implying their possible involvement in hostparasite interactions. Thus, our comprehensive analysis of L.
major orphan genes provided evidence for their extensive roles in hostpathogen interactions and
virulence.
2015 Elsevier B.V. All rights reserved.
1. Introduction
Orphan genes are protein coding genes that do not share detectable sequence similarity with the genomes of other organisms
(Tautz and Domazet-Loso, 2011). Due to their phylogenetic restriction these genes are also called as lineage-specic or taxonomically
restricted genes (Wilson et al., 2005). Orphan genes comprise a
considerable fraction of genes in all domains of life including
Abbreviations: L. major, Leishmania major; BLAST, Basic Local Alignment Search
Tool; GRAVY, grand average of hydropathy index; Nc, effective number of codon;
CAI, Codon Adaptation Index; FPKM, Fragments Per Kilobase of exon per Million
fragments mapped.
Corresponding author. Tel.: +91 33 2355 6626; fax: +91 33 2355 3886.
E-mail address: tapash@jcbose.ac.in (T.C. Ghosh).
http://dx.doi.org/10.1016/j.meegid.2015.03.031
1567-1348/ 2015 Elsevier B.V. All rights reserved.
novo from non-coding regions (Cai et al., 2008; Heinen et al., 2009;
Knowles and McLysaght, 2009; Neme and Tautz, 2013; Wu et al.,
2011; Xie et al., 2012; Yang and Huang, 2011). These genes were
also found to emerge from overlapping of anti-sense reading
frames and frameshift mutations in protein coding sequences
(Wissler et al., 2013).
Orphan genes are emerging to play critical roles in lineagespecic adaptation of different species to a broad range of ecological conditions (Khalturin et al., 2009). These genes were reported
to play substantial roles in response to a variety of abiotic stresses
in plant genomes (Donoghue et al., 2011). Imperative roles of
orphan genes were also evidenced in several development processes. For instance, orphan gene products were found to be crucial
for human early brain development (Zhang et al., 2011) and also
for regulation of tentacle formation in Hydra species (Khalturin
et al., 2008). Lineage-specic putative surface antigen of plasmodium were shown to be involved in hostparasite interactions
(Kuo and Kissinger, 2008). In 2010, Zhang et al. ectopically
expressed 14 Leishmania donovani-specic genes in Leishmania
major and observed that two of these genes could increase L. major
survival in visceral organs (Zhang and Matlashewski, 2010).
Studies conducted on different eukaryotes demonstrated that
orphan genes evolve faster than non-orphan genes (Cai et al.,
2006; Domazet-Loso and Tautz, 2003; Donoghue et al., 2011;
Kuo and Kissinger, 2008; Toll-Riera et al., 2009). An inverse
relationship between gene age and protein evolutionary rate has
been widely observed in a broad range of organisms including primates (Toll-Riera et al., 2009), mammals (Alba and Castresana,
2005), drosophila (Domazet-Loso and Tautz, 2003), Plasmodium
(Kuo and Kissinger, 2008), fungi (Cai et al., 2006) and bacteria
(Daubin and Ochman, 2004). Since, orphan genes are younger
genes in a particular lineage it was hypothesized that these genes
evolve faster mainly due to their recent evolutionary origin (Cai
et al., 2006; Domazet-Loso and Tautz, 2003; Toll-Riera et al.,
2009). Later it was found that protein evolutionary rate could not
be determined by a single factor, rather proteins intrinsic properties as well as their evolutionary age independently modulate the
rates of protein evolution (Toll-Riera et al., 2012). Protein
evolutionary rate was shown to correlate with a number of gene
level and protein level attributes, such as expression level
(Drummond et al., 2005; Drummond et al., 2006; Pal et al.,
2001), number of proteinprotein interactions (Fraser et al.,
2002), protein complex number (Chakraborty and Ghosh, 2013),
its centrality in the protein interaction network (Hahn and Kern,
2005), protein dispensability (Hirsh and Fraser, 2001), sequence
length (Marais and Duret, 2001), Codon Adaptation Index (CAI),
effective number of codons (Nc) (Pal et al., 2001; Wall et al.,
2005), protein disorder content (Chen et al., 2011; Podder and
Ghosh, 2010), etc. In spite of all these ndings, factors determining
the evolutionary rate of orphan genes are still under debate and
the relative contribution of different genomic and proteomic attributes on the evolutionary rates of orphan genes remains elusive.
With the availability of high-throughput genomic sequences
together with expression data and bioinformatics prediction tools,
it has now become easier to identify and characterize orphan genes
in different species. L. major is one of the most important protozoan parasites of the genus Leishmania. It is associated with the
disease cutaneous leishmaniasis, affecting more than 2 million
people throughout the world every year (Ivens et al., 2005). In spite
of multiple research endeavors, till date, there is no available vaccine for this disease. Because of their absence in the host genomes
orphan gene products in pathogenic protozoan were considered to
be possible therapeutic targets (Kuo and Kissinger, 2008).
Therefore, proling orphan genes of L. major from the perception
of protein evolutionary rates and comparing them with non-orphan genes along with understanding their functional roles will
331
be helpful to recognize the molecular signature of parasitic adaptation. With this aim we carried out rigorous analysis to understand
the functionality of orphan genes and investigated the evolutionary forces affecting orphan gene evolution. To evaluate the attributes of orphan genes in the evolutionary framework we
performed a comprehensive analysis comparing orphan genes with
the non-orphan genes. In this study our primary objective is to
characterize all the possible determinants that may have shaped
the evolutionary rate of orphan genes in L. major. One of the main
obstacles to such a study is the limitation of required data on
orphan genes. Therefore, in this study we consider several genomic
and proteomic attributes that could be easily identied from coding sequences and analyzed their relative inuence on the
evolutionary rate heterogeneity between orphan and non-orphan
genes.
Conrming earlier observations our study revealed that orphan
genes evolve faster than non-orphan genes (Domazet-Loso and
Tautz, 2003; Toll-Riera et al., 2009). However, in contrary to the
suggestions of those studies, here, we found that gene age could
account for a fraction of variation of their evolutionary rate.
Instead, together with gene age, a number of factors like gene
expression, codon bias, genic GC content, protein hydropathicity,
protein disorder content and protein length were found to have
substantial contribution on the evolutionary rate difference
between orphan and non-orphan genes. On functional level, we
found that sequences of orphan genes are endowed with host targeting motifs, prenylation motifs, heparin-binding consensus
sequences, signal peptides and transmembrane domains, implying
their possible roles in hostparasite interactions. Thus, our study
on orphan genes of L. major shed light on the factors governing
pathogen evolution and reveals their contribution in parasitic
adaptations.
2. Materials and methods
2.1. Collection of dataset and gene expression data
We retrieved the protein coding sequences of L. major (strain
Friedlin) from TriTrypDB version 7.0 (http://tritrypdb.org/tritrypdb/) (Aslett et al., 2010). CDS sequences containing internal stop
codons and partial codons were removed using CodonW
(http://codonw.sourceforge.net). Signal peptide, transmembrane
domain, epitope, paralogs and pathway informations of all L. major
genes were downloaded from TriTrypDB version 7.0. To compute
gene expression level, we retrieved high-throughput RNA-seq
expression prole data of L. major promastigote stage from the
dataset of Rastrojo et al. (2013). We searched for protein domains
via InterProScan (Zdobnov and Apweiler, 2001).
2.2. Identication of orphan genes
To identify orphan gene models which are restricted to the
Leishmania genus, we used a systematic way based on homology
search. First, BLASTP followed by TBLASTN ltering approach
(E < 10 5 and use of low-complexity lters) was used against
NCBI nr databases. Additionally, to further screen for similarity
between sequences we employed Position-Specic Iterated BLAST
(PSI-BLAST) (Altschul et al., 1997) that can detect weaker homologous relationships that would otherwise be missed by the standard
BLAST algorithms.
2.3. Calculation of nucleotide substitution rate
The ratio of the rate of non-synonymous substitutions (dN) to
the rate of synonymous substitutions (dS) was widely used as an
332
For Gene Ontology (GO) annotations of orphan genes we primarily focused on TriTrypDB v 7.0. However, we found only 43
orphan genes have annotated GO terms. Therefore, for the rest of
orphan genes in our dataset we predicted GO categories using
ProtFun 2.2 webserver (http://www.cbs.dtu.dk/services/ProtFun/)
(Jensen et al., 2003). Protfun 2.2 is a homology independent
method and predicts protein function based on their physicochemical properties. Therefore, this algorithm was considered to
be useful for prediction of protein function even of orphan genes
(Yang et al., 2013).
Subcellular localization of orphan genes was predicted using
two independent web servers: CELLO v.2.5 (http://cello.life.nctu.
edu.tw/) (Yu et al., 2004) and SubCellProt (http://www.databases.
niper.ac.in/SubCellProt) (Garg et al., 2009). CELLO predicts protein
subcellular localization using two-level support vector machine
(SVM). While, SubCellProt is based on two machine learning
approaches, k Nearest Neighbor (k-NN) and Probabilistic Neural
Network (PNN). When two of these three approaches (k-NN, PNN
Table 1
Evolutionary rate of L. major genes according to their phylogenetic distribution.
Phylogenetic class of L. major genes
dN/dS
(Mean SE)
P-values
0.55 0.0068
<1 10
0.27 0.0022
0.17 0.0040
0.11 0.0044
333
334
Table 2
Categorical regression to illustrate independent inuence of different variables on
protein evolutionary rate.
Parameter
Protein level properties
Intrinsic disorder content
Protein length
Protein hydrophilicity
Gene age
Gene level properties
Expression level (FPKM)
CAI
GC content
Nc
b score
P-values
0.371
0.035
0.237
0.072
<1 10
0.108
0.270
0.256
0.330
et al., 2003). Using ProtFun we were able to predict the GO annotations for 674 orphan genes. Similar to the study Yang et al., in zebrash (Yang et al., 2013) our analysis revealed a non-random
distribution of orphan genes across different functional categories.
Here, we observed that growth factors are the most abundant
functional categories for orphan genes (29.2%), followed by transcription regulation (24.18%), and cellular transportation (20.02%)
(Table 4). Therefore, predicted functional annotations indicate that
most of the orphan genes lie in the growth factor categories which
could stimulate cell growth and proliferation and are important for
regulating a variety of cellular processes, suggesting that these
genes could involve in various biochemical pathways leading to
parasitic lineage-specic adaptations.
Prediction of protein subcellular localization is an important
component of in silico prediction of protein function (Yu et al.,
2006). Computational prediction of subcellular localization of proteins may be error prone (Nair and Rost, 2003). Therefore, for the
prediction of subcellular localization of orphan genes here we
employed two prediction servers: CELLO (Yu et al., 2004) and
SubCellProt (Garg et al., 2009) which are based on three different
methods (k-NN, P-NN and SVM). We assigned subcellular localization for a protein if at least two of those three methods predict the
same. Predictions from these two web servers unanimously suggest that orphan genes are mainly located within nucleus and
plasma membrane (Supplementary_dataset). Further gene
Table 3
Functional categorization of orphan genes as per annotated GO term in TriTrypDB.
Annotated GO function
Number of orphan
genes
1
5
1
2
1
1
1
3
1
13
2
5
1
1
1
1
1
1
12
Note: Total 43 orphan genes were assigned to various GO terms in TriTrypDB. Some
orphan genes were assigned into multiple GO functional terms in TriTrypDB.
Number of
orphan genes
Percentage of
orphan genes
Growth_factor
Transcription_regulation
Transporter
Structural_protein
Transcription
Central_intermediary_metabolism
Receptor
Signal_transducer
Cation_channel
Ion_channel
Stress_response
Voltage-gated_ion_channel
197
163
135
64
51
15
11
9
8
8
7
6
29.22
24.18
20.02
9.49
7.56
2.22
1.63
1.33
1.18
1.18
1.03
0.89
ontology (GO) analysis with the orphan genes of plasma membrane revealed that most of these genes are involved in processes
like transport, ion channel and voltage-gated ion channel, etc.
Involvement of genes in metabolic pathways indicates their
important functional consequences in several biosynthetic processes. Here, we investigated whether orphan genes have any role
in metabolic pathways of L. major. Therefore, from TriTrypDB we
retrieved the list of L. major genes which are involved in various
pathways. By this way we found evidence for the involvement of
three orphan genes in different biosynthetic pathways of L. major.
For instance, one orphan gene (Gene ID: LmjF.36.4180) was found
to be associated with N-Glycan biosynthesis pathways. Another
two orphan genes (Gene ID: LmjF.06.0780 and LmjF.35.0550) were
found to be associated with ascorbate and aldarate metabolism
pathways, ubiquinone and other terpenoid-quinone biosynthesis
pathways as well as in glycosaminoglycan degradation pathways.
To search for their functional roles in those pathways, we considered their GO annotations and Enzyme Commission (EC) numbers.
TriTrypDB annotated GO process indicates that gene LmjF.36.4180
is involved in dolichol-linked oligosaccharide biosynthetic process
in N-Glycan biosynthesis pathways (annotated GO function:
UDP-N-acetylglucosamine-dolichyl-phosphate
N-acetylglucosaminephosphotransferase activity and EC number is 2.7.8.15
(UDP-N-acetylglucosamine-dolichyl-phosphate
N-acetylglucosaminephosphotransferase)). Annotated GO terms and EC numbers are unavailable for the genes LmjF.06.0780 and LmjF.
35.0550 in TriTrypDB. Therefore, we considered EC number
inferred from OrthoMCL (Li et al., 2003) for their functional assignments. This observation indicates that these two genes are possibly
involved in glycosidase activity in those pathways (EC Numbers
inferred from OrthoMCL: 3.2.1. (glycosidases, i.e. enzymes
hydrolyzing O- and S-glycosyl compounds)). Thus, our ndings
suggest that orphan genes of L. major could integrate into its
metabolic pathway to play important functional roles.
3.7. Orphan genes of L. major putatively involved in secretory
pathways, hostparasite interactions and virulence
Hostpathogen interactions are types of environmental interaction where intracellular pathogens exploit host cells to ensure their
survival and replication within the host genome (Tautz and
Domazet-Loso, 2011). Virulence is one of the potential results of
hostpathogen interaction (Casadevall and Pirofski, 2001).
Therefore, involvement of orphan genes in hostpathogen interactions may suggest their crucial role in parasitic adaptation to the
host systems. In our endeavor to understand the role of orphan
genes in hostparasite interactions, we investigated presence of
335
336
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J.H., Zhang, Z., Miller, W., Lipman,
D.J., 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database
search programs. Nucleic Acids Res. 25, 33893402.
Amaya, M., Baranova, A., van Hoek, M.L., 2011. Protein prenylation: a new mode of
hostpathogen interaction. Biochem. Biophys. Res. Commun. 416, 16.
Arinaminpathy, Y., Khurana, E., Engelman, D.M., Gerstein, M.B., 2009.
Computational analysis of membrane proteins: the largest class of drug
targets. Drug Discovery Today 14, 11301135.
Aslett, M., Aurrecoechea, C., Berriman, M., Brestelli, J., Brunk, B.P., Carrington, M.,
Depledge, D.P., Fischer, S., Gajria, B., Gao, X., Gardner, M.J., Gingle, A., Grant, G.,
Harb, O.S., Heiges, M., Hertz-Fowler, C., Houston, R., Innamorato, F., Iodice, J.,
Kissinger, J.C., Kraemer, E., Li, W., Logan, F.J., Miller, J.A., Mitra, S., Myler, P.J.,
Nayak, V., Pennington, C., Phan, I., Pinney, D.F., Ramasamy, G., Rogers, M.B.,
Roos, D.S., Ross, C., Sivam, D., Smith, D.F., Srinivasamoorthy, G., Stoeckert Jr., C.J.,
Subramanian, S., Thibodeau, R., Tivey, A., Treatman, C., Velarde, G., Wang, H.,
2010. TriTrypDB: a functional genomic resource for the Trypanosomatidae.
Nucleic Acids Res. 38, D457D462.
Bhattacharjee, S., Stahelin, R.V., Speicher, K.D., Speicher, D.W., Haldar, K., 2012.
Endoplasmic reticulum PI(3)P lipid binding targets malaria proteins to the host
cell. Cell 148, 201212.
Botzman, M., Margalit, H., 2011. Variation in global codon usage bias among
prokaryotic organisms is associated with their lifestyles. Genome Biol. 12, 109.
Cai, J., Zhao, R., Jiang, H., Wang, W., 2008. De novo origination of a new proteincoding gene in Saccharomyces cerevisiae. Genetics 179, 487496.
Cai, J.J., Woo, P.C.Y., Lau, S.K.P., Smith, D.K., Yuen, K.-Y., 2006. Accelerated
evolutionary rate may be responsible for the emergence of lineage-specic
genes in Ascomycota. J. Mol. Evol. 63, 111.
Casadevall, A., Pirofski, L.A., 2001. Hostpathogen interactions: the attributes of
virulence. J. Infect. Dis. 184, 337344.
Chakraborty, S., Ghosh, T.C., 2013. Evolutionary rate heterogeneity of core and
attachment proteins in yeast protein complexes. Genome Biol. Evol. 5, 1366
1375.
Chen, S.C.-C., Chuang, T.-J., Li, W.-H., 2011. The relationships among microRNA
regulation, intrinsically disordered regions, and other indicators of protein
evolutionary rate. Mol. Biol. Evol. 28, 25132520.
Daubin, V., Ochman, H., 2004. Bacterial genomes as new gene homes: the genealogy
of ORFans in E-coli. Genome Res. 14, 10361042.
de Castro Cortes, L.M., de Souza Pereira, M.C., da Silva, F.S., Santini Pereira, B.A., de
Oliveira Junior, F.O., de Araujo Soares, R.O., Brazil, R.P., Toma, L., Vicente, C.M.,
Nader, H.B., Madeira, M.d.F., Bello, F.J., Alves, C.R., 2012. Participation of heparin
binding proteins from the surface of Leishmania (Viannia) braziliensis
promastigotes in the adhesion of parasites to Lutzomyia longipalpis cells (Lulo)
in vitro. Parasit. Vectors 5, 142.
Domazet-Loso, T., Tautz, D., 2003. An evolutionary analysis of orphan genes in
Drosophila. Genome Res. 13, 22132219.
Domazet-Loso, T., Tautz, D., 2010. Phylostratigraphic tracking of cancer genes
suggests a link to the emergence of multicellularity in metazoa. BMC Biol. 8, 66.
Donoghue, M.T.A., Keshavaiah, C., Swamidatta, S.H., Spillane, C., 2011. Evolutionary
origins of Brassicaceae specic genes in Arabidopsis thaliana. BMC Evol. Biol. 11,
47.
Dosztanyi, Z., Csizmok, V., Tompa, P., Simon, I., 2005. IUPred: web server for the
prediction of intrinsically unstructured regions of proteins based on estimated
energy content. Bioinformatics 21, 34333434.
Drummond, D.A., Bloom, J.D., Adami, C., Wilke, C.O., Arnold, F.H., 2005. Why highly
expressed proteins evolve slowly. Proc. Natl. Acad. Sci. U.S.A. 102, 14338
14343.
Drummond, D.A., Raval, A., Wilke, C.O., 2006. A single determinant dominates the
rate of yeast protein evolution. Mol. Biol. Evol. 23, 327337.
Dumonteil, E., 2009. Vaccine development against Trypanosoma cruzi and
Leishmania species in the post-genomic era. Infect. Genet. Evol. 9, 10751082.
Dunker, A.K., Brown, C.J., Lawson, J.D., Iakoucheva, L.M., Obradovic, Z., 2002.
Intrinsic disorder and protein function. Biochemistry 41, 65736582.
Dyson, H.J., Wright, P.E., 2005. Intrinsically unstructured proteins and their
functions. Nat. Rev. Mol. Cell Biol. 6, 197208.
Fraser, H.B., Hirsh, A.E., Steinmetz, L.M., Scharfe, C., Feldman, M.W., 2002.
Evolutionary rate in the protein interaction network. Science 296, 750752.
Garg, P., Sharma, V., Chaudhari, P., Roy, N., 2009. SubCellProt: predicting protein
subcellular localization using machine learning approaches. In Silico Biol. 9, 35
44.
Gouy, M., Gautier, C., 1982. Codon usage in bacteria: correlation with gene
expressivity. Nucleic Acids Res. 10, 70557074.
Gupta, A., Kapil, R., Dhakan, D.B., Sharma, V.K., 2014. MP3: a software tool for the
prediction of pathogenic proteins in genomic and metagenomic data. PLoS ONE
9, e93907.
Hahn, M.W., Kern, A.D., 2005. Comparative genomics of centrality and essentiality
in three eukaryotic protein-interaction networks. Mol. Biol. Evol. 22, 803806.
Heinen, T.J.A.J., Staubach, F., Haeming, D., Tautz, D., 2009. Emergence of a new gene
from an intergenic region. Curr. Biol. 19, 15271531.
Hirsh, A.E., Fraser, H.B., 2001. Protein dispensability and rate of evolution. Nature
411, 10461049.
Ikemura, T., 1985. Codon usage and tRNA content in unicellular and multicellular
organisms. Mol. Biol. Evol. 2, 1334.
Ivens, A.C., Peacock, C.S., Worthey, E.A., Murphy, L., Aggarwal, G., Berriman, M., Sisk,
E., Rajandream, M.A., Adlem, E., Aert, R., Anupama, A., Apostolou, Z., Attipoe, P.,
Bason, N., Bauser, C., Beck, A., Beverley, S.M., Bianchettin, G., Borzym, K., Bothe,
G., Bruschi, C.V., Collins, M., Cadag, E., Ciarloni, L., Clayton, C., Coulson, R.M.R.,
337
Piani, A., Ilg, T., Elefanty, A.G., Curtis, J., Handman, E., 1999. Leishmania major
proteophosphoglycan is expressed by amastigotes and has an
immunomodulatory effect on macrophage function. Microbes Infect. 1, 589
599.
Podder, S., Ghosh, T.C., 2010. Exploring the differences in evolutionary rates
between monogenic and polygenic disease genes in human. Mol. Biol. Evol. 27,
934941.
Rastrojo, A., Carrasco-Ramiro, F., Martin, D., Crespillo, A., Reguera, R.M., Aguado, B.,
Requena, J.M., 2013. The transcriptome of Leishmania major in the axenic
promastigote stage: transcript annotation and relative expression levels by
RNA-seq. BMC Genomics 14, 223.
Sharp, P.M., Tuohy, T.M.F., Mosurski, K.R., 1986. Codon usage in yeast: cluster
analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids
Res. 14, 51255143.
Silverman, J.M., Clos, J., deOliveira, C.C., Shirvani, O., Fang, Y., Wang, C., Foster, L.J.,
Reiner, N.E., 2010. An exosome-based secretion pathway is responsible for
protein export from Leishmania and communication with macrophages. J. Cell
Sci. 123, 842852.
Tautz, D., Domazet-Loso, T., 2011. The evolutionary origin of orphan genes. Nat. Rev.
Genet. 12, 692702.
Toll-Riera, M., Bosch, N., Bellora, N., Castelo, R., Armengol, L., Estivill, X., Mar Alba,
M., 2009. Origin of primate orphan genes: a comparative genomics approach.
Mol. Biol. Evol. 26, 603612.
Toll-Riera, M., Bostick, D., Mar Alba, M., Plotkin, J.B., 2012. Structure and age jointly
inuence rates of protein evolution. PLoS Comput. Biol. 8, e1002542.
Uversky, V.N., Gillespie, J.R., Fink, A.L., 2000. Why are natively unfolded proteins
unstructured under physiologic conditions? Proteins 41, 415427.
Vishnoi, A., Kryazhimskiy, S., Bazykin, G.A., Hannenhalli, S., Plotkin, J.B., 2010. Young
proteins experience more variable selection pressures than old proteins.
Genome Res. 20, 15741581.
Wall, D.P., Hirsh, A.E., Fraser, H.B., Kumm, J., Giaever, G., Eisen, M.B., Feldman, M.W.,
2005. Functional genomic analysis of the rates of protein evolution. Proc. Natl.
Acad. Sci. U.S.A. 102, 54835488.
Wilson, G.A., Bertrand, N., Patel, Y., Hughes, J.B., Feil, E.J., Field, D., 2005. Orphans as
taxonomically restricted and ecologically important genes. Microbiology-Sgm
151, 24992501.
Wissler, L., Gadau, J., Simola, D.F., Helmkampf, M., Bornberg-Bauer, E., 2013.
Mechanisms and dynamics of orphan gene emergence in insect genomes.
Genome Biol. Evol. 5, 439455.
Wright, F., 1990. The effective number of codons used in a gene. Gene 87, 2329.
Wu, D.-D., Irwin, D.M., Zhang, Y.-P., 2011. De novo origin of human protein-coding
genes. PLoS Genet. 7, e1002379.
Xia, Y., Franzosa, E.A., Gerstein, M.B., 2009. Integrated assessment of genomic
correlates of protein evolutionary rate. PLoS Comput. Biol. 5, e1000413.
Xie, C., Zhang, Y.E., Chen, J.-Y., Liu, C.-J., Zhou, W.-Z., Li, Y., Zhang, M., Zhang, R., Wei,
L., Li, C.-Y., 2012. Hominoid-specic de novo protein-coding genes originating
from long non-coding RNAs. PLoS Genet. 8, e1002942.
Yang, L., Zou, M., Fu, B., He, S., 2013. Genome-wide identication, characterization,
and expression analysis of lineage-specic genes within zebrash. BMC
Genomics 14, 65.
Yang, Z., Huang, J., 2011. De novo origin of new genes with introns in Plasmodium
vivax. FEBS Lett. 585, 641644.
Yang, Z.H., Nielsen, R., 2000. Estimating synonymous and nonsynonymous
substitution rates under realistic evolutionary models. Mol. Biol. Evol. 17, 32
43.
Yin, Y., Fischer, D., 2008. Identication and investigation of ORFans in the viral
world. BMC Genomics 9, 24.
Yu, C.-S., Chen, Y.-C., Lu, C.-H., Hwang, J.-K., 2006. Prediction of protein subcellular
localization. Proteins 64, 643651.
Yu, C.S., Lin, C.J., Hwang, J.K., 2004. Predicting subcellular localization of proteins for
Gram-negative bacteria by support vector machines based on n-peptide
compositions. Protein Sci. 13, 14021406.
Zdobnov, E.M., Apweiler, R., 2001. InterProScan an integration platform for the
signature-recognition methods in InterPro. Bioinformatics 17, 847848.
Zhang, F.L., Casey, P.J., 1996. Protein prenylation: molecular mechanisms and
functional consequences. Annu. Rev. Biochem. 65, 241269.
Zhang, W.-W., Matlashewski, G., 2010. Screening Leishmania donovani-specic
genes required for visceral infection. Mol. Microbiol. 77, 505517.
Zhang, Y.E., Landback, P., Vibranovski, M.D., Long, M., 2011. Accelerated recruitment
of new brain development genes into the human genome. PLoS Biol. 9,
e1001179.