Vous êtes sur la page 1sur 19

Epitope Predictions

Roman Kogay, Nazarbayev University, Astana, Republic of Kazakhstan


Christian Schönbach, Nazarbayev University, Astana, Kazakhstan
r 2018 Elsevier Inc. All rights reserved.

Abbreviations MCC Matthews correlation coefficient


ACO Ant colony optimization SVM Support vector machine
ANN Artificial neural network SVR Support vector regression
AROC Area under receiver operating WEKA Waikato environment for knowledge analysis
characteristic curve

Introduction

Hallmarks of the adaptive immune system are discrimination of self from non-self recognized as antigenic determinants or
epitopes through concerted activation of antibody and cell-mediated primary or memory responses. A complex network of
molecular and cellular interactions that includes antigen presenting cells, T cell and B cells regulates the steps from antigen
processing, epitope presentation and recognition to memory recall (Litman et al., 2010; Frank, 2002). One key question in epitope
prediction is whether a predicted epitope will be immunogenic. Indeed theoretical estimates for MHC class I restricted T cell
epitopes indicate that only 1 in 2000 peptides of non-self antigens may induce a dominant CTL response (Yewdell and Bennink,
1999).

T Cell Epitopes
Antigen presenting cells process proteins into peptides that if recognized by T cells are called T cell epitopes. Two distinct pathways
facilitate the processing of exogenous and endogenous (self and foreign) proteins into peptides which were comprehensively
reviewed by Blum et al. (2013). Most peptides generated by proteolysis through 26S proteasome are transported by TAP
(transporter associated with antigen processing) into the endoplasmatic reticulum where they bind to MHC class I molecules. If
the affinity of the peptides to MHC class I molecules is sufficiently high, stable peptide-MHC-I complexes are transported through
the Golgi apparatus to the cell surface where they are recognized by T-cell receptors (TCR) of CD8 þ T cells (Fig. 1). In contrast,
MHC class II molecules usually bind in the endosome to peptides derived from lysosomal proteolysis of exogeneous proteins
trafficked by phago- and endocyotsis. Peptide-MHC II complexes are then transported in endosomal vesicles to the cell surface
where they are recognized by TCR expressed on CD4 þ T cells (Fig. 2). Although the binding of the peptides to MHC is crucial in

Fig. 1 Processing, presentation and recognition of MHC class I-restricted T cell epitopes. Modified from Fig 17.21 (a) in Karp, G., 2008. Cell and
Molecular Biology. Concepts and Experiments, fifth ed. Asia: John Wiley & Sons (Asia) Pte. Ltd.

Encyclopedia of Bioinformatics and Computational Biology doi:10.1016/B978-0-12-809633-8.20248-3 1


2 Epitope Predictions

Fig. 2 Processing, presentation and recognition of MHC class II restricted T cell epitopes. Modified from Fig 17.21 (a) in Karp, G., 2008. Cell and
Molecular Biology. Concepts and Experiments, fifth ed. Asia: John Wiley & Sons (Asia) Pte. Ltd.

defining whether peptides may become epitopes, cross-presentation of peptides generated from phagocytosed exogenous proteins
by MHC class I or endogenous proteins processed in the lysosome by MHC class II complicates matters in epitope prediction-
assisted vaccine design. Cross-priming of naïve CD8 þ T cells mediated by cross-presentation is just one example where the choice
of epitopes in a vaccine will affect primary and secondary T-cell responses (Grotzke et al., 2017). Since epitope-based vaccines are
meant to mimic the natural protective immunity that activates the functions of both T and B cells (Fig. 3) we need to consider also
B cell epitopes.

B Cell Epitopes
B cell epitopes do not require processing and are predominantly of conformational or discontinuous nature (Barlow et al., 1986;
Greenbaum et al., 2007). The B cell epitope is a specific 3D surface area of an antigen that is recognized by the paratope, the
antigen-binding part of the antibody (Sela-Culang et al., 2013). Antibodies represent the secreted form of B-cell receptors (hence B
cell epitopes) which are produced by mature B cells, the plasma cells (Fig. 3). The antibodies produced by plasma cells are the
result of a somatic hypermutation process of the complementary determining regions (CDRs) that increases the antibody affinity
to the antigen (affinity maturation) (Schroeder and Cavacini, 2010; Neu and Wilson, 2016). Unlike T cell epitope predictions the
conformational nature of the B cell epitopes and conformational diversity of antibodies themselves, in addition to conformational
changes induced in both antibodies and epitopes when they bind to each other, pose an additional challenge in identifying
immunogenic candidate B cell epitopes. Unsurprisingly, advances in predicting B cell epitopes are less pronounced than for T cell
epitopes although the problem of conformational epitopes has been known since 1986 when the first crystal structure of a
lysozyme-antibody complex was published (Barlow et al., 1986).

Background

Immunogenicity of Epitopes
In both T and B cell epitope predictions of pathogen-derived proteins the goal is to identify potential immunogenic epitopes that
induce a protective immune response. In case of T cell epitopes, peptides that are predicted to bind with high affinity (good
binders) to MHC molecules are indeed very often immunogenic. Yet the MHC binding affinity remains a surrogate that does not
provide clues about the type of T cell response an epitope may trigger. Co-stimulatory signals mediated by expression of various
members of the cluster of differentiation (CD) family on both antigen presenting cells and T cells will determine which signaling
pathway downstream of the TCR-peptide-MHC complex are activated, and hence whether the epitope triggers an effector T cell
response or a suppressive, regulatory T cell response. Thus future assessments of predicted epitopes towards immunogenicity,
Epitope Predictions 3

Fig. 3 Interaction of activated T helper cells activates B cells which differentiate into memory and plasma cells. The latter secrete antigen-specific
antibodies. Modified from Fig 17.10 (a) in Karp, G., 2008. Cell and Molecular Biology. Concepts and Experiments, fifth ed. Asia: John Wiley &
Sons (Asia) Pte. Ltd.

particularly in context of human vaccine development, should take into account the data derived from systems biology approaches
(Querec et al., 2009) and biomarker assays (Rappuoli and Aderem, 2011) in the context, dynamics and quality of a protective
immune response to further boost the efficacy of data-driven predictions.

Overview of Epitope Predictions

Epitope predictions utilize statistical, machine learning or structural models derived from experimental data as reviewed by Soria-
Guerra et al. (2015). Early epitope prediction methods BIMAS (Parker et al., 1994) and EpiMatrix (Schafer et al., 1998) were based
on quantitative matrices (QM) or binding motifs (SYFPEITHI) (Rammensee et al., 1999) that capture simple linear relationships of
binding and non-binding data. Both methods are limited by the number of available experimental binding and non-binding data
for a specific MHC allotype to derive a binding motif based on amino acids observed at peptide residue positions or to score a
peptide's binding potential using position-dependent coefficients of amino acids. The limitations of quantitative matrices
including overfitting, are balanced in practice by their easy implementation and use combined with an acceptable accuracy for
selected MHC allotypes (Schönbach et al., 2002; De Groot et al., 2002).
The continuous growth of data enabled the application of more sophisticated methods utilizing artificial neural networks
(ANN) (Adams and Koziol, 1995), hidden Markov models (HMM) ranging from fully connected HMMs to profile HMMs
(Mamitsuka, 1998; Brusic et al., 2002; Zhang et al., 2006; Larsen et al., 2006), and support vector machines (SVM) (Zhang et al.,
2007; Jacob and Vert, 2007; Chen et al., 2007) that are adept in extracting and processing complex nonlinear relationships in
peptide-MHC binding or B cell epitope data to derive predictive models. The predictive power of these models largely depends on
the quantity and quality of (unbiased) annotated data. Immune Epitope Database (IEDB) (Vita et al., 2014), a manually curated
database of experimentally characterized immune epitopes has been instrumental in providing since 2005 high-quality data for
training and testing datasets. Successful application of these data-driven methods that reduced experimental time and costs
spurned further improvements towards increasing the accuracy of CTL epitope predictions by combining different individual
prediction-based models followed by integrating top-predicting scores to derive a consensus score (consensus methods) (Mou-
taftsi et al., 2006). Similarly, the integration of predictions for distinct processes such as proteasomal processing, TAP binding and
peptide-MHC binding while decreasing the dependency on MHC allotype-specific binding data as implemented in NetCTL-pan
(Stranzl et al., 2010) improved the predictive power.
Structure-based methods (Patronov and Doytchinova, 2013) yield high-quality quantitative predictions of peptide-MHC
binding affinities based on free Gibbs energy calculations that take into account the interactions and distances between atoms of
peptide and MHC amino acid residues. Threading of peptides predicted to bind to MHC is based on the computationally intensive
free energy calculations of individual interactions (Logean and Rognan, 2002) or pairwise interactions using an energy potential
4 Epitope Predictions

matrix (Schueler-Furman et al., 2000). Quantitative structure-activity relationship (QSAR) methods that are employed in drug
discovery, assuming similar molecular structures result in similar activities, analyze differences in free energy and structures
(Doytchinova and Flower, 2002). The limited number of available high quality structures restricts the applicability of structure-
based methods particularly for B cell epitope predictions.
The performance of prediction methods is evaluated by applying Pearson correlation, Matthews correlation coefficients or
receiver operating characteristic (ROC) analyses which includes positive-predictive value (PPV), negative-predictive value (NPV),
area under the receiver operating characteristic curve (AROC), sensitivity (SE), specificity (SP) or root mean squared deviations
(RSMD) of distances in Å for structure predictions (Yang and Yu, 2009; Hattotuwagama et al., 2007). SE provides the ratio of
correctly predicted real positives, whereas SP for example, correctly predicted true negatives indicate the quality of predictions. PPV
and NPV provide an overall success rate as proportions of true positives (negatives) of all positive (negative) predicted, whereas the
AROC value reflects the overall quality of predictions within SE and SP value ranges. The AROC value is a robust measure as long as
the number of positives and negatives in a test set are not extremely lop-sided. When no independent test data sets are available
leave-one-out cross-validation (LOOC) (Sammut and Webb, 2010) is often used. LOOC may indicate an acceptable performance
of a tool that becomes inacceptable when applied to a new data set.

Approaches

T Cell Epitope Prediction


T cell epitope predictions can be divided into integrated, MHC-pan and allotype-specific binding, TAP binding, proteasomal
cleavage, and a few miscellaneous approaches which are represented by 38 publicly available, mostly web accessible tools shown
in Table 1. Each tool is summarized by its name, URL, applicable species, and salient features including limitations, performance
and methods applied to provide a convenient desk reference for choosing a T cell epitope prediction tool.
Independent of the approach categories the majority of tools rely on the application of QM, ANN, SVM, consensus or hybrid
methods. An ANN is a network of interconnected nodes whose connections carry numeric data that are trained by weighting the
connections (Honeyman et al., 1998). Prior to using an ANN for peptide sequences the sequences must be transformed to numeric
descriptors to input into the network layers. Popular ANN-based prediction tools include MULTIPRED2 (Zhang et al., 2011a),
NetMHC 4.0 (Andreatta and Nielsen, 2015) and NetMHCpan 3.0 (Nielsen and Andreatta, 2016) for MHC class I and NetMHCII
2.2 (Nielsen and Lund, 2009) and NetMHCIIpan 3.1 (Andreatta et al., 2015) for MHC class II peptide binding.
SVMs comprise machine learning algorithms that classify complex data by placing it into a multidimensional space and
constructing a hyperplane through an appropriate kernel function. Representative tools using SVM are TAP binding predictor
TAPreg (Diez‐Rivero et al., 2010), MHC class II peptide binding predictor MHC2SKpan (Guo et al., 2013) and T cell epitope
immunogenicity predictor Repitope (Ogishi and Yotsuyanagi, 2017).
Several MHC-pan (PSSMHCpan (Liu et al., 2017) and TEPITOPEpan (Zhang et al., 2012)) and allotype-specific tools (Pick-
Pocket 1.1 (Zhang et al., 2009), HLaffy (Mukherjee et al., 2016)) and PREDIVAC 2.0 (Oyarzún et al., 2013) use QM-based methods
that assign coefficients for each amino acid in a peptide frame and discriminate potential binders from non-binders through the
derived peptide score. The only QM-based tool that successfully integrated proteasomal processing, TAP, MHC class I and II
binding prediction is TEpredict (Antonets and Maksyutov, 2010) whereas ANN- and QM-based NetCTLpan 1.1 (Stranzl et al.,
2010) do not cover MHC class II.
Integrated approaches that are based on the consensus of independent methods tend to outperform single methods.
NetMHCcons 1.1 which couples NetMHC 3.4, NetMHCpan 2.8 and PickPocket 1.1 methods is a leading example (Karosiene et al.,
2012).
The only MHC class II allelle-specific candidate epitope prediction tools that utilize structure-based methods (Yao et al., 2013)
are EpiDOCK (Atanasova et al., 2013) and EpiTOP (Dimitrov et al., 2010a; Dimitrov et al., 2010b). EpiDOCK uses docking score-
based quantitative matrices to evaluate the binding potential of overlapping nonamer peptides generated from an input sequence
to five HLA class II DPA/DBP, six DQA/DQB and 12 DRB1 proteins. EpiTOP employs QSAR-derived quantitative matrices that
encode both peptide and HLA protein pocket interactions derived from 12 HLA-DRB1 allotypes to predict peptide-HLA-DRB1
binding.
Two tools, categorized under miscellaneous approaches use hybrid SVM and motif methods to identify potential T-helper cell
epitopes by predicting cytokine-inducing peptides. IL4Pred (Dhanda et al., 2013a) and IFNepitope (Dhanda et al., 2013b) predict
interferon gamma- and interleukin 4-inducing MHC class II peptides, respectively.

B Cell Epitope Prediction


The conformational and linear B cell epitope prediction approaches comprising 21 web accessible tools are summarized according
to their characteristics including performance, limitations and methods in Table 2.
Representatives of conformational epitope prediction approaches using structure-based methods are Discotope 2.0 (Kringelum
et al., 2012) and BEpro (Sweredoski and Baldi, 2008b). Discotope 2.0 uses amino acid residue propensity scores within a defined
sphere whereas BEPro applies an amino acid propensity scale in combination with side chain orientation and solvent accessibility
properties.
Table 1 T cell epitope prediction approaches

Name and URL Species Functionality and Performance Limitations Method

Integrated approaches
NetCTLpan 1.1 Hosa, Patr, Mamu, Susc, Integrates prediction of proteasomal Predictions for 8–11 mer; at most ANN and matrix (matrix for
http://www.cbs.dtu.dk/services/NetCTLpan/ Mumu, Gogo, Bota cleavage, TAP and MHC class I 5000 sequences per TAP).
binding affinity; AROC: 0.920–0.977 submission and each sequence
(depending on data); (Stranzl et al., must be o20,000 amino acids;
2010). maximum 20 MHC allotypes
per submission.
NetTepi 1.0 Hosa Integrates peptide-MHC binding affinity, 13 HLA allotypes; 8–14mer ANN and matrix.
http://www.cbs.dtu.dk/services/NetTepi/ peptide-MHC stability and TCR peptides; at most 5000
propensity; prediction values are sequences per submission and
weighted sum or % rank; sorted by each sequence must be
combined, affinity, stability, TCR o20,000 amino acids; T cell
propensity scores; AUC0.1: propensity tool was validated
0.9285–0.9305 (depending on the only on 9mer peptides.
combined model) (Trolle and Nielsen,
2014).

NetMHCcons 1.1 Hosa, Patr, Mamu, Susc, Integrates methods of NetMHC, 101 MHC I allotypes; at most Consensus method: ANN,
http://www.cbs.dtu.dk/services/NetMHCcons/ Mumu, Gogo, Bota NetMHCpan and Pickpocket; 5000 sequences per pan-specific ANN, and
predictions for 8–15mer peptides; submission and each sequence matrix.
prediction values in nM IC50 and % must be o20,000 amino acids;
Rank; sorting by predicted binding maximum 20 MHC allotypes
affinity; PCC: 0.23–0.72 (depending per submission.
on the group of allotypes) (Karosiene
et al., 2012).

RANKPEP Hosa, Mumu Cleavage prediction; immunodominance 102 MHC I and 80 MHC II Matrix and SVM (Immuno-
http://imed.med.ucm.es/Tools/rankpep.html filter; molecular weight filter; allotypes; predictions for MHC I dominance).
predictions can be made from MSA; binder are limited 8–11mer
480% of CD8 þ T cell epitopes are peptide; matrices are limited by
among top 2% of scoring peptides; the quality of sequences.
3–10% threshold was required to
predict 80% of CD4 þ T cells epitopes;
(Reche et al., 2004).

Epitope Predictions
NetCTL 1.2 Hosa Integrates HLA class I binding, C 12 HLA supertypes; predicts only ANN (HLA binding and C
http://www.cbs.dtu.dk/services/NetCTL/ terminal cleavage, TAP transport 9mer epitopes. terminal cleavage) and
efficiency; (depending on the matrix (TAP transport
threshold); AROC: 0.941, sensitivity: efficiency).
72% (among 5% top-scoring peptide)
(Larsen et al., 2007).
(Continued )

5
6
Table 1 Continued

Name and URL Species Functionality and Performance Limitations Method

Epitope Predictions
ProPred1 Hosa; Mumu Prediction of the standard proteasome 47 MHC I allotypes; only 9mer Matrix.
http://crdd.osdd.net/raghava/propred1/ and immunoproteasome cleavage peptides are predicted; over-
sites; identification of MHC binders generalized - the matrices were
with cleavage sites at C terminus; obtained for enolase-I protein.
output displayed in different formats;
allows subsequence analysis;
accuracy: 38–80% for HLA-A*0201,
70–80% for H2-Kb (depends on the
threshold); (Singh and Raghava,
2003).

nHLAPred Hosa, Mumu Average accuracy for hybrid approach: 67 MHC I allotypes; only ANN and matrix or matrix
http://crdd.osdd.net/raghava/nhlapred/ 92.8%; prediction of the standard nonamer peptides. only.
proteasome and immunoproteasome
cleavage sites; identification of MHC
binders with cleavage sites at C
terminus; (Bhasin and Raghava,
2007).
MULTIPRED 2.0 Hosa Allows input of pre-calculated viral 26 HLA class I and II supertypes; ANN.
http://cvc.dfci.harvard.edu/multipred2/ proteomes; employs NetMHCpan and 8–11 mer peptides for HLA I
NetMHCIIpan as predictive engines; and 9mer peptides for HLA II.
heatmaps are generated to visualize
results; (Zhang et al., 2011).

TepiTool (Pipeline) Hosa, Patr, Mamu, Susc, Integrates several predictive tools; Predicts 8–14mer epitopes. Includes but not limited to
http://tools.iedb.org/tepitool/ Mumu, Gogo, Bota and predicts MHC I and MHC II binders; consensus, ANN, and
others allow to select most frequent matrix methods.
allotypes; different types of filtering,
such as percentile rank, absolute rank,
based on IC50 (Paul et al., 2016).

MHCPred 2.0 Hosa, Mumu Two models of prediction: amino acids 17 MHC class I and II allotypes; Matrix.
http://www.ddg-pharmfac.net/mhcpred/MHCPred/ contribution; amino acids þ their sequences are limited to 1000
interactions; anchor positions residues; only plain format is
(maximum 4) can be chosen; predicts supported.
affinity to TAP; (Guan et al., 2006).

FRED 2.0 (Pipeline) N/A HLA typing, epitope prediction, epitope Difficult to execute for unfamiliar Python-based framework.
http://fred-2.github.io/ selection, and epitope assembly; users; requires separate
implemented in Python; provides installation of several external
unified access to different epitope tools from Center for Biological
prediction tools and databases; Sequence Analysis, Technical
(Schubert et al., 2016). University of Denmark.
TEpredict N/A Predicts MHC I and MHC II binders, Considers only 9mer peptides. Matrix.
http://tepredict.sourceforge.net/downloads.html proteasomal and immunoproteasomal
processing, peptide and TAP binding;
sensitivity: 50–80%; specificity:
75–99%; (Antonets and Maksyutov,
2010).

SVMHC (works only via Epitoolkit pipeline) Hosa, Mumu Allows accessions/database identifiers Predictions for 8–10 mer SVM (MHC I) and matrices
https://abi.inf.uni-tuebingen.de/Services/SVMHC as input; analysis of single amino acid peptides; 26 MHC I allotypes (MHC II).
polymorphism; different output views; from MHCPEP, 24 MHC I
(Dönnes and Kohlbacher, 2006). allotypes from SYFPEITHI and
51 MHC II allotypes; only one
sequence per submission;
thresholds cannot be set by
users.
PickPocket 1.1 Hosa, Patr, Mamu, Susc, Robust when data is scarce and when 8–12mer peptides; 4150 MHC I Matrix.
http://www.cbs.dtu.dk/services/PickPocket/ Mumu, Gogo, Bota the similarity to MHC molecules with and II allotypes; at most 5000
characterized binding specificities is sequences per submission and
low; PCC: 0.26–0.6 (depending on the each sequence must be
evaluated set) (Zhang et al., 2009). o20,000 amino acids;
maximum 20 MHC per
submission.

MetaMHCpan http://datamining-iip.fudan.edu.cn/ Hosa, Mumu Meta-server with different predictive 8–11mer peptides for MHC I and Matrix, SVM, and multiple
MetaMHCpan/index.php/pages/view/info methods (Xu et al., 2016). 9–25mer peptides for MHC II; instance learning
41 MHC I and 4600 MHC II method.
allotypess.
MHC I pan approaches
NetMHCpan 3.0 Hosa, Patr, Mamu, Susc, 91% of ligands are recovered at a rank 172 MHC I molecules; 8–14mer ANN.
http://www.cbs.dtu.dk/services/NetMHCpan/ Mumu, Gogo, Bota threshold of 2% with a specificity of peptides; at most 5000
98%; AROC : 0.83–0.89; (depending sequences per submission and
on the epitope length); (Nielsen and each sequence must be
Andreatta, 2016). o20,000 amino acids;
maximum 20 MHC allotypes
per submission.

PSSMHCpan Hosa AROC: 0.94, accuracy: 85%; able to 87 HLA class I allotypes. Matrix.
https://github.com/BGI2016/PSSMHCpan predict neoantigens; 8–25 mer peptide

Epitope Predictions
length prediction; (Liu et al., 2017).

MHC I allotype-specific approaches


NetMHC 4.0 Hosa, Patr, Mamu, Susc, AROC:0.882–0.895; (depending on the 122 MHC I allotypes; 8–13 mer ANN.
http://www.cbs.dtu.dk/services/NetMHC/ Mumu, Bota peptide epitope length); (Andreatta peptides; at most 5000
and Nielsen, 2015). sequences per submission and
each sequence must be
o20,000 amino acids;
maximum 20 MHC allotypes

7
per submission.
(Continued )
8
Table 1 Continued

Name and URL Species Functionality and Performance Limitations Method

Epitope Predictions
HLaffy Hosa Estimates peptide affinity for HLA class Only for 9mer peptides. Matrix.
http://proline.biochem.iisc.ernet.in/HLaffy/?tab=1 I; accuracy: 92% and correlation:
0.85, for IEDB dataset accuracy:
82.5%; provides a histogram view;
representative molecular models with
favorable peptide-HLA residue
interactions are available (Mukherjee
et al., 2016).

MHC II pan approaches


NetMHCIIpan 3.1 Hosa, Mumu Prediction values are given as IC50 4 MHC II isotypes; at most 5000 ANN.
http://www.cbs.dtu.dk/services/NetMHCIIpan/ (inhibitory concentration) and as % sequences per submission and
ranks; AROC: 0.80–0.90 for the binding each sequence must be
between peptide and MHC; (Andreatta o20,000 amino acids;
et al., 2015). maximum 20 MHC per
submission.
TEPITOPEpan Hosa Shows binding cores and percentile Limited to HLA-DR. Matrix.
http://datamining-iip.fudan.edu.cn/service/ ranks; prediction for 9–25mer
TEPITOPEpan/TEPITOPEpan.html peptides; average AROC: 0.717–0.833
(depending on the dataset); predicts
over 700 HLA-DR allotypes; (Zhang
et al., 2012).

MHC2SKpan Hosa Predicts MHC class II peptide binding; 9–25mer peptides; DRB only. SVM.
http://datamining-iip.fudan.edu.cn/service/ AROC: 0.734–0.843 (depending on the
MHC2SKpan/info.html dataset); (Guo et al., 2013).

PREDIVAC  2.0 Hosa AROC: 0.842–0.872 for HLA II binding; DRB only. Matrix.
http://predivac.biosci.uq.edu.au/ AROC: 0.749 for CD4-T-cell epitopes;
883 HLA II allotypes; target population
epitope predictions (Oyarzún et al.,
2013).

MHC II allotype-specific approaches


NetMHCII 2.2 Hosa, Mumu Predictions are given as % Rank and in 28 MHC II allotypes; at most ANN.
http://www.cbs.dtu.dk/services/NetMHCII/ nM IC50 values; AROC: 0.68–0.82 5000 sequences per
(depending on the dataset and submission, each sequence
method) (Nielsen and Lund, 2009). must be o20,000 amino acids.

EpiDOCK Hosa Identifies 90% true binders and 76% 23 HLA II allotypes; only 9mer Matrix.
http://epidock.ddg-pharmfac.net/ true non-binders; overall accuracy: peptides are predicted; no
83% (Atanasova et al., 2013). sorting mechanism.
EpiTOP Hosa Identifies 89% of known epitopes within 12 HLA-DRB1 allotypes; only Matrix.
http://www.pharmfac.net/EpiTOP/ top 20% of predicted binders 9mer peptides are predicted;
(Dimitrov et al., 2010a,b). allotype selection did not work
at the time of testing.
MHC2Pred Hosa, Mumu Accuracy: 478%; (Lata et al., 2007). 42 MHC II allotypes; only 9mer SVM.
http://crdd.osdd.net/raghava/mhc2pred/ peptides are predicted;
inconvenient output format
without sorting options.
Propred Hosa Output displayed in different formats; 51 HLA-DR allotypes. Matrix.
http://crdd.osdd.net/raghava/propred/ (Singh and Raghava, 2001).

HLA-DR4Pred Hosa Accuracy: 86% and 78% for SVM and Only for HLA-DRB1*0401. SVM and ANN.
ANN, respectively. (Bhasin and
http://crdd.osdd.net/raghava/hladr4pred/ Raghava, 2004b)

Proteasomal cleavage approaches


Pcleavage N/A Predicts proteasome cleavage sites; The C-terminal of MHC ligands SVM.
http://crdd.osdd.net/raghava/pcleavage/ MCCs: 0.54 and 0.43 for in vitro and represents only a subset of
MHC ligand data respectively (Bhasin cleavages that occur in vivo.
and Raghava, 2005).

NetChop 3.1 Hosa Predicts cleavage sites of human At most 100 sequences and ANN.
http://www.cbs.dtu.dk/services/NetChop/ proteasome; two different methods: 100,000 amino acids per
C-terminus and 20S; specificity: 48%, submission.
sensitivity: 81% MCC: 0.31 (Nielsen
et al., 2005).

TAP binding approaches


TAPPred Hosa Predicts TAP-binding peptides; N/A SVM or Cascade SVM.
http://crdd.osdd.net/raghava/tappred/ correlation coefficients: 0.88 (Cascade
SVM) and 0.8 (SVM); cascade SVM
uses features of amino acids along
with sequence whereas SVM uses
only sequence (Bhasin et al., 2007).

TAPreg N/A Predicts affinity of TAP binding ligands; Data training was done on 9mer SVM.
http://imed.med.ucm.es/Tools/tapreg/ maximum PCC: 0.89 7 0.03 (Diez‐ peptides only.
Rivero et al., 2010).

Epitope Predictions
Miscellaneous T cell epitope related prediction approaches
Repitope Hosa Epitope immunogenicity prediction; Retrospective observational SVM.
https://github.com/masato-ogishi/Repitope accuracy: 70–80% (Ogishi and study; simplified assumptions
Yotsuyanagi, 2017). biophysical-chemical nature of
the TCR-peptide-MHC
interactions; limited to TCR-V
beta.
(Continued )

9
10
Table 1 Continued

Name and URL Species Functionality and Performance Limitations Method

Epitope Predictions
IL4pred N/A Predicts interleukin 4-inducing MHC Training set consists of only Motif, SVM or hybrid.
http://crdd.osdd.net/raghava/il4pred/ class II binders; maximum accuracy: 8–22mer peptides created
75.76% and MCC: 0.51 (hybrid without species considerations.
method) (Dhanda et al., 2013a).

IFNepitope N/A Prediction and design of interferon-g Analysis of positions-specific Motif, SVM or hybrid.
http://crdd.osdd.net/raghava/ifnepitope/ inducing MHC class II binding preference of residues was
peptides; maximum prediction done only for a small number
accuracy of hybrid approach: 81.3% of peptides.
(MCC: 0.57)  82.1% (MCC: 0.62)
(depends on dataset) (Dhanda et al.,
2013b).

CTLPred N/A Predicts CTL epitopes; accuracy for QM, Training data contains only 9mer ANN, SVM and matrix.
http://crdd.osdd.net/raghava/ctlpred/ ANN and SVM: 70.0, 72.2% and epitopes; small blind dataset
75.2% respectively; has combined and (63 epitopes for 2 subgroups).
consensus prediction approaches;
(Bhasin and Raghava, 2004a).

MMBPred Hosa, Mamu, Mumu Prediction of mutated promiscuous 67 MHC I allotypes; only 9mer Matrix.
http://crdd.osdd.net/raghava/mmbpred/ binders e.g. increase/decrease of peptides are predicted;
peptide-MHC binding; analyzes promiscuous binding
position and type of mutations prediction was not supported at
(Bhasin and Raghava, 2003). the time of testing.

Abbreviations: ANN: artificial neural network, AROC or AUC: area under receiver operating characteristic curve, Bota: Bos taurus; Gogo: Gorilla gorilla, Hosa: Homo sapiens, HLA: human leukocyte antigen, Mamu: Macaca mulatta, MCC: Matthews
correlation coefficient, MHC: major histocompatibility complex, MSA: multiple sequence alignment, Mumu: Mus musculus, Patr: Pan troglodytes, PCC: Pearson correlation coefficient, SVM: support vector machine, Susc: Sus scrofa, TAP: transporter
associated with antigen processing, and QM: quantitative matrix.
Table 2 B cell epitope prediction approaches

Name and URL Functionality and Performance Limitations Method

Discontinuous/conformational epitope prediction approaches


SEPPA 2.0 Subcellular localization of an antigen and host species Antigens with less than o25 residues Optimized logistic regression algorithm.
http://lifecenter.sgst.cn/seppa2/ are considered; AROC: 0.785–0.823 (depending on were not considered.
the host and localization of antigen protein); Jmol is
used for visualization (Qi et al., 2014).

SEPIa AROC: 0.65; prediction is based on the amino acid Only window of 9 residues is used to test Gaussian Naïve Bayes and Random Forest
https://github.com/SEPIaTool/SEPIa sequence; amino acid features and sequence-based whether the middle residue is an epitope algorithms based on 13 features.
features are taken into account (Dalkas and Rooman, or not.
2017).

DiscoTope 2.0 AROC: 0.824 and 0.727 (for training and independent Trained on a dataset with the total number Definition of the spatial neighborhood to
http://www.cbs.dtu.dk/services/ data sets) (Kringelum et al., 2012). of residues per epitope 9–22 and with sum propensity scores and half-sphere
DiscoTope/ the longest sequential stretch 3–12 exposure as a surface measure.
residues per epitope.
EPSVR AROC: 0.597; evaluates 6 different features: residue Multiple epitopes in one antigen were not SVR with six attributes.
http://sysbio.unl.edu/EPSVR/ epitope propensity, conservation score, side chain considered.
energy score, contact number, surface planarity
score and secondary structure composition (Liang
et al., 2010).

EPMeta AROC: 0.638 (Liang et al., 2010). Works only on Linux OS. Consensus method for EPSVR, EPCES,
http://sysbio.unl.edu/EPMeta/ Epitopia, SEPPA, BEpro and Discotope
1.2.
BEpro AROC: 0.754 and 0.683 (for Discotope and Epitome Propensity scale scores averaged over a Linear combination of amino-acid
http://pepito.proteomics.ics.uci.edu/index. datasets respectively); Jmol is used for visualization window of exactly 9 residues; only PDB propensity scale, side chain orientation
html (Sweredoski and Baldi, 2008b). file format is supported. and solvent accessibility information
using half sphere exposure values.
CBTOPE Accuracy: 86.59%, AROC: 0.90, MCC: 0.73; prediction Lacks 3D structural output. SVM.
http://crdd.osdd.net/raghava/cbtope/ is based on the amino acid sequence using different
window length patterns, standard binary and
physicochemical profiles of pattern (Ansari and
Raghava, 2010).

Epitope Predictions
EPCES Sensitivity: 47.8%, Specificity: 69.5%, AROC: 0.632; Low accuracy for the bound structures. Consensus scoring utilizing six different
http://sysbio.unl.edu/EPCES/ evaluates residue epitope propensity, conservation scoring functions.
score, side chain energy score, contact number,
surface planarity score and secondary structure
composition (Liang et al., 2009).
(Continued )

11
12
Table 2 Continued

Name and URL Functionality and Performance Limitations Method

Epitope Predictions
Bpredictor AROC: 0.633 and 0.654 for bound and unbound Window size: 3–15 residues. Random-forest algorithm with distance
https://code.google.com/archive/p/my- datasets respectively; impact of interior residues and based feature.
project-bpredictor/downloads different contributions of adjacent residues were
considered (Zhang et al., 2011a,b,c).

EpiSearch In most cases covers 450% of experimentally In addition to the 3D structure of the Patch analysis that identifies cluster of
http://curie.utmb.edu/episearch.html validated residues; Jmol is used for visualization antigen the input requires the set of residues on the surface of antigen with
(Negi and Braun, 2009). mimotopes (up to 12 mer). similar physicochemical properties as in
mimotopes.
PEP  3D-Search MCC: 0.176, sensitivity: 36.4%, Precision: 69.5% In addition to the 3D structure of the Mimotope-based analysis via ACO
http://kyc.nenu.edu.cn/Pep3DSearch/ (Huang et al., 2008). antigen the input requires the set of algorithm.
mimotopes; compatible only with
Windows OS.
CEP Accuracy: 75%; Jmol is used for visualization Simplified approach with few attributes. Algorithm uses accessibility of residues
http://196.1.114.49/cgi-bin/cep.pl (Kulkarni-Kale et al., 2005). and spatial distance cut-off.

Linear B cell epitope prediction approaches


LBtope Able to identify mutation(s) in peptide and convert it/ 5–30mer epitope prediction. SVM, using diverse features; binary
http://crdd.osdd.net/raghava/lbtope/ them to the epitope; accuracy: 81% (54–86% profile, di-peptide composition and
depending on model and dataset); mutation tool is amino acid pair profile.
available to design better epitope or for de-
immunization purpose (Singh et al., 2013).

COBEpro AROC: 0.606–0.829 (depending on the dataset); Prediction is limited to sequences of 1500 SVM for the similarity measure based on
http://scratch.proteomics.ics.uci.edu/ Predicts epitopes of any length (Sweredoski and amino acids length. the total number of identical substrings.
Baldi, 2008a).

ABCpred Sensitivity: 67.14%, specificity: 64.71%, accuracy: 10, 14, 16, 18, 20mer predicted epitope ANN.
http://crdd.osdd.net/raghava/abcpred/ 65.93%; has overlapping filter (Saha and Raghava, length in plain text format; developed
2006). with a small dataset derived from
database BCIPEP (Saha et al., 2005)

SVMTriP Precision: 54.1–57.1%, sensitivity: 68.5–80.1%, AROC: 10, 14, 16, 18, 20mer predicted epitope SVM.
http://sysbio.unl.edu/SVMTriP/ 0.674–0.702 (depending on the length of epitope); length; only one sequence per time in
prediction combines tri-peptide similarity and FASTA format; waiting time at the time
propensity scores (Yao et al., 2012). of testing was 10–30 min.

IgPred Predicts B-cell epitopes that can induce a specific class 4–20 mer epitopes. SVM and WEKA. package tools.
http://crdd.osdd.net/raghava/igpred/ of antibody; MCC: 0.44 (IgG), 0.7 (IgE), 0.45 (IgA),
accuracy is around 80%; epitope mapping and motif
scan functions are available (Gupta et al., 2013).
Bcepred Prediction accuracy for various properties: Only plain text format is supported; Evaluation of different physicochemical
http://crdd.osdd.net/raghava/bcepred/ 52.92–57.53%; highest accuracy via combination of developed with a small dataset derived scales by different method for each.
properties: 58.70% (Saha and Raghava, 2004). from database BCIPEP (Saha et al.,
2005)

Linear and discontinuous/conformational epitope prediction approaches


Epitopia AROC: 0.6 (conformational) and 0.59 (linear); Immunogenicity of each residue is Naïve Bayes classifier.
http://epitopia.tau.ac.il/ visualization through Jmol and RasMol ; calculates determined by analysis of
immunogenicity for each solvent accessible residue physicochemical properties of three
for 3D structure input and for every amino acid for flanking residues.
sequence input (Rubinstein et al., 2009).

ElliPro AROC: 0.732, sensitivity: 60.1%; Jmol and MODELLER Generalize all proteins as ellipsoids. Modified Thornton’s method with residue
http://tools.iedb.org/ellipro/ program are available for visualization and prediction (pI value) clustering
3D structure (Ponomarenko et al., 2008).

BepiPred 2.0 AROC for structural epitopes: 0.62, AROC for linear 5–25mer epitope predictions; at most 50 Random Forest algorithm
http://www.cbs.dtu.dk/services/BepiPred/ epitopes: 0.574 (Jespersen et al., 2017). sequences (max. 300,000 residues per
index.php submission, o6000 residues/sequence

Epitope Predictions
13
14 Epitope Predictions

Implementations of machine learning algorithms for conformational epitope predictions are Epitopia (Rubinstein et al., 2009),
EPSVR (Liang et al., 2010) and Bpredictor (Zhang et al., 2011c). These tools utilize naïve Bayesian classifiers, SVM and random
forest algorithm, respectively. A consensus method (Liang et al., 2009) was developed for EPMeta (Liang et al., 2010). EPMeta
combines EPSVR, EPCES, Epitopia, SEPPA, BEpro and Discotope 1.2 predictions to predict surface residues as an epitope if two or
more single tools have voted for it. None of the methods applied increased the predictive performances which range between AROC
of 0.597 and 0.824, depending on the datasets tested, above moderate levels.
Sequence-based methods rely on amino acid sequence-based features to evaluate the probability of each residue to be a part of
a conformational epitope (Sun et al., 2013). CBTOPE is a SVM-based tool that applies standard binary and physico-chemical
profiles of patterns (Ansari and Raghava, 2010). SEPIa employs naïve Bayesian classifier and random forest ensemble methods to
classify residues using 13 different features (Dalkas and Rooman, 2017). Although sequence-based methods can be a solution for
conformational epitope predictions the performance improvement of complex methods as implemented in SEPIa is minor
compared to simpler methods.
Mimotopes are peptides selected from random libraries for their ability to bind to an antibody directed against a specific
antigen. Since the mimicry relies on similarities in physicochemical properties and spatial organization (Moreau et al., 2006)
mimotope analysis methods applied to conformational epitope predictions require both mimotopes and a 3D structure of the
target antigen as an input. The mimotopes are mapped to the surface of the antigen to identify the best sequence alignment, and
predict potential epitope regions (Sun et al., 2016). For example PEP-3D-Search searches for matching paths on an antigen surface
with respect to the query mimotopes using an ant colony optimization algorithm (Huang et al., 2008).
Linear epitope prediction approaches utilize SVM, ANN and random forest machine learning methods to evaluate multiple
amino acid residue properties (Potocnakova et al., 2016). The machine learning approaches have substituted older, poorly
performing methods that used only amino acid propensity scores (Blythe and Flower, 2005). The features assessed include for
example hydrophilicity, flexibility, turns, solvent accessibility and amino acid pair antigenicity scale (Yasser and Honavar, 2010).
SVMTriP (Yao et al., 2012), LBtope (Singh et al., 2013) and COBEpro (Sweredoski and Baldi, 2008a) are representatives of SVM-
based tools. ABCpred (Saha and Raghava, 2006) and Bepipred 2.0 (Jespersen et al., 2017) are ANN and random forest-based tools,
respectively.
A few tools allow the prediction of both conformational and linear B cell epitopes. ElliPro (Ponomarenko et al., 2008) is based
on the implementation of three different algorithms that treat a protein as an ellipsoid shape (Taylor et al., 1983), derive a residue
protrusion index using a modified Thornton method (Thornton et al., 1986), and cluster neighboring residues according to their pI
(isoelectric point) values. When tested on a conformational epitope dataset constructed from antibody-protein complex 3D
structures ElliPro's performance was moderate (AROC 0.732). Naïve Bayes classifier-based Epitopia (Rubinstein et al., 2009)
performs slightly inferior for conformational (AROC 0.60) and linear (AROC 0.59) epitopes. Similarly modest performances were
reported for BepiPred 2.0 with AROC 0.62 for conformational and AROC 0.574 for linear epitopes (Jespersen et al., 2017).

MHC and Epitope Databases


Sustained public accessibility of curated high-quality data on HLA allele sequences, experimentally derived epitope and non-
epitope data has enabled the development and improvement of epitope prediction tools. IMGTs (the international ImMuno-
GeneTics information systems) established in 1989 became the global reference in immunogenetics and immunoinformatics for
immunoglobulins or antibodies, T-cell receptors, human and vertebrate major histocompatibility, immunoglobulin superfamily
and related proteins of the immune system of vertebrates and invertebrates (Lefranc et al., 2014).
SYFPEITHI, one of the oldest databases in the field had been utilized together with IEDB (Vita et al., 2014) to develop T cell
epitope prediction tools. SYFPEITHY includes more than 7000 peptides that bind to MHC class I and II molecules (Rammensee
et al., 1999). With increasing utility of IEDB and integration of epitope data SYFPEITHY became static in 2012.
IEDB stores data on more than 300,000 peptide epitopes including epitope information derived from antigen-antibody
complex of PDB, and more than 2500 non-peptide epitopes. The epitope data is not restricted to human and mouse but includes
also chimpanzee, macaque, cow and swine. Integrated epitope prediction tools for linear and discontinuous B cell epitopes such as
ElliPro (Ponomarenko et al., 2008) or the pipeline TepiTool pipeline (Paul et al., 2016) for vaccine, diagnostic, therapeutic
candidate epitope discovery render IEDB by far the most user-friendly and effective resource. Yet a prediction tool that identifies in
one process candidate B and T cell epitopes still awaits its implementation.

Illustrative Examples or Case Studies

The applications of epitope predictions are as multifarious as immunoinformatics the field that rose from the beginnings in
theoretical immunology to make an impact on vaccine research and development. A representative example is reverse vaccinology
(Sette and Rappuoli, 2010). This strategy allows the rapid design of novel vaccines based on pathogen genome information used
in epitope predictions. Guttierrez and collaborators have demonstrated successfully the design of and effective epitope-based
vaccine against swine Influenza A virus through MHC class I and II restricted epitope prediction using PigMatrix (Gutiérrez et al.,
2016). The vaccinated pigs responded to virus re-stimulation, showing that the epitope-based vaccination gave rise to T cells that
were cross-reactive in vitro with epitopes present in the whole virus.
Epitope Predictions 15

Another important application of epitope predictions are personalized therapeutic vaccines with the aim to treat epithelial
cancer and melanoma as reviewed by Bobisse et al. (Bobisse et al., 2016). Typically, epitope predictions of mutated antigens
(neonantigens) involves NetMHC or NetMHCpan to identify high-affinity candidate neo-epitopes for in vitro validation before
stimulating and expanding a patient's tumor infiltrating lymphocytes ex vivo for use in adoptive cell therapy. Several clinical trials
are ongoing for example a phase I study on a personalized melanoma neoantigen cancer vaccine that started in 2013 (see relevant
websites). Although it is too early to pass judgement on the success or failure of the approach even negative data, provided they are
published and shared, can help to improve predictions, particularly prediction methods for class II restricted epitopes which
perform less accurately than for class I.
In a recent report by Anagnostou et al. (2017) epitope prediction had an essential role to elucidate the mechanism of resistance
to immune checkpoint blockade drugs in cancer patients. The study of neoantigen evolution during immune checkpoint blockade
in non-small cell lung cancer required to assess the immunogenicity of somatic mutations in tumors using wild-type and mutated
peptides. NetMHCpan was used to predict the class I binding potential of each peptide, and NetCTLpan to evaluate antigen
processing to classify epitopes and non-epitopes. Interestingly, candidate mutation-associated neoantigens with high HLA binding
affinity disappeared, probably driven by immune-mediated elimination of cancer cells, thereby depriving patients the means to
mount an effective functional immune response.
A lesser known application area of epitope prediction is graft-versus-host disease (GVHD) in organ transplantation research.
Monitoring GVHD onset and prevention depends is critically depended on the knowledge of minor histocompatbility antigen
(MiHA) match/mismatch between donor and recipient. Van Bergen and collaborators used NetCTLpan to evaluate amino acid
polymorphisms among 19 MiHAs for the potential to be targeted by alloreactive CD8 T cells in GVHD patients (van Bergen et al.,
2017). The approach yielded 13 new MiHA T cell epitopes.
Numerous other case studies ranging from allergen cross-reactivity, infectious diseases, autoimmunity to therapeutic proteins
could be listed here that demonstrate the applicability of current epitope prediction methods in basic, applied and clinical research
accompanied by savings in time and cost.

Results and discussion

One of the challenges in epitope prediction is the high polymorphism of MHC genes. Immuno Polymorphism Database IPD-
HLA/IMGT (release 3.29.0, 2017) contains 12,544 HLA class I alleles and 4622 HLA class II alleles (Robinson et al., 2014). In IEDB
only 272 (2.17%) HLA class I and 247 (5.34%) HLA class II allotypes are associated with experimental peptide binding or T cell
epitope data. Therefore allotype-specific prediction methods (e.g. QM-based methods) are not able to deal with MHC allotypes
that are uncharacterized with regard to peptide binding. The first method that satisfactorily addressed the issue for HLA-A and -B
allotypes was NetMHCpan (Nielsen et al., 2007; Zhang et al., 2011a,b,c) by combining all HLA sequence and peptide data as input
into an ANN to derive general features between HLA sequences and peptides and make inferences on binding affinities. Although
the initial method has been refined and overcame the problem of variation in peptide length (Lundegaard et al. 2008) and
potential multiple binding frames for MHC class II (NetMHCIIpan) it has been of limited use for various non-human primate,
bovine and swine MHC where dissimilarities among MHC sequences exceeded the threshold for reasonably accurate epitope
predictions. Considering the effects of MHC sequence and peptide data diversity and redundancy on the accuracy of epitope
predictions (Kim et al., 2014) Mattsson et al. proposed in 2016 an MHC-pan method improvement that relies on MHC class I
similarity redundancy reduction rather than peptide similarity by using pseudo-sequences derived from the binding cleft amino
acid environment of each MHC class I molecule in the input space (Mattsson et al., 2016). In principle, the method could be
extended to all MHC class II allotypes opening the possibility to integrate OptiType HLA-typing from next-generation sequencing
data (Szolek et al., 2014) with epitope prediction for patient-specific therapeutic vaccine or in transplantation.
Despite the limitations of QM-based methods compared to ANN-based methods, they still yield robust results as demonstrated
by De Groot and collaborators who have applied a genome-to-vaccine approach to design (not produce) an HLA class I and II
epitope-based vaccine for avian H7N9 influenza using EpiMatrix in 20 h (De Groot et al., 2013). The integration of EpiMatrix and
JanusMatrix (Moise et al., 2013), a prediction tool which allows to evaluate potential undesired cross-reactivity of predicted
epitopes with human proteins or commensal gut bacteria that may trigger autoimmune responses into iVax (Moise et al., 2015)
improves the selection of epitope candidates to be tested for a vaccine. Unfortunately iVax is not an unrestricted-access tool which
limits its spread and use, and may trigger future improvements of unrestricted accessible ANN-based tools emulating the
JanusMatrix concept.
The performance of B cell epitope predictions is limited by the lack of a sufficiently accurate propensity scales for sequence-
based methods and the scarcity of structural data of antigen-antibody complexes. For instance the training data sets of EPSVR and
Discotope 2.0 comprised only 48 and 75 complexes, respectively. Therefore it is not too surprising that experimental validations of
predicted B cell epitopes result in an average accuracy of 60% (Bergmann-Leitner et al., 2013). Since NMR or X-ray crystallographic
antigen-antibody and unbound structural data will increase only slowly, the development of new hybrid and meta-learning
methods incorporating qualitative and quantitative features extracted from data generated by experimental methods of lower cost
and effort such as phage-display library screening and "quality of antibody response" workflows that generate data of antibody
binding from surface plasmon resonance (Davidoff et al., 2015) and hydrogen deuterium exchange data combined with mass
spectrometry (Yang et al., 2016) are likely to improve the performance of B cell epitope predictions in the near future.
16 Epitope Predictions

Epitope prediction tools are an integral part of the reverse vaccinology approach, yet a straightforward performance assessment
similar to Critical Assessment of Structure Prediction (CASP) (Moult et al., 2014) has not been conducted so far. CASP has helped
to raise the accuracy of homology models and their acceptance as bona fide structural information source. In 2006 a NIAD B cell
epitope prediction tool workshop (Greenbaum et al., 2007) recommended the creation of annotated datasets and the develop-
ment of methods and metrics to assess the prediction tools. IEDB has been very successful in assembling richly annotated T and B
cell epitope data including negative non-epitope data. The latter were found to be biased towards short non-B-cell epitopes which
negatively affected the performance of B cell epitope predictions when included in testing data sets (Rahman et al., 2016) Yet, we
are still lacking a community-based consensus procedure and minimal standards to assess the prediction tools. The establishment
of a Critical Assessment of Epitope Predictions (CAEP) initiative is overdue to guide the improvement of predictions tools.

Future Directions

The success of epitope predictions particularly for HLA class I T cell epitopes has been largely driven by data and their level of
integration with the current state of knowledge of immunological processes and host-pathogen interactions. Knowledge gaps in
the details of immunological processes that define relations and differentiation dynamics of effector and memory T cell pools, and
central and peripheral tolerance represent just two bottlenecks among others that require many more experimental data to advance
predictive methods. At present we can identify immunogenic epitope candidates, but not optimal protective ones. For example,
more work and data on molecular structures, TCR and BCR repertoire data are needed to predict cross-reactive and -neutralizing
epitopes that will not be reduced in efficacy nor trigger autoimmunity when a pathogen is encountered two or more times.
Rational vaccine design and evaluation resembles in its temporal and mechanistic order of steps a biological pathway with
epitope prediction positioned fairly upstream. Efforts to improve connectivity and feedback among individual components including
epitope prediction may come to fruition on proof-of-concept level within in the next five to ten year if concerted and standardized
generation and collection of longitudinal data downstream of epitope predictions on gene and protein expression levels, protein-
protein interactions and pathways, pathogen and antigen variation is funded and performed. The Human Vaccines Project (Koff et al.,
2014) a nonprofit public-private project that aims to elucidate the molecular cellular principles of vaccine-induced immunity is a
promising high-impact initiative to enhance rational vaccine design. Translation into effective and affordable preventive vaccines for
complex targets such as tuberculosis or HIV and therapeutic cancer vaccines may take longer because research progress on the effects
and impact of natural genome and transcriptome variations in populations, and impact of environmental factors for example metals
and chemicals, on the development, functioning and ageing of the immune system is slower. Areas deserving more attention include
the development of algorithms to predict non-protein epitopes e.g. carbohydrates that are not unimportant for vaccines.

Closing Remarks

In the hierarchy of epitope prediction performance HLA class I binding predictions occupy the top position. Basically we can predict
with current systems for any classical HLA class I allotype T cell epitope candidates. Second ranked are MHC class I-restricted T cell
epitope predictions for chimpanzee, macaque, cow and swine. Therefore the development and actual use of epitope-based veterinary
vaccines might precede the ones for human vaccines. Third ranked are predictions of HLA class II DR- and DQ-restricted epitope
candidates. There is still room for significant improvements with regard to correctly identifying the core binding residues and increasing
the number of experimental data for less studied allotypes, especially for HLA-DP. At the bottom of the performance hierarchy are B
cell epitope predictions. This fact is expected to incentivize the development of both new experimental and data-driven computational
methods that are anticipated to lift the accuracy and efficacy of B cell epitope predictions to acceptable levels until 2025.

Acknowledgement

R.K. acknowledges the award of an IRCMS Internship by International Research Center for Medical Sciences, Kumamoto Uni-
versity. C.S. acknowledges the support for the work on epitope predictions by Kumamoto University, International Research
Center for Medical Sciences (#005–5700101122).

References

Adams, H.-P., Koziol, J.A., 1995. Prediction of binding to MHC class I molecules. Journal of Immunological Methods 185, 181–190.
Anagnostou, V., Smith, K.N., Forde, P.M., et al., 2017. Evolution of neoantigen landscape during immune checkpoint blockade in non-small cell lung cancer. Cancer Discovery
7, 264–276.
Andreatta, M., Karosiene, E., Rasmussen, M., et al., 2015. Accurate pan-specific prediction of peptide-MHC class II binding affinity with improved binding core identification.
Immunogenetics 67, 641–650.
Andreatta, M., Nielsen, M., 2015. Gapped sequence alignment using artificial neural networks: Application to the MHC class I system. Bioinformatics 32, 511–517.
Ansari, H.R., Raghava, G.P., 2010. Identification of conformational B-cell epitopes in an antigen from its primary sequence. Immunome Research 6, 6.
Epitope Predictions 17

Antonets, D., Maksyutov, A., 2010. TEpredict: Software for T-cell epitope prediction. Molecular Biology 44, 119–127.
Atanasova, M., Patronov, A., Dimitrov, I., Flower, D.R., Doytchinova, I., 2013. EpiDOCK: A molecular docking-based tool for MHC class II binding prediction. Protein
Engineering, Design & Selection 26, 631–634.
Barlow, D., Edwards, M., Thornton, J., 1986. Continuous and discontinuous protein antigenic determinants. Nature 322, 747–748.
Bergmann-Leitner, E.S., Chaudhury, S., Steers, N.J., et al., 2013. Computational and experimental validation of B and T-cell epitopes of the in vivo immune response to a
novel malarial antigen. PloS One 8, e71610.
Bhasin, M., Lata, S., Raghava, G., 2007. TAPPred prediction of TAP-binding peptides in antigens. Immunoinformatics: Predicting Immunogenicity In Silico. 381–386.
Bhasin, M., Raghava, G., 2003. Prediction of promiscuous and high-affinity mutated MHC binders. Hybridoma and Hybridomics 22, 229–234.
Bhasin, M., Raghava, G., 2004a. Prediction of CTL epitopes using QM, SVM and ANN techniques. Vaccine 22, 3195–3204.
Bhasin, M., Raghava, G., 2004b. SVM based method for predicting HLA-DRB1* 0401 binding peptides in an antigen sequence. Bioinformatics 20, 421–423.
Bhasin, M., Raghava, G., 2005. Pcleavage: An SVM based method for prediction of constitutive proteasome and immunoproteasome cleavage sites in antigenic sequences.
Nucleic Acids Research 33, W202–W207.
Bhasin, M., Raghava, G., 2007. A hybrid approach for predicting promiscuous MHC class I restricted T cell epitopes. Journal of Biosciences 32, 31–42.
Blum, J.S., Wearsch, P.A., Creswell, P., 2013. Pathways of antigen processing. Annual Review of immunology 31, 443–473.
Blythe, M.J., Flower, D.R., 2005. Benchmarking B cell epitope prediction: Underperformance of existing methods. Protein Science 14, 246–248.
Bobisse, S., Foukas, P.G., Coukos, G., Harari, A., 2016. Neoantigen-based cancer immunotherapy. Annals of Translational Medicine 4, 262.
Brusic, V., Petrovsky, N., Zhang, G., Bajic, V.B., 2002. Prediction of promiscuous peptides that bind HLA class I molecules. Immunology and Cell Biology 80, 280.
Chen, J., Liu, H., Yang, J., Chou, K.-C., 2007. Prediction of linear B-cell epitopes using amino acid pair antigenicity scale. Amino acids 33, 423–428.
Dalkas, G.A., Rooman, M., 2017. SEPIa, a knowledge-driven algorithm for predicting conformational B-cell epitopes from the amino acid sequence. BMC Bioinformatics 18, 95.
Davidoff, S.N., Ditto, N.T., Brooks, A.E., Eckman, J., Brooks, B.D., 2015. Surface plasmon resonance for therapeutic antibody characterization. Label-Free Biosensor Methods in
Drug Discovery. 35–76.
De Groot, A.S., Einck, L., Moise, L., et al., 2013. Making vaccines “on demand” A potential solution for emerging pathogens and biodefense? Human Vaccines and
Immunotherapeutics 9, 1877–1884.
De Groot, A.S., Sbai, H., Saint Aubin, C., et al., 2002. Immuno-informatics: Mining genomes for vaccine components. Immunology and Cell Biology 80, 255.
Dhanda, S.K., Gupta, S., Vir, P., Raghava, G., 2013a. Prediction of IL4 inducing peptides. Clinical and Developmental Immunology 2013, 263952.
Dhanda, S.K., Vir, P., Raghava, G.P., 2013b. Designing of interferon-gamma inducing MHC class-II binders. Biology Direct 8, 30.
Diez‐Rivero, C.M., Chenlo, B., Zuluaga, P., Reche, P.A., 2010. Quantitative modeling of peptide binding to TAP using support vector machine. Proteins: Structure, Function,
and Bioinformatics 78, 63–72.
Dimitrov, I., Garnev, P., Flower, D.R., Doytchinova, I., 2010a. EpiTOP – a proteochemometric tool for MHC class II binding prediction. Bioinformatics 26, 2066–2068.
Dimitrov, I., Garnev, P., Flower, D.R., Doytchinova, I., 2010b. Peptide binding to the HLA-DRB1 supertype: A proteochemometrics analysis. European Journal of Medicinal
Chemistry 45, 236–243.
Dönnes, P., Kohlbacher, O., 2006. SVMHC: A server for prediction of MHC-binding peptides. Nucleic Acids Research 34, W194–W197.
Doytchinova, I.A., Flower, D.R., 2002. Physicochemical explanation of peptide binding to HLA‐A* 0201 major histocompatibility complex: A three‐dimensional quantitative
structure‐activity relationship study. Proteins: Structure, Function, and Bioinformatics 48, 505–518.
Frank, S.A., 2002. Immunology and Evolution of Infectious Disease. Princeton, NJ: Princeton University Press.
Greenbaum, J.A., Andersen, P.H., Blythe, M., et al., 2007. Towards a consensus on datasets and evaluation metrics for developing B‐cell epitope prediction tools. Journal of
Molecular Recognition 20, 75–82.
Grotzke, J.E., Sengupta, D., Lu, Q., Cresswell, P., 2017. The ongoing saga of the mechanism (s) of MHC class I-restricted cross-presentation. Current Opinion in Immunology
46, 89–96.
Guan, P., Hattotuwagama, C.K., Doytchinova, I.A., Flower, D.R., 2006. MHCPred 2.0. Applied Bioinformatics 5, 55–61.
Guo, L., Luo, C., Zhu, S., 2013. MHC2SKpan: A novel kernel based approach for pan-specific MHC class II peptide binding prediction. BMC Genomics 14, S11.
Gupta, S., Ansari, H.R., Gautam, A., Raghava, G.P., 2013. Identification of B-cell epitopes in an antigen for inducing specific class of antibodies. Biology Direct 8, 27.
Gutiérrez, A.H., Loving, C., Moise, L., et al., 2016. In vivo validation of predicted and conserved T cell epitopes in a swine influenza model. PlOS One 11, e0159237.
Hattotuwagama, C.K., Doytchinova, I.A., Flower, D.R., 2007. Toward the prediction of class I and II mouse major histocompatibility complex-peptide-binding affinity: In silico
bioinformatic step-by-step guide using quantitative structure-activity relationships. Immunoinformatics: Predicting Immunogenicity In Silico. 227–245.
Honeyman, M.C., Brusic, V., Stone, N.L., Harrison, L.C., 1998. Neural network-based prediction of candidate T-cell epitopes. Nature Biotechnology 16, 966–969.
Huang, Y.X., Bao, Y.L., Guo, S.Y., et al., 2008. Pep-3D-Search: A method for B-cell epitope prediction based on mimotope analysis. BMC Bioinformatics 9, 538.
Jacob, L., Vert, J.-P., 2007. Efficient peptide-MHC-I binding prediction for alleles with few known binders. Bioinformatics 24, 358–366.
Jespersen, M.C., Peters, B., Nielsen, M., Marcatili, P., 2017. BepiPred-2.0: Improving sequence-based B-cell epitope prediction using conformational epitopes. Nucleic Acids
Research 45, W24–W29.
Karosiene, E., Lundegaard, C., Lund, O., Nielsen, M., 2012. NetMHCcons: A consensus method for the major histocompatibility complex class I predictions. Immunogenetics
64, 177–186.
Kim, Y., Sidney, J., Buus, S., et al., 2014. Dataset size and composition impact the reliability of performance benchmarks for peptide-MHC binding predictions. BMC
Bioinformatics 15, 241.
Koff, W.C., Gust, I.D., Plotkin, S.A., 2014. Toward a human vaccines project. Nature Immunology 15, 589–592.
Kringelum, J.V., Lundegaard, C., Lund, O., Nielsen, M., 2012. Reliable B cell epitope predictions: Impacts of method development and improved benchmarking. PLOS
Computational Biology 8, e1002829.
Kulkarni-Kale, U., Bhosle, S., Kolaskar, A.S., 2005. CEP: A conformational epitope prediction server. Nucleic Acids Research 33, W168–W171.
Larsen, J.E., Lund, O., Nielsen, M., 2006. Improved method for predicting linear B-cell epitopes. Immunome Research 2, 2.
Larsen, M.V., Lundegaard, C., Lamberth, K., et al., 2007. Large-scale validation of methods for cytotoxic T-lymphocyte epitope prediction. BMC Bioinformatics 8, 424.
Lata, S., Bhasin, M., Raghava, G.P., 2007. Application of machine learning techniques in predicting MHC binders. Methods in Molecular Biology 409, 201–215.
Lefranc, M.-P., Giudicelli, V., Duroux, P., et al., 2014. IMGTs, the international ImMunoGeneTics information systems 25 years on. Nucleic Acids Research 43, D413–D422.
Liang, S., Zheng, D., Standley, D.M., et al., 2010. EPSVR and EPMeta: Prediction of antigenic epitopes using support vector regression and multiple server results. BMC
Bioinformatics 11, 381.
Liang, S., Zheng, D., Zhang, C., Zacharias, M., 2009. Prediction of antigenic epitopes on protein surfaces by consensus scoring. BMC Bioinformatics 10, 302.
Litman, G.W., Rast, J.P., Fugmann, S.D., 2010. The origins of vertebrate adaptive immunity. Nature Reviews. Immunology 10, 543.
Liu, G., Li, D., Li, Z., et al., 2017. PSSMHCpan: A novel PSSM-based software for predicting class I peptide-HLA binding affinity. Giga Science 6, 1–11.
Logean, A., Rognan, D., 2002. Recovery of known T-cell epitopes by computational scanning of a viral genome. Journal of Computer-aided Molecular Design 16, 229–243.
Lundegaard, C., Lund, O., Nielsen, M., 2008. Accurate approximation method for prediction of class I MHC affinities for peptides of length 8, 10 and 11 using prediction tools
trained on 9mers. Bioinformatics 24, 1397–1398.
Mamitsuka, H., 1998. Predicting peptides that bind to MHC molecules using supervised learning of hidden Markov models. Proteins Structure Function and Genetics 33,
460–474.
18 Epitope Predictions

Mattsson, A.H., Kringelum, J.V., Garde, C., Nielsen, M., 2016. Improved pan‐specific prediction of MHC class I peptide binding using a novel receptor clustering data
partitioning strategy. HLA 88, 287–292.
Moise, L., Gutierrez, A., Kibria, F., et al., 2015. iVAX: An integrated toolkit for the selection and optimization of antigens and the design of epitope-driven vaccines. Human
Vaccines and Immunotherapeutics 11, 2312–2321.
Moise, L., Gutierrez, A.H., Bailey-Kellogg, C., et al., 2013. The two-faced T cell epitope: Examining the host-microbe interface with JanusMatrix. Human Vaccines and
Immunotherapeutics 9, 1577–1586.
Moreau, V., Granier, C., Villard, S., Laune, D., Molina, F., 2006. Discontinuous epitope prediction based on mimotope analysis. Bioinformatics 22, 1088–1095.
Moult, J., Fidelis, K., Kryshtafovych, A., Schwede, T., Tramontano, A., 2014. Critical assessment of methods of protein structure prediction (CASP) – round x. Proteins:
Structure, Function, and Bioinformatics 82, 1–6.
Moutaftsi, M., Peters, B., Pasquetto, V., et al., 2006. A consensus epitope prediction approach identifies the breadth of murine TCD8 þ -cell responses to vaccinia virus. Nature
Biotechnology 24, 817.
Mukherjee, S., Bhattacharyya, C., Chandra, N., 2016. HLaffy: Estimating peptide affinities for Class-1 HLA molecules by learning position-specific pair potentials. Bioinformatics
32, 2297–2305.
Negi, S.S., Braun, W., 2009. Automated detection of conformational epitopes using phage display peptide sequences. Bioinformatics and Biology Insights 3, 71.
Neu, K.E., Wilson, P.C., 2016. Taking the broad view on B cell affinity maturation. Immunity 44, 518–520.
Nielsen, M., Andreatta, M., 2016. NetMHCpan-3.0; improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length
datasets. Genome Medicine 8, 33.
Nielsen, M., Lund, O., 2009. NN-align. An artificial neural network-based alignment algorithm for MHC class II peptide binding prediction. BMC Bioinformatics 10, 296.
Nielsen, M., Lundegaard, C., Blicher, T., et al., 2007. NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and-B locus protein of known
sequence. PlOS One 2, e796.
Nielsen, M., Lundegaard, C., Lund, O., Keşmir, C., 2005. The role of the proteasome in generating cytotoxic T-cell epitopes: Insights obtained from improved predictions of
proteasomal cleavage. Immunogenetics 57, 33–41.
Ogishi, M., Yotsuyanagi, H., 2017. Epitope immunogenicity prediction through repertoire-wide TCR-peptide contact profiles. bioRxiv 155317.
Oyarzún, P., Ellis, J.J., Bodén, M., Kobe, B., 2013. PREDIVAC: CD4 þ T-cell epitope prediction for vaccine design that covers 95% of HLA class II DR protein diversity. BMC
Bioinformatics 14, 52.
Parker, K.C., Bednarek, M.A., Coligan, J.E., 1994. Scheme for ranking potential HLA-A2 binding peptides based on independent binding of individual peptide side-chains. The
Journal of Immunology 152, 163–175.
Patronov, A., Doytchinova, I., 2013. T-cell epitope vaccine design by immunoinformatics. Open Biology 3, 120139.
Paul, S., Sidney, J., Sette, A., Peters, B., 2016. TepiTool: A pipeline for computational prediction of T cell epitope candidates. Current Protocols in Immunology 114, 18.19.1.
Ponomarenko, J., Bui, H.-H., Li, W., et al., 2008. ElliPro: A new structure-based tool for the prediction of antibody epitopes. BMC Bioinformatics 9, 514.
Potocnakova, L., Bhide, M., Pulzova, L.B., 2016. An Introduction to B-cell epitope mapping and in silico epitope prediction. Journal of Immunology Research 2016, 6760830.
Qi, T., Qiu, T., Zhang, Q., et al., 2014. SEPPA 2.0 – more refined server to predict spatial epitope considering species of immune host and subcellular localization of protein
antigen. Nucleic Acids Research 42, W59–W63.
Querec, T.D., Akondy, R.S., Lee, E.K., et al., 2009. Systems biology approach predicts immunogenicity of the yellow fever vaccine in humans. Nature Immunology 10,
116–125.
Rahman, K.S., Chowdhury, E.U., Sachse, K., Kaltenboeck, B., 2016. Inadequate reference datasets biased toward short non-epitopes confound B-cell epitope prediction. The
Journal of Biological Chemistry 29, 14585–14599.
Rammensee, H.-G., Bachmann, J., Emeerich, N.P.N., Bachor, O.A., Stevanović, S., 1999. SYFPEITHI: Database for MHC ligands and peptide motifs. Immunogenetics 50,
213–219.
Rappuoli, R., Aderem, A., 2011. A 2020 vision for vaccines against HIV, tuberculosis and malaria. Nature 473, 463.
Reche, P.A., Glutting, J.-P., Zhang, H., Reinherz, E.L., 2004. Enhancement to the RANKPEP resource for the prediction of peptide binding to MHC molecules using profiles.
Immunogenetics 56, 405–419.
Robinson, J., Halliwell, J.A., Hayhurst, J.D., et al., 2014. The IPD and IMGT/HLA database: Allele variant databases. Nucleic Acids Research 43, D423–D431.
Rubinstein, N.D., Mayrose, I., Martz, E., Pupko, T., 2009. Epitopia: A web-server for predicting B-cell epitopes. BMC Bioinformatics 10, 287.
Saha, S., Bhasin, M., Raghava, G.P., 2005. Bcipep: A database of B-cell epitopes. BMC Genomics 6, 79.
Saha, S., Raghava, G., 2006. Prediction of continuous B‐cell epitopes in an antigen using recurrent neural network. Proteins: Structure, Function, and Bioinformatics 65, 40–48.
Saha, S., Raghava, G.P.S., 2004. BcePred: Prediction of continuous B-Cell epitopes in antigenic sequences using physico-chemical Properties. In: Nicosia, G., Cutello, V.,
Bentley, P.J., Timmis, J. (Eds.), ICARIS. Berlin: Springer, pp. 197–204.
Sammut, C., Webb, G.I., 2010. Leave-One-Out Cross-Validation. In: Sammut, C., Webb, G.I. (Eds.), Encyclopedia of Machine Learning, 2nd edn. Boston: Springer US.
Schafer, J.R.A., Jesdale, B.M., George, J.A., Kouttab, N.M., De Groot, A.S., 1998. Prediction of well-conserved HIV-1 ligands using a matrix-based algorithm, EpiMatrix.
Vaccine 16, 1880–1884.
Schönbach, C., Kun, Y., Brusic, V., 2002. Large-scale computational identification of HIV T-cell epitopes. Immunology and Cell Biology 80, 300.
Schroeder, H.W., Cavacini, L., 2010. Structure and function of immunoglobulins. Journal of Allergy and Clinical Immunology 125, S41–S52.
Schubert, B., Walzer, M., Brachvogel, H.-P., et al., 2016. FRED 2: An immunoinformatics framework for Python. Bioinformatics 32, 2044–2046.
Schueler-Furman, O., Altuvia, Y., Sette, A., Margalit, H., 2000. Structure-based prediction of binding peptides to MHC class I molecules: Application to a broad range of MHC
alleles. Protein Science 9, 1838–1846.
Sela-Culang, I., Kunik, V., Ofran, Y., 2013. The structural basis of antibody-antigen recognition. Frontiers in Immunology 4, 302.
Sette, A., Rappuoli, R., 2010. Reverse vaccinology: Developing vaccines in the era of genomics. Immunity 33, 530–541.
Singh, H., Ansari, H.R., Raghava, G.P., 2013. Improved method for linear B-cell epitope prediction using antigen’s primary sequence. PLOS One 8, e62216.
Singh, H., Raghava, G., 2001. ProPred: Prediction of HLA-DR binding sites. Bioinformatics 17, 1236–1237.
Singh, H., Raghava, G., 2003. ProPred1: Prediction of promiscuous MHC Class-I binding sites. Bioinformatics 19, 1009–1014.
Soria-Guerra, R.E., Nieto-Gomez, R., Govea-Alonso, D.O., Rosales-Mendoza, S., 2015. An overview of bioinformatics tools for epitope prediction: Implications on vaccine
development. Journal of Biomedical Informatics 53, 405–414.
Stranzl, T., Larsen, M.V., Lundegaard, C., Nielsen, M., 2010. NetCTLpan: Pan-specific MHC class I pathway epitope predictions. Immunogenetics 62, 357–368.
Sun, P., Ju, H., Liu, Z., et al., 2013. Bioinformatics resources and tools for conformational B-cell epitope prediction. Computational and Mathematical Methods in Medicine.
2013), 943636.
Sun, P., Qi, J., Zhao, Y., et al., 2016. A novel conformational B-cell epitope prediction method based on mimotope and patch analysis. Journal of Theoretical Biology 394,
102–108.
Sweredoski, M.J., Baldi, P., 2008a. COBEpro: A novel system for predicting continuous B-cell epitopes. Protein Engineering, Design and Selection 22, 113–120.
Sweredoski, M.J., Baldi, P., 2008b. PEPITO: Improved discontinuous B-cell epitope prediction using multiple distance thresholds and half sphere exposure. Bioinformatics 24,
1459–1460.
Szolek, A., Schubert, B., Mohr, C., et al., 2014. OptiType: Precision HLA typing from next-generation sequencing data. Bioinformatics 30, 3310–3316.
Taylor, W., Thornton, J.T., Turnell, W., 1983. An ellipsoidal approximation of protein shape. Journal of Molecular Graphics 1, 30–38.
Epitope Predictions 19

Thornton, J., Edwards, M., Taylor, W., Barlow, D., 1986. Location of 'continuous' antigenic determinants in the protruding regions of proteins. The EMBO Journal 5, 409.
Trolle, T., Nielsen, M., 2014. NetTepi: An integrated method for the prediction of T cell epitopes. Immunogenetics 66, 449–456.
van Bergen, C.A., Van Luxemburg-Heijs, S.A., De Wreede, L.C., et al., 2017. Selective graft-versus-leukemia depends on magnitude and diversity of the alloreactive T cell
response. The Journal of Clinical Investigation 127, 517.
Vita, R., Overton, J.A., Greenbaum, J.A., et al., 2014. The immune epitope database (IEDB) 3.0. Nucleic Acids Research 43, D405–D412.
Xu, Y., Luo, C., Mamitsuka, H., Zhu, S., 2016. MetaMHCpan, a meta approach for pan-specific MHC peptide binding prediction. Vaccine Design: Methods and Protocols,
Volume 2: Vaccines for Veterinary Diseases. 753–760.
Yang, D., Frego, L., Lasaro, M., et al., 2016. Efficient qualitative and quantitative determination of antigen-induced immune responses. The Journal of Biological Chemistry 291,
16361–16374.
Yang, X., Yu, X., 2009. An introduction to epitope prediction methods and software. Reviews in Medical Virology 19, 77–96.
Yao, B., Zheng, D., Liang, S., Zhang, C., 2013. Conformational B-cell epitope prediction on antigen protein structures: A review of current algorithms and comparison with
common binding site prediction methods. PLOS One 8, e62249.
Yao, B., Zhang, L., Liang, S., Zhang, C., 2012. SVMTriP: A method to predict antigenic epitopes using support vector machine to integrate tri-peptide similarity and propensity.
PlOS One 7, e45152.
Yasser, E.-M., Honavar, V., 2010. Recent advances in B-cell epitope prediction methods. Immunome Research 6, S2.
Yewdell, J.W., Bennink, J.R., 1999. Immunodominance in major histocompatibility complex class I-restricted T lymphocyte responses. Annual Review of Immunology 17,
51–88.
Zhang, C., Bickis, M.G., Wu, F.-X., Kusalik, A.J., 2006. Optimally-connected hidden markov models for predicting MHC-binding peptides. Journal of Bioinformatics and
Computational Biology 4, 959–980.
Zhang, G.L., Bozic, I., Kwoh, C.K., August, J.T., Brusic, V., 2007. Prediction of supertype-specific HLA class I binding peptides using support vector machines. Journal of
Immunological Methods 320, 143–154.
Zhang, G.L., Deluca, D.S., Keskin, D.B., et al., 2011a. MULTIPRED2: A computational system for large-scale identification of peptides predicted to bind to HLA supertypes and
alleles. Journal of Immunological Methods 374, 53–61.
Zhang, H., Lund, O., Nielsen, M., 2009. The PickPocket method for predicting binding specificities for receptors based on receptor pocket similarities: Application to MHC-
peptide binding. Bioinformatics 25, 1293–1299.
Zhang, L., Chen, Y., Wong, H.-S., et al., 2012. TEPITOPEpan: Extending TEPITOPE for peptide binding prediction covering over 700 HLA-DR molecules. PLOS One 7, e30483.
Zhang, L., Udaka, K., Mamitsuka, H., Zhu, S., 2011b. Toward more accurate pan-specific MHC-peptide binding prediction: A review of current methods and tools. Briefings in
Bioinformatics 13, 350–364.
Zhang, W., Xiong, Y., Zhao, M., et al., 2011c. Prediction of conformational B-cell epitopes from 3D structures by random forests with a distance-based feature. BMC
Bioinformatics 12, 341.

Further Reading
Belden, O.S., Baker, S.C., Baker, B.M., 2015. Citizens unite for computational immunology!. Trends in Immunology 36, 385–387.
Brusic, V., Gottardo, R., Kleinstein, S.H., Davis, M.M., 2014. Computational resources for high-dimensional immune analysis from the Human Immunology Project Consortium.
Nature Biotechnology 32, 146–148.
De, R.K., Tomar, N., 2014. Immunoinformatics. [eds.] In: Walker, (Ed.), Methods in Molecular Biology, second ed., 1184. New York: Humana Press.
De Gregorio, E., Rappuoli, R., 2014. From empiricism to rational design: A personal perspective of the evolution of vaccine development. Nature Reviews. Immunology 14, 505.
Ditto, N.T., Brooks, B.D., 2016. The emerging role of biosensor-based epitope binning and mapping in antibody-based drug discovery. Expert Opinion on Drug Discovery 11,
925–937.
Fleri, W., Paul, S., Dhanda, S.K., et al., 2017. The immune epitope database and analysis resource in epitope discovery and synthetic vaccine design. Frontiers in Immunology
8, 278.
He, L., Zhu, J., 2015. Computational tools for epitope vaccine design and evaluation. Current Opinion in Virology 11, 103–112.
Liljeroos, L., Malito, E., Ferlenghi, I., Bottomley, M.J., 2015. Structural and computational biology in the design of immunogenic vaccine antigens. Journal of Immunology
Research 2015, 156241.
Scheuermann, R.H., Sinkovits, R.S., Schenkelberg, T., Koff, W.C., 2017. A bioinformatics roadmap for the human vaccines project. Expert Review of Vaccines 16, 535–544.
Sette, A., Peters, B., 2007. Immune epitope mapping in the post-genomic era: Lessons for vaccine development. Current Opinion in Immunology 19, 106–110.

Relevant Websites

https://clinicaltrials.gov/ct2/show/NCT01970358
A phase I study with a personalized neoantigen cancer vaccine in melanoma.
https://www.immunespace.org/
Enabling integrative modelling of human immunological data. The Human Immunology Project Consortium.
http://www.iedb.org/
Immune Epitope Database Analysis Resource.
http://www.imgt.org
IMGTs, the international ImMunoGeneTics information systems.
http://www.humanvaccinesproject.org
The Human Vaccines Project.