1 s2.0 S0022283618301700 Main PDF

Article
KDC YJMBI-65651; No. of pages: 12; 4C:
Rationalizing Drug Response in

Cancer Cell Lines
Teresa Juan-Blanco 1 , Miquel Duran-Frigola 1 and Patrick Aloy 1, 2

1 - Joint IRB-BSC-CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona),
The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
2 - Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Catalonia, Spain
Correspondence to Patrick Aloy: IRB-BSC-CRG Program in Computational Biology, Institute for Research in
Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain.
patrick.aloy@irbbarcelona.org
https://doi.org/10.1016/j.jmb.2018.03.021
Edited by Juan Fuxman
Abstract
Cancer cell lines (CCLs) play an important role in the initial stages of drug discovery allowing, among others,
for the screening of drug candidates. As CCL panels continue to grow in size and diversity, many
polymorphisms in genes encoding drug-metabolizing enzymes, transporters and drug targets, as well as
disease-related genes have been linked to altered drug sensitivity. However, identifying the correlation
between this variability and pharmacological responses remains challenging due to the heterogeneity of
cancer biology and the intricate interplay between cell lines and drug molecules. Here, we propose a network-
based strategy that exploits information on gene expression and somatic mutations of CCLs to group cells
according to their molecular similarity. We then identify genes that are characteristic of each cluster and
correlate their status with drug response. We find that CCLs with similar characteristic active network regions
present specific responses to certain drugs, and identify a limited set of genes that might be directly involved in
drug sensitivity or resistance.
© 2018 Elsevier Ltd. All rights reserved.
Introduction To overcome these limitations, CCL panels

continue to improve in size and in-depth character-
Cancer cell lines (CCLs) are one of the most widely ization. In 1990, the NCI-60 was launched, contain-
used experimental models to understand cancer ing only 59 cell lines representing 9 cancer types [9].
biology, as well as to test the efficacy of novel Modern panels such as the pioneering Cancer Cell
anticancer therapies [1]. A large number and variety Line Encyclopedia (CCLE) offer a comprehensive
of CCLs exist, and their cost-effectiveness and collection of gene expression, chromosome copy
unlimited auto-replicative nature facilitate the fast number and sequencing data for about one thousand
acquisition of results [2]. However, the clinical CCLs [10]. Currently, efforts are put into increasing the
relevance of CCLs remains controversial. First, number of drugs that are screened against these vast
CCLs present technical restrains such as cross- collections, and to map clinical data on their molecular
contamination with other cell lines or genomic profiles [11].
instability produced by culture conditions [3]. Second, The large volume of molecular data accumulated
and more important, CCLs often do not represent allows for the systematic study of drug response
primary tumors [4,5], which may lead to misleading in CCLs. Many polymorphisms in genes encoding
conclusions and difficulties to translate the outcomes drug-metabolizing enzymes, transporters and drug
to in vivo models. For example, two-dimensional targets, as well as disease-related genes have been
monolayer cultures are more sensitive to cytotoxic proposed as determinants of drug sensitivity [12].
agents, and therefore, sensitivity results are over- However, true mechanistic understanding of drug
optimistic [6–8]. action remains challenging. One of the main difficulties
0022-2836/© 2018 Elsevier Ltd. All rights reserved. J Mol Biol (2018) xx, xxxxxx
Please cite this article as: T. Juan-Blanco, et al., Rationalizing Drug Response in Cancer Cell Lines, J. Mol. Biol. (2018), https://doi.
org/10.1016/j.jmb.2018.03.021
2 Rationalizing Drug Response in Cancer Cell Lines
is that differential pharmacological profiles are caused Results and Discussion

by the interplay of multiple genes [13], and other factors
such as the tissue of origin may confound the analysis
and translation to the clinics [3,14]. Often, there is Stratifying CCLs according to their molecular
no correlation between the expression of the intended profiles
pharmacological targets and the observed drug
activity [14,15]. All the above suggest that systems To describe molecular variability in CCLs, we
biology approaches may help elucidating the exploited the mutational and transcriptional land-
complexity of drug response [16]. So far, and scapes provided by the CCLE. This panel contains
given the significant amount of data collected, the somatic mutations and indels for 1651 genes, and
most successful strategies to predict drug activity are expression data for 18,987 genes. We kept for
based on supervised machine learning approaches analysis the 448 CCLs, representing 23 tissues, for
[17,18]. However, although they yield promising which somatic mutations, transcriptional data and
results in terms of accuracy of the predictions, these pharmacological profiles are available. Differential
methods are largely unable to pinpoint the biological gene expression analysis was done by comparing
players (i.e., over/under-expression of certain genes, basal expression of CCLs to the rest of the panel
mutations, etc.) responsible for the observed drug (see Methods).
effects. Figure 1 summarizes the strategy implemented and
Here, we describe a systems biology, cell-centered content of our data set. The most represented tissues
approach to characterize drug response in CCL in the data set are lung and hematopoietic and
panels. Instead of grouping CCLs according to their lymphoid cells. The number of somatic mutations per
drug sensitivity, we exploited molecular data provided CCL is, on average, 10 times smaller than the number
by the CCLE, together with a protein interaction of differentially expressed genes.
network, to bring together CCLs with similar molecular To integrate the CCL molecular profiles and to
profiles. Only then we correlate these molecular determine their area of influence, we applied the
profiles to differential drug response and propose network-based stratification (NBS) method [19]. In
potential gene expression signatures that are brief, NBS combines network propagation with clus-
beneficial for predicting drug sensitivity/resistance tering analysis to identify groups of CCLs whose
in CCLs. somatic mutations and other molecular signals affect
(a) Network Propagation (b)

1500
# Dif exp genes
Somatic Mutations 1000

1651 genes
Interaction network
12486 proteins 500
Gene expression
18987 genes 62387 interactions
1400
448 CCLs
# Mutated genes
1200
1000
800
CCL Clusters 600

400
●
200
**
0
●
**
●
**
●
●
*
80
●
● ● ●
●
● ●
● ●
*
●
60
# CCLs
40
20
Map drug sensitivity Pharmacological profiles 0
Skin
Thyroid
Lung
Liver
Endometrium
Bone
UAT
Pleura
H&L
Ovary
Prostate
Kidney
Stomach
Breast
CNS
Soft tissue
Urinary tract
Bilary tract
Autonomic ganglia
Large intestine
Pancreas
on clusters
Salivary gland
Oesophagus
24 antineoplastic drugs
Tumor type
Fig. 1. Summary of the methodology and data set. (a) Somatic mutations and gene expression data from the CCLE was
integrated with PPI data through the network propagation algorithm. Then, CCLs were clustered and drug sensitivity to 24
antineoplastic drugs was mapped onto them. (b) Distribution of the number of differentially expressed and mutated genes
among the 448 CCLs grouped by tumor type and number of CCls per tumor type. CNS stands for central nervous system;
H&L, hematopoietic and lymphoid; and UAT, urinary aerodigestive tract.
org/10.1016/j.jmb.2018.03.021
Rationalizing Drug Response in Cancer Cell Lines 3
similar areas of the interactome. We used an in-house proximity to the altered genes. Then, we clustered
binary protein–protein interaction (PPI) network gen- the resulting network-smoothed profiles into an
erated by merging major public PPI repositories [20]. In optimal number of groups and used the tissue of
total, the PPI network used in this study contains origin as a proxy to assess the biological relevance
12,486 proteins and 62,387 interactions. of the clusters [21]. The results are displayed in the
The NBS protocol requires that both mutation and co-clustering networks in Fig. 2a and b. As evident,
expression are expressed as binary features (see gene expression and somatic mutation data do not
Methods). The network propagation step smoothens cluster cell lines in the same way. Differential
these profiles so that they are compliant with expression clusters are able to recapitulate the
the network architecture, leading to a molecular tissue of origin of the cell lines (e.g., hematopoietic
signature that is no longer binary but captures the and lymphoid cells form a well-distinguished group),
(a) Differential expression co-clustering network (b) Mutations co-clustering network
Color legend
(c) Merged co-clustering network Skin
Thyroid
Lung
Liver
Endometrium
Bone
UAT
Pleura
H&L
Ovary
Prostate
Kidney
Stomach
Breast
CNS
Soft tissue
Urinary tract
Bilary tract
Autonomic ganglia
Large intestine
Pancreas
Salivary gland
Oesophagus
(d) 1.0
% co-clustering
0.8
0.6
0.4
Dif. exp Merged Mutations

(e) 1.0
0.8
Dif. exp vs. merged
NMI
0.6 Dif. exp vs. mutation

0.4 Mutation vs. merged
0.2
0.0
20 40 60 80 100
# Clusters
Fig. 2. Co-clustering network using differential expression (a), somatic mutations (b) and the merged data set (c) after
applying NR-NMF onto the network-smoothed matrix for a number of clusters from 3 to 100. Nodes symbolize the CCLs.
Edges connect two CCLs if they cluster together at least in 30% of the clustering results. Node colors represent the tissue
of origin. Edge transparency is proportional to the co-clustering score. (d) Co-clustering scores distribution for differential
expression, mutation and merged data set. (e) Normalized mutual information between the clusters results between the
three data sets. CNS stands for central nervous system; H&L, hematopoietic and lymphoid; and UAT, urinary
aerodigestive tract.
org/10.1016/j.jmb.2018.03.021
17.AAG F = 3.46 p−value = 2.25e−05 AEW541 F = 6.71 p−value = 2.03e−12 AZD0530 F = 4.03 p−value = 1.44e−06 AZD6244 F = 13.32 p−value = 2.76e−26
* ** **
6 4 6
** 3 **
* 3
**
4
* 4 *
* * * * *
ActA
ActA
ActA
ActA
2
2
*
2 **
2 1
1
0 0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Cluster Cluster Cluster Cluster
Erlotinib F = 5.5 p−value = 9.15e−10 Irinotecan F = 9.84 p−value = 1.34e−17 L.685458 F = 13.53 p−value = 1.43e−26 Lapatinib F = 6.69 p−value = 2.17e−12
6
** ** **
4 4
* 3 * *
3
* ** 3
4
* ** **
* 2 *
ActA
ActA
ActA
ActA
2 2 *
*
* * 2 1 ** **
1 1 *
0 0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Nilotinib F = 14.18 p−value = 6.61e−27 Nutlin.3 F = 4.08 p−value = 1.09e−06 PD.0325901 F = 16.03 p−value = 1.46e−31 PD.0332991 F = 10.76 p−value = 1.52e−20
** **
** **
3
6 ** 3
6 ** ** **
* ** * ** * *
2 *
** ** *
2
4 4
ActA
ActA
ActA
ActA
**
* ** ** *
2
1
2
1 *
0 0 0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
PF2341066 F = 10.11 p−value = 9.88e−20 PHA.665752 F = 4.79 p−value = 3.14e−08 PLX4720 F = 10.6 p−value = 1.03e−20 Paclitaxel F = 7.6 p−value = 2.25e−14
8
* ** ** **
****
3
**
5
4
** ** * *
** 4 **
3 ** 2 **
6
3
**
ActA
ActA
ActA
ActA
2 * 4
1
2 *
1 * 1
2
0 0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Panobinostat F = 20.92 p−value = 2.32e−40 RAF265 F = 5.53 p−value = 9.27e−10 Sorafenib F = 9.68 p−value = 7.98e−19 TAE684 F = 8.34 p−value = 5.6e−16
7
** 6
6
** ** 4 **
** * * * * *
** **
6
* * * * * *
* 3 4 4 *
5
*
** * * *
ActA
ActA
ActA
ActA
*
**
2
4
2 2
*
3 1
*
2 0 0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
TKI258 F = 8.18 p−value = 1.26e−15 Topotecan F = 9.58 p−value = 1.26e−18 ZD.6474 F = 4.78 p−value = 3.37e−08
** 4
* * ** *
6
** * *
4 * 3 *
* * * * *
** 4 * *
ActA
ActA
ActA
2
*
2
* 2
1
0 0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Cluster Cluster Cluster
Fig. 3 (legend on next page)
org/10.1016/j.jmb.2018.03.021
Table 1. Summary of the functional enrichment analysis of genes altered in each cluster
Cluster Function annotation (GO terms, KEGG and Reactome pathways)
1 Neuron differentiation and development, signaling by GPCR, synaptic transmission
2 Pathways in cancer, signaling by NGF, ErbB signaling pathway, protein kinase activity
3 Apoptosis, ectoderm and endodermis development, cytoskeleton, calcium ion binding, cell–cell junction
5 Extracellular matrix structural constituent, ECM–receptor interaction, growth factor binding, focal adhesion, axon guidance,
hemostasis, integrin cell surface interactions
6 Primary immunodeficiency, intestinal immune network for IgA production, signaling in Immune system, homeostasis, cytokine–cytokine
interaction, B-cell receptor signaling pathway
7 Glutathione metabolism
8 Complement and coagulation cascades, hemostasis, metabolism of lipids and lipoproteins
11 MHC class II receptor activity, ECM–receptor interaction
12 B-cell homeostasis, cytokine binding
13 Signaling in immune system, hemostasis, natural killer cell-mediated cytotoxicity, hematopoietic cell lineage, T-cell receptor signaling
pathway, chemokine signaling pathway, integral cell surface interactions. Cell adhesion molecules, Fc gamma R-mediated
phagocytosis, Jak–STAT signaling pathway, leukocyte transendothelial migration, regulation of actin cytoskeleton
14 Apical part of cell, sensory perception of sound and mechanical stimulus
15 Epidermis and ectoderm development, axon guidance, signaling by PDGF, collagen type IV
and the result of the clustering is robust (Fig. 2d). previous results with physical protein interaction in
However, and despite their recognized importance in terms of cluster robustness and protein coverage
tumor development [22] and drug response [23], we (normalized mutual information of 0.66 and 0.68 for
found that CCL clusters obtained from somatic STRING and inBioMap, respectively).
mutations alone are less robust (Fig. 2d), rather
scattered, and do not correlate with the tissues of Cluster correlation with drug response
origin (Fig. 2b). In order to retain mutation data in our
clusters, we combined the network-smoothed We then assessed whether CCLs in the same cluster
expression and mutation profiles into a single, merged respond similarly to drug treatment. Following recom-
profile (see Methods). The resulting co-clustering mendation by the CCLE authors, we used the area
network of this merged data set is depicted in Fig. 2c. above the drug-response curve (ActA) as the drug
Here, clusters resemble those obtained by gene activity measure—the higher the ActA, the more
expression analysis alone (Fig. 2e), thereby keeping effective the drug is. Remarkably, for 23 of the 24
their biological relevance and robustness (Fig. 2d), drugs, we could find at least one cluster whose CCLs
while still incorporating the valuable mutation data. We had a differential drug sensitivity (ANOVA p value b
chose this merged data set for further analyses. 0.05; Fig. 3). Some of the drugs, like Panobinostat, had
To optimize the clustering parameters, we required specific drug activity ranges in most of the clusters,
the number of CCLs per cluster to be comparable while others, like PLX4720, were mostly related to a few
among clusters and be large enough to enable of them. Some clusters contained CCLs that were in
statistical analysis. We chose to carry on our analyses general more sensitive, as it is the case for two out of
with 15 clusters, as it ensured a minimal number of the three hematopoietic and lymphoid clusters (12
groupings with less than 10 CCLs and a small standard and 13). Also, several tissue-diverse clusters were
deviation of the number of cells per group (Fig. S1). specific to some of the drugs (e.g., cluster 3 and
Figure S2 shows that the number of CCLs in each Erlotinib), supporting the notion that the key molecular
cluster varies from 16 to 45, and illustrates tissue features may drive drug response across tumor types
distribution along the 15 clusters. Some clusters are [14,15,26].
clearly dominated by CCLs belonging to a single tissue Interestingly, CCL clusters respond similarly to drugs
(i.e., clusters 6, 12 and 13 are composed only by with the same mechanism of action. For instance,
hematopoietic and lymphoid cells), while others are clusters 11, 13 and 14 are more sensitive to the
more diverse. Finally, we re-ran our pipeline using two two MEK inhibitors AZD6244 and PD-0325901. Their
functional networks gathered from STRING [24] and drug targets, MEK1 and MEK2, are down- and over-
inBioMap [25] to measure the robustness of the results expressed, respectively, in these sensitive clusters
and to check whether we could increase the protein (Wilcoxon's p value b 0.005). Another example relates
coverage (Figs. S3–S5). The clusters obtained from to cluster 3, whose CCLs are tissue heterogeneous and
functional networks were considerably similar to the respond to EGFR inhibition by Erlotinib and Lapatinib.
Fig. 3. Drug activity distribution along CCL clusters for the 23 drugs with a response significantly associated with
at least one cluster. “F” and “p value” are the ANOVA F value and p value, respectively. Red asterisks indicate that the
drug response of the cluster is significantly different to more than 5 (*) or 10 (**) clusters (Benjamini–Hochberg adjusted
p value b 0.05).
org/10.1016/j.jmb.2018.03.021
Molecular signatures and drug response biomarkers, we compared the expression/mutation

status of the previous cluster-associated genes be-
Given the observed relevance of our CCL clusters to tween sensitive and resistant CCLs. That is, for each
drug response, we sought to identify the representative drug–gene pair, we tested whether the expression/
genes of each cluster, that is, those with a similar mutation status of the gene was significantly different in
behavior in all the CCLs belonging to it. For this, we sensitive/resistant CCLs. Figure 5a has examples of
applied the Significant Analysis Of Microarray (SAM) the three potential drug sensitivity associations, namely
algorithm [27] and selected genes with a significantly down-, up-regulated gene expression, and mutated
high SAM score (FDR b 0.01). We then compared genes. Figure 5b–e quantifies the potential biomarkers
every CCL cluster with the remaining cohort. The figure that we could find (Bonferroni-corrected Fisher's
shows the number of significant genes identified per p value b 0.05). With gene expression data, we
cluster. Finally, we exploited the DAVID database [28] identified 1733 gene–drug correlations, involving 563
to functionally annotate the relevant genes identified. genes and 23 drugs. A portion of these associations is
Table 1 summarizes the biological terms associated listed in Table 2, and the rest are available in Table S1
with the clusters. (please note that genes associated with clusters only
The number of detected genes is highly variable through the network-propagation step had no chance to
across clusters (Fig. 4). The clusters with a higher enter this differential expression analysis). Regarding
number of genes are clusters 6 and 13, which contain mutations, we could only find six significant mutation–
hematopoietic and lymphoid cells, as well as cluster 11, drug associations, all of them involving BRAF and
composed by skin cells. As expected, we found a KRAS (Table 3). Reassuringly, all of them are well-
lesser number of associated genes when CCLs in the documented drug response biomarkers [31,32]; unfor-
cluster were more diverse, as is the case of clusters 9 tunately, due to the low mutation rate and to the smaller
and 10, containing cell lines from all of the tissues. number of mutated genes, we lacked statistical power
A noteworthy observation is that, thanks to the to find novel biomarker mutations.
network-propagation algorithm, we could detect 724
gene–cluster associations (i.e., 21% of the total) that Benefit of the system biology approach
would have been otherwise lost. Seventy-one out of
these 724 involve genes whose expression was not As we showed above, the network-propagation
measured by the CCLE, 214 comprise genes that algorithm was helpful to detect cluster-associated
do not significantly change their expression in any genes that would have remained otherwise unper-
CCL, and the remaining 439 are not differentially ceived. To further evaluate this feature, we re-run our
expressed nor mutated in the cluster compared to the clustering analysis, this time skipping the network
other clusters. Some interesting examples among propagation step. By considering the basal gene
these associations are cancer-related genes such as expression of 16,077 genes, we now grouped the
the transcription factor FOXO1, the kinase FYN, the CCLs into 7 clusters (Fig. S6A). Similar to the results
proto-oncogene VAV1 and PIDD1, which promotes obtained with NBS, 23 of the 24 drugs could be
apoptosis downstream of TP53 and mediates NF-ΚΒ associated with one or more of these clusters (Fig. S7).
activation in response to DNA damage [29,30]. However, we found a rather different profile of genes in
To check whether cluster-specific molecular the clusters, detecting more than 2000 genes per
signatures can help discovering drug response cluster and, overall, associating 88% of the genes to at
500 514
466
440
400
# Genes
300 321
306
200 212
203 194
171
100 132 124 118
109
87
25
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Cluster
Fig. 4. Number of significant genes identified by SAM in each cluster. FDR b 0.01.
org/10.1016/j.jmb.2018.03.021
(a) (c)
FLI1 — L-685458 BTBD3 — L-685458 BRAF — AZD6244 20
Up-regulated Down-regualted Mutated
# Unique drugs
Yes No Yes No Yes No 15
Sens.
Sens.
Sens.
Response
Response
Response
21 30 14 34 33 48
10
Res.
Res.
Res.
7 337 14 330 23 300 5
Fisher p-value = 2e-15 Fisher p-value = 6e-7 Fisher p-value = 2e-12 0

9 10 4 7 8 15 5 1 14 2 11 3 12 6 13
(b) (d) Cluster
1%
500 51%
2% 300
80% 1% > 100
44%
400 250 ≤ 100
# Unique genes
# Unique genes
≤ 10
Expression
1% No expression 200
2% data in CCLE
300 98%
97%
48% 55%
No significant 150
7% 1% Significant
200 1% 89% biomarkers 100
2% 42%
89%
89%
1% 11% 50
98% 96% 57% 6%
100 2% 89% 18% 90%
93% 0
PLX4720
Nilotinib
PF2341066
PD.0332991
Sorafenib
Paclitaxel
Topotecan
Panobinostat
L.685458
RAF265
AZD0530
X17.AAG
ZD.6474
Nutlin.3
Lapatinib
Erlotinib
PHA.665752
PD.0325901
AEW541
TKI258
AZD6244
Irinotecan
TAE684
4% 10%
4% 9% 2% 1%
0 2% 5% 3% 96% 5%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Cluster
83%
500
92% Drugs
87%
(e)
400
Mutations
# Unique genes
No mutation
93% 91% data in CCLE 150
300 No significant
Significant
# Genes
64% 93% biomarkers 100

200 89%
94%
94% 97% 90%
91%
100 17% 92% 50
36%
13%
7% 9% 7% 88% 8% 11% 8%
7%
0
6% 6% 1% 3% 10% 12% 1%
0
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
# Drugs
Cluster
Fig. 5. Statistics on drug response associations. (a) Examples of the contingency tables to find associations between
differential expression or mutations and drug response. (b) Proportion of genes significantly, no significantly associated
with drug response and without molecular data available in the CCLE for both differential expression and mutation.
(c) Number of unique drugs per cluster. (d) Number of unique genes per drug. (e) Specificity of the drug–gene associations, that
is, number of genes associated with a certain number of drugs.
least one cluster (Fig. S6B). By integrating network based on Bayesian Regression Trees (BART) [33].
information and starting from a comparable number of Like most Bayesian methods, BART works wells with
genes, we decreased this percentage to 25%. Hence, relatively few samples and, thanks to its tree structure,
our NBS approach has permitted to drastically reduce can natively deal with highly correlated variables.
the number of genes associated with clusters, allowing We applied the BART algorithm on the Network-
a much simpler interpretation. smoothed profiles to classify CCLs into sensitive or
resistant, according to their ActA (Fig. 6a). Further-
Drug response prediction more, we included the cluster identity and the
cluster-specific genes identified to assess whether
The described methodology has grouped CCLs the predictive power 11of our model would improve
according to their basal molecular profiles, and we with this information (Fig. 6).
have shown that these cell clusters, in many cases, can We used the area under the ROC curve (AUROC) to
be associated with the response to specific drugs. We evaluate the performance of the leave-one-out cross-
have also identified characteristic molecular features of validated classifier (Fig. 6b). In total, by using cluster
each cluster, and it remains to be seen whether these information and cluster-specific genes, we could rank
are sufficient to predict drug sensitivity or resistance sensitive CCLs with an AUROC higher than 0.7 for 16
of CCLs. To address this question, and given the drugs. Remarkably, the overall performance of the
relatively small number of samples and the recognized classifier does not drop if we only use the set of cluster-
coexpression between genes, we used a classifier associated genes (i.e., four times fewer genes), and the
org/10.1016/j.jmb.2018.03.021
Table 2. Drug response associations to differentially expressed genes

Drug Gene Expression Cluster % Total % Sensitive % Resistant Bonferroni p value
Nilotinib ASAP2 Down 6, 12, 13 12 65.5 7.9 1.34E−08
TKI258 BIN2 Up 12, 13 6 57.9 4.0 1.75E−06
Sorafenib ARHGAP15 Up 6, 13 10 55.6 7.1 8.16E−06
Sorafenib SNX7 Down 6, 13 11 55.6 8.6 6.37E−05
Sorafenib TJP1 Down 6, 12, 13 11 55.6 8.1 1.47E−05
Nilotinib SH3BP4 Down 6, 12, 13 12 55.2 8.5 1.74E−05
L.685458 ASAP2 Down 6, 12, 13 11 53.7 4.7 8.11E−15
L.685458 SNX7 Down 6, 13 12 53.7 5.8 4.86E−13
TKI258 CD53 Up 12, 13 10 52.6 7.9 6.78E−03
TKI258 DOCK8 Up 6, 13 8 52.6 6.5 3.31E−03
TKI258 LCP2 Up 13 7 52.6 4.7 2.46E−04
“Expression” indicates whether the gene is up- or down-regulated. “% Total” is the total percentage of CCLs with the expression biomarker.
“% Sensitive” and “% Resistant” are the percentages of sensitive or resistant CCLs with the expression biomarkers. “Bonferroni p value” is
the corrected Fisher's test p value.
number of drugs with an AUROC of N 0.7 is higher. sensitive or the top resistant in both data sets; Figs. 6d
Under these settings, the best performing drugs are and S11). We can appreciate that the median AUROC
L-685458 and Nilotinib, with a cross-validated AUROC raises for all the approaches and, although the increase
above 0.8. is not significant, the performance improved for 9 out of
In addition, we sought to confirm the predictive power 15 drugs. This increase, however, has the drawback of
of our approach with an external data set. We chose the a lower applicability since the number of sensitive CCLs
GDSC CCL panel described by Iorio et al. [11] in 2016, is very limited for some drugs.
where they characterized molecular alterations in 1001
human CCLs and correlated them with sensitivity to Conclusions
265 drugs. The CCLE and the GDSC data sets share
343 CCLs and 15 drugs. However, the drug activities In this report, we have described an approach to
reported by the two studies are poorly correlated group CCLs according to their molecular profiles and
(Fig. S9), which effectively limits the validity of the to capture how specific profile alterations are spread
comparison. We nevertheless carried out the analysis across their protein–protein wiring. Combining gene
by using the BART models obtained from the CCLE expression and mutational data, we obtained robust
panel to predict drug sensitivity in the GDSC data set. clusters that recapitulate the CCL tissue of origin,
Figure 6c shows the performance of the BART indicating that gene expression profiles are highly
classification for all methods. As it can be observed, influenced by the histology. Moreover, we observed
the performance of the classification decreases in the that, starting from the molecular variability described in
external data set compared to the cross-validation the CCLE alone, our CCL clusters showed significant
results. specificity for most of the drugs. Moreover, we identified
We repeated the analyses considering only those those genes that were altered in each cluster and
drugs that, at least, had the same type of qualitative looked for associations between their expression status
response in the two data sets (i.e., sensitive or and the observed drug response. In total, we identified
resistant; Fig. S10), obtaining better results for 8 out 1712 potential gene–drug associations and found, in
of 10 drugs (SAM & cluster approach). Finally, we addition, a strong, potentially confounding influence of
cleaned up the sets further by including only those the CCL tissue of origin on drug response, especially in
CCLs whose drug response falls in the percentile 33 hematopoietic and lymphoid cells, which tend to be
and 66 in both cases (i.e., CCLs that are in the top more sensitive.
Table 3. Drug response associations for mutations

Drug Gene Cluster % Total % Sensitive % Resistant Bonferroni p value
PLX4720 BRAF 11 16 66 10 3.21E− 13
AZD6244 BRAF 11 16 42 9 2.69E− 08
PD.0325901 BRAF 11 16 27 6 6.39E− 06
PD.0325901 KRAS 7, 15 22 32 13 3.87E− 03
PLX4720 KRAS 7, 15 21 0 24 1.56E− 02
RAF265 BRAF 11 16 26 9 4.04E− 02
“% Total” is the total percentage of CCLs with the mutated gene. “% Sensitive” and “% Resistant” are the percentages of sensitive or
resistant CCLs with the mutation biomarker. “Bonferroni p value” is the corrected Fisher test p value.
org/10.1016/j.jmb.2018.03.021
(a)
100
Percentage
75
Response
50 46%
40% 41% 39% Resistant
35% 34%
29% Sensitive
25 20% 17% 16% 15% 14%
12% 7% 12% 12% 8% 9% 11% 11%
6% 4% 9%
0
G
D 1
D 30
4
no n i b
pa 8
N tinib
P D Nu i n i b
6 8 an
PD 32 .3
PF 332 01
PH 234 991
66 66
X 2
Pa ac 720
no l
AF t
ra 65
TA enib
To KI2 4
p o 58
.6 n
4
bi xe
R sta
4
Er 24
La 545
PL 575
T 8
ZD eca
47
AE AA
.0 tlin
AZ W5
AZ 05
L. tec
.0 59
A. 10
So 2
E6
no lita
Iri loti
t
6
P 4
ilo
f
7.
t
X1
Drug
(b)
NS matrix SAM genes NS matrix & Clusters SAM & Clusters
Median AUC = 0.709 Median AUC = 0.715 Median AUC = 0.722 Median AUC = 0.72
1.00
0.75
TPR
0.50
0.25
0.00
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
FPR FPR FPR FPR
(c)
NS matrix SAM genes NS matrix & Clusters SAM & Clusters
1.00
0.75
TPR
0.50
0.25
0.00
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
FPR FPR FPR FPR
(d) SAM genes NS matrix & Clusters SAM & Clusters
NS matrix
1.00
0.75
TPR
0.50
0.25
0.00
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
FPR FPR FPR FPR
Fig. 6. Drug response prediction. (a) Percentage of sensitive CCLs per drug. (b) ROC curves after applying a leave-one-out
cross-validation. Lines represent the 23 drugs. “NS matrix” is the complete network-smoothed matrix—448 CCLs and 12,486
genes; “SAM genes” only includes genes identified as relevant in the clusters—448 CCLs and 3119 genes; “NS matrix &
Clusters” is the complete network-smoothed matrix plus the cluster information; and “SAM & cluster” refers to both the relevant
genes and the cluster information. (c) ROC curves for the prediction of the external data set and (d) including only the top resistant/
sensitive CCLs in both data sets.
org/10.1016/j.jmb.2018.03.021
Thanks to the systems biology approach to propa- to map them onto the PPI network. For somatic
gate the effect of over/under-expression and mutations mutations, we considered 1 if a given gene is mutated
throughout the network, we detected 724 genes that in a CCL and 0 otherwise. The following variants were
were not characterized in the CCLE and might, we filtered out: common polymorphisms, allelic fraction
believe, guide mechanistic interpretation due to their b 10%, putative neutral variants (missenses present in
proximity to altered genes. less than 2 warm-blooded vertebrates) and located
Finally, we checked whether the network-propagated outside of the CDS for all transcripts. Regarding
profiles could be used to predict drug response. We transcriptomic data, 1 refers to differentially expressed
predicted drug sensitivity with an AUROC higher than genes, either up- or down-regulated. To detect
0.7 for 13 of the 24 drugs. However, the poor correlation differentially expressed genes, we normalized the
between drug responses in different CCL panels basal expression of each gene in a cell line to the
makes the external validation of the predictions tough. expression distribution of the gene in all CCLs, yielding
This has been a classical concern of machine-learning a differential gene-expression Z-score. We defined
based prediction of CCL drug response [34–36]. We as up- or down-regulated those genes with Z ≥ 2 or
hope that approaches like the one presented here, Z ≤ −2, respectively. To merge expression and muta-
more focused on the identification of key, robust genetic tion data, we summed up the network-smoothed
determinants of drug response, will lead to more matrices, and afterward, we scaled the merged matrix
informed predictions and finally shed light onto the between 0 and 1. To optimize the number of clusters,
molecular bases of drug action. we minimized the groups with less than 10 CCLs and
the standard deviation of the number of cells per group
(Fig. S1). To compare the clustering results, we
Methods calculated the normalized mutual information.
We applied the SAM [27] method to identify those
genes with a significantly high score within clusters
CCL data
(FDR b 0.01). We compared every cluster with the
remained cohort. We run the functional enrichment
We retrieved CCL data from the CCLE portal (https://
analysis on the DAVID database Web Services. We
portals.broadinstitute.org/ccle). We downloaded the
considered biological processes and molecular func-
gene expression data (version 2012-09-29), the Hybrid tions from the Gene Ontology, as well as KEGG and
capture sequence (version 2012-10-18) and the drug
Reactome pathways.
profiling (version 2015-02-24). We included in our
analysis the 448 CCLs that are present in the three Cluster correlation with drug response
data sets. The final data set contains somatic mutations
and indels for 1651 genes, expression data for 18,987
To assess whether CCLs within clusters present
genes and pharmacological profiles for 24 antineo-
similar drug response, we performed the ANOVA test
plastic drugs.
on the drug response distribution along clusters. We
used the activity area (ActA) as drug response measure
PPI data
since it provides a comprehensive representation of
drug activity according to the CCLE. Clusters were also
We used an in-house PPI network, generated by
compared pairwise to identify significant differences
merging data available from major public PPI data-
in drug response (Wilcoxon's rank test, Benjamini–
bases (version 2016_06) [20]. We only considered Hochberg p value b 0.05).
physical binary interactions. To incorporate the molec-
ular data onto the network, we mapped ENTREZ Gene
Gene–drug response associations
IDs to UniProtACs using the UniProtID mapping tool.
We also gathered two functional interaction networks
To analyze whether the genes identified previously
from STRING [24] and inBioMap [25] databases. We
are associated with differential drug response, we
applied confidence cutoffs of 0.7 and 0.2, respectively,
compared their expression/mutation status between
regarded as “high confidence” by the databases'
sensitive and resistant CCLs. For each drug–gene pair,
authors. The STRING network contains 14,725 pro-
we built a contingency table to see whether the
teins and 300,686 interactions and the inBioMap
expression/mutation status of the gene is significantly
network, 10,100 proteins and 168,970 interactions. different between sensitive and resistant CCLs. We
applied the Bonferroni correction to adjust for multiple
NBS of CCLs Fisher's exact testing. We used the waterfall plot to
classify CCLs as resistant or sensitive as described in
We applied the NBS method [19] to integrate gene Haibe-Kains et al. [34]. In brief, for each compound, the
expression and somatic mutations data with the PPI shape of the rank-ordered plot of the responses values
network. We used the MATLAB code provided by the was inspected to classify CCLs into sensitive or
authors. Expression and mutation data were binarized resistant. If the drug response distribution was linear,
org/10.1016/j.jmb.2018.03.021
we used as inflection point the median. If it was not Line Encyclopedia; SAM, Significant Analysis Of
linear, we took the point on the curve with the maximal Microarray; BART, Bayesian Regression Trees; AUROC,
distance to a line drawn between the start and end area under the ROC curve.
points of the distribution.
Drug response prediction References

To predict drug sensitivity in each CCL, we applied
the BART classifier [33] on the network-smoothed [1] S.V. Sharma, D.A. Haber, J. Settleman, Cell line-based
matrix, using the implementation provided in the R platforms to evaluate the therapeutic efficacy of candidate
package bartMachine [37]. We applied both a leave- anticancer agents, Nat. Rev. Cancer 10 (2010) 241–253.
one-out and fivefold cross-validation (Fig. S8). In the [2] D. Ferreira, F. Adega, R. Chaves, The importance of cancer
cell lines as in vitro models in cancer methylome analysis and
fivefold cross-validation, we applied the SMOTE R
anticancer drugs testing, in: C. López (Ed.), Oncogenomics
package [38] to balance the number of resistant/
and Cancer Proteomics—Novel Approaches in Biomarkers
sensitive CCLs. SMOTE blends under-sampling of Discovery and Therapeutic Targets in Cancer, InTech, 2013.
the majority class with a special form of over- [3] J.L. Wilding, W.F. Bodmer, Cancer cell lines for drug discovery
sampling the minority class. We split the training and development, Cancer Res. 74 (2014) 2377–2384.
and test data set keeping the proportion of resistant/ [4] M. Lukk, M. Kapushesky, J. Nikkila, H. Parkinson, A.
sensitive CCLs and then we computed SMOTE only Goncalves, W. Huber, et al., A global map of human gene
in the training set. expression, Nat. Biotechnol. 28 (4) (2010) 322.
We chose the GDSC CCL panel described in [5] S. Domcke, R. Sinha, D.A. Levine, C. Sander, N. Schultz,
Ref. [11] as external data set to validate the classifier. Evaluating cell lines as tumour models by comparison of
genomic profiles, Nat. Commun. 4 (2013) 2126.
GDSC contains 15 drugs in common with CCLE. We
[6] R.M. Hoffman, Three-dimensional histoculture: origins and
included only CCLs analyzed in both data set (343
applications in cancer research, Cancer Cells 3 (1991) 86–92.
CCLs). We mapped the binarized expression gene [7] T. Reya, S.J. Morrison, M.F. Clarke, I.L. Weissman, Stem cells,
expression values onto the network and we applied cancer, and cancer stem cells, Nature 414 (2001) 105–111.
the network-propagation algorithm as described [8] R.M. Hoffman, The three-dimensional question: can clinically
previously. We finally used these data to test the relevant tumor drug resistance be measured in vitro? Cancer
BART classifier trained with the results from the CCLE Metastasis Rev. 13 (1994) 169–173.
panel. [9] R.H. Shoemaker, The NCI60 human tumour cell line
Supplementary data to this article can be found anticancer drug screen, Nat. Rev. Cancer 6 (2006) 813–823.
online at https://doi.org/10.1016/j.jmb.2018.03.021. [10] J. Barretina, G. Caponigro, N. Stransky, K. Venkatesan, A.A.
Margolin, S. Kim, et al., The Cancer Cell Line Encyclopedia
enables predictive modelling of anticancer drug sensitivity,
Nature 483 (2012) 603–607.
[11] F. Iorio, T.A. Knijnenburg, D.J. Vis, G.R. Bignell, M.P.
Menden, M. Schubert, et al., A landscape of pharmacoge-
Acknowledgments nomic interactions in cancer, Cell 166 (2016) 740–754.
[12] M. Pirmohamed, Personalized pharmacogenomics: predicting
T.J.-B. is a recipient of and FPI-SO fellowship. P.A. efficacy and adverse drug reactions, Annu. Rev. Genomics Hum.
acknowledges the support of the Spanish Ministerio Genet. 15 (2014) 349–370.
de Economía y Competitividad (BIO2013-48222-R; [13] W.E. Evans, M.V. Relling, Moving towards individualized
BIO2016-77038-R) and the European Research medicine with pharmacogenomics, Nature 429 (2004) 464–468.
Council (SysPharmAD: 614944). [14] S. Jaeger, M. Duran-Frigola, P. Aloy, Drug sensitivity in cancer
cell lines is not tissue-specific, Mol. Cancer 14 (2015) 40.
[15] M.G. Rees, B. Seashore-Ludlow, J.H. Cheah, D.J. Adams,
Received 31 January 2018; E.V. Price, S. Gill, et al., Correlating chemical sensitivity
Received in revised form 19 March 2018; and basal gene expression reveals mechanism of action,
Accepted 22 March 2018 Nat. Chem. Biol. 12 (2016) 109–116.
Available online xxxx [16] M. Danhof, Systems pharmacology—towards the modeling
of network interactions, Eur. J. Pharm. Sci. 94 (2016) 4–14.
Keywords: [17] M. Bansal, J. Yang, C. Karan, M.P. Menden, J.C. Costello, H.
cancer cell lines; Tang, et al., A community computational challenge to predict
drug response; the activity of pairs of compounds, Nat. Biotechnol. 32 (2014)
1213–1222.
molecular signatures;
[18] F. Eduati, L.M. Mangravite, T. Wang, H. Tang, J.C. Bare, R.
antineoplastic drugs;
Huang, et al., Prediction of human population responses
network-based stratification to toxic compounds by a collaborative competition, Nat.
Biotechnol. 33 (2015) 933–940.
Abbreviations used: [19] M. Hofree, J.P. Shen, H. Carter, A. Gross, T. Ideker, Network-
CCLs, cancer cell lines; NBS, network-based stratifica- based stratification of tumor mutations, Nat. Methods 10
tion; PPI, protein–protein interaction; CCLE, Cancer Cell (2013) 1108–1115.
org/10.1016/j.jmb.2018.03.021
[20] R. Mosca, A. Ceol, P. Aloy, Interactome3D: adding structural [30] S. Janssens, A. Tinel, S. Lippens, J. Tschopp, PIDD
details to protein networks, Nat. Methods 10 (2013) 47–53. mediates NF-kappaB activation in response to DNA damage,
[21] M. Uhlen, L. Fagerberg, B.M. Hallstrom, C. Lindskog, P. Cell 123 (2005) 1079–1092.
Oksvold, A. Mardinoglu, et al., Proteomics. Tissue-based [31] S. Temraz, D. Mukherji, A. Shamseddine, Dual inhibition of
map of the human proteome, Science 347 (2015), 1260419. MEK and PI3K pathway in KRAS and BRAF mutated
[22] D. Huang, W. Sun, Y. Zhou, P. Li, F. Chen, H. Chen, et al., colorectal cancers, Int. J. Mol. Sci. 16 (2015) 22976–22988.
Mutations of key driver genes in colorectal cancer progression [32] L.A. Dossett, R.R. Kudchadkar, J.S. Zager, BRAF and MEK
and metastasis, Cancer Metastasis Rev. 37 (2018) 173–187. inhibition in melanoma, Expert Opin. Drug Saf. 14 (2015)
[23] P. Geeleher, Z. Zhang, F. Wang, R.F. Gruener, A. Nath, G. 559–570.
Morrison, et al., Discovering novel pharmacogenomic bio- [33] H.A. Chipman, E.I. George, R.E.M., BART: Bayessian
markers by imputing drug response in cancer patients from Additive Regression Trees, Ann. Appl. Stat. 4 (2010) 33.
large genomics studies, Genome Res. 27 (2017) 1743–1751. [34] B. Haibe-Kains, N. El-Hachem, N.J. Birkbak, A.C. Jin, A.H.
[24] D. Szklarczyk, A. Franceschini, S. Wyder, K. Forslund, D. Beck, H.J. Aerts, et al., Inconsistency in large pharmacoge-
Heller, J. Huerta-Cepas, et al., STRING v10: protein–protein nomic studies, Nature 504 (2013) 389–393.
interaction networks, integrated over the tree of life, Nucleic [35] C. Hatzis, P.L. Bedard, N.J. Birkbak, A.H. Beck, H.J. Aerts, D.F.
Acids Res. 43 (2015) D447–52. Stem, et al., Enhancing reproducibility in cancer drug screening:
[25] T. Li, R. Wernersson, R.B. Hansen, H. Horn, J. Mercer, G. how do we move forward? Cancer Res. 74 (2014) 4016–4023.
Slodkowicz, et al., A scored human protein–protein interaction [36] I. Cortes-Ciriano, G.J. van Westen, G. Bouvier, M. Nilges, J.P.
network to catalyze genomic interpretation, Nat. Methods 14 Overington, A. Bender, et al., Improved large-scale prediction
(2017) 61–64. of growth inhibition patterns using the NCI60 cancer cell line
[26] I. Cortes-Ciriano, L.H. Mervin, A. Bender, Current trends in drug panel, Bioinformatics 32 (2016) 85–95.
sensitivity prediction, Curr. Pharm. Des. 22 (2016) 6918–6927. [37] A. Kapelner, J. Bleich, bartMachine: Machine Learning
[27] V.G. Tusher, R. Tibshirani, G. Chu, Significance analysis of with Bayesian Additive Regression Trees, J. Stat. Softw. 70
microarrays applied to the ionizing radiation response, Proc. (2016) 40.
Natl. Acad. Sci. U. S. A. 98 (2001) 5116–5121. [38] M. Nakamura, Y. Kajiwara, A. Otsuka, H. Kimura, LVQ-
[28] G. Dennis Jr., B.T. Sherman, D.A. Hosack, J. Yang, W. Gao, SMOTE—Learning Vector Quantization based Synthetic
H.C. Lane, et al., DAVID: Database for Annotation, Visual- Minority Over-sampling Technique for biomedical data,
ization, and Integrated Discovery, Genome Biol. 4 (2003) P3. BioData Min. 6 (2013) 16.
[29] Y. Lin, W. Ma, Benchimol S. Pidd, a new death-domain-
containing protein, is induced by p53 and promotes apoptosis,
Nat. Genet. 26 (2000) 122–127.
org/10.1016/j.jmb.2018.03.021

1 s2.0 S0022283618301700 Main PDF

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

1 s2.0 S0022283618301700 Main PDF

Transféré par

Droits d'auteur :

Formats disponibles

Article

KDC YJMBI-65651; No. of pages: 12; 4C:

Rationalizing Drug Response in

Teresa Juan-Blanco 1 , Miquel Duran-Frigola 1 and Patrick Aloy 1, 2

Introduction To overcome these limitations, CCL panels

is that differential pharmacological profiles are caused Results and Discussion

(a) Network Propagation (b)

Somatic Mutations 1000

CCL Clusters 600

(a) Differential expression co-clustering network (b) Mutations co-clustering network

Dif. exp Merged Mutations

0.6 Dif. exp vs. mutation

Cluster Cluster Cluster

Fig. 3 (legend on next page)

Molecular signatures and drug response biomarkers, we compared the expression/mutation

Fisher p-value = 2e-15 Fisher p-value = 6e-7 Fisher p-value = 2e-12 0

64% 93% biomarkers 100

Table 2. Drug response associations to differentially expressed genes

Table 3. Drug response associations for mutations

Drug response prediction References

Vous aimerez peut-être aussi