Académique Documents
Professionnel Documents
Culture Documents
Correspondence to Patrick Aloy: IRB-BSC-CRG Program in Computational Biology, Institute for Research in
Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain.
patrick.aloy@irbbarcelona.org
https://doi.org/10.1016/j.jmb.2018.03.021
Edited by Juan Fuxman
Abstract
Cancer cell lines (CCLs) play an important role in the initial stages of drug discovery allowing, among others,
for the screening of drug candidates. As CCL panels continue to grow in size and diversity, many
polymorphisms in genes encoding drug-metabolizing enzymes, transporters and drug targets, as well as
disease-related genes have been linked to altered drug sensitivity. However, identifying the correlation
between this variability and pharmacological responses remains challenging due to the heterogeneity of
cancer biology and the intricate interplay between cell lines and drug molecules. Here, we propose a network-
based strategy that exploits information on gene expression and somatic mutations of CCLs to group cells
according to their molecular similarity. We then identify genes that are characteristic of each cluster and
correlate their status with drug response. We find that CCLs with similar characteristic active network regions
present specific responses to certain drugs, and identify a limited set of genes that might be directly involved in
drug sensitivity or resistance.
© 2018 Elsevier Ltd. All rights reserved.
0022-2836/© 2018 Elsevier Ltd. All rights reserved. J Mol Biol (2018) xx, xxxxxx
Please cite this article as: T. Juan-Blanco, et al., Rationalizing Drug Response in Cancer Cell Lines, J. Mol. Biol. (2018), https://doi.
org/10.1016/j.jmb.2018.03.021
2 Rationalizing Drug Response in Cancer Cell Lines
1400
448 CCLs
# Mutated genes
1200
1000
800
**
●
**
●
●
*
80
●
● ● ●
●
● ●
● ●
*
●
60
# CCLs
40
20
Map drug sensitivity Pharmacological profiles 0
Skin
Thyroid
Lung
Liver
Endometrium
Bone
UAT
Pleura
H&L
Ovary
Prostate
Kidney
Stomach
Breast
CNS
Soft tissue
Urinary tract
Bilary tract
Autonomic ganglia
Large intestine
Pancreas
on clusters
Salivary gland
Oesophagus
24 antineoplastic drugs
Tumor type
Fig. 1. Summary of the methodology and data set. (a) Somatic mutations and gene expression data from the CCLE was
integrated with PPI data through the network propagation algorithm. Then, CCLs were clustered and drug sensitivity to 24
antineoplastic drugs was mapped onto them. (b) Distribution of the number of differentially expressed and mutated genes
among the 448 CCLs grouped by tumor type and number of CCls per tumor type. CNS stands for central nervous system;
H&L, hematopoietic and lymphoid; and UAT, urinary aerodigestive tract.
Please cite this article as: T. Juan-Blanco, et al., Rationalizing Drug Response in Cancer Cell Lines, J. Mol. Biol. (2018), https://doi.
org/10.1016/j.jmb.2018.03.021
Rationalizing Drug Response in Cancer Cell Lines 3
similar areas of the interactome. We used an in-house proximity to the altered genes. Then, we clustered
binary protein–protein interaction (PPI) network gen- the resulting network-smoothed profiles into an
erated by merging major public PPI repositories [20]. In optimal number of groups and used the tissue of
total, the PPI network used in this study contains origin as a proxy to assess the biological relevance
12,486 proteins and 62,387 interactions. of the clusters [21]. The results are displayed in the
The NBS protocol requires that both mutation and co-clustering networks in Fig. 2a and b. As evident,
expression are expressed as binary features (see gene expression and somatic mutation data do not
Methods). The network propagation step smoothens cluster cell lines in the same way. Differential
these profiles so that they are compliant with expression clusters are able to recapitulate the
the network architecture, leading to a molecular tissue of origin of the cell lines (e.g., hematopoietic
signature that is no longer binary but captures the and lymphoid cells form a well-distinguished group),
Color legend
(c) Merged co-clustering network Skin
Thyroid
Lung
Liver
Endometrium
Bone
UAT
Pleura
H&L
Ovary
Prostate
Kidney
Stomach
Breast
CNS
Soft tissue
Urinary tract
Bilary tract
Autonomic ganglia
Large intestine
Pancreas
Salivary gland
Oesophagus
(d) 1.0
% co-clustering
0.8
0.6
0.4
0.2
0.0
20 40 60 80 100
# Clusters
Fig. 2. Co-clustering network using differential expression (a), somatic mutations (b) and the merged data set (c) after
applying NR-NMF onto the network-smoothed matrix for a number of clusters from 3 to 100. Nodes symbolize the CCLs.
Edges connect two CCLs if they cluster together at least in 30% of the clustering results. Node colors represent the tissue
of origin. Edge transparency is proportional to the co-clustering score. (d) Co-clustering scores distribution for differential
expression, mutation and merged data set. (e) Normalized mutual information between the clusters results between the
three data sets. CNS stands for central nervous system; H&L, hematopoietic and lymphoid; and UAT, urinary
aerodigestive tract.
Please cite this article as: T. Juan-Blanco, et al., Rationalizing Drug Response in Cancer Cell Lines, J. Mol. Biol. (2018), https://doi.
org/10.1016/j.jmb.2018.03.021
4 Rationalizing Drug Response in Cancer Cell Lines
17.AAG F = 3.46 p−value = 2.25e−05 AEW541 F = 6.71 p−value = 2.03e−12 AZD0530 F = 4.03 p−value = 1.44e−06 AZD6244 F = 13.32 p−value = 2.76e−26
* ** **
6 4 6
** 3 **
* 3
**
4
* 4 *
* * * * *
ActA
ActA
ActA
ActA
2
2
*
2 **
2 1
1
0 0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Cluster Cluster Cluster Cluster
Erlotinib F = 5.5 p−value = 9.15e−10 Irinotecan F = 9.84 p−value = 1.34e−17 L.685458 F = 13.53 p−value = 1.43e−26 Lapatinib F = 6.69 p−value = 2.17e−12
6
** ** **
4 4
* 3 * *
3
* ** 3
4
* ** **
* 2 *
ActA
ActA
ActA
ActA
2 2 *
*
* * 2 1 ** **
1 1 *
0 0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Cluster Cluster Cluster Cluster
Nilotinib F = 14.18 p−value = 6.61e−27 Nutlin.3 F = 4.08 p−value = 1.09e−06 PD.0325901 F = 16.03 p−value = 1.46e−31 PD.0332991 F = 10.76 p−value = 1.52e−20
** **
** **
3
6 ** 3
6 ** ** **
* ** * ** * *
2 *
** ** *
2
4 4
ActA
ActA
ActA
ActA
**
* ** ** *
2
1
2
1 *
0 0 0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Cluster Cluster Cluster Cluster
PF2341066 F = 10.11 p−value = 9.88e−20 PHA.665752 F = 4.79 p−value = 3.14e−08 PLX4720 F = 10.6 p−value = 1.03e−20 Paclitaxel F = 7.6 p−value = 2.25e−14
8
* ** ** **
****
3
**
5
4
** ** * *
** 4 **
3 ** 2 **
6
3
**
ActA
ActA
ActA
ActA
2 * 4
1
2 *
1 * 1
2
0 0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Cluster Cluster Cluster Cluster
Panobinostat F = 20.92 p−value = 2.32e−40 RAF265 F = 5.53 p−value = 9.27e−10 Sorafenib F = 9.68 p−value = 7.98e−19 TAE684 F = 8.34 p−value = 5.6e−16
7
** 6
6
** ** 4 **
** * * * * *
** **
6
* * * * * *
* 3 4 4 *
5
*
** * * *
ActA
ActA
ActA
ActA
*
**
2
4
2 2
*
3 1
*
2 0 0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Cluster Cluster Cluster Cluster
TKI258 F = 8.18 p−value = 1.26e−15 Topotecan F = 9.58 p−value = 1.26e−18 ZD.6474 F = 4.78 p−value = 3.37e−08
** 4
* * ** *
6
** * *
4 * 3 *
* * * * *
** 4 * *
ActA
ActA
ActA
2
*
2
* 2
1
0 0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Please cite this article as: T. Juan-Blanco, et al., Rationalizing Drug Response in Cancer Cell Lines, J. Mol. Biol. (2018), https://doi.
org/10.1016/j.jmb.2018.03.021
Rationalizing Drug Response in Cancer Cell Lines 5
Table 1. Summary of the functional enrichment analysis of genes altered in each cluster
Cluster Function annotation (GO terms, KEGG and Reactome pathways)
1 Neuron differentiation and development, signaling by GPCR, synaptic transmission
2 Pathways in cancer, signaling by NGF, ErbB signaling pathway, protein kinase activity
3 Apoptosis, ectoderm and endodermis development, cytoskeleton, calcium ion binding, cell–cell junction
5 Extracellular matrix structural constituent, ECM–receptor interaction, growth factor binding, focal adhesion, axon guidance,
hemostasis, integrin cell surface interactions
6 Primary immunodeficiency, intestinal immune network for IgA production, signaling in Immune system, homeostasis, cytokine–cytokine
interaction, B-cell receptor signaling pathway
7 Glutathione metabolism
8 Complement and coagulation cascades, hemostasis, metabolism of lipids and lipoproteins
11 MHC class II receptor activity, ECM–receptor interaction
12 B-cell homeostasis, cytokine binding
13 Signaling in immune system, hemostasis, natural killer cell-mediated cytotoxicity, hematopoietic cell lineage, T-cell receptor signaling
pathway, chemokine signaling pathway, integral cell surface interactions. Cell adhesion molecules, Fc gamma R-mediated
phagocytosis, Jak–STAT signaling pathway, leukocyte transendothelial migration, regulation of actin cytoskeleton
14 Apical part of cell, sensory perception of sound and mechanical stimulus
15 Epidermis and ectoderm development, axon guidance, signaling by PDGF, collagen type IV
and the result of the clustering is robust (Fig. 2d). previous results with physical protein interaction in
However, and despite their recognized importance in terms of cluster robustness and protein coverage
tumor development [22] and drug response [23], we (normalized mutual information of 0.66 and 0.68 for
found that CCL clusters obtained from somatic STRING and inBioMap, respectively).
mutations alone are less robust (Fig. 2d), rather
scattered, and do not correlate with the tissues of Cluster correlation with drug response
origin (Fig. 2b). In order to retain mutation data in our
clusters, we combined the network-smoothed We then assessed whether CCLs in the same cluster
expression and mutation profiles into a single, merged respond similarly to drug treatment. Following recom-
profile (see Methods). The resulting co-clustering mendation by the CCLE authors, we used the area
network of this merged data set is depicted in Fig. 2c. above the drug-response curve (ActA) as the drug
Here, clusters resemble those obtained by gene activity measure—the higher the ActA, the more
expression analysis alone (Fig. 2e), thereby keeping effective the drug is. Remarkably, for 23 of the 24
their biological relevance and robustness (Fig. 2d), drugs, we could find at least one cluster whose CCLs
while still incorporating the valuable mutation data. We had a differential drug sensitivity (ANOVA p value b
chose this merged data set for further analyses. 0.05; Fig. 3). Some of the drugs, like Panobinostat, had
To optimize the clustering parameters, we required specific drug activity ranges in most of the clusters,
the number of CCLs per cluster to be comparable while others, like PLX4720, were mostly related to a few
among clusters and be large enough to enable of them. Some clusters contained CCLs that were in
statistical analysis. We chose to carry on our analyses general more sensitive, as it is the case for two out of
with 15 clusters, as it ensured a minimal number of the three hematopoietic and lymphoid clusters (12
groupings with less than 10 CCLs and a small standard and 13). Also, several tissue-diverse clusters were
deviation of the number of cells per group (Fig. S1). specific to some of the drugs (e.g., cluster 3 and
Figure S2 shows that the number of CCLs in each Erlotinib), supporting the notion that the key molecular
cluster varies from 16 to 45, and illustrates tissue features may drive drug response across tumor types
distribution along the 15 clusters. Some clusters are [14,15,26].
clearly dominated by CCLs belonging to a single tissue Interestingly, CCL clusters respond similarly to drugs
(i.e., clusters 6, 12 and 13 are composed only by with the same mechanism of action. For instance,
hematopoietic and lymphoid cells), while others are clusters 11, 13 and 14 are more sensitive to the
more diverse. Finally, we re-ran our pipeline using two two MEK inhibitors AZD6244 and PD-0325901. Their
functional networks gathered from STRING [24] and drug targets, MEK1 and MEK2, are down- and over-
inBioMap [25] to measure the robustness of the results expressed, respectively, in these sensitive clusters
and to check whether we could increase the protein (Wilcoxon's p value b 0.005). Another example relates
coverage (Figs. S3–S5). The clusters obtained from to cluster 3, whose CCLs are tissue heterogeneous and
functional networks were considerably similar to the respond to EGFR inhibition by Erlotinib and Lapatinib.
Fig. 3. Drug activity distribution along CCL clusters for the 23 drugs with a response significantly associated with
at least one cluster. “F” and “p value” are the ANOVA F value and p value, respectively. Red asterisks indicate that the
drug response of the cluster is significantly different to more than 5 (*) or 10 (**) clusters (Benjamini–Hochberg adjusted
p value b 0.05).
Please cite this article as: T. Juan-Blanco, et al., Rationalizing Drug Response in Cancer Cell Lines, J. Mol. Biol. (2018), https://doi.
org/10.1016/j.jmb.2018.03.021
6 Rationalizing Drug Response in Cancer Cell Lines
500 514
466
440
400
# Genes
300 321
306
200 212
203 194
171
100 132 124 118
109
87
25
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Cluster
Fig. 4. Number of significant genes identified by SAM in each cluster. FDR b 0.01.
Please cite this article as: T. Juan-Blanco, et al., Rationalizing Drug Response in Cancer Cell Lines, J. Mol. Biol. (2018), https://doi.
org/10.1016/j.jmb.2018.03.021
Rationalizing Drug Response in Cancer Cell Lines 7
(a) (c)
FLI1 — L-685458 BTBD3 — L-685458 BRAF — AZD6244 20
Up-regulated Down-regualted Mutated
# Unique drugs
Yes No Yes No Yes No 15
Sens.
Sens.
Sens.
Response
Response
Response
21 30 14 34 33 48
10
Res.
Res.
Res.
7 337 14 330 23 300 5
# Unique genes
# Unique genes
≤ 10
Expression
1% No expression 200
2% data in CCLE
300 98%
97%
48% 55%
No significant 150
7% 1% Significant
200 1% 89% biomarkers 100
2% 42%
89%
89%
1% 11% 50
98% 96% 57% 6%
100 2% 89% 18% 90%
93% 0
PLX4720
Nilotinib
PF2341066
PD.0332991
Sorafenib
Paclitaxel
Topotecan
Panobinostat
L.685458
RAF265
AZD0530
X17.AAG
ZD.6474
Nutlin.3
Lapatinib
Erlotinib
PHA.665752
PD.0325901
AEW541
TKI258
AZD6244
Irinotecan
TAE684
4% 10%
4% 9% 2% 1%
0 2% 5% 3% 96% 5%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Cluster
83%
500
92% Drugs
87%
(e)
400
Mutations
# Unique genes
No mutation
93% 91% data in CCLE 150
300 No significant
Significant
# Genes
Fig. 5. Statistics on drug response associations. (a) Examples of the contingency tables to find associations between
differential expression or mutations and drug response. (b) Proportion of genes significantly, no significantly associated
with drug response and without molecular data available in the CCLE for both differential expression and mutation.
(c) Number of unique drugs per cluster. (d) Number of unique genes per drug. (e) Specificity of the drug–gene associations, that
is, number of genes associated with a certain number of drugs.
least one cluster (Fig. S6B). By integrating network based on Bayesian Regression Trees (BART) [33].
information and starting from a comparable number of Like most Bayesian methods, BART works wells with
genes, we decreased this percentage to 25%. Hence, relatively few samples and, thanks to its tree structure,
our NBS approach has permitted to drastically reduce can natively deal with highly correlated variables.
the number of genes associated with clusters, allowing We applied the BART algorithm on the Network-
a much simpler interpretation. smoothed profiles to classify CCLs into sensitive or
resistant, according to their ActA (Fig. 6a). Further-
Drug response prediction more, we included the cluster identity and the
cluster-specific genes identified to assess whether
The described methodology has grouped CCLs the predictive power 11of our model would improve
according to their basal molecular profiles, and we with this information (Fig. 6).
have shown that these cell clusters, in many cases, can We used the area under the ROC curve (AUROC) to
be associated with the response to specific drugs. We evaluate the performance of the leave-one-out cross-
have also identified characteristic molecular features of validated classifier (Fig. 6b). In total, by using cluster
each cluster, and it remains to be seen whether these information and cluster-specific genes, we could rank
are sufficient to predict drug sensitivity or resistance sensitive CCLs with an AUROC higher than 0.7 for 16
of CCLs. To address this question, and given the drugs. Remarkably, the overall performance of the
relatively small number of samples and the recognized classifier does not drop if we only use the set of cluster-
coexpression between genes, we used a classifier associated genes (i.e., four times fewer genes), and the
Please cite this article as: T. Juan-Blanco, et al., Rationalizing Drug Response in Cancer Cell Lines, J. Mol. Biol. (2018), https://doi.
org/10.1016/j.jmb.2018.03.021
8 Rationalizing Drug Response in Cancer Cell Lines
number of drugs with an AUROC of N 0.7 is higher. sensitive or the top resistant in both data sets; Figs. 6d
Under these settings, the best performing drugs are and S11). We can appreciate that the median AUROC
L-685458 and Nilotinib, with a cross-validated AUROC raises for all the approaches and, although the increase
above 0.8. is not significant, the performance improved for 9 out of
In addition, we sought to confirm the predictive power 15 drugs. This increase, however, has the drawback of
of our approach with an external data set. We chose the a lower applicability since the number of sensitive CCLs
GDSC CCL panel described by Iorio et al. [11] in 2016, is very limited for some drugs.
where they characterized molecular alterations in 1001
human CCLs and correlated them with sensitivity to Conclusions
265 drugs. The CCLE and the GDSC data sets share
343 CCLs and 15 drugs. However, the drug activities In this report, we have described an approach to
reported by the two studies are poorly correlated group CCLs according to their molecular profiles and
(Fig. S9), which effectively limits the validity of the to capture how specific profile alterations are spread
comparison. We nevertheless carried out the analysis across their protein–protein wiring. Combining gene
by using the BART models obtained from the CCLE expression and mutational data, we obtained robust
panel to predict drug sensitivity in the GDSC data set. clusters that recapitulate the CCL tissue of origin,
Figure 6c shows the performance of the BART indicating that gene expression profiles are highly
classification for all methods. As it can be observed, influenced by the histology. Moreover, we observed
the performance of the classification decreases in the that, starting from the molecular variability described in
external data set compared to the cross-validation the CCLE alone, our CCL clusters showed significant
results. specificity for most of the drugs. Moreover, we identified
We repeated the analyses considering only those those genes that were altered in each cluster and
drugs that, at least, had the same type of qualitative looked for associations between their expression status
response in the two data sets (i.e., sensitive or and the observed drug response. In total, we identified
resistant; Fig. S10), obtaining better results for 8 out 1712 potential gene–drug associations and found, in
of 10 drugs (SAM & cluster approach). Finally, we addition, a strong, potentially confounding influence of
cleaned up the sets further by including only those the CCL tissue of origin on drug response, especially in
CCLs whose drug response falls in the percentile 33 hematopoietic and lymphoid cells, which tend to be
and 66 in both cases (i.e., CCLs that are in the top more sensitive.
Please cite this article as: T. Juan-Blanco, et al., Rationalizing Drug Response in Cancer Cell Lines, J. Mol. Biol. (2018), https://doi.
org/10.1016/j.jmb.2018.03.021
Rationalizing Drug Response in Cancer Cell Lines 9
(a)
100
Percentage
75
Response
50 46%
40% 41% 39% Resistant
35% 34%
29% Sensitive
25 20% 17% 16% 15% 14%
12% 7% 12% 12% 8% 9% 11% 11%
6% 4% 9%
0
G
D 1
D 30
4
no n i b
pa 8
N tinib
P D Nu i n i b
6 8 an
PD 32 .3
PF 332 01
PH 234 991
66 66
X 2
Pa ac 720
no l
AF t
ra 65
TA enib
To KI2 4
p o 58
.6 n
4
bi xe
R sta
4
Er 24
La 545
PL 575
T 8
ZD eca
47
AE AA
.0 tlin
AZ W5
AZ 05
L. tec
.0 59
A. 10
So 2
E6
no lita
Iri loti
t
6
P 4
ilo
f
7.
t
X1
Drug
(b)
NS matrix SAM genes NS matrix & Clusters SAM & Clusters
Median AUC = 0.709 Median AUC = 0.715 Median AUC = 0.722 Median AUC = 0.72
1.00
0.75
TPR
0.50
0.25
0.00
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
FPR FPR FPR FPR
(c)
NS matrix SAM genes NS matrix & Clusters SAM & Clusters
Median AUC = 0.535 Median AUC = 0.497 Median AUC = 0.514 Median AUC = 0.519
1.00
0.75
TPR
0.50
0.25
0.00
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
FPR FPR FPR FPR
(d) SAM genes NS matrix & Clusters SAM & Clusters
NS matrix
Median AUC = 0.551 Median AUC = 0.545 Median AUC = 0.534 Median AUC = 0.522
1.00
0.75
TPR
0.50
0.25
0.00
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
FPR FPR FPR FPR
Fig. 6. Drug response prediction. (a) Percentage of sensitive CCLs per drug. (b) ROC curves after applying a leave-one-out
cross-validation. Lines represent the 23 drugs. “NS matrix” is the complete network-smoothed matrix—448 CCLs and 12,486
genes; “SAM genes” only includes genes identified as relevant in the clusters—448 CCLs and 3119 genes; “NS matrix &
Clusters” is the complete network-smoothed matrix plus the cluster information; and “SAM & cluster” refers to both the relevant
genes and the cluster information. (c) ROC curves for the prediction of the external data set and (d) including only the top resistant/
sensitive CCLs in both data sets.
Please cite this article as: T. Juan-Blanco, et al., Rationalizing Drug Response in Cancer Cell Lines, J. Mol. Biol. (2018), https://doi.
org/10.1016/j.jmb.2018.03.021
10 Rationalizing Drug Response in Cancer Cell Lines
Thanks to the systems biology approach to propa- to map them onto the PPI network. For somatic
gate the effect of over/under-expression and mutations mutations, we considered 1 if a given gene is mutated
throughout the network, we detected 724 genes that in a CCL and 0 otherwise. The following variants were
were not characterized in the CCLE and might, we filtered out: common polymorphisms, allelic fraction
believe, guide mechanistic interpretation due to their b 10%, putative neutral variants (missenses present in
proximity to altered genes. less than 2 warm-blooded vertebrates) and located
Finally, we checked whether the network-propagated outside of the CDS for all transcripts. Regarding
profiles could be used to predict drug response. We transcriptomic data, 1 refers to differentially expressed
predicted drug sensitivity with an AUROC higher than genes, either up- or down-regulated. To detect
0.7 for 13 of the 24 drugs. However, the poor correlation differentially expressed genes, we normalized the
between drug responses in different CCL panels basal expression of each gene in a cell line to the
makes the external validation of the predictions tough. expression distribution of the gene in all CCLs, yielding
This has been a classical concern of machine-learning a differential gene-expression Z-score. We defined
based prediction of CCL drug response [34–36]. We as up- or down-regulated those genes with Z ≥ 2 or
hope that approaches like the one presented here, Z ≤ −2, respectively. To merge expression and muta-
more focused on the identification of key, robust genetic tion data, we summed up the network-smoothed
determinants of drug response, will lead to more matrices, and afterward, we scaled the merged matrix
informed predictions and finally shed light onto the between 0 and 1. To optimize the number of clusters,
molecular bases of drug action. we minimized the groups with less than 10 CCLs and
the standard deviation of the number of cells per group
(Fig. S1). To compare the clustering results, we
Methods calculated the normalized mutual information.
We applied the SAM [27] method to identify those
genes with a significantly high score within clusters
CCL data
(FDR b 0.01). We compared every cluster with the
remained cohort. We run the functional enrichment
We retrieved CCL data from the CCLE portal (https://
analysis on the DAVID database Web Services. We
portals.broadinstitute.org/ccle). We downloaded the
considered biological processes and molecular func-
gene expression data (version 2012-09-29), the Hybrid tions from the Gene Ontology, as well as KEGG and
capture sequence (version 2012-10-18) and the drug
Reactome pathways.
profiling (version 2015-02-24). We included in our
analysis the 448 CCLs that are present in the three Cluster correlation with drug response
data sets. The final data set contains somatic mutations
and indels for 1651 genes, expression data for 18,987
To assess whether CCLs within clusters present
genes and pharmacological profiles for 24 antineo-
similar drug response, we performed the ANOVA test
plastic drugs.
on the drug response distribution along clusters. We
used the activity area (ActA) as drug response measure
PPI data
since it provides a comprehensive representation of
drug activity according to the CCLE. Clusters were also
We used an in-house PPI network, generated by
compared pairwise to identify significant differences
merging data available from major public PPI data-
in drug response (Wilcoxon's rank test, Benjamini–
bases (version 2016_06) [20]. We only considered Hochberg p value b 0.05).
physical binary interactions. To incorporate the molec-
ular data onto the network, we mapped ENTREZ Gene
Gene–drug response associations
IDs to UniProtACs using the UniProtID mapping tool.
We also gathered two functional interaction networks
To analyze whether the genes identified previously
from STRING [24] and inBioMap [25] databases. We
are associated with differential drug response, we
applied confidence cutoffs of 0.7 and 0.2, respectively,
compared their expression/mutation status between
regarded as “high confidence” by the databases'
sensitive and resistant CCLs. For each drug–gene pair,
authors. The STRING network contains 14,725 pro-
we built a contingency table to see whether the
teins and 300,686 interactions and the inBioMap
expression/mutation status of the gene is significantly
network, 10,100 proteins and 168,970 interactions. different between sensitive and resistant CCLs. We
applied the Bonferroni correction to adjust for multiple
NBS of CCLs Fisher's exact testing. We used the waterfall plot to
classify CCLs as resistant or sensitive as described in
We applied the NBS method [19] to integrate gene Haibe-Kains et al. [34]. In brief, for each compound, the
expression and somatic mutations data with the PPI shape of the rank-ordered plot of the responses values
network. We used the MATLAB code provided by the was inspected to classify CCLs into sensitive or
authors. Expression and mutation data were binarized resistant. If the drug response distribution was linear,
Please cite this article as: T. Juan-Blanco, et al., Rationalizing Drug Response in Cancer Cell Lines, J. Mol. Biol. (2018), https://doi.
org/10.1016/j.jmb.2018.03.021
Rationalizing Drug Response in Cancer Cell Lines 11
we used as inflection point the median. If it was not Line Encyclopedia; SAM, Significant Analysis Of
linear, we took the point on the curve with the maximal Microarray; BART, Bayesian Regression Trees; AUROC,
distance to a line drawn between the start and end area under the ROC curve.
points of the distribution.
Please cite this article as: T. Juan-Blanco, et al., Rationalizing Drug Response in Cancer Cell Lines, J. Mol. Biol. (2018), https://doi.
org/10.1016/j.jmb.2018.03.021
12 Rationalizing Drug Response in Cancer Cell Lines
[20] R. Mosca, A. Ceol, P. Aloy, Interactome3D: adding structural [30] S. Janssens, A. Tinel, S. Lippens, J. Tschopp, PIDD
details to protein networks, Nat. Methods 10 (2013) 47–53. mediates NF-kappaB activation in response to DNA damage,
[21] M. Uhlen, L. Fagerberg, B.M. Hallstrom, C. Lindskog, P. Cell 123 (2005) 1079–1092.
Oksvold, A. Mardinoglu, et al., Proteomics. Tissue-based [31] S. Temraz, D. Mukherji, A. Shamseddine, Dual inhibition of
map of the human proteome, Science 347 (2015), 1260419. MEK and PI3K pathway in KRAS and BRAF mutated
[22] D. Huang, W. Sun, Y. Zhou, P. Li, F. Chen, H. Chen, et al., colorectal cancers, Int. J. Mol. Sci. 16 (2015) 22976–22988.
Mutations of key driver genes in colorectal cancer progression [32] L.A. Dossett, R.R. Kudchadkar, J.S. Zager, BRAF and MEK
and metastasis, Cancer Metastasis Rev. 37 (2018) 173–187. inhibition in melanoma, Expert Opin. Drug Saf. 14 (2015)
[23] P. Geeleher, Z. Zhang, F. Wang, R.F. Gruener, A. Nath, G. 559–570.
Morrison, et al., Discovering novel pharmacogenomic bio- [33] H.A. Chipman, E.I. George, R.E.M., BART: Bayessian
markers by imputing drug response in cancer patients from Additive Regression Trees, Ann. Appl. Stat. 4 (2010) 33.
large genomics studies, Genome Res. 27 (2017) 1743–1751. [34] B. Haibe-Kains, N. El-Hachem, N.J. Birkbak, A.C. Jin, A.H.
[24] D. Szklarczyk, A. Franceschini, S. Wyder, K. Forslund, D. Beck, H.J. Aerts, et al., Inconsistency in large pharmacoge-
Heller, J. Huerta-Cepas, et al., STRING v10: protein–protein nomic studies, Nature 504 (2013) 389–393.
interaction networks, integrated over the tree of life, Nucleic [35] C. Hatzis, P.L. Bedard, N.J. Birkbak, A.H. Beck, H.J. Aerts, D.F.
Acids Res. 43 (2015) D447–52. Stem, et al., Enhancing reproducibility in cancer drug screening:
[25] T. Li, R. Wernersson, R.B. Hansen, H. Horn, J. Mercer, G. how do we move forward? Cancer Res. 74 (2014) 4016–4023.
Slodkowicz, et al., A scored human protein–protein interaction [36] I. Cortes-Ciriano, G.J. van Westen, G. Bouvier, M. Nilges, J.P.
network to catalyze genomic interpretation, Nat. Methods 14 Overington, A. Bender, et al., Improved large-scale prediction
(2017) 61–64. of growth inhibition patterns using the NCI60 cancer cell line
[26] I. Cortes-Ciriano, L.H. Mervin, A. Bender, Current trends in drug panel, Bioinformatics 32 (2016) 85–95.
sensitivity prediction, Curr. Pharm. Des. 22 (2016) 6918–6927. [37] A. Kapelner, J. Bleich, bartMachine: Machine Learning
[27] V.G. Tusher, R. Tibshirani, G. Chu, Significance analysis of with Bayesian Additive Regression Trees, J. Stat. Softw. 70
microarrays applied to the ionizing radiation response, Proc. (2016) 40.
Natl. Acad. Sci. U. S. A. 98 (2001) 5116–5121. [38] M. Nakamura, Y. Kajiwara, A. Otsuka, H. Kimura, LVQ-
[28] G. Dennis Jr., B.T. Sherman, D.A. Hosack, J. Yang, W. Gao, SMOTE—Learning Vector Quantization based Synthetic
H.C. Lane, et al., DAVID: Database for Annotation, Visual- Minority Over-sampling Technique for biomedical data,
ization, and Integrated Discovery, Genome Biol. 4 (2003) P3. BioData Min. 6 (2013) 16.
[29] Y. Lin, W. Ma, Benchimol S. Pidd, a new death-domain-
containing protein, is induced by p53 and promotes apoptosis,
Nat. Genet. 26 (2000) 122–127.
Please cite this article as: T. Juan-Blanco, et al., Rationalizing Drug Response in Cancer Cell Lines, J. Mol. Biol. (2018), https://doi.
org/10.1016/j.jmb.2018.03.021