Genomics of Cancer Drug Sensitivity
Genomics of Cancer Drug Sensitivity
net/publication/233770705
CITATIONS READS
3,147 1,678
17 authors, including:
All content following this page was uploaded by Richard Thompson on 20 September 2014.
Received August 29, 2012; Revised October 15, 2012; Accepted October 20, 2012
ABSTRACT INTRODUCTION
Alterations in cancer genomes strongly influence There is compelling evidence that alterations in cancer
clinical responses to treatment and in many genomes can strongly influence clinical responses to
instances are potent biomarkers for response to anticancer therapies. Indeed, there are now several
drugs. The Genomics of Drug Sensitivity in Cancer examples where genomic changes can be used as molecular
(GDSC) database ([Link]) is the biomarkers to identify patients most likely to benefit from
a treatment. For example, the use of drugs to target the
largest public resource for information on drug
protein product of the BCR–ABL translocation in chronic
sensitivity in cancer cells and molecular markers
myeloid leukemia, or the BRAF gene in malignant
of drug response. Data are freely available without melanoma, has transformed the treatment of these
restriction. GDSC currently contains drug sensitivity diseases and substantially improved survival rates (1,2).
data for almost 75 000 experiments, describing Despite these notable successes, many cancer drugs in
response to 138 anticancer drugs across almost use or development have not been linked to specific
700 cancer cell lines. To identify molecular genomic markers that could direct their clinical use to
markers of drug response, cell line drug sensitivity maximize patient benefit. Moreover, even among appro-
data are integrated with large genomic datasets priately selected patients, a poorly explained range of
obtained from the Catalogue of Somatic Mutations clinical responses is observed (2,3). Thus, there exists a
in Cancer database, including information on need for the development of new and improved bio-
somatic mutations in cancer genes, gene markers to guide therapies and ultimately improve
amplification and deletion, tissue type and tran- clinical responses.
scriptional data. Analysis of GDSC data is through Recent years have seen significant advances in our
understanding of the molecular nature of cancer (4).
a web portal focused on identifying molecular bio-
This has been driven in part by advances in high-
markers of drug sensitivity based on queries of
throughput technologies and, in particular, DNA
specific anticancer drugs or cancer genes. sequencing technologies that allow us to sequence on a
Graphical representations of the data are used scale that was previously unthinkable. In the near
throughout with links to related resources and future, sequencing efforts will provide a complete descrip-
all datasets are fully downloadable. GDSC provides tion of the genomic changes that occur in many cancer
a unique resource incorporating large drug subtypes. A complete list of the repertoire of cancer
sensitivity and genomic datasets to facilitate the genes will provide profound insights into the origins, evo-
discovery of new therapeutic biomarkers for cancer lution and progression of cancer and will act as an impetus
therapies. for the development of new cancer therapies.
*To whom correspondence should be addressed. Tel: +44 1223 494878; Fax: +44 1226 494919; Email: mg12@[Link]
Correspondence may also be address to Ultan McDermott. Tel: +44 1223 494856; Fax: +44 1226 494919; Email: um1@[Link]
To exploit this increased understanding, preclinical slope of the dose–response curve and the area under the
studies that link the genomic complexity of cancer with curve for each experiment.
functional readouts such as drug sensitivity are required. The current release of GDSC (release 2, July 2012)
Cancer cell lines derived from naturally occurring tumours includes drug sensitivity data for 138 anticancer com-
have been generated from many different cancer types and pounds screened across a range of 329–668 cell lines per
in many respects recapitulate the tissue type and genomic drug (mean = 525 cell lines per drug) representing 73 169
context of cancer. They are a facile system for experimen- cell line–drug interactions. This is the largest public
tal manipulation and are a standard research tool in mo- resource available on drug sensitivity in cancer cells.
lecular biology and drug discovery. Significantly, several Screening is ongoing and the objective is to screen these
studies have used cancer cell lines to link pharmacological compounds, as well as additional compounds in the
data with genomic information and helped define thera- future, across the entire collection of >1000 cell lines.
peutic biomarkers (5–7). Collectively, these studies have Data release occurs every 4 months and with each
demonstrated that pharmacogenomic profiling in cancer release, these results are updated with new data for
cell lines can be used as a biomarker discovery platform to existing drugs, as well as data for newly screened drugs.
guide the development of new cancer therapies.
The Genomics of Drug Sensitivity in Cancer (GDSC) Genomic datasets for cell lines
database ([Link]) is designed to facilitate The total collection available for screening includes >1000
an increased understanding of the molecular features that different cancer cell lines. These have been selected to rep-
influence drug response in cancer cells and which will resent the spectrum of common and rare types of adult
enable the design of improved cancer therapies. GDSC and childhood cancers of epithelial, mesenchymal and
holds and annotates large datasets on drug sensitivity in haematopoietic origin. The cell lines have been extensively
cancer cells and links these data to detailed genomic in- genomically characterized as part of the cancer cell line
formation to facilitate the discovery of molecular bio- project from the Cancer Genome Project at the WTSI.
markers of drug response. The website is designed to The genomic datasets currently available for each cell
provide straightforward access to querying the database, line include information on somatic mutations in 75
and interactive graphical interfaces are used throughout to cancer genes, genome wide gene copy number for ampli-
provide readily interpretable summaries of data and fication and deletion, targeted screening for seven gene
analyses. rearrangements, markers of microsatellite instability,
tissue type and transcriptional data. Using various statis-
tical approaches as described below, genomic datasets are
DATABASE CONTENT used together with drug sensitivity data for each cell line
The GDSC database is based on three types of datasets as to identify genomic biomarkers of drug response.
described in the following sections. Genomic datasets within GDSC are obtained and
updated directly from the Catalogue of Somatic
Mutations in Cancer (COSMIC) database, a comprehen-
Cell line drug sensitivity data sive freely available resource for the annotation and pres-
Cancer cell line drug sensitivity data are generated from entation of somatic mutations in cancer (8).
ongoing high-throughput screening performed by the
Cancer Genome Project at the Wellcome Trust Sanger Analysis of genomic features of drug sensitivity
Institute (WTSI) and the Center for Molecular An essential component of the GDSC database is the sys-
Therapeutics at Massachusetts General Hospital using a tematic integration of large-scale genomic and drug sensi-
collection of >1000 cell lines (7). Compounds selected for tivity datasets. To identify genomic markers of drug
screening are anticancer therapeutics encompassing both response, we currently use two complementary analytical
targeted agents and cytotoxic chemotherapeutics. They approaches (7). A multivariate analysis of variance
are comprised of approved drugs used in the clinic, (MANOVA) is used to correlate drug sensitivity (IC50
drugs undergoing clinical development and in clinical values and slope of the dose–response curve) with
trials and tool compounds in early phase development. genomic alterations in cancer including point mutations,
They cover a wide range of targets and processes amplifications and deletions of common cancer genes,
implicated in cancer biology including receptor tyrosine cancer gene rearrangements and microsatellite instability.
kinase signalling, cell cycle control, DNA damage The MANOVA identifies individual genomic features
response and the cytoskeleton. Compounds are sourced associated with drug sensitivity and for each drug–gene
from commercial vendors or provided by collaborators association reports a size effect and statistical significance
in academia, biotech and the pharmaceutical industry. of the association.
Cell line drug sensitivity is measured using fluorescence- We also apply elastic net regression, a penalized linear
based cell viability assays following 72 h of drug treat- modelling technique, to identify multiple interacting
ment. Dose–response curves are fitted to fluorescence genomic features influencing each drug response.
signal intensities over nine drug concentrations (2-fold Genomic data used in the elastic net analysis include all
dilution series) to derive a multi-parameter signature of of those used in the MANOVA and also incorporate
drug response. Values reported on the website include genome-wide transcriptional profiles and tissue type. The
the half maximal inhibitory concentration (IC50), the elastic net selects which of these features are associated
Nucleic Acids Research, 2013, Vol. 41, Database issue D957
with drug response as measured by IC50 values across the and to the UniProt databases for further protein informa-
cell line panel. For each drug, a feature list is built tion (10). Clicking on the gene name accesses the drug
comprised of mutations, transcripts and tissue with an sensitivity and genomic correlation data on the individual
effect size assigned to each. gene page.
A more detailed description of the different It is also possible to query the database using a ‘Search’
statistical analyses performed, as well as guidance on in- function (Figure 1). The ‘Search’ box accepts queries
terpreting the results, can be found on the ‘Help & based on compound (including synonyms), cancer gene
Documentation’ webpages under the ‘statistical analysis’ or cell line name. An auto-completion feature enables
tab. users to quickly select their drug, gene or cell line of
interest. The search result page lists matching compounds,
cancer genes or cell lines with links to the detailed drug/
DATA ACCESS gene page of the website. In the case of cell line matches,
links are provided to detailed cell line information within
Querying the GDSC database COSMIC.
The website is focused on presenting cell line drug sensi-
tivity data and genomic correlates of drug sensitivity. Data analysis and visualization
Although data on the genomic characterization of the Screening data and genomic correlations are accessed
cell lines are available through the GDSC website, these through specific drug or gene pages (Figures 2a and 3).
data are presented in more detail within the COSMIC The top panel provides drug or gene information and links
database. to PUBCHEM, COSMIC and UniProt databases as
To facilitate data interpretation, graphical representa- appropriate. Notably, the top panel also provides links
tions with interactive features are used wherever possible. to relevant help pages to explain the data and analyses
Querying the database is primarily based on either specific performed. Additional information is also available from
screening ‘Compounds’ or ‘Cancer Genes’ in the ‘Browse the ‘Help & Documentation’ link found in the header at
our data’ section of the homepage (Figure 1). Browsing by the top of all pages. The actual screening data and
‘Compounds’ displays a list of drug names together with analyses are presented in the bottom panel of a drug/
their associated synonyms, putative therapeutic target(s), gene page and are split into the following tabs: Volcano
the number of cell lines screened for each drug (sample plot, Volcano data, Elastic net (drug pages only), Scatter
size) and date of the most recent data update for each plots and Download data.
compound. A link to the PUBCHEM database of A volcano plot is used to visualize the correlation of
chemical structures is provided (9). By clicking a specific drug sensitivity data with genetic events as calculated
drug name, users enter the individual drug page where using the MANOVA. The drug page shows a drug-specific
drug sensitivity and genomic correlation data are volcano plot, which represents how different genomic
presented. changes influence response to a specific drug (Figure 2a).
Similarly, browsing ‘Cancer Genes’ leads to a list of The gene page shows a gene-specific volcano plot, which
cancer genes identified by their HUGO name. This page represents the effect of a mutated cancer gene on the re-
provides direct links to the COSMIC page for the gene sponses to all drugs analysed (Figure 3). For example, the
Data analysis
(M)ANOVA data
Elastic net IC50 Scatter
Volcano plot
EN heatmap plots
Volcano data
Data downloads
Drug Statistical
Genomics sensitivity analyses
(e.g. mutations, Data archive
copy number)
(e.g. IC50, AUC) MANOVA
Elastic net
Figure 1. A schematic representation of the GDSC database structure and content. Data are accessed in a hierarchical fashion by either querying by
screening compound or cancer gene of interest. This gives access to graphical representations of cell line drug sensitivity data and genomic correl-
ations of drug response in multiple formats through either drug- or gene-specific pages. All data are freely available for download either through
gene- or drug-specific pages, or as a whole through the download page.
D958 Nucleic Acids Research, 2013, Vol. 41, Database issue
Figure 2. Querying the GDSC database by compound name. Drug-specific pages demonstrate the effect of genomic features on cell line sensitivity to
a particular drug. In this example, we show the effect of genomic features on sensitivity to the BRAF-inhibitor PLX4720. (a) A volcano plot
representation of MANOVA results showing the magnitude (x-axis) and significance (P-value, log scale on inverted axis) for each cancer gene
association. Each circle represents a single drug–gene interaction and the size is proportional to the number of mutant cell lines screened for each
drug. For clarity, the y-axis is capped at P = 1 10 8 and a plus sign (+) next to a circle indicates that the P-value is smaller than this threshold. The
dashed red line represents a Benjamini–Hochberg multiple testing correction for significance and only significant associations are coloured either
green for drug sensitivity or red for resistance. (b) Elastic net analysis of genomic features associated with sensitivity to PLX4720. Features with
negative effect size are associated with drug sensitivity and features with positive effect size are associated with drug resistance (all features are
negative in this example). Mutation and tissue features are at the top of the heatmap to represent the presence (black) or absence (grey) of a
mutation/tissue subtype. Below this are gene expression and copy number features with blue corresponding to lower expression or copy number, and
red to indicate higher expression or copy number.
drug-specific volcano plot for the BRAF-inhibitor different BRAF inhibitors (i.e. PLX4720, SB590885 and
PLX4720 shows that mutations in the gene BRAF are sig- AZ628) (Figure 3). In both cases, the x-axis represents the
nificantly associated with sensitivity to this compound magnitude of the effect of a gene–drug interaction on IC50
(Figure 2a). Conversely, the gene-specific volcano plot values across the cell lines screened and the y-axis is the
for BRAF shows that mutations in this gene are associated significance of the interaction (P-value). By hovering over
with sensitivity to multiple drugs including several each circle, the following information is provided: genetic
Nucleic Acids Research, 2013, Vol. 41, Database issue D959
Figure 3. Querying the GDSC database by cancer gene. Gene-specific pages show how a cancer gene mutation affects response to many drugs.
A volcano plot representation shows results of the MANOVA analysis for drug sensitivity associated with BRAF mutations.
event sample size (i.e. the number of cell lines screened BRAF mutation are on average more sensitive to
with a specific mutation), effect size and P-value. By PLX4720 compared with BRAF wild-type cell lines
clicking on an individual circle, it is possible to link to a (Figure 4). The table in the middle shows the statistics
scatter plot of cell line IC50 values for this association (see for the plot including sample size, and the mean and
below). The volcano data tab represents the volcano plot median IC50 values for the two populations (mutated or
data as a sortable table. Three buttons at the top of the wild type). Additional functionality includes the ability of
table allow the download of the table in .csv, .tab or .xlsx users to select which drugs (or genes from the drug-specific
file format. pages) to plot. It is possible to link directly to relevant
Similarly, the elastic net tab contains a graphical repre- scatter plots by clicking on circles within the volcano
sentation of results from the elastic net analysis of drug plot pages. Furthermore, by clicking on circles within
sensitivity (Figure 2b). For effective visualization, a scatter plots, cell line IC50 values are directly linked to
maximum of 10 significant features associated with drug the COSMIC database facilitating integration of drug sen-
response are shown. These may include tissue type, muta- sitivity data with detailed cell line information such a
tions in cancer genes, expression levels and gene copy tissue type, tumour histology and a description of cell
number. Each graphic contains three elements: a bar line origin and genotype.
plot of effect size for significant features (right-hand
side), a heatmap of genomic features (central panel) and
a second heatmap of IC50 values for the 20 least and most Download data
sensitive cell lines (bottom). For example, the elastic net As the emphasis of the website is on the graphical repre-
analysis for BRAF-inhibitor PLX4720 identified muta- sentation of data both the volcano and scatter plots are
tions in the BRAF gene, the tissue-type skin, as well as downloadable as either .png or .svg files. In addition, the
several transcriptional features (BCL2A1, GYPC and raw data are available to download in either .csv or .xlsx
DAAM2) as associated with drug sensitivity (Figure 2b). format. As described below, it is possible to download the
Unlike the MANOVA analysis, gene-specific correlations data for a specific drug or gene on their associated pages,
for the elastic net analysis are not represented since the or to download the data from all of our analyses in a series
EN describes how multiple genes affect drug sensitivity of large spreadsheets.
together. On the drug page for a specific compound, the available
The ‘Scatter plots’ tab shows a plot of cell line IC50 downloads include (i) sensitivity data for the drug (a table
values to a drug. IC50 values are split into two populations of cell line IC50 values); (ii) genomic alterations in cell
according to a cell lines mutational status for a given gene lines; (iii) genomic correlations with MANOVA and (iv)
that is significantly associated with the drug response elastic net analysis of drug sensitivity. On a gene page a
(Figure 4). In the example provided, cell lines with a single data download is available, containing the
D960 Nucleic Acids Research, 2013, Vol. 41, Database issue
Figure 4. Scatter plot of cell line IC50 values and the effect of a cancer mutation. A scatter plot of cell line IC50 values for BRAF-mutated versus
wild-type cell lines following drugging with PLX4720. Each circle represents the IC50 value for an individual cell line plotted on a logarithmic scale
and the red line is the geometric mean of the population. Cell lines are colour coded to indicate whether the mutation is a coding mutation detected
by sequencing, or an amplification or deletion detected by copy number analysis. The lower and upper brown lines indicate the minimum and
maximum concentration in micro-molar of drug used for screening. Super-imposed on the scatter plot is a box-and-whisker plot showing the median,
interquartile ranges and max and min for each plot. The central panel contains statistics for the plot and the right-hand table allows users to select
which drug data to plot.
MANOVA correlation for how a gene correlates with for hundreds of newly screened anticancer drugs.
drug response across the entire panel of compounds. Collectively, this will expand the number of different
Rather than downloading drug- or gene-specific data, cancer subtypes and genotypes represented within the
drug sensitivity and genomic datasets can also be directly cell line collection, as well as the number of different
downloaded as a whole through the ‘Downloads’ page. drug targets interrogated by screening compounds.
This can be directly accessed from the header on each Additional developments will see the further genomic
page. Downloadable files include (i) cell line tissue type, characterization of the cell line collection to increase its
drug sensitivity and genomic data used for the MANOVA; utility as a resource. Notably, this will include whole-
(ii) the MANOVA results for all compounds; (iii) A exome sequencing of all 22 000 coding genes across the
tissue-specific ANOVA to examine the effect of tissue entire collection. Whole genome SNP6.0 copy number
type on drug response; (iv) the elastic net results for all data currently include 750 cell lines and this will be
compounds; (v) cell line genomic and transcriptional data expanded to include the entire cell line collection.
used for elastic net analysis and (vi) a continuously Similarly, basal transcriptional data are currently being
updated list of cancer cell lines in our collection. Please updated to include the entire cell line collection using
note that some of these files contain a large number of the latest Affymetrix human genome U219 mRNA expres-
columns and data will be lost if files are opened in Excel sion array. These new genomic datasets, together with our
2003 or earlier versions because the worksheet size is expanding drug sensitivity datasets, will be incorporated
limited to 256 columns. The ‘Downloads’ page also into our analytical models to enhance our ability to
provides access to archive files of previous data releases. identify therapeutic biomarkers predictive of drug
response.
Large numbers of primary tumours across different can-
FUTURE WORK cer types are being extensively genomically characterized
The GDSC database will expand significantly in coming by systematic efforts such as the International Cancer
years as the size and complexity of datasets increase. The Genomics Consortium. This will give us profound
database currently contains data for 75 000 experiments insights into the molecular taxonomy of cancer and, for
across 138 drugs and the amount of drug sensitivity data is the first time, enable us to directly assess the genomic
expected to increase in size 2–3-fold within the next 2 years similarity of our cell line models to primary tumours.
and even further in the future. This will include drug sen- Based on these comparisons, we will refine and expand
sitivity data for many new cell lines to bring the total the cell line collection to ensure that they are as represen-
number to >1000 lines, and the inclusion of data tative as possible of primary tumours. Similarly, it is
Nucleic Acids Research, 2013, Vol. 41, Database issue D961