Small Molecules and siRNA: Methods to Explore Bioactivity Data

Small Molecules and siRNA:Methods to Explore Bioactivity DataRajarshi GuhaNIH Chemical for Translational TherapeuticsAugust 17, 2011Pfizer, Groton

BackgroundCheminformatics methodsQSAR, diversity analysis, virtual screening, fragments, polypharmacology, networksMore recentlysiRNAscreening, high content imaging,combination screeningExtensive use of machine learningAll tied together with software developmentIntegrate small molecule information & biosystems – systems chemical biology

OutlineExploring the SAR landscapeThe landscape view of SAR dataQuantifying SAR landscapesExtending an SAR landscapeLinking small molecule & RNAiHTSOverview of the Trans NIH RNAi Screening InitiativeInfrastructure componentsLinking small molecule & siRNA screens

The Landscape View of Structure Activity Datasets

Structure Activity RelationshipsSimilar molecules will have similar activitiesSmall changes in structure will lead to small changes in activityOne implication is that SAR’s are additiveThis is the basis for QSAR modelingMartin, Y.C. et al., J. Med. Chem., 2002, 45, 4350–4358

Structure Activity LandscapesRugged gorges or rolling hills?Small structural changes associated with large activity changes represent steep slopes in the landscapeBut traditionally, QSAR assumes gentle slopesMachine learning is not very good for special casesMaggiora, G.M., J. Chem. Inf. Model., 2006, 46, 1535–1535

Characterizing the LandscapeA cliff can be numerically characterizedStructure Activity Landscape Index (SALI)Cliffs are characterized by elements of the matrix with very large valuesGuha, R.; Van Drie, J.H., J. Chem. Inf. Model., 2008, 48, 646–658

Visualizing SALI ValuesThe SALI graphCompounds are nodesNodes i,j are connected if SALI(i,j) > XOnly display connected nodes

What Can We Do With SALI’s?SALI characterizes cliffs & non-cliffsFor a given molecular representation, SALI’s gives us an idea of thesmoothness of the SAR landscapeModels try and encodethis landscapeUse the landscape to guidedescriptor or model selection

Descriptor Space SmoothnessEdge count of the SALI graph for varying cutoffsMeasures smoothness of the descriptor spaceCan reduce this to a single number (AUC)

Other ExamplesInstead of fingerprints, we use molecular descriptorsSALI denominator now uses Euclidean distance2D & 3D random descriptor setsNone are really goodToo rough, orToo flat2D3D

Feature Selection Using SALISurprisingly, exhaustive search of 66,000 4-descriptor combinations did not yield semi-smoothly decreasing curvesNot entirely clear what type of curve is desirable

Measuring Model QualityA QSAR model should easily encode the “rolling hills”A good model captures the most significantcliffsCan be formalized as How many of the edge orderings of a SALI graph does the model predict correctly?Define S (X ), representing the number of edges correctly predicted for a SALI network at a threshold XRepeat for varying X and obtain the SALI curve

Model Search Using the SCIWe’ve used the SALI to retrospectively analyze modelsCan we use SALI to develop models?Identify a model that captures the cliffsTrickyCliffs are fundamentally outliersOptimizing for good SALI values implies overfittingNeed to trade-off between SALI & generalizability

Predicting the LandscapeRather than predicting activity directly, we can try to predict the SAR landscapeImplies that we attempt to directly predict cliffsObservations are now pairs of moleculesA more complex problemChoice of features is trickierStill face the problem of cliffs as outliersSomewhat similar to predicting activity differencesScheiber et al, Statistical Analysis and Data Mining, 2009, 2, 115-122

MotivationPredicting activity cliffs corresponds to extending the SAR landscapeIdentify whether a new molecule will perform better or worse compared to the specific molecules in the datasetCan be useful for guiding lead optimization, but not necessarily useful for lead hopping

Predicting CliffsDependent variable are pairwise SALI values, calculated using fingerprintsIndependent variables are molecular descriptors – but considered pairwiseAbsolute difference of descriptor pairs, orGeometric mean of descriptor pairs…Develop a model to correlate pairwise descriptors to pairwise SALI values

A Test CaseWe first consider the CavalliCoMFA dataset of 30 molecules with pIC50’sEvaluate topological and physicochemical descriptorsDeveloped random forest modelsOn the original observed values (30 obs)On the SALI values (435 observations)Cavalli, A. et al, J Med Chem, 2002, 45, 3844-3853

Double Counting Structures?The dependent and independent variables both encode structure. But pretty low correlations between individual pairwisedescriptors and the SALI values

Model SummariesOriginal pIC50RMSE = 0.97SALI, AbsDiffRMSE = 1.10SALI, GeoMeanRMSE = 1.04All models explain similar % of variance of their respective datasets Using geometric mean as the descriptor aggregation function seems to perform bestSALI models are more robust due to larger size of the dataset

Test Case 2Considered the Holloway docking dataset, 32 molecules with pIC50’s and EinterSimilar strategy as beforeNeed to transform SALI values Descriptors show minimal correlationHolloway, M.K. et al, J Med Chem, 1995, 38, 305-317

Model SummariesOriginal pIC50RMSE = 1.05SALI, AbsDiffRMSE = 0.48SALI, GeoMeanRMSE = 0.48The SALI models perform much poorer in terms of % of variance explainedDescriptor aggregation method does not seem to have much effectThe SALI models appear to perform decently on the cliffs – but misses the most significant

Model SummariesOriginal pIC50RMSE = 1.05SALI, AbsDiffRMSE = 9.76SALI, GeoMeanRMSE = 10.01With untransformed SALI values, models perform similarly in terms of % of variance explainedThe most significant cliffs correspond to stereoisomers

Test Case 338 adenosine receptor antagonists with reported Ki values; use 35 for training and 3 for testingRandom forest model on the SALI values performed reasonable well (RMSE = 7.51, R2=0.62)Upper end ofSALI rangeis better predictedKalla, R.V. et al, J. Med. Chem., 2006, 48, 1984-2008

Test Case 3The dataset does not containing really big cliffs

Generally, performance is poorer for smaller cliffsFor any given hold out molecule, range of error in SALI prediction is largeSuggests that some form of domain applicability metric would be useful

Model CaveatsModels based on SALI values are dependent on their being an SAR in the original activity dataScrambling results for these models are poorer than the original models but aren’t as random as expected

ConclusionsSALI is the first step in characterizing the SAR landscapeAllows us to directly analyze the landscape, as opposed to individual moleculesBeing able to predict the landscape could serve as a useful way to extend an SAR landscape

Joining the Dots: Integrating High Throughput Small Molecule and RNAi Screens

RNAi Facility MissionPathway (Reporter assays, e.g. luciferase, b-lactamase)Simple Phenotypes (Viability, cytotoxicity, oxidative stress, etc)Perform collaborative genome-wide RNAi screening-based projects with intramural investigatorsAdvance the science of RNAi and miRNA screening and informatics via technology development to improve efficiency, reliability, and costs.Complex Phenotypes (High-content imaging, cell cycle, translocation, etc)Range of Assays

RNAi Informatics Infrastructure

RNAi Analysis WorkflowRaw and Processed DataGO annotationsPathwaysInteractionsHit ListFollow-up

RNAi Informatics ToolsetLocal databases (screen data, pathways, interactions, etc).Commercial pathway tools. Custom software for loading, analysis and visualization.

Back End ServicesCurrently all computational analysis performed on the backendR & Bioconductor codeCustom R package (ncgcrnai) to support NCGC infrastructurePartly derived from cellHTS2Supports QC metrics, normalization, adjustments, selections, triage, (static) visualization, reportsSome Java tools forData loadingLibrary and plate registration

RNAi& Small Molecule ScreensCAGCATGAGTACTACAGGCCATACGGGAACTACCATAATTTAWhat targets mediate activity of siRNA and compoundPathway elucidation, identification of interactions Reuse pre-existing MLI data

Develop new annotated librariesTarget ID and validationLink RNAi generated pathway peturbations to small molecule activities. Could provide insight into polypharmacology Run parallel RNAi screenGoal: Develop systems level view of small molecule activity

HTS for NF-κB AntagonistsNF-κB controls DNA transcription Involved in cellular responses to stimuliImmune response, memory formationInflammation, cancer, auto-immune diseaseshttp://www.genego.com

HTS for NF-κB AntagonistsME-180 cell lineStimulate cells using TNF, leading to NF-κB activation, readout via a β-lactamase reporterIdentify small molecules and siRNA’s that block the resultant activation

Small Molecule HTS Summary2,899 FDA-approved compounds screened55 compounds retested activeWhich components of the NF-κB pathway do they hit?17 molecules have target/pathway information in GeneGOLiterature searches list a few moreMost Potent ActivesProscillaridin ATrabectidinDigoxinMiller, S.C. et al, Biochem. Pharmacol., 2010, ASAP

RNAi HTS SummaryQiagen HDG library – 6886 genes, 4 siRNA’s per geneA total of 567 genes were knockeddown by 1 or more siRNA’sWe consider >= 2 as a “reliable” hit16 reliable hitsAdded in 66 genes for follow up via triage procedure

The Obvious ConclusionThe active compounds target the 16 hits (at least) from the RNAi screenUseful if the RNAi screen was small & focusedBut what if we’re investigating a larger system?Is there a way to get more specific?Can compound data suggest RNAi non-hits?

Small Molecule TargetsBortezomib (proteosome inhibitor)Some small molecules interact with core componentsDaunorubicin (IκBα inhibitor)

Small Molecule TargetsMontelukast (LDT4 antagonist)Others are active against upstream targetsWe also get an idea of off -target effects

Compound Networks - SimilarityEvaluate fingerprint-based similarity matrix for the 55 activesConnect pairs that exhibit Tc> 0.7 Edges are weightedby the Tc value Most groupings areobvious

A “Dictionary” Based ApproachCreate a small-ish annotated library“Seed” compoundsUse it in parallel small molecule/RNAi screensUse a similarity based approach to prioritize larger collections, in terms of anticipated targetsCurrently, we’d use structural similarityDiversity of prioritized structures is dependent on the diversity of the annotated library

Compound Networks - TargetsPredict targets for the actives using SEATarget based compound network maps nearly identically to the similarity based network But depending on the predicted target qualitywe get poor (or no) mappings to the RNAi targeted genesKeiser, M.J. et al, Nat. Biotech., 2007, 25, 197-206

Gene Networks - PathwaysNodes are 1374 HDG genes contained in the NCI PID Edge indicates two genes/proteins are involved in the same pathway“Good” hits tend to be very highly connectedWang, L. et al, BMC Genomics, 2009, 10, 220

(Reduced) Gene Networks – PathwaysNodes are 526 genes with >= 1 siRNA showing knockdown Edge indicates two genes/proteins are involved in the same pathway

Pathway Based IntegrationDirect matching of targets is not very usefulTry and map compounds to siRNA targets if the compounds’ predicted target(s) and siRNA targets are in the same pathwayConsidering 16 reliable hits, we cover 26 pathwaysPredicted compound targets cover 131 pathwaysFor 18 out of 41 compounds3 RNAi-derived pathways not covered by compound-derived pathways Rhodopsin, alternative NFkB, FAS

Pathway Based IntegrationStill not completely useful, as it only handled 18 compoundsDepending on target predictions is probably not a great idea

Integration CaveatsBiggest bottleneck is lack of resolutionCurrently, both small molecule and RNAi data are 1-DActive or inactive, high/low signalCRC’s for small molecules alleviate this a bitHigh content screens can provide significantly more information and so better resolutionData size & feature selection are of concern

Integration CaveatsCompound annotations are keyCurrently working on using ChEMBL data to provide target ‘suggestions’More comprehensive pathway data will be requiredRNAi and small molecule inhibition do not always lead to the same phenotypeCould be indicative of promiscuityCould indicate true biological differencesWeiss, W.A. et al, Nat. Chem. Biol., 2007, 12, 739-744

ConclusionsBuilding up a wealth of small molecule and RNAi data“Standard” analysis of RNAi screens relatively straightforwardChallenges involve integrating RNAi data with other sourcesPrimary bottleneck is dimensionality of the dataSimple flourescence-based approaches do not provide sufficient resolutionHigh-content is required

AcknowledgementsJohn Van DrieGerry MaggioraMicLajinessJurgenBajorathScott MartinPinar TuzmenCarleenKlumpDacTrung NguyenRuili HuangYuhong Wang

CPT Sensitization & “Central” GenesYves Pommier, Nat. Rev. Cancer, 2006. TOP1 poisons prevent DNA religation resulting in replication-dependent double strand breaks. Cell activates DNA damage response (e.g. ATR).

Screening ProtocolScreen conducted in the human breast cancer cell line MDA-MB-231. Many variables to optimize including transfection conditions, cell seeding density, assay conditions, and the selection of positive and negative controls.

Hit SelectionFollow-Up Dose Response AnalysisATRScreen #1siNegsiATR-AsiATR-BsiATR-CViability (%)Sensitization Ranked by Log2 Fold ChangeCPT (Log M)Screen #2MAP3K7IP2siNegsiMAP3K7IP2-AsiMAP3K7IP2-BsiMAP3K7IP2-CViability (%)siMAP3K7IP2-DSensitization Ranked by Log2 Fold ChangeCPT (Log M)Multiple active siRNAs for ATR, MAP3K7IP2, and BCL2L1.

Small Molecules and siRNA: Methods to Explore Bioactivity Data

More Related Content

Similar to Small Molecules and siRNA: Methods to Explore Bioactivity Data (20)

More from Rajarshi Guha (20)

Recently uploaded (20)

Small Molecules and siRNA: Methods to Explore Bioactivity Data

Editor's Notes