Small Molecules and siRNA:Methods to Explore Bioactivity DataRajarshi GuhaNIH Chemical for Translational TherapeuticsAugust 17, 2011Pfizer, Groton
BackgroundCheminformatics methodsQSAR, diversity analysis, virtual screening, fragments, polypharmacology, networksMore recentlysiRNAscreening, high content imaging,combination screeningExtensive use of machine learningAll tied together with software developmentIntegrate small molecule information & biosystems – systems chemical biology
OutlineExploring the SAR landscapeThe landscape view of SAR dataQuantifying SAR landscapesExtending an SAR landscapeLinking small molecule &  RNAiHTSOverview of the Trans NIH RNAi Screening InitiativeInfrastructure componentsLinking small molecule & siRNA screens
The Landscape View of Structure Activity Datasets
Structure Activity RelationshipsSimilar molecules will have similar activitiesSmall changes in structure will lead to small changes in activityOne implication is that SAR’s are additiveThis is the basis for QSAR modelingMartin, Y.C. et al., J. Med. Chem., 2002, 45, 4350–4358
Structure Activity LandscapesRugged gorges or rolling hills?Small structural changes associated with large activity changes represent steep slopes in the landscapeBut traditionally, QSAR assumes gentle slopesMachine learning is not very good for special casesMaggiora, G.M., J. Chem. Inf. Model., 2006, 46, 1535–1535
Characterizing the LandscapeA cliff can be numerically characterizedStructure Activity Landscape Index (SALI)Cliffs are characterized by elements of the matrix with very large valuesGuha, R.; Van Drie, J.H., J. Chem. Inf. Model., 2008, 48, 646–658
Visualizing SALI ValuesThe SALI graphCompounds are nodesNodes i,j are connected if SALI(i,j) > XOnly display connected nodes
What Can We Do With SALI’s?SALI characterizes cliffs & non-cliffsFor a  given molecular representation, SALI’s gives us an idea of  thesmoothness of the SAR landscapeModels try and encodethis landscapeUse the landscape to guidedescriptor or model selection
Descriptor Space SmoothnessEdge count of the SALI graph for varying cutoffsMeasures smoothness of the descriptor spaceCan reduce this to a single number (AUC)
Other ExamplesInstead of fingerprints, we use molecular descriptorsSALI denominator now uses Euclidean distance2D & 3D random descriptor setsNone are really goodToo rough, orToo flat2D3D
Feature Selection Using SALISurprisingly, exhaustive search of 66,000 4-descriptor combinations did not yield semi-smoothly decreasing curvesNot entirely clear what type of curve is desirable
Measuring Model QualityA QSAR model should easily encode the “rolling hills”A good model captures the most significantcliffsCan be formalized as How many of the edge orderings of a SALI graph 	      	 does the model predict correctly?Define S (X ), representing the number of edges correctly predicted for a SALI network at a threshold XRepeat for varying X and obtain the SALI curve
SALI Curves
Model Search Using the SCIWe’ve used the SALI to retrospectively analyze modelsCan we use SALI to develop models?Identify a model that captures the cliffsTrickyCliffs are fundamentally outliersOptimizing for good SALI values implies overfittingNeed to trade-off between SALI & generalizability
Predicting the LandscapeRather than predicting activity directly, we can try to predict the SAR landscapeImplies that we attempt to directly predict cliffsObservations are now pairs of moleculesA more complex problemChoice of features is trickierStill face the problem of cliffs as outliersSomewhat similar to predicting activity differencesScheiber et al, Statistical Analysis and Data Mining, 2009, 2, 115-122
MotivationPredicting activity cliffs corresponds to extending the SAR landscapeIdentify whether a new molecule will perform better or worse compared to the specific molecules in the datasetCan be useful for guiding lead optimization, but not necessarily useful for lead hopping
Predicting CliffsDependent variable are pairwise SALI values, calculated using fingerprintsIndependent variables are molecular descriptors – but considered pairwiseAbsolute difference of descriptor pairs, orGeometric mean of descriptor pairs…Develop a model to correlate pairwise descriptors to pairwise SALI values
A Test CaseWe first consider the CavalliCoMFA dataset of 30 molecules with pIC50’sEvaluate topological and physicochemical descriptorsDeveloped random forest modelsOn the original observed values (30 obs)On the SALI values (435 observations)Cavalli, A. et al, J Med Chem, 2002, 45, 3844-3853
Double Counting Structures?The dependent and independent variables both encode structure. But pretty low correlations between individual pairwisedescriptors and the SALI values
Model  SummariesOriginal pIC50RMSE = 0.97SALI, AbsDiffRMSE = 1.10SALI, GeoMeanRMSE = 1.04All models explain similar % of variance of their respective datasets Using geometric mean as the descriptor aggregation function seems to perform bestSALI models are more robust due to larger size of the dataset
Test Case 2Considered the Holloway docking dataset, 32 molecules with pIC50’s and EinterSimilar strategy as beforeNeed to transform SALI values Descriptors show minimal correlationHolloway, M.K. et al, J Med Chem, 1995, 38, 305-317
Model  SummariesOriginal pIC50RMSE = 1.05SALI, AbsDiffRMSE = 0.48SALI, GeoMeanRMSE = 0.48The SALI models perform much poorer in terms of  % of variance explainedDescriptor aggregation method does not seem to have much effectThe SALI models appear to perform decently on the cliffs – but misses the most significant
Model  SummariesOriginal pIC50RMSE = 1.05SALI, AbsDiffRMSE = 9.76SALI, GeoMeanRMSE = 10.01With untransformed SALI values, models perform similarly in terms of  % of variance explainedThe most significant cliffs correspond to stereoisomers
Test Case 338 adenosine receptor antagonists with reported Ki values; use 35 for training and 3 for testingRandom forest model on the SALI values performed reasonable well (RMSE = 7.51, R2=0.62)Upper end ofSALI rangeis better predictedKalla, R.V. et al, J. Med. Chem., 2006, 48, 1984-2008
Test Case 3The dataset does not containing really big cliffs
Generally, performance is poorer for smaller cliffsFor any given hold out molecule, range of error in SALI prediction is largeSuggests that some form of domain applicability metric would be useful
Model CaveatsModels based on SALI values are dependent on their being an SAR in the original activity dataScrambling results for these models are poorer than the original models but aren’t as random as expected
ConclusionsSALI is the first step in characterizing the SAR landscapeAllows us to directly analyze the landscape, as opposed to individual moleculesBeing able to predict the landscape could serve as a useful way to extend an SAR  landscape
Joining the Dots: Integrating High Throughput Small Molecule and RNAi Screens
RNAi Facility MissionPathway (Reporter assays, e.g. luciferase, b-lactamase)Simple Phenotypes (Viability, cytotoxicity, oxidative stress, etc)Perform collaborative genome-wide RNAi screening-based projects with intramural investigatorsAdvance the science of RNAi and miRNA screening and informatics via technology development to improve efficiency, reliability, and costs.Complex Phenotypes (High-content imaging, cell cycle, translocation, etc)Range of Assays
RNAi Informatics Infrastructure
RNAi Analysis WorkflowRaw and Processed DataGO annotationsPathwaysInteractionsHit ListFollow-up
RNAi Informatics ToolsetLocal databases (screen data, pathways, interactions, etc).Commercial pathway tools. Custom software for loading, analysis and visualization.
Back End ServicesCurrently all computational analysis performed on the backendR & Bioconductor codeCustom R package (ncgcrnai) to support NCGC infrastructurePartly derived from cellHTS2Supports QC metrics, normalization, adjustments, selections, triage, (static) visualization, reportsSome Java tools forData loadingLibrary and plate registration
User Accessible Tools
User Accessible Tools
RNAi& Small Molecule ScreensCAGCATGAGTACTACAGGCCATACGGGAACTACCATAATTTAWhat targets mediate activity of siRNA  and compoundPathway elucidation, identification of interactions Reuse pre-existing MLI data
 Develop new annotated librariesTarget ID and validationLink RNAi generated pathway peturbations to small molecule activities. Could provide insight into polypharmacology Run parallel RNAi screenGoal: Develop systems level view of small molecule activity
HTS for NF-κB AntagonistsNF-κB controls DNA transcription Involved in cellular responses to stimuliImmune response, memory formationInflammation, cancer, auto-immune diseaseshttp://www.genego.com
HTS for NF-κB AntagonistsME-180 cell lineStimulate cells using TNF, leading to NF-κB activation, readout via a β-lactamase reporterIdentify small molecules and siRNA’s that block the resultant activation
Small Molecule HTS Summary2,899 FDA-approved compounds screened55 compounds retested activeWhich components of the NF-κB pathway do they hit?17 molecules have target/pathway information in GeneGOLiterature searches list a few moreMost Potent ActivesProscillaridin ATrabectidinDigoxinMiller, S.C. et al, Biochem. Pharmacol., 2010, ASAP
RNAi HTS SummaryQiagen HDG library – 6886 genes, 4 siRNA’s per geneA total of 567 genes were knockeddown by 1 or more siRNA’sWe consider >= 2 as a “reliable” hit16 reliable hitsAdded in 66 genes for follow up via triage procedure
The Obvious ConclusionThe active compounds target the 16 hits (at least) from the RNAi screenUseful if the RNAi screen was small & focusedBut what if we’re investigating a larger system?Is there a way to get more specific?Can compound data suggest RNAi non-hits?
Small Molecule TargetsBortezomib (proteosome inhibitor)Some small molecules interact with core componentsDaunorubicin (IκBα inhibitor)
Small Molecule TargetsMontelukast (LDT4 antagonist)Others are active against upstream targetsWe also get an idea of off -target effects
Compound Networks - SimilarityEvaluate fingerprint-based similarity matrix for the 55 activesConnect pairs that exhibit Tc> 0.7 Edges are weightedby the Tc value Most groupings areobvious
A “Dictionary” Based ApproachCreate a small-ish annotated library“Seed” compoundsUse it in parallel small molecule/RNAi screensUse a similarity based approach to prioritize larger collections, in terms of anticipated targetsCurrently, we’d use structural similarityDiversity of prioritized structures is dependent on the diversity of the annotated library
Compound Networks - TargetsPredict targets for the actives using SEATarget based compound network maps nearly identically to the similarity based network But depending on the predicted target qualitywe get poor (or no) mappings to the RNAi targeted genesKeiser, M.J. et al, Nat. Biotech., 2007, 25, 197-206
Gene Networks - PathwaysNodes are 1374 HDG genes contained in the NCI PID Edge indicates two genes/proteins are involved in the same pathway“Good” hits tend to be very highly connectedWang, L. et al, BMC Genomics, 2009, 10, 220
(Reduced) Gene Networks – PathwaysNodes are 526 genes with >= 1 siRNA showing knockdown Edge indicates two genes/proteins are involved in the same pathway
Pathway Based IntegrationDirect matching of targets is not very usefulTry and map compounds to siRNA targets if the compounds’ predicted target(s) and siRNA targets are in the same pathwayConsidering 16 reliable hits, we cover 26 pathwaysPredicted compound targets cover 131 pathwaysFor 18 out of 41 compounds3 RNAi-derived pathways not covered by compound-derived pathways	Rhodopsin, alternative NFkB, FAS
Pathway Based IntegrationStill not completely useful, as it only handled 18 compoundsDepending on target predictions is probably not a great idea
Integration CaveatsBiggest bottleneck is lack of resolutionCurrently, both small molecule and RNAi data are 1-DActive or inactive, high/low signalCRC’s for small molecules alleviate this a bitHigh content screens can provide significantly more information and so better resolutionData size & feature selection are of concern
Integration CaveatsCompound annotations are keyCurrently working on using ChEMBL data to provide target ‘suggestions’More comprehensive pathway data will be requiredRNAi and small molecule inhibition do not always lead to the same phenotypeCould be indicative of promiscuityCould indicate true biological differencesWeiss, W.A. et al, Nat. Chem. Biol., 2007, 12, 739-744
ConclusionsBuilding up a wealth of small molecule and RNAi data“Standard” analysis of RNAi screens relatively straightforwardChallenges involve integrating RNAi data with other sourcesPrimary bottleneck is dimensionality of the dataSimple flourescence-based approaches do not provide sufficient resolutionHigh-content is required
AcknowledgementsJohn Van DrieGerry MaggioraMicLajinessJurgenBajorathScott MartinPinar TuzmenCarleenKlumpDacTrung NguyenRuili HuangYuhong Wang
CPT Sensitization & “Central” GenesYves Pommier, Nat. Rev. Cancer, 2006. TOP1 poisons prevent DNA religation resulting in replication-dependent double strand breaks. Cell activates DNA damage response (e.g. ATR).
Screening ProtocolScreen conducted in the human breast cancer cell line MDA-MB-231. Many variables to optimize including transfection conditions, cell seeding density, assay conditions, and the selection of positive and negative controls.
Hit SelectionFollow-Up Dose Response AnalysisATRScreen #1siNegsiATR-AsiATR-BsiATR-CViability (%)Sensitization Ranked by Log2 Fold ChangeCPT (Log M)Screen #2MAP3K7IP2siNegsiMAP3K7IP2-AsiMAP3K7IP2-BsiMAP3K7IP2-CViability (%)siMAP3K7IP2-DSensitization Ranked by Log2 Fold ChangeCPT (Log M)Multiple active siRNAs for ATR, MAP3K7IP2, and BCL2L1.

More Related Content

PPTX
STATISTICAL METHOD OF QSAR
PDF
Chemical Spaces: Modeling, Exploration & Understanding
PDF
K-NN Classifier Performs Better Than K-Means Clustering in Missing Value Imp...
PPTX
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
PPT
Molecular modelling for in silico drug discovery
PPTX
QSAR by Faizan Deshmukh
PDF
Structure-Activity Relationships and Networks: A Generalized Approach to Expl...
STATISTICAL METHOD OF QSAR
Chemical Spaces: Modeling, Exploration & Understanding
K-NN Classifier Performs Better Than K-Means Clustering in Missing Value Imp...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Molecular modelling for in silico drug discovery
QSAR by Faizan Deshmukh
Structure-Activity Relationships and Networks: A Generalized Approach to Expl...

Similar to Small Molecules and siRNA: Methods to Explore Bioactivity Data (20)

PPTX
Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?
PPTX
A Network Visualization of Structure Activity Landscapes
PPT
Prediction Of Bioactivity From Chemical Structure
PDF
Selectivity mining – multiple activities in Activity Miner
PPTX
The influence of data curation on QSAR Modeling – Presented at American Chemi...
PDF
The influence of data curation on QSAR Modeling – examining issues of qualit...
PPTX
Free online access to experimental and predicted chemical properties through ...
PPTX
Cheminformatics II
PDF
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
PPT
BioIT Drug induced liver injury talk 2011
PPT
Nc state lecture v2 Computational Toxicology
PDF
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
PPTX
Matched molecular pair and activity cliffs published
PDF
Robots, Small Molecules & R
PDF
Mining Big datasets to create and validate machine learning models
PPT
SOT short course on computational toxicology
PDF
Rafael Gozalbes / Computational chemistry in the field of human health
PDF
Drug Discovery and Development Using AI
PPTX
Using open bioactivity data for developing machine-learning prediction models...
PDF
Open-source tools for querying and organizing large reaction databases
Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?
A Network Visualization of Structure Activity Landscapes
Prediction Of Bioactivity From Chemical Structure
Selectivity mining – multiple activities in Activity Miner
The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – examining issues of qualit...
Free online access to experimental and predicted chemical properties through ...
Cheminformatics II
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
BioIT Drug induced liver injury talk 2011
Nc state lecture v2 Computational Toxicology
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
Matched molecular pair and activity cliffs published
Robots, Small Molecules & R
Mining Big datasets to create and validate machine learning models
SOT short course on computational toxicology
Rafael Gozalbes / Computational chemistry in the field of human health
Drug Discovery and Development Using AI
Using open bioactivity data for developing machine-learning prediction models...
Open-source tools for querying and organizing large reaction databases
Ad

More from Rajarshi Guha (20)

PDF
Pharos: A Torch to Use in Your Journey in the Dark Genome
PDF
Pharos: Putting targets in context
PDF
Pharos – A Torch to Use in Your Journey In the Dark Genome
PDF
Pharos - Face of the KMC
PDF
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
PDF
What can your library do for you?
PDF
So I have an SD File … What do I do next?
PDF
Characterization of Chemical Libraries Using Scaffolds and Network Models
PDF
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
PDF
Fingerprinting Chemical Structures
PDF
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
PDF
When the whole is better than the parts
PDF
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
PDF
Pushing Chemical Biology Through the Pipes
PDF
Characterization and visualization of compound combination responses in a hig...
PDF
The BioAssay Research Database
PDF
Cloudy with a Touch of Cheminformatics
PDF
Chemical Data Mining: Open Source & Reproducible
PDF
Chemogenomics in the cloud: Is the sky the limit?
PDF
Quantifying Text Sentiment in R
Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: Putting targets in context
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos - Face of the KMC
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
What can your library do for you?
So I have an SD File … What do I do next?
Characterization of Chemical Libraries Using Scaffolds and Network Models
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
Fingerprinting Chemical Structures
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
When the whole is better than the parts
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Pushing Chemical Biology Through the Pipes
Characterization and visualization of compound combination responses in a hig...
The BioAssay Research Database
Cloudy with a Touch of Cheminformatics
Chemical Data Mining: Open Source & Reproducible
Chemogenomics in the cloud: Is the sky the limit?
Quantifying Text Sentiment in R
Ad

Recently uploaded (20)

PPTX
Training Program for knowledge in solar cell and solar industry
PDF
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
PDF
4 layer Arch & Reference Arch of IoT.pdf
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PDF
A hybrid framework for wild animal classification using fine-tuned DenseNet12...
PPTX
MuleSoft-Compete-Deck for midddleware integrations
PDF
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
PDF
A symptom-driven medical diagnosis support model based on machine learning te...
PDF
Auditboard EB SOX Playbook 2023 edition.
PDF
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
PDF
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
PDF
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
PDF
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
PDF
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
PDF
NewMind AI Weekly Chronicles – August ’25 Week IV
PDF
Lung cancer patients survival prediction using outlier detection and optimize...
PDF
Co-training pseudo-labeling for text classification with support vector machi...
PDF
“The Future of Visual AI: Efficient Multimodal Intelligence,” a Keynote Prese...
PDF
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
PDF
LMS bot: enhanced learning management systems for improved student learning e...
Training Program for knowledge in solar cell and solar industry
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
4 layer Arch & Reference Arch of IoT.pdf
Early detection and classification of bone marrow changes in lumbar vertebrae...
A hybrid framework for wild animal classification using fine-tuned DenseNet12...
MuleSoft-Compete-Deck for midddleware integrations
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
A symptom-driven medical diagnosis support model based on machine learning te...
Auditboard EB SOX Playbook 2023 edition.
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
NewMind AI Weekly Chronicles – August ’25 Week IV
Lung cancer patients survival prediction using outlier detection and optimize...
Co-training pseudo-labeling for text classification with support vector machi...
“The Future of Visual AI: Efficient Multimodal Intelligence,” a Keynote Prese...
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
LMS bot: enhanced learning management systems for improved student learning e...

Small Molecules and siRNA: Methods to Explore Bioactivity Data

  • 1. Small Molecules and siRNA:Methods to Explore Bioactivity DataRajarshi GuhaNIH Chemical for Translational TherapeuticsAugust 17, 2011Pfizer, Groton
  • 2. BackgroundCheminformatics methodsQSAR, diversity analysis, virtual screening, fragments, polypharmacology, networksMore recentlysiRNAscreening, high content imaging,combination screeningExtensive use of machine learningAll tied together with software developmentIntegrate small molecule information & biosystems – systems chemical biology
  • 3. OutlineExploring the SAR landscapeThe landscape view of SAR dataQuantifying SAR landscapesExtending an SAR landscapeLinking small molecule & RNAiHTSOverview of the Trans NIH RNAi Screening InitiativeInfrastructure componentsLinking small molecule & siRNA screens
  • 4. The Landscape View of Structure Activity Datasets
  • 5. Structure Activity RelationshipsSimilar molecules will have similar activitiesSmall changes in structure will lead to small changes in activityOne implication is that SAR’s are additiveThis is the basis for QSAR modelingMartin, Y.C. et al., J. Med. Chem., 2002, 45, 4350–4358
  • 6. Structure Activity LandscapesRugged gorges or rolling hills?Small structural changes associated with large activity changes represent steep slopes in the landscapeBut traditionally, QSAR assumes gentle slopesMachine learning is not very good for special casesMaggiora, G.M., J. Chem. Inf. Model., 2006, 46, 1535–1535
  • 7. Characterizing the LandscapeA cliff can be numerically characterizedStructure Activity Landscape Index (SALI)Cliffs are characterized by elements of the matrix with very large valuesGuha, R.; Van Drie, J.H., J. Chem. Inf. Model., 2008, 48, 646–658
  • 8. Visualizing SALI ValuesThe SALI graphCompounds are nodesNodes i,j are connected if SALI(i,j) > XOnly display connected nodes
  • 9. What Can We Do With SALI’s?SALI characterizes cliffs & non-cliffsFor a given molecular representation, SALI’s gives us an idea of thesmoothness of the SAR landscapeModels try and encodethis landscapeUse the landscape to guidedescriptor or model selection
  • 10. Descriptor Space SmoothnessEdge count of the SALI graph for varying cutoffsMeasures smoothness of the descriptor spaceCan reduce this to a single number (AUC)
  • 11. Other ExamplesInstead of fingerprints, we use molecular descriptorsSALI denominator now uses Euclidean distance2D & 3D random descriptor setsNone are really goodToo rough, orToo flat2D3D
  • 12. Feature Selection Using SALISurprisingly, exhaustive search of 66,000 4-descriptor combinations did not yield semi-smoothly decreasing curvesNot entirely clear what type of curve is desirable
  • 13. Measuring Model QualityA QSAR model should easily encode the “rolling hills”A good model captures the most significantcliffsCan be formalized as How many of the edge orderings of a SALI graph does the model predict correctly?Define S (X ), representing the number of edges correctly predicted for a SALI network at a threshold XRepeat for varying X and obtain the SALI curve
  • 15. Model Search Using the SCIWe’ve used the SALI to retrospectively analyze modelsCan we use SALI to develop models?Identify a model that captures the cliffsTrickyCliffs are fundamentally outliersOptimizing for good SALI values implies overfittingNeed to trade-off between SALI & generalizability
  • 16. Predicting the LandscapeRather than predicting activity directly, we can try to predict the SAR landscapeImplies that we attempt to directly predict cliffsObservations are now pairs of moleculesA more complex problemChoice of features is trickierStill face the problem of cliffs as outliersSomewhat similar to predicting activity differencesScheiber et al, Statistical Analysis and Data Mining, 2009, 2, 115-122
  • 17. MotivationPredicting activity cliffs corresponds to extending the SAR landscapeIdentify whether a new molecule will perform better or worse compared to the specific molecules in the datasetCan be useful for guiding lead optimization, but not necessarily useful for lead hopping
  • 18. Predicting CliffsDependent variable are pairwise SALI values, calculated using fingerprintsIndependent variables are molecular descriptors – but considered pairwiseAbsolute difference of descriptor pairs, orGeometric mean of descriptor pairs…Develop a model to correlate pairwise descriptors to pairwise SALI values
  • 19. A Test CaseWe first consider the CavalliCoMFA dataset of 30 molecules with pIC50’sEvaluate topological and physicochemical descriptorsDeveloped random forest modelsOn the original observed values (30 obs)On the SALI values (435 observations)Cavalli, A. et al, J Med Chem, 2002, 45, 3844-3853
  • 20. Double Counting Structures?The dependent and independent variables both encode structure. But pretty low correlations between individual pairwisedescriptors and the SALI values
  • 21. Model SummariesOriginal pIC50RMSE = 0.97SALI, AbsDiffRMSE = 1.10SALI, GeoMeanRMSE = 1.04All models explain similar % of variance of their respective datasets Using geometric mean as the descriptor aggregation function seems to perform bestSALI models are more robust due to larger size of the dataset
  • 22. Test Case 2Considered the Holloway docking dataset, 32 molecules with pIC50’s and EinterSimilar strategy as beforeNeed to transform SALI values Descriptors show minimal correlationHolloway, M.K. et al, J Med Chem, 1995, 38, 305-317
  • 23. Model SummariesOriginal pIC50RMSE = 1.05SALI, AbsDiffRMSE = 0.48SALI, GeoMeanRMSE = 0.48The SALI models perform much poorer in terms of % of variance explainedDescriptor aggregation method does not seem to have much effectThe SALI models appear to perform decently on the cliffs – but misses the most significant
  • 24. Model SummariesOriginal pIC50RMSE = 1.05SALI, AbsDiffRMSE = 9.76SALI, GeoMeanRMSE = 10.01With untransformed SALI values, models perform similarly in terms of % of variance explainedThe most significant cliffs correspond to stereoisomers
  • 25. Test Case 338 adenosine receptor antagonists with reported Ki values; use 35 for training and 3 for testingRandom forest model on the SALI values performed reasonable well (RMSE = 7.51, R2=0.62)Upper end ofSALI rangeis better predictedKalla, R.V. et al, J. Med. Chem., 2006, 48, 1984-2008
  • 26. Test Case 3The dataset does not containing really big cliffs
  • 27. Generally, performance is poorer for smaller cliffsFor any given hold out molecule, range of error in SALI prediction is largeSuggests that some form of domain applicability metric would be useful
  • 28. Model CaveatsModels based on SALI values are dependent on their being an SAR in the original activity dataScrambling results for these models are poorer than the original models but aren’t as random as expected
  • 29. ConclusionsSALI is the first step in characterizing the SAR landscapeAllows us to directly analyze the landscape, as opposed to individual moleculesBeing able to predict the landscape could serve as a useful way to extend an SAR landscape
  • 30. Joining the Dots: Integrating High Throughput Small Molecule and RNAi Screens
  • 31. RNAi Facility MissionPathway (Reporter assays, e.g. luciferase, b-lactamase)Simple Phenotypes (Viability, cytotoxicity, oxidative stress, etc)Perform collaborative genome-wide RNAi screening-based projects with intramural investigatorsAdvance the science of RNAi and miRNA screening and informatics via technology development to improve efficiency, reliability, and costs.Complex Phenotypes (High-content imaging, cell cycle, translocation, etc)Range of Assays
  • 33. RNAi Analysis WorkflowRaw and Processed DataGO annotationsPathwaysInteractionsHit ListFollow-up
  • 34. RNAi Informatics ToolsetLocal databases (screen data, pathways, interactions, etc).Commercial pathway tools. Custom software for loading, analysis and visualization.
  • 35. Back End ServicesCurrently all computational analysis performed on the backendR & Bioconductor codeCustom R package (ncgcrnai) to support NCGC infrastructurePartly derived from cellHTS2Supports QC metrics, normalization, adjustments, selections, triage, (static) visualization, reportsSome Java tools forData loadingLibrary and plate registration
  • 38. RNAi& Small Molecule ScreensCAGCATGAGTACTACAGGCCATACGGGAACTACCATAATTTAWhat targets mediate activity of siRNA and compoundPathway elucidation, identification of interactions Reuse pre-existing MLI data
  • 39. Develop new annotated librariesTarget ID and validationLink RNAi generated pathway peturbations to small molecule activities. Could provide insight into polypharmacology Run parallel RNAi screenGoal: Develop systems level view of small molecule activity
  • 40. HTS for NF-κB AntagonistsNF-κB controls DNA transcription Involved in cellular responses to stimuliImmune response, memory formationInflammation, cancer, auto-immune diseaseshttp://www.genego.com
  • 41. HTS for NF-κB AntagonistsME-180 cell lineStimulate cells using TNF, leading to NF-κB activation, readout via a β-lactamase reporterIdentify small molecules and siRNA’s that block the resultant activation
  • 42. Small Molecule HTS Summary2,899 FDA-approved compounds screened55 compounds retested activeWhich components of the NF-κB pathway do they hit?17 molecules have target/pathway information in GeneGOLiterature searches list a few moreMost Potent ActivesProscillaridin ATrabectidinDigoxinMiller, S.C. et al, Biochem. Pharmacol., 2010, ASAP
  • 43. RNAi HTS SummaryQiagen HDG library – 6886 genes, 4 siRNA’s per geneA total of 567 genes were knockeddown by 1 or more siRNA’sWe consider >= 2 as a “reliable” hit16 reliable hitsAdded in 66 genes for follow up via triage procedure
  • 44. The Obvious ConclusionThe active compounds target the 16 hits (at least) from the RNAi screenUseful if the RNAi screen was small & focusedBut what if we’re investigating a larger system?Is there a way to get more specific?Can compound data suggest RNAi non-hits?
  • 45. Small Molecule TargetsBortezomib (proteosome inhibitor)Some small molecules interact with core componentsDaunorubicin (IκBα inhibitor)
  • 46. Small Molecule TargetsMontelukast (LDT4 antagonist)Others are active against upstream targetsWe also get an idea of off -target effects
  • 47. Compound Networks - SimilarityEvaluate fingerprint-based similarity matrix for the 55 activesConnect pairs that exhibit Tc> 0.7 Edges are weightedby the Tc value Most groupings areobvious
  • 48. A “Dictionary” Based ApproachCreate a small-ish annotated library“Seed” compoundsUse it in parallel small molecule/RNAi screensUse a similarity based approach to prioritize larger collections, in terms of anticipated targetsCurrently, we’d use structural similarityDiversity of prioritized structures is dependent on the diversity of the annotated library
  • 49. Compound Networks - TargetsPredict targets for the actives using SEATarget based compound network maps nearly identically to the similarity based network But depending on the predicted target qualitywe get poor (or no) mappings to the RNAi targeted genesKeiser, M.J. et al, Nat. Biotech., 2007, 25, 197-206
  • 50. Gene Networks - PathwaysNodes are 1374 HDG genes contained in the NCI PID Edge indicates two genes/proteins are involved in the same pathway“Good” hits tend to be very highly connectedWang, L. et al, BMC Genomics, 2009, 10, 220
  • 51. (Reduced) Gene Networks – PathwaysNodes are 526 genes with >= 1 siRNA showing knockdown Edge indicates two genes/proteins are involved in the same pathway
  • 52. Pathway Based IntegrationDirect matching of targets is not very usefulTry and map compounds to siRNA targets if the compounds’ predicted target(s) and siRNA targets are in the same pathwayConsidering 16 reliable hits, we cover 26 pathwaysPredicted compound targets cover 131 pathwaysFor 18 out of 41 compounds3 RNAi-derived pathways not covered by compound-derived pathways Rhodopsin, alternative NFkB, FAS
  • 53. Pathway Based IntegrationStill not completely useful, as it only handled 18 compoundsDepending on target predictions is probably not a great idea
  • 54. Integration CaveatsBiggest bottleneck is lack of resolutionCurrently, both small molecule and RNAi data are 1-DActive or inactive, high/low signalCRC’s for small molecules alleviate this a bitHigh content screens can provide significantly more information and so better resolutionData size & feature selection are of concern
  • 55. Integration CaveatsCompound annotations are keyCurrently working on using ChEMBL data to provide target ‘suggestions’More comprehensive pathway data will be requiredRNAi and small molecule inhibition do not always lead to the same phenotypeCould be indicative of promiscuityCould indicate true biological differencesWeiss, W.A. et al, Nat. Chem. Biol., 2007, 12, 739-744
  • 56. ConclusionsBuilding up a wealth of small molecule and RNAi data“Standard” analysis of RNAi screens relatively straightforwardChallenges involve integrating RNAi data with other sourcesPrimary bottleneck is dimensionality of the dataSimple flourescence-based approaches do not provide sufficient resolutionHigh-content is required
  • 57. AcknowledgementsJohn Van DrieGerry MaggioraMicLajinessJurgenBajorathScott MartinPinar TuzmenCarleenKlumpDacTrung NguyenRuili HuangYuhong Wang
  • 58. CPT Sensitization & “Central” GenesYves Pommier, Nat. Rev. Cancer, 2006. TOP1 poisons prevent DNA religation resulting in replication-dependent double strand breaks. Cell activates DNA damage response (e.g. ATR).
  • 59. Screening ProtocolScreen conducted in the human breast cancer cell line MDA-MB-231. Many variables to optimize including transfection conditions, cell seeding density, assay conditions, and the selection of positive and negative controls.
  • 60. Hit SelectionFollow-Up Dose Response AnalysisATRScreen #1siNegsiATR-AsiATR-BsiATR-CViability (%)Sensitization Ranked by Log2 Fold ChangeCPT (Log M)Screen #2MAP3K7IP2siNegsiMAP3K7IP2-AsiMAP3K7IP2-BsiMAP3K7IP2-CViability (%)siMAP3K7IP2-DSensitization Ranked by Log2 Fold ChangeCPT (Log M)Multiple active siRNAs for ATR, MAP3K7IP2, and BCL2L1.
  • 61. Are These Genes Relevant?Some are well known to be CPT-sensitizersConsider a HPRD PPI sub-network corresponding to the Qiagen HDG gene setHow “central” are these selected genes?Larger values of betweennessindicate that the node lies onmany shortest pathsMakes sense - a number of them are stress-relatedBut some of them have very lowbetweenness values
  • 62. Are These Genes Relevant?Most selected genesare densely connectedA few are notGenerally did notreconfirmNetwork metrics could be used to provide confidencein selections

Editor's Notes

  • #17: Outliers in a cliff prediction model are not as severe since SALI changes more slowly than just activity differences
  • #23: For SALI = 0, had to set log10(SALI) = 0Similar performance if we use SALI and not log10(SALI) at least more % variance is explained. Still fail on most significant cliffs
  • #36: View plates (raw, normalized, adjusted, …)Highlight specific genes, siRNA’sView assay statisticsView pathway membership (via Wikipathways)Linkout to external resources (Entrez, GeneCards, …)Hit selection, follow up (DRC)
  • #37: View plates (raw, normalized, adjusted, …)Highlight specific genes, siRNA’sView assay statisticsView pathway membership (via Wikipathways)Linkout to external resources (Entrez, GeneCards, …)Hit selection, follow up (DRC)
  • #41: * Proscillaridin A was not selected in the 20 compounds for further analysis in the paper* 2 cardiac glycosides in the top 3, target appears to be caspase-3 (activating it). CG inhibition of NF-kb is well known . See PNAS 2005, by Pollard* Trabectidin induces lethal DNA strand breaks and blocks cell cycle in G2 phase
  • #42: PSM* genes code for proteosome subunits – so they likely prevent the ubiquination of the IkBa complex, so that RelA+cp50 cannot be released from the IkBa complex and enter the nucleus
  • #46: Size of node indicates potency – larger is more potentLanatosidec and a have Tc = 1 and hence the edge was not shown (ideally it should be shown)
  • #48: Good confirmation that SEA worksSize of node corresponds to SEA confidence score
  • #51: We consider 41 compounds rather than 55, since a number of them did not have sufficiently confident target predictionsWe then get to 18 compounds since, many of the predicted genes, did not map to an NCI PID pathway
  • #54: Pheontypic difference can arise when PPI’s are involved
  • #60: HPRD subnetwork corresponding to the Qiagen HDG has 6782 genes
  • #61: HPRD subnetwork corresponding to the Qiagen HDG has 6782 genes