0% found this document useful (0 votes)
49 views32 pages

03 Wright Carolina EBRC

The Carolina Environmental Bioinformatics Research Center has three major research projects: (1) Biostatistics, (2) Cheminformatics, and (3) Computational Infrastructure for Systems Toxicology. It is organized under a director and co-director with units for research oversight, public outreach and training, administration, and quality assurance. The Cheminformatics project seeks to establish a universally applicable and robust predictive toxicology modeling framework through various computational methods.

Uploaded by

Srikantapatra
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views32 pages

03 Wright Carolina EBRC

The Carolina Environmental Bioinformatics Research Center has three major research projects: (1) Biostatistics, (2) Cheminformatics, and (3) Computational Infrastructure for Systems Toxicology. It is organized under a director and co-director with units for research oversight, public outreach and training, administration, and quality assurance. The Cheminformatics project seeks to establish a universally applicable and robust predictive toxicology modeling framework through various computational methods.

Uploaded by

Srikantapatra
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

The Carolina Environmental

Bioinformatics Research Center

Fred Wright, Ph.D.


University of North Carolina at Chapel Hill

1
Organization of Center
• Three major Research Projects: (1) Biostatistics,
(2) Cheminformatics, and (3) Computational
Infrastructure for Systems Toxicology
• Administrative Unit
• Public Outreach and Training Activity (POTA)
• “Functional areas” of Analysis, Methods
Development and Tools Development overseen
by a panel of experienced investigators

2
CEBRC Center Organization
Director
External Science
Dr. Fred Wright Advisory Committee

Scientific Co-Director Administrative Co-


Dr. Ivan Rusyn Director
Dr. Mary Sym

Analysis - Dr. Lawrence Kupper


Methods - Dr. J. Stephen Marron
Tools - Dr. Jan Prins

Research Oversight Translation Quality


Integration (POTA) Assurance
Dr. Brad Dr. Mary Sym
Project 1 Project 2 Project 3 Hemminger
Biostatistics Chem- Computation
informatics and Systems
Dr. Fred Wright
Toxicology
Dr. Alex
Tropsha Dr. Ivan Rusyn
Dr. David Stotts
3
(1) Biostatistics in
Computational Toxicology
• Emphasis on strengths in
microarray analysis,
elucidation of
networks/pathways,
Bayesian approaches
• Stresses existing
capabilities

4
(2) Chem-informatics Input
Structure
File
Descriptor Generation
Convert
Structures
dbtranslate
Babel
Generate
Descriptors
MolconnZ
GenAP
Descriptor formatting
Reformat
Descriptors
MolconnZ-
ToDescr
Normalize
Descriptors

Utility
Input
Descriptor
File
(UNC)
etc. etc. (UNC)

Randomize
Activities Build & Test Models

• seeks to establish a Input


Activity
File
Train & Test
Set Selection

SE6
(UNC)
Randomize
(UNC)
QSAR
Algorithm
RWKNN,
SAPLS (UNC),
Predict
Test Set
KNNPredict,
SAPLSPred (UNC)
etc.

universally applicable and


etc

QSAR
Model(s)

robust predictive toxicology Report & Visualize Results


Compile
Results
Visualize
Results Database
Screen Database
Normalize
Descriptors
Mine
Database
Visualize
Hits
to Screen

modeling framework
Weblab, TSAR,
ModStat DBMine, TSAR, MOE,
MOE, Spotfire,
(UNC) etc.
Utility KNNPredict, Spotfire,
etc. etc.

functions programs User input

• Focuses on Quantitative Figure 4. Predictive QSPR Modeling Workflow

Structure Activity/Property Descriptor


Preparation

Relationships (QSAR) Randomize


Yes Do y-random
Testing?
SE6

No

• Establishes a modeling QSAR


Model
Predict
No
Are there more
training sets?

Yes

workflow, toxicity Y-rand


test

Train/test
Selection?
Yes QSAR
Model
predict

prediction scheme and plan Update Status Db


To Complete
Fail Pass or
Fail?
Pass

QSAR
No

Collect
GRID
Modeling/ Process

for software development Update


Status DB
Predict Results

To Complete

Figure 5. Workflow Logic of Model Generation Process

5
(3) Computational
Framework for
Systems Toxicology
• Uses model for toxicity
profiling in multiple strains
of mice to set up
computational infrastructure
• Some data mining activity
• will develop user-friendly
software tools from methods
Liver Injury

in Projects 1 and 2
Control C57BL/10J BUB/BnJ MSM/Ms

6
Project 1
Biostatistics in Computational Toxicology
• Fred Wright, Ph.D. (P.I.) –statistical genetics, genomic analysis
• Mayetri Gupta, Ph.D. – sequence analysis, motif detection
• Young Troung, Ph.D. – Bayesian network genetic analysis, SVM
methods for metabolomic data
• Joseph Ibrahim, Ph.D. – Bayesian analysis of microarray data
• Danyu Lin, Ph.D. – haplotype-phenotype analysis, microarray
analysis
• Fei Zou, Ph.D. – statistical genetics, genomic analysis
• Andrew Nobel, Ph.D. – clustering, data dimensional reduction,
genetic pathway analysis
• Master’s trained personnel
7
Project 1 objectives

•to provide statistical analysis capability

•to develop appropriate new methods to apply to


computational toxicology problems

•to develop computational tools

•to disseminate research findings, train students, and


coordinate additional statistical research in
computational toxicology.

8
Methods (to name a few)
• Sample size estimation for high-throughput data
• P-value computation, significance testing
• Multiple-testing issues, false discovery rates
• Dose-response modeling
• New measures of differential expression
• Transcriptional regulation and motif discovery
• Network analysis, discrimination methods
• Pathway analysis

9
Tools
• Much of initial code has been implemented in
R/Bioconductor. This is directly useful to other
statistical investigators.
• Working with project 3 investigators and students to
produce user-friendly web-based and/or standalone
applications
• Working to increase utility of methods by integration
with informatics and biological annotation
• We view the SAM software as a model for
independent successful dissemination. Project 3
personnel are working on implementing appropriate
procedures in ArrayTrack
10
Example 1. New ways of detecting
differential expression
Expression measurements
show a mean-variance
relationship…

Grey envelope shows all


Log(sample mean) the SAM procedures

Log(sample mean)

Which we can exploit to


reduce the false discovery
Lowest curve is a new
rate… maximum likelihood
procedure

Hu and Wright, in press 11


Example 2. Significant genes/pathways/categories: the
Significance Analysis of Function and Expression
procedure (honest pathway significance testing)

Genes in
category

Shading
indicates
individually
significant
genes
Category
p-value

Barry et al. Bioinformatics 21:1943-1949 , 2005 12


GO Tree with significant nodes
Key: blue (p<0.001) green (0.001<=p<0.01), red (0.01<=p<0.1).

nucleotide metabolism cell organization


and biogenesis

13
Example 3. Isotonic regression: gene expression dose-
response data

Model -
f should be
strictly
increasing or
decreasing

Hu et al., 2005, Bioinformatics


21: 3524-3529).

14
Pyrethroid Biomarker Project (J. Harrill, K.
Crofton and colleagues, U.S. E.P.A)
• Problem: Lack of a cost efficient biomarker of effect
hampers assessments of the cumulative risk of
pyrethroid insecticides.
•Aim: Develop a biochemical biomarker of effect for
pyrethroids that reflects changes in neuronal firing
rates.
•Methods: Use gene arrays and RT-PCR to identify
dose-responsive transcripts in rat CNS. Permethrin
and deltamethrin each examined at four doses,
Affymetrix arrays.
15
Dose-response, cont.: a statistic to rank genes…
fˆ (dosehighest ) − fˆ (doselowest ) Standard error estimate.
M=
v Could be improved.

Dose-response data
on pyrethroid in rat
brains, courtesy of
J. Harrill and K.
Crofton, U.S. E.P.A.

16
Project 2
Chem-informatics

• Alex Tropsha, Ph.D. (P.I.) –computational chemistry, QSAR


• Weifan Zheng, Ph.D. – computational methods in drug discovery,
QSAR
• Alexander Golbraikh, Ph.D. – mathematical approaches in QSAR
development
• Yufeng Liu, Ph.D. – Support vector machines, semi-supervised
machine learning
• additional personnel

17
Project 2 objectives

•to develop an innovative QSPR modeling workflow


based on the principles of combinatorial QSPR
modeling, model validation and consensus prediction

•to develop toxicity predictors using the workflow

•to integrate modeling tools and endpoint predictors


using workflow design middleware and workflow
deployment in a predictive toxicology web portal

•Applied to toxicology datasets


18
• Project 2 builds on years of research in the Tropsha lab
on QSAR/QSPR modeling and developing robust
predictors
• Many of the machine learning and cross-validation ideas
are used in statistical genomics
• Descriptors – topological molecular indices, size and
shape, hydrophilic/phobic indices, physical properties,
etc.
• Try to predict biological activity
• Analysis of the Carcinogenic Potency Database
(collaboration with Dr. A. Richard, EPA) has been
performed, applied to 693 compounds, with
classification kNN QSAR prediction accuracies
19
estimated at 85%-90%.
Y-Randomization

Multiple
Training Sets

Combi-QSAR
Original Split into Modeling
Dataset Training and
Test Sets

Only accept models


Multiple Activity that have a
Test Sets Prediction q2 > 0.6
R2 > 0.6, etc.

Validated Predictive
Database Models with High Internal
Screening & External Accuracy

Flowchart of predictive toxicology framework based on


validated combi-QSAR models. Numerous public datasets
proposed. 20
Project 3
Computational Infrastructure for Systems Toxicology

• David Stotts, Ph.D. (co-P.I.) – computer science, software


engineering
• Ivan Rusyn, Ph.D. (co-P.I.) – toxicology, genomics
• Wei Wang, Ph.D. – computer science, data mining
• David Threadgill, Ph.D. – mammalian genetics, genomics
• Additional programmers and students

21
Project 3 objectives
•Develop and implement algorithms that streamline
analysis of multi-dimensional data streams.

•Facilitate the development of a standard workflow for


(i) analysis of -omics data, (ii) linkages to classical
indicators of adverse health effects, and (iii) integration
with other types of biological information such as
sequence and cross-species comparisons.

•Implement user-friendly software for approaches from


Projects 1-3

22
A driving biological problem:

• Toxicogenetic analysis of susceptibility to


toxicant-induced organ injury
• The model being used by Drs. Threadgill and
Rusyn involves extensive profiling of numerous
mouse strains (over 40) for relevant organs
• Early data on acetominophen and alcohol on liver
• Proposals for trichloroethylene and other toxicants
on liver, kidney, and other organs

23
The Mouse as a Model for Studying
Genotype-Phenotype Interactions

24
Image courtesy of D.W. Threadgill
A large variation in response by
genetic background…
Strain-specific susceptibility to
acetaminophen (APAP)-induced
liver injury. Serum ALT levels
(top panel) and tissue
histopathological changes
(bottom panel) were assessed
24 hrs after a single dose
exposure to APAP (300
mg/kg, i.g., 24 hrs).

25
Toxicological and expression analysis of genotype-specific responses to
ethanol in liver. Serum and liver tissues were collected from mice of 6
different strains after acute (5 g/kg, 6 hrs; A) or subchronic (4 weeks, B)
treatment with ethanol.
26
Systems Biology Approach

27
Examination of genetic networks that regulate
gene expression in liver (webQTL and beyond)

Transcriptome map for the murine brain and liver.


Source: Ivan Rusyn and colleagues
28
Source: Ivan Rusyn and colleagues

Correlation between gene expression of CYP2C29 and several liver-


specific phenotypes recorded for BXD strains.

29
Development of new methods for -omics data analysis:
Finding associations between gene expression profiles and
strain-specific genotyping data

SiZer
Smoothing
approach

30
• Data analysis procedures in concert with project 1,
including principal component analyses, distance-
weighted discrimination, SAFE, etc.
• Specific data mining approaches also proposed, such
as subspace clustering (SNPs vs. phenotypes, gene
expression), that fall outside of typical statistical
framework
• The computational challenges are immense when we
compare different –omics platforms (e.g., 100,000
SNPs X 30,000 transcripts)
• This requires serious computer science (activities of
UNC SNP group).
31
Collaborations with the NCCT and NJ ebCTC
• Several collaborative projects identified and ongoing
• Several graduate students working on projects at
NCCT or shared advising with CEBRC members
• Implementation of ArrayTrack extensions underway
in Project 3
• Plan for deliberate expansion of collaborations with
NCCT
• The ebCTC is highly complementary to our Center.
Coordination with ebTrack/ArrayTrack development
is one key area of interchange.
32

You might also like