0% found this document useful (0 votes)
53 views8 pages

Predicting Neurotoxicity of Solvents

This document describes a study that developed nonlinear qualitative and quantitative structure-toxicity relationship (QSTR) models to predict the acute neurotoxicity of organic solvents using probabilistic neural network (PNN) and generalized regression neural network (GRNN) modeling approaches. The models were able to reliably classify solvents as neurotoxic or non-neurotoxic and predict the endpoint neurotoxicities of diverse organic solvents with high accuracy, demonstrating the potential of these QSTR methods for reducing animal testing of solvent neurotoxicity.

Uploaded by

haw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views8 pages

Predicting Neurotoxicity of Solvents

This document describes a study that developed nonlinear qualitative and quantitative structure-toxicity relationship (QSTR) models to predict the acute neurotoxicity of organic solvents using probabilistic neural network (PNN) and generalized regression neural network (GRNN) modeling approaches. The models were able to reliably classify solvents as neurotoxic or non-neurotoxic and predict the endpoint neurotoxicities of diverse organic solvents with high accuracy, demonstrating the potential of these QSTR methods for reducing animal testing of solvent neurotoxicity.

Uploaded by

haw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

NeuroToxicology 53 (2016) 45–52

Contents lists available at ScienceDirect

NeuroToxicology

Full length article

Predicting the acute neurotoxicity of diverse organic solvents using


probabilistic neural networks based QSTR modeling approaches
Nikita Basanta , Shikha Guptab , Kunwar P. Singha,*
a
ETRC, Gomtinagar, Lucknow 226010, India
b
Environmental Chemistry Division, CSIR-Indian Institute of Toxicology Research, Post Box 80, Mahatma Gandhi Marg, Lucknow 226001, India

A R T I C L E I N F O A B S T R A C T

Article history: Organic solvents are widely used chemicals and the neurotoxic properties of some are well established. In
Received 22 July 2015 this study, we established nonlinear qualitative and quantitative structure-toxicity relationship (STR)
Received in revised form 17 December 2015 models for predicting neurotoxic classes and neurotoxicity of structurally diverse solvents in rodent test
Accepted 17 December 2015
species following OECD guideline principles for model development. Probabilistic neural network (PNN)
Available online 22 December 2015
based qualitative and generalized regression neural network (GRNN) based quantitative STR models
were constructed using neurotoxicity data from rat and mouse studies. Further, interspecies correlation
Keywords:
based quantitative activity–activity relationship (QAAR) and global QSTR models were also developed
Neurotoxicity
Solvents
using the combined data set of both rodent species for predicting the neurotoxicity of solvents. The
Nonlinear structure–toxicity relationships constructed models were validated through deriving several statistical coefficients for the test data and
Interspecies correlations the prediction and generalization abilities of these models were evaluated. The qualitative STR models
Quantitative activity–activity relationships (rat and mouse) yielded classification accuracies of 92.86% in the test data sets, whereas, the quantitative
Global structure–toxicity relationship STRs yielded correlation (R2) of >0.93 between the measured and model predicted toxicity values in both
the test data (rat and mouse). The prediction accuracies of the QAAR (R2 0.859) and global STR (R2 0.945)
models were comparable to those of the independent local STR models. The results suggest the ability of
the developed QSTR models to reliably predict binary neurotoxicity classes and the endpoint
neurotoxicities of the structurally diverse organic solvents.
ã 2015 Elsevier Inc. All rights reserved.

1. Introduction health effects. This warrants assessing of risk of solvents in a


systematic manner. Although, experimental test protocols for
Organic solvents are widely used in various applications assessing the neurotoxicity of solvents in rodents have been
including emulsion and micro-emulsion formulation, shoe mak- developed (OECD, 1997), these are tedious and time and resource
ing, degreasing, detergents, cosmetics, paint, metal processing, intensive. On the other hand, computational toxicology continues
auto manufacturing, aeronautical maintenance and manufactur- to be an attractive, viable approach to reduce the amount of effort
ing, and pharmaceutical industries. Moreover, solvents may be and cost of experimental toxicity assessment (Chandler et al., 2011)
used in liquid-liquid extraction and absorption processes, as a and provides a method for the early evaluation in the development
reaction medium and as a carrier, to deliver chemical compounds of new solvents (Cronin et al., 2003; Jaworska et al., 2003). The
in solutions in the required amounts (Gani et al., 2005; Al-Malah, European Union (EU) regulation “Registration, Evaluation, Autho-
2012). Many organic solvents are low molecular weight com- rization and Restriction of Chemicals (REACH, 2015) advocates the
pounds and are volatile, thus transferring a fraction of their volume use of non-animal testing methods and in particular quantitative
to the atmospheric environment at room temperature. Inhalation structure-toxicity/activity relationship (QSTR/QSAR) approaches.
of solvent vapors is the most frequent type of occupational The OECD has provided a set of guidelines for development of
exposure (Dick, 2006). The ability of various solvents to evoke QSARs (OECD, 2007). A qualitative QSAR model may be useful in
acute neurotoxic symptoms and signs is one crucial parameter for classifying solvents into relative neurotoxicity classes (high or low)
the assessment of the hazard of the solvents for adverse human and quantitative QSAR is expected to be a useful tool in predicting
the neurotoxicity potential of chemicals. A few attempts have been
made to develop QSAR models for the neurotoxicity of solvents in
* Corresponding author. rodents (Cronin, 1996; Estrada et al., 2001). However, both of these
E-mail addresses: [email protected], [email protected] QSAR studies, based on linear modeling methods, reported low
(K.P. Singh).

http://dx.doi.org/10.1016/j.neuro.2015.12.013
0161-813X/ ã 2015 Elsevier Inc. All rights reserved.
46 N. Basant et al. / NeuroToxicology 53 (2016) 45–52

prediction accuracies. Moreover, none of the studies attempted to Hartree-Fock method, which is able to capture any specific
develop qualitative QSARs. Poor performance of the QSAR model chemical interaction (Stewart, 2013). The optimized molecular
may be due to the selection of inappropriate modeling method or structures were transferred to Chemopy (2015) for the descriptor
irrelevant descriptors. Experimental toxicity data generally have calculation. A total of 1135 molecular descriptors were calculated
nonlinear structure and linear methods failing to capture nonlinear for each chemical that includes 1D, 2D (constitutional, connectivi-
dependence. Further, interspecies quantitative activity–activity ty, Basak, topology, Kappa, Burden, E-state, autocorrelations,
relationships (QAARs) (Cronin, 2010; Cassani et al., 2013; molecular property, charge, MOE-type) and 3D (geometrical,
Furuhama et al., 2015), which extrapolate data for one toxicity charged partial surface area, Randic molecular profiles from the
endpoint to those for another toxicity endpoint, can be used to geometrical matrix, MoRSE) descriptors. To reduce redundant and
determine the species-specific toxicity of a chemical. When the useless information, descriptors with constant and near constant
toxicity values of defined chemicals for one endpoint correlate well values (variance < 0.5) were removed. Finally, 262 descriptors
with the values for another endpoint, the chemicals can be were retained to undergo subsequent descriptor selection for QSTR
expected to have similar modes of action with respect to both analysis. The most relevant parameters were then selected using
endpoints. the model-fitting approach. Prior to model construction, the
Probability density function (PDF) based neural networks, such neurotoxicity datasets (rat and mouse) were split into respective
as probabilistic neural networks (PNNs), and generalized regres- training (70%) and test (30%) subsets using a random distribution
sion neural networks (GRNNs) capable of capturing the non- method. Using this approach, the samples are selected randomly
linearities in the data have successfully been used in various with a uniform distribution. For the training subset Ttr:
qualitative (classification) and quantitative (regression) QSAR pðX 2 T tr Þ ¼ ntr =n,n ¼ jTj, ntr ¼ jT tr j—each sample has an equal
studies (Mosier and Jurs, 2002; Panaye et al., 2006; Singh et al., probability of selection. This method leads to low bias of the model
2013, 2014). These methods learn quickly and produce reproduc- performance (Reitermanova, 2010). For determining the optimal
ible outputs without any risk for a local minimum of the error values of the model parameters, the models were trained (training
surface (Walzack and Massart, 2000). set) with the retained pool of descriptors through a 5-fold cross-
In this study, the PNN and GRNN based QSTR models were validation (CV) and computing the scoring function (mean squared
established for the qualitative (neurotoxicity classes) and quanti- error, MSE) to rank the contribution of the descriptors in the
tative neurotoxicity predictions of structurally diverse organic current set. The lowest ranked descriptors (<10% contribution)
solvents in rodents (Cronin, 1996) following the OECD guidelines were then removed in the successive steps (Singh et al., 2015). The
for QSAR validation. The predictive and generalization abilities of most significant descriptors were then retained and the corre-
the proposed QSTR models constructed here were evaluated using sponding prediction accuracies were computed. The descriptor
several statistical criteria. The external predictive power of the selection process was performed separately for each modeling
QSTR model was evaluated using the OECD recommended external method (PNN and GRNN). Finally retained descriptors for the
validation tests. Moreover, the possibility finding interspecies qualitative and quantitative QSTRs in both the test species (rat and
correlations (ISC) for the experimental data for rat and mouse has mouse) are presented in Table S2 (Supplementary material). For
been investigated in order to derive a quantitative activity–activity qualitative QSTR modeling, the solvents were categorized as high
relationship (QAAR) model able to predict rat neurotoxicity from neurotoxic (EC30 > 50 mM) and low neurotoxic (EC30 > 50 mM),
the experimental data measured in mouse. rendering a total 25 compounds in the high neurotoxicity
(class = 1) and remaining 22 compounds in the low neurotoxicity
2. Materials and methods (class = 2) categories.

2.1. Datasets 2.3. Model development, validation and applicability domain analysis

The rodent neurotoxicity data (pEC30 mM) of 47 organic In this study, PNN and GRNN based QSTR models (quantitative
solvents were collected from the literature (Cronin, 1996). This and quantitative) were established for predicting the class and
database contained experimental values for the neurotoxicities of neurotoxicity of structurally diverse organic solvents in rats and
organic solvents in rats and mice. A detailed methodology for the mice. An ISC based linear QAAR model was also constructed using
experimental measurements of the neurotoxicities of solvents is the rat and mouse neurotoxicity datasets. A brief account of these
provided elsewhere (Frantik et al., 1994). In brief, the experimental methods is provided here.
values refer to a whole body exposure for 2 h in mouse and 4 h in
rats. Inhibition of propagation and maintenance of the electrically 2.3.1. QSTR modeling
evoked seizure discharge was used as a criterion of the acute PNN estimates the probability density function (PDF) of the
neurotropic effect. Out of a range of concentrations of solvents, an features of each class from the available training samples using the
effective concentration amounting to 30% of the maximum Gaussian kernel function, which are then used in a Bayes decision
possible effect (EC30) was reported. The selected solvent database rule to perform the classification (Gelman et al., 2003). PNN uses a
includes aromatic and aliphatic hydrocarbons, chlorinated hydro- nonparametric technique known as the Parzen window to
carbons, alcohols, ketones, and acetates. The neurotoxicity values construct the class-dependent PDF for each classification category
(pEC30, mM) of the solvents in rat and mouse vary between required by Bayes’ theory. This allows determination of the chance
2.94 and 0.57; 2.98 and 0.82, respectively (Table S1, a given vector pattern lies within a given category. If the jth
Supplementary material). training pattern for category C1 is xj, then the Parzen estimate of
the PDF for category C1 is;
2.2. Molecular descriptors and data processing " #
1 X ðx  xj ÞT ðx  xj Þ
F 1 ðxÞ ¼ exp ;
For calculating the descriptors, the SMILES (simplified molecu- 2ðpÞm=2 s m n 2s 2
lar input line entry system) codes of the solvent molecules were
where n is the number of training patterns, m is the input space
obtained using Chemspider (2015). The SMILES codes were then
dimension, j is the pattern number, and s is the adjustable
used for the geometry optimization of the molecules using PM7
smoothing parameter (Goh, 2002). A PNN consists of a node in
semi empirical method (ChemMop, 2015). PM7 is a parameterized
N. Basant et al. / NeuroToxicology 53 (2016) 45–52 47

layer one for each of the N training samples. The node computes value of the coefficient of determination of the non-random model
the distance d(s,x) from the test vector x to the training sample s (R2) exceeding the average value for the random models ðR2r Þ
and output the value of the Gaussian kernel function. The outcome disapproves the chance correlation probability. The extent of the
of each of the layers one cell is added separately for the different difference in the values of R2 and R2r that signifies the reliability of
classes by the connections to the output cells with weight one.
the developed QSTR model was determined in terms of cR2p (Mitra
GRNN is a four-layered neural network consisting of an input
layer, a pattern layer, a summation layer, and an output layer. The et al., 2010). The threshold value of cR2p is 0.5 and a model
RBF units in GRNN are called “kernels” and are usually PDFs exceeding this value might be considered not the outcome of mere
(Celikoglu, 2006). GRNN estimates any arbitrary function between chance only. For the external validation, a separate test set was
the input and output vectors, drawing the function estimate used, which was kept out of the model building data. In binary
directly from the training data. The PDF is estimated using the classification, the sensitivity, specificity, accuracy, and Matthew's
Parzen’s nonparametric estimator (Parzen, 1962). This function is correlation coefficient (MCC) were calculated. In regression QSTRs,
calculated as a weighted sum of kernel functions represented by statistical parameters like the CCC (concordance correlation
normalized Gaussian functions, with a common “width” (rj). Each coefficient), Q 2F1 ,Q 2F2 ,Q 2F3 and r2m were taken into account to
training pattern is weighted exponentially according to its distance consider the model as a robust one (Lin, 1992; Shi et al., 2001;
to the unknown pattern and to the smoothening factor. The Schuurmann et al., 2008; Consonni et al., 2009; Chirico and
regression of a dependent variable y on independent variable x Gramatica, 2011; Roy et al., 2013). The performance of the
evaluates the most probable value for y, given x as a training set,
proposed QSTR models here was also assessed by calculating R2
Pn T and the root mean squared error (RMSE) of the training and test
j¼1 yi exp½ðx  xj Þ ðx  xj Þ=2r j 
2
yðxÞ ¼ Pn T
; data arrays.
j¼1 exp½ðx  xj Þ ðx  xj Þ=2r j 
2

where xj, yj correspond to the structural descriptors and the 2.3.4. Applicability domain analysis
property value of the hidden node corresponding to the training The applicability domain (AD) of the constructed QSTR models
pattern, j. The numerator and denominator of the equation are was defined using the leverage method. In this approach the
evaluated on the two layers of the summation layers. Their distance of a chemical from the centroids of its training set was
quotient, calculated in the output unit, gives the predicted value. measured. The leverage value, hi for each ith compound is
calculated (Netzeva et al., 2005) from the descriptor (i  j) matrix
2.3.2. ISC QAAR modeling (X) as,
Since, the experimental neurotoxicity values (pEC30) of the hi ¼ xTi ðXT XÞ1 xi ;
organic solvents in two rodent test species (rat and mouse)
exhibited a high correlation (r = 0.90), a quantitative linear where xi is a row vector of molecular descriptors for a particular ith
relationship (ISC) between the two biological endpoints was compound. The value of hi greater than the critical h* value
defined. The QAAR is a mathematical relationship between two indicates that the structure of the compound substantially differs
different biological endpoints measured in the same species or the from those used for the calibration. The h* value can be calculated
same endpoint in different species. This approach is widely used (Netzeva et al., 2005) as,
for the extrapolation of toxicological data for a surrogate species to
 3ðp þ 1Þ
a predicted species. Here, a QAAR model was established using the h ¼ ;
n
mouse neurotoxicity (pEC30) data of 47 organic solvents as the
independent variable to predict rat neurotoxicity. where p is the number of variables used in the model, n is the
number of training data.
2.3.3. QSTR validation
Model validation is needed to estimate the predictive power of 3. Results and discussion
the model. Here, the constructed QSTR and QAAR models were
validated using both the internal and external validation proce- 3.1. Qualitative modeling and evaluation
dures. Internal validation was performed through a 5-fold cross-
validation (CV). In this method, the data D are divided into Here, PNN based separate qualitative (STR) models were
k = 5 non-overlapping sets, D1 . . . ., Dk. At each iteration i (from 1 to constructed for categorizing organic solvents into high and low
k = 5), the model was trained with D–Di and tested on Di (Singh neurotoxic classes for the rat and mouse data. The two models
et al., 2015). The criteria of low MR (classification) and MSE were based on separate sets of descriptors and 5-fold CV was used
(regression) values in the training and validation data in CV were to validate the model robustness. The average classification
used for the optimal model selection. Further, Y-randomization accuracy in the training and CV data for the two models was
was performed to check any chance correlation of the constructed 95.21%, 82.89% (rat), and 98.42%, 80.67% (mouse). A 5-fold Y-
QSTR and QAAR models (Rücker et al., 2007) using a 5-fold CV. A randomization was performed to check any chance correlation in

Table 1
Optimal model parameters in PNN and GRNN based QSTR models.

Model parameters Rat QSTR models Mouse QSTR models

Qualitative Quantitative Qualitative Quantitative


model model model model
Neurons in input layer 2 3 3 5
Neurons in pattern layer 33 33 33 33
Neurons in summation layer 2 2 2 2
Neurons in output layer 2 1 2 1
Kernel function Gaussian Gaussian Gaussian Gaussian
48 N. Basant et al. / NeuroToxicology 53 (2016) 45–52

the constructed models and yielded the respective classification (Shev); and 1.95, 1.37 (Log P), respectively. The mean values of all
accuracies of 54.04% and 55.74% in the rat and mouse models. The the three descriptors (Chiv2, Shev, Log P) in the mouse model were
results suggest that the constructed models are not due to chance high for the compounds in high neurotoxic category and low for
correlation. The architectures and optimal parameters of the those in the low neurotoxic class. It is evident that in general, the
constructed PNN-STR models determined through the internal and mean values of all the descriptors (except nhyd in rat model) are
external validation for the rat and mouse data are given in Table 1. higher for the chemicals in high neurotoxicity class and the values
The value of the spread (s ) parameter of the Gaussian function of all the descriptors for the chemicals in two classes are
was optimized using the conjugate-gradient algorithm. Here, significantly different. Moreover, the PNN models are known as
separate s values were considered for each of the input variable relatively insensitive to the outliers and generate accurate
and category and search for each was made in the range of 0.0001– predicted target probability scores (Sawant and Topannavar, 2015).
10. Selection of s values for each variable provided a relatively
better model as compared to single model s value (Singh et al., 3.2. Quantitative modeling and evaluation
2013). The optimal values of s for considered input variables
ranged between 0.06 and 0.66. The values of the performance Here, the GRNN based quantitative STR models were con-
parameters (sensitivity, specificity, accuracy and MCC) for the rat structed for predicting the neurotoxicity of the solvents. Accord-
and mouse models are given in Table 2. ingly, mathematical functions, establishing relationships between
The results show that the accuracy of classification in two the independent set of descriptors and the endpoint property were
models (rat and mouse) were 90.91%, 96.97% in training and developed separately in rat and mouse data. The optimal
92.86%, 92.86% in the test data, respectively. High values for the architectures and the model parameters of the two GRNN based
sensitivity (>87.50%), specificity (>83.33%) and MCC (>0.83) for STR models for the rat and mouse data were determined using a 5-
both the models suggest the robustness of the models. Sensitivity fold CV. The average MSE in the training and CV data for the two
is considered the most important parameter in a classification models were 0.02, 0.09 (rat), and 0.01, 0.17 (mouse), respectively. A
model. A low sensitivity value indicates the low ability of a model 5-fold Y-scrambling was performed to check any chance correla-
to recognize the toxicity of diverse compounds. Specificity is tion in the two models. Low R2 and high cR2p values of 0.07, 0.867
another important indicator. A high specificity value indicates the (rat) and 0.07, 0.965 (mouse) in Y-randomization revealed that the
ability of the model to recognize the false positive compounds and original STR models are unlikely to arise as a result of chance
it can save experimental costs (Cheng et al., 2011). Fjodorova et al. correlation. The architectures and the optimal parameters of the
(Fjodorova et al., 2010) suggested that classification model for constructed STR models determined through the internal and
regulatory purpose should have high sensitivity. The MCC usually external validation for the rat and mouse data are given in Table 1.
varies from 1 to +1 referring to an inverse classification to a Here, separate s values were considered for each of the input
perfect classification, respectively, whereas a value of 0 corre- variables and were searched in the range of 0.0001 to 7. Selection of
sponds to random classification performance. The values of the the s values for each variable provided a relatively better model as
MCC calculated for both the models in the training (0.83, 0.94), and compared to a single model s value. The optimal values of s for
in test (0.87, 0.86) were close to unity suggesting the adequacy of considered input variables ranged between 0.25–0.58 (rat) and
the proposed model in predicting the neurotoxicity behavior of the 0.19–0.39 (mouse), respectively. The finally selected optimal STR
solvents. Further, the area under the ROC curve (AUROC) was also models in training captured 89.21% (rat) and 99.98% (mouse) of the
determined to check the performance of the classification models total data variance, respectively. The proportion of the variance
for both the rat and mouse. The calculated value of AUROC of captured by the model descriptors is a measure of the closeness of
training and test set are 0.917, 0.938 (rat) and 0.972, 0.944 (mouse), the predicted and actual values of the endpoint property. Selected
respectively. The results are on the higher side of the acceptable GRNN yielded RMSE and R2 values of 0.13, 0.932 (rat) and 0.14,
limit of 0.5. AUROC also strongly supports the reliability of our 0.952 (mouse) in test data. The values of the GRNN model
developed classification models. The discriminating ability of a performance criteria parameters are presented in Table 3.
classification model is largely based on the modeling approach and It may be noted that the optimal GRNN models yielded high
the selected descriptors. A high value of classification accuracy for correlation between the measured and the model predicted values
the chemicals into two categories in both the data (rat and mouse) of the endpoint toxicity both in rat and mouse in the training and
here may be due to the fact that the chemicals in two classes differ test data. A closely followed pattern of variation of the measured
significantly in their characteristics (descriptors) considered for and model predicted response (Fig. S2, Supplementary material)
modeling. The values of the descriptors of the chemicals in two and reasonably low values of prediction errors (Table 3) suggest for
classes (low and high neurotoxic) are shown in the Box-Whisker a good-fit of the GRNN model to the data sets and for the adequacy
plots (Fig. S1, Supplementary material). The mean values of these of the selected model for predicting the neurotoxicity of the
descriptors for the two categories of high (class 1) and low (class 2) chemicals.
neurotoxic compounds in rat and mouse models were; 6.76, 9.09 Several stringent validation coefficients in external test set,
(nhyd); 278.84, 220.07 (TASA); 1.87, 1.53 (Chiv2), 15.00, 13.64
such as CCC, Q 2F1 , Q 2F2 , Q 2F3 and r2m were derived to attain higher
confidence in the constructed models. In the OECD guideline,
Table 2
Performance metrics for the qualitative STR models. Table 3
Performance parameters for the quantitative STR models.
Model/data set Sensitivity Specificity (%) Accuracy (%) MCC AUC
(%) Model/Data set RMSE R2 Q2F1 Q2F2 Q2F3 CCC r 2m
Rat model Rat model
Training set 100.00 83.33 90.91 0.83 0.917 Training 0.18 0.902 – – – – –
Test set 87.50 100.00 92.86 0.87 0.938 Test 0.13 0.932 0.932 0.931 0.941 0.965 0.873

Mouse model Mouse model


Training set 94.44 100.00 96.97 0.94 0.972 Training 0.01 0.999 – – – – –
Test set 88.89 100.00 92.86 0.86 0.944 Test 0.14 0.952 0.925 0.921 0.928 0.961 0.872
N. Basant et al. / NeuroToxicology 53 (2016) 45–52 49

Principle 4 advocates for a rigorous validation of the constructed predicted. Here, we used nine different descriptors (nhyd, TASA,
QSTR models prior to applying them for new chemicals. The values Shev, Smin, LogP, Chiv2, IDET, phi, WNSA2) to establish the
of these coefficients along with their respective thresholds qualitative and quantitative STR models for the neurotoxicity
(Tropsha et al., 2011; Chirico and Gramatica, 2012) and the quality prediction of chemical solvents in rodent test species. The
metric R2 are given in Table 3. From the results (Table 3), it is contributions of the selected descriptors in different QSTR models
evident that all the validation metrics for the developed STR developed here are presented in Fig. 2. In rat data, nhyd in
models were within their acceptable limits. Thus, the predictive qualitative and Shev in quantitative STR exhibited the highest
potential of the developed STR models in terms of external (100%) contribution, whereas in mouse data, Shev in qualitative
validation tests is reflected in the acceptable values of the metrics. and phi in quantitative STRs were the most influential descriptors.
Further, we have compared the results of previous studies on A correlation matrix of the descriptors and the endpoint toxicities
the toxicity prediction of solvents in rodents. It may be noted that of the chemicals are given in Table S3 (Supplementary material).
both of the previous studies (Cronin, 1996; Estrada et al., 2001) In the quantitative rat neurotoxicity model, all of the three
used multiple linear regression (MLR) models for neurotoxicity descriptors (Shev, Smin and Log P) have a positive correlation
prediction in rat and mouse. Cronin (1996) reported R2 of 0.50 for (r = 0.1–0.49) with the endpoint. In the case of the mouse model,
rat (n = 44) and 0.57 (n = 44) for mouse data. On the other hand, in Shev, IDET, and Log P) have a positive correlation (r = 0.22–0.34)
another QSAR study, Estrada et al. (2001) reported R2 of 0.81 in with the endpoint and phi and WNSA2 have a negative
both the rat (n = 45) and mouse (n = 46) data, respectively. It is (r = 0.4 and 0.48) dependence. The Log P and nhyd are both
evident that the proposed SAR models in the present study constitutional descriptors. The Log P is a measure of the
outperformed both of the earlier ones. A relatively low variance hydrophobicity which determines the uptake, transport, and
captured in both the previous QSAR studies may be due to the distribution of organic toxicants via passive diffusion. For
linear modeling approach considered there. Moreover, external compounds with a strong lipophilic character, the lipid bilayer
validation is an important step in QSAR analysis to bring a higher of the cellular membrane is a potential site for interaction.
confidence in the prediction results, which in neither of the two
studies has been performed.

3.3. Applicability domain analysis

Leverage is one of the standard methods for determining the AD


of QSTR models. Here, to visualize the AD of the constructed QSTRs
(rat and mouse), the Williams plots were examined (Fig. 1) for the
detection of the response outliers (standardized residuals >3) and
structurally influential chemicals in the model (h > h*) (in the
training data). As evident from Fig. 1, all predictions were reliable
for QSTR models and there is no response outlier compound in the
training and test sets of rat, whereas, a single compound
(propylbenzene) in the test set (mouse model) was detected as
the response outlier. None of the compounds in the training and
test data sets of the two models was detected as structurally
influential. The anomalous behavior of these chemicals may be due
to one of the following reasons (i) incorrect experimental input
data, (ii) the selected descriptors do not capture some relevant
structural features present in these molecules and absent in others,
and (iii) their biological mechanism is different from the remaining
chemicals. For future predictions, predicted toxicity data must be
considered reliable only for those chemicals that fall within the AD
on which the model is constructed.

3.4. Mechanistic interpretation of QSTRs

In the OECD guideline, Principle 5 states that a QSTR model


should be mechanistically interpretable, which can link the Fig. 2. Plots of the contributions of the input variables in (a) qualitative STR, and (b)
quantitative STR models.
descriptors used in the model and the endpoint property be

4 (a) Rat-QSTR 4 Training Test (b) Mouse-QSTR


Training Test
3
Standardized Residuals

3
Standardized Residuals

2 2
1 1
0 0
-1 -1
-2 -2
-3 -3
h* = 0.36 h* = 0.55
-4 -4
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 0.6
Leverage Leverage

Fig. 1. Williams plot for the (a) Rat QSTR, and (b) Mouse QSTR models.
50 N. Basant et al. / NeuroToxicology 53 (2016) 45–52

Partitioning of the molecule into the membrane permits it to 0.0


interfere with the organism. Hence, the Log P descriptor might be
related to the ability of solvents to penetrate through the cellular
membrane and reach the target site (Katritzky et al., 2009). The -1.0

pEC30 pred
nhyd is an element count descriptor that represents the number of
hydrogen atoms in a molecule, suggesting double, triple, or
aromatic substituent over alkyl substituent (Sharma et al., 2013). -2.0
The Shev (E-state indices over all non-H atoms) and Smin (the Training
minimal E-state value in all atoms) are both electro-topological Test
indices. E-state indices describe the electronic character and -3.0
-3.0 -2.0 -1.0 0.0
topological environment of a skeletal atom in the molecule. These
pEC30 exp
indices depend on p bonds, loan-pair electrons and s bonds that
a
reflect quantitative availability of valence electrons for ligand-
target interactions (Samat et al., 2014). The Chiv2 is a valence 4
molecular connectivity order 2. It represents the degree of skeletal Training Test

Standardized Residuals
3
branching and molecular size and includes heteroatoms and
2
valence state information with high sensitivity (Contrera et al.,
2005). The phi is a kappa shape molecular flexibility descriptor 1
that increases with homologation and decreases with increased 0
branching or cyclicity. A larger phi value indicates greater -1
molecular flexibility. The IDET is topological descriptor which -2
represents a total information index on distance equality. The TASA
-3
(total hydrophobic surface area) and WNSA2 (surface weighted h* = 0.18
-4
charged partial negative surface area) are both charged partial
0.00 0.04 0.08 0.12 0.16 0.20
surface area (CPSA) descriptors. The presence of these descriptors Leverage
in our models underlines the importance of the polarity in the toxic
b
process. Polarity is directly related to the hydrophobicity which
enhances the permeability of molecule leading to higher toxicity. Fig. 3. (a) Plot of the measured and model predicted pEC30 values in QAAR model,
Thus, it is clear that the selected descriptors have quantitative and (b) Williams plot of the QAAR model.
mechanistic relationships with the endpoint property investigated
here. that the original global GRNN QSTR disapproved the chance
correlation probability. The global QSTR model yielded the R2 and
3.5. Interspecies QAAR modeling and evaluation RMSE values of 0.909, 0.16 (training) and 0.945, 0.13 (test). The
values of all the statistical coefficients derived from the test set
For the QAAR development, the neurotoxicity data (rat and were Q 2F1 (0.943), Q 2F2 (0.941), Q 2F3 (0.940), CCC (0.968) and r2m
mouse) were split into training (70%) and test (30%) sets by a
random distribution approach. Here, a linear regression model was
0.5
constructed considering the mouse neurotoxicity data as indepen-
dent and the rat toxicity data as the dependent variable. The linear
equation obtained was; pEC30 (rat) = 0.26 + 0.847 pEC30 (mouse; -0.5
n = 33). The model yielded the R2 (training) and RMSE values of
pEC30 pred

0.815 and 0.21, respectively. The model applied to the test data -1.5
(n = 14) yielded R2 (test) and RMSE values of 0.859 and 0.27,
respectively. The values of the coefficients for the test set were Q 2F1 -2.5
Training
(0.791),Q 2F2(0.791),Q 2F3
(0.705), CCC (0.861) and (0.40) were above Test
-3.5
their respective thresholds (except for r2m ). -3.5 -2.5 -1.5 -0.5 0.5
Fig. 3 shows the plot of experimental and QAAR predicted pEC30 exp
values and the Williams graph for the training and test data. All the a
other compounds fall within the structural AD of the QAAR model
(Fig. 3). The proposed QAAR model can be a possible supporting
tool to reduce the experimental tests performed in rodents. 4 Training Test Global-QSTR
Standardized Residuals

3
3.6. Global QSTR modeling 2
1
The GRNN based global QSTR model was constructed using the
0
combined data set (n = 94) of both of the test species. The
application domain of the global model was thus wider than the -1
local models. The data were split into training (70%) and test (30%) -2
sets using a random approach. Four descriptors (Shev, Log P, phi -3
and WNSA2) were selected by the model fitting approach on the -4
h* = 0.23
training data using a 5-fold CV. Y-scrambling and external 0.00 0.05 0.10 0.15 0.20 0.25 0.30
validation were performed to verify the chance correlation and Leverage
applicability of the global model. In CV, the average MSE in the b
training and CV data were 0.02 and 0.06, respectively. Low R2 and
Fig. 4. (a) Plot of the measured and model predicted pEC30 values in the global QSTR
high cR2p values of 0.46 and 0.64 in Y-randomization test revealed model, and (b) Williams plot of the global QSTR model.
N. Basant et al. / NeuroToxicology 53 (2016) 45–52 51

(0.851) were above their respective thresholds. The plot of the interspecies quantitative activity–activity modeling. J. Hazard. Mater. 258–259,
actual and predicted values of the endpoint toxicity (Fig. 4a) 50–60.
Celikoglu, H.B., 2006. Application of radial basis function and generalized regression
suggested a high correlation between them. The AD of the global neural networks in non-linear utility function specification for travel mode
QSTR model constructed here was also evaluated using the choice modeling. Math. Comput. Model. 44, 640–658.
leverage approach and the leverage and the standardized residuals Chandler, K.J., Barrier, M., Jeffay, S., Nichols, H.P., Kleinstreuer, N.C., Singh, A.V., Reif,
D.M., Sipes, N.S., Judson, R.S., Dix, D.J., Kavlock, R., Hunter, E.S., Knudsen, T.B.,
of the chemical solvents (Williams plot) are plotted (Fig. 4b). Visual 2011. Evaluation of 309 environmental chemicals using a mouse embryonic
inspection of the plot suggested that none of the chemical solvents stem cell adherent cell differentiation and cytotoxicity assay. PLoS One 6,
considered here exhibited higher leverage and standardized e18540. doi:http://dx.doi.org/10.1371/journal.pone.0018540.
ChemMop, http://www.scbdd.com/mopac-optimization/optimize/.
residual. The absence of any outlier in the test and training data Chemopy Webserver, http://www.scbdd.com/chemopy_desc/index/.
implies a wider applicability of the developed GRNN based global Chemspider, www.chemspider.com.
QSTR model for predicting the neurotoxicity of solvents in rodents. Cheng, F., Shen, J., Yu, Y., Li, W., Liu, G., Lee, P.W., Tang, Y., 2011. In silico prediction of
tetrahymena pyriformis toxicity for diverse industrial chemicals with
Further, it may be noted that the prediction accuracy of the global
substructure pattern recognition and machine learning methods. Chemosphere
QSTR is comparable with those of the local models. 82, 1636–1643.
Chirico, N., Gramatica, P., 2011. Real external predictivity of QSAR models: how to
evaluate it? Comparison of different validation criteria and proposal of using the
4. Conclusions concordance correlation coefficient. J. Chem. Info. Model. 51, 2320–2335.
Chirico, N., Gramatica, P., 2012. Real external predictivity of QSAR models: part 2.
New intercomparable thresholds for different validation criteria and the need
In this study, PNN based qualitative STR and GRNN based for scatter plot inspection. J. Chem. Info. Model. 52.
quantitative STR models were established for predicting neurotox- Consonni, V., Ballabio, D., Todeschini, R., 2009. Comments on the definition of the Q2
icity classes and the endpoint neurotoxicities of structurally diverse parameter for QSAR validation. J. Chem. Info. Model. 49, 1669–1678.
Contrera, J.F., MacLaughlin, P., Hall, L.H., Kier, L.B., 2005. QSAR modeling of
solvents following OECD principles. Accordingly, rodent neurotox- carcinogenic risk using discriminant analysis and topological molecular
icity (pEC30) datasets (rat and mouse) of 47 chemical solvents were descriptors. Curr. Drug Discov.Technol. 2, 55–67.
considered for model development. The structural diversity of the Cronin, M.T.D., Jaworska, J.S., Walker, J.D., Comber, M.H.I., Watts, C.D., Worth, A.P.,
2003. Use of QSARs in international decision-making frameworks to predict
organic solvents was established using the TSI statistics. A total of
health effects of chemical substances. Environ. Health Persp. 111, 1391–1401.
nine different molecular descriptors were selected for the qualitative Cronin, M.T.D., 1996. Quantitative structure–activity relationship (QSAR) analysis of
and quantitative STRs development. In addition, the ISC QAAR and the acute sublethal neurotoxicity of solvents. Toxicol. In Vito 10, 103–110.
Cronin, M.T.D., 2010. Biological read-across: mechanistically-based species-species
global quantitative STR models were also constructed for the
and endpoint-endpoint extrapolations. In: Cronin, M.T.D., Madden, J.C. (Eds.),
quantitative prediction of the endpoint neurotoxicity of solvents in Silico Toxicology: Principles and Applications. Royal Society of Chemistry,
the rodents. Several statistical validation tests performed revealed a Cambridge, pp. 446–477 (Chapter 18).
high predictivity for the qualitative and quantitative models and Dick, F.D., 2006. Solvent neurotoxicity. Occup. Environ. Med. 63, 221–226.
Estrada, E., Molina, E., Uriarte, E., 2001. Quantitative structure-toxicity relationships
rendered high statistical confidence in the developed STRs. Perform- using tops-mode. 2. Neurotoxicity of a non-congeneric series of solvents. SAR
ances of both the ISC QAAR and global QSTR models were comparable QSAR Environ. Res. 12, 445–459.
with those of the individual rat and mouse STRs. The STRs developed Fjodorova, N., Vracko, M., Novic, M., Roncaglioni, A., Benfenati, E., 2010. New public
QSAR models for carcinogenicity. Chem. Cent. J. 4, 1–15.
in the present study performed better than those reported earlier for Frantik, E., Homychovi, M., HorvBth, M., 1994. Relative acute neurotoxicity of
the prediction of neurotoxicity. Excellent predictive and generaliza- solvents: isoeffective air concentrations of 48 compounds evaluated in rats and
tion achieved for the PNN and GRNN based STR models here may be mice. Environ. Res. 66, 173–185.
Furuhama, A., Hasunuma, K., Aoki, Y., 2015. Interspecies quantitative structure–
attributed to their ability to work well in noisy environments. The activity–activity relationships (QSAARs) for prediction of acute aquatic toxicity
results of the AD analysis using the leverage method revealed a single of aromatic amines and phenols. SAR QSAR Environ. Res. 26, 301–323.
compound (in mouse) as the response outlier and thus, confirmed Gani, R., Jimenez-Gonzalez, C., Constable, D.J.C., 2005. Method for selection of
solvents for promotion of organic reactions. Comput. Chem. Eng. 29, 1661–1676.
the applicability of the constructed QSTR models over a wide
Gelman, A., Carlin, J., Stren, H., Rubin, D., 2003. Bayesian Data Analysis. CRC Press,
chemical space. This study provided a powerful tool for the Boca Raton, FL.
prediction of the acute neurotoxicity of organic solvents in rodent Goh, T.C., 2002. Probabilistic neural network for evaluating seismic liquefaction
potential. Can. Geotech. J. 39, 219–232.
test species, hence useful in cost and effort reduction towards the
Jaworska, J.S., Comber, M., Auer, C., Van Leeuwen, C.J., 2003. Summary of a workshop
neurotoxicity evaluation of new chemicals. on regulatory acceptance of (Q)SARs for human health and environmental
endpoints. Environ. Health Perspect. 111, 1358–1360.
Katritzky, A.R., Slavov, S.H., Stoyanova-Slavova, I.S., Kahn, I., Karelson, M., 2009.
Conflict of interest Quantitative structure–activity relationship (QSAR) modeling of EC50 of aquatic
toxicities for Daphnia magna. J. Toxicol. Environ. Health 72, 1181–1190.
Lin, L.I., 1992. Assay validation using the concordance correlation coefficient.
None. Biometrics 48, 599–604.
Mitra, I., Saha, A., Roy, K., 2010. Exploring quantitative structure–activity
Acknowledgement relationship studies of antioxidant phenolic compounds obtained from
traditional Chinese medicinal plants. Mol. Simul. 36, 1067–1079.
Mosier, P.D., Jurs, P.C., 2002. QSAR/QSPR studies using probabilistic neural networks
The authors thank the Director, CSIR-Indian Institute of and generalized regression neural networks. J. Chem. Info. Comput. Sci. 42,
Toxicology Research, Lucknow (India) for his keen interest in this 1460–1470.
Netzeva, T.I., Worth, A.P., Aldenberg, T., Benigni, R., Cronin, M.T.D., Gramatica, P.,
work and providing all necessary facilities. Jaworska, J.S., Kahn, S., Klopman, G., Marchant, C.A., Myatt, G., Nikolova-
Jeliazkova, N., Patlewicz, G.Y., Perkins, R., Roberts, D.W., Schultz, T.W., Stanton, D.
Appendix A. Supplementary data T., van de Sandt, J.J.M., Tong, W., Veith, G., Yang, C., 2005. Current status of
methods for defining the applicability domain of (quantitative) structure–
activity relationships. Altern. Lab. Anim. 33, 155–173.
Supplementary data associated with this article can be found, in OECD, Test No. 424: Neurotoxicity Study in Rodents, OECD Guidelines for Testing of
the online version, at http://dx.doi.org/10.1016/j.neuro.2015.12.013. Chemicals, Section 4: Health effects, OECD Publishing: Paris, France, 1997. doi:
10.1787/9789264071025-en.
OECD, Environment Health and Safety Publications Series on Testing and
References Assessment No. 69, Guidance Document On The Validation Of (Quantitative)
Structure-Activity Relationship [(Q)SAR] Models, 2007. Accessed from http://
Al-Malah, K.I., 2012. Prediction of aqueous solubility of organic solvents as a search.oecd.org/officialdocuments/displaydocumentpdf/?cote=env/jm/mono
function of selected molecular properties. J. Pharm. Drug Deliv. Res. 1, 2. doi: (2007)2&doclanguage=en.
http://dx.doi.org/10.4172/2325-9604.1000106. Panaye, A., Fan, B.T., Doucet, J.P., Yao, X.J., Zhang, R.S., Liu, M.C., Hu, Z.D., 2006.
Cassani, S., Kovarich, S., Papa, E., Roy, P.P., van der Wal, L., Gramatica, P., 2013. Quantitative structure-toxicity relationships (QSTRs): a comparative study of
Daphnia and fish toxicity of (benzo) triazoles: validated QSAR models,and various nonlinear methods general regression neural network, radial basis
52 N. Basant et al. / NeuroToxicology 53 (2016) 45–52

function neural network and support vector machine in predicting toxicity of Sharma, M.C., Sharma, S., Sahu, N.K., Kohli, D.V., 2013. QSAR studies of some
nitro- and cyano-aromatics to Tetrahymena pyriformis. SAR QSAR Environ. Res. substituted imidazolinones angiotensin II receptor antagonists using Partial
17, 75–91. Least Squares Regression (PLSR) method based feature selection. J. Saudi Chem.
Parzen, E., 1962. On estimation of a probability density function and mode. Ann. Soc. 17, 219–225.
Math. Stat. 33, 1065–1076. Shi, L.M., Fang, H., Tong, W., Wu, J., Perkins, R., Blair, R.M., Branham, W.S., Dial, S.L.,
Rücker, C., Rücker, G., Meringer, M., 2007. Y-Randomization and its variants in QSPR/ Moland, C.L., Sheehan, D.M., 2001. QSAR models using a large diverse set of
QSAR. J. Chem. Info. Comput. Sci. 47, 2345–2357. estrogens. J. Chem. Info. Comput. Sci. 41, 186–195.
REACH—European Community Regulation on chemicals and their safe use. Singh, K.P., Gupta, S., Rai, P., 2013. Predicting carcinogenicity of diverse chemicals
Available online: http://ec.europa.eu/environment/chemicals/reach/ using probabilistic neural network modeling approaches. Toxicol. Appl.
reach_intro.htm. Pharmacol. 272, 465–475.
Z. Reitermanova, Data splitting, WDS’s 10 Proceedings of Contributed Papers, Part 1, Singh, K.P., Gupta, S., Basant, N., Mohan, D., 2014. QSTR modeling for qualitative and
2010, 31–36. quantitative toxicity predictions of diverse chemical pesticides in honey bee for
Roy, K., Chakraborty, P., Mitra, I., Ojha, P.K., Kar, S., Das, R.N., 2013. Some case studies regulatory purposes. Chem. Res. Toxicol. 27, 1504–1515.
on application of “rm2” metrics for judging quality of quantitative structure– Singh, K.P., Gupta, S., Basant, N., 2015. In silico prediction of cellular permeability of
activity relationship predictions: Emphasis on scaling of response data. J. diverse chemicals using qualitative and quantitative SAR modeling approaches.
Comput. Chem. 34, 1071–1082. Chemom. Intel. Lab. Syst. 140, 61–72.
Samat, N.H.A., Abdualkader, A.M., Mohamed, F., Abdullahi, A.D., 2014. Group-based Stewart, J.J., 2013. Optimization of parameters for semiempirical methods VI: more
quantitative structural activity relationship analysis of bcelllymphoma extra modifications to the NDDO approximations and re-optimization of parameters.
large (bcl-xl) inhibitors. Int. J. Pharm. Pharm. Sci. 6, 284–290. J. Mol. Model. 19, 1–32.
Sawant, S.S., Topannavar, P.S., 2015. Introduction to probabilistic neural network- Tropsha, A., Golbraikh, A., Cho, W.J., 2011. Development of kNN QSAR models for 3-
used for image classifications. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 5, 279– arylisoquinoline antitumor agents. Bull. Korean Chem. Soc. 32, 2397–2404.
283. Walzack, B., Massart, D.L., 2000. Local modeling with radial basis function networks.
Schuurmann, G., Ebert, R., Chen, J., Wang, B., Kuhne, R., 2008. External validation and Chemom. Intell. Lab. Syst. 50, 179–198.
prediction employing the predictive squared correlation coefficient test set
activity mean vs training set activity mean. J. Chem. Info. Model. 48, 2140–2145.

You might also like