0% found this document useful (0 votes)
69 views11 pages

AI Toxicity Prediction with ToxCast Data

Uploaded by

Ojas Harkare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views11 pages

AI Toxicity Prediction with ToxCast Data

Uploaded by

Ojas Harkare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Toxicology 517 (2025) 154230

Contents lists available at ScienceDirect

Toxicology
journal homepage: [Link]/locate/toxicol

AI-based toxicity prediction models using ToxCast data: Current status and
future directions for explainable models
Donghyeon Kim 1, Jinhee Choi *
School of Environmental Engineering, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, Republic of Korea

A R T I C L E I N F O A B S T R A C T

Handling Editor: Dr. Mathieu Vinken Artificial intelligence (AI) offers new opportunities for developing toxicity prediction models to screen envi­
ronmental chemicals. U.S. EPA’s ToxCast program provides one of the largest toxicological databases and has
Keywords: consequently become the most widely used data source for developing AI-driven models. ToxCast In this review,
ToxCast we analyzed 93 peer-reviewed papers published since 2015 to provide an overview of ToxCast data-based AI
Artificial intelligence
models. We overviewed the current landscape in terms of database structure, target endpoints, molecular rep­
Toxicity prediction
resentations, and learning algorithms. Most models focus on data-rich endpoints and organ-specific toxicity
Next generation risk assessment
mechanisms, particularly endocrine disruption and hepatotoxicity. While conventional molecular fingerprints
and descriptors are still common, recent studies employ alternative representations—graphs, images, and
text—leveraging advances in deep learning. Likewise, traditional supervised machine-learning algorithms
remain prevalent, but newer work increasingly adopts semi- and unsupervised approaches to tackle data-sparsity
challenges. Beyond classical structure-based QSAR, ToxCast data are also being used as biological features to
predict in vivo toxicity. We conclude by discussing current limitations and future directions for applying ToxCast-
based AI models to accelerate next-generation risk assessment (NGRA).

1. Background and industry (Jebali et al., 2024; Qin et al., 2023). Computational
toxicology aims to provide insights for drug discovery and prioritize
Next Generation Risk Assessment (NGRA) is described as an environmental chemicals for risk assessment, integrating the latest ad­
exposure-led, hypothesis-driven approach that incorporates in silico, in vances in life sciences, including molecular biology and computational
chemico, and in vitro methods (Dent et al., 2018). New approach meth­ modeling (Perkins et al., 2003). Various computational techniques and
odologies (NAMs) can be used individually or together to enhance databases are being actively developed to support these goals (Cavasotto
chemical safety assessment through more protective and relevant and Scardino, 2022; Liu et al., 2023a,b).
models, thereby reducing the need for animal testing (Punt et al., 2020; In the toxicology field, most toxicity prediction models rely on
Stucki et al., 2022). Although animals are still heavily relied upon or quantitative structure activity relationship (QSAR) principles, the
legally required for safety assessments in some sectors, the scientific concept of inferring the relationship between a chemical structure and
community is increasingly adopting the 3Rs (replacement, reduction, its biological activity, including toxicity (Cherkasov et al., 2014; Gada­
and refinement of animal use in research) not just for ethical reasons leta et al., 2018). One of the most important factors in the development
(Sewell et al., 2024). Adopting NAMs addresses significant animal of a robust QSAR model is the availability of high-quality and sufficient
welfare issues while also offering the potential for substantial scientific data (Füzi et al., 2023). Thus, collecting and refining the training dataset
advancements and, in some instances, economic benefits. Computa­ is the most important step in modeling. Our previous review on artificial
tional toxicology, in particular, is poised for further innovation through intelligence (AI)-based toxicity prediction models highlights the ToxCast
the application of big data-driven artificial intelligence (AI) technology data as the preeminent choice among researchers in the pursuit of
(Raslan et al., 2023; Rusyn and Daston, 2010). Today, AI is set to developing QSAR-based prediction models (Jeong and Choi, 2022). The
transform socioeconomic lifestyles, driving innovations in both research data were produced under the Toxicity Forecasting (ToxCast) program,

* Corresponding author.
E-mail address: jinhchoi@[Link] (J. Choi).
1
[Link]/0000–0002-3860–0648

[Link]
Received 11 May 2025; Received in revised form 24 June 2025; Accepted 4 July 2025
Available online 9 July 2025
0300-483X/© 2025 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license ([Link]
nc-nd/4.0/).
D. Kim and J. Choi Toxicology 517 (2025) 154230

launched by the US Environmental Protection Agency (EPA) in 2007, to target endpoints, molecular representation methods, and novel algo­
produce toxicity data using high-throughput in vitro assays (Jeong et al., rithms used in 94 papers published since 2015. We also suggested the
2022a; Richard et al., 2016). The preference for ToxCast data in devel­ potential limitations and perspectives on using ToxCast data in devel­
oping AI models is attributed to two main factors: the vast amount of oping explainable AI models for implementing NGRA.
quantity and high degree of homogeneity resulting from standardized
data processing (Feshuk et al., 2023). ToxCast data contains publicly 2. Study characteristics
available in vitro screening results for thousands of chemicals across
diverse assay platforms (Filer et al., 2017; Judson et al., 2014; Richard We analyzed the annual publications in PubMed ([Link]
et al., 2016), providing a substantial amount of data for model devel­ [Link]/) search engine. The keyword “ToxCast” or “Tox21”
opment. Although these data initially originate from various heteroge­ were searched in PubMed (accessed 4 February 2024), and papers
neous assay platforms and targets, the application of the standardized related to QSAR-based AI models since 2015 were selected. Our search
ToxCast pipeline systematically processes and integrates the results, resulted in 562 publications, of which 93 met eligibility criteria and
yielding a homogeneous dataset suitable for toxicity prediction were included in the review (Lee et al., 2017) (Fig. 1(A)). Literature
modeling. Thus, compared to other toxicity datasets lacking such sys­ featuring AI toxicity prediction models utilizing ToxCast data, including
tematic processing, the ToxCast data output demonstrates higher ho­ both chemical library and bioactivity data, were chosen for in-depth
mogeneity, facilitating more consistent and reliable model development analysis. In the analysis of the 93 papers, the research purpose, target
(Jeong et al., 2022a). endpoint, molecular representation, algorithms, use cases of ToxCast
Meanwhile, our interest lies in exploring the potential of ToxCast data for in vivo toxicity prediction, were examined for each study
data-based models in developing explainable AI models. The concept of (Table S1).
explainable AI can be classified as intrinsic explainability and post-hoc
explainability, and it can be achieved before and after model training 3. Recent trend of studies on AI-based toxicity prediction
(Jia et al., 2023). Post-hoc interpretability is achieved after obtaining a models
trained model. The goal of post-hoc methods is to understand the model
predictions based on the training data. Many studies propose post-hoc In our previous review (Jeong and Choi, 2022), we analyzed the
methods, such as Shapley Additive exPlanations (SHAP), to explain research trends of AI in the field of toxicity prediction by analyzing
the predictive results by identifying important model features that annual publications from 2014 to 2022. In that study, we confirmed that
impact the model’s performance (Hassija et al., 2024). However, since in vitro data-based models (51 papers) were studied more than in vivo
there is little prior knowledge linking the chemicals’ moieties and data-based models (32 papers) and only few studies leverage both in
physicochemical properties to apical toxicity, even though this approach vitro and in vivo data for developing models (11 papers) (Fig. 2(A)). Of
can identify statistical relationships between features and labels in the the in vivo data-based models, hepatotoxicity prediction models were
training dataset, it is inherently limited in explaining the prediction the most studied (26 %), followed by acute toxicity prediction models
results within conventional QSAR modeling. The OECD QSAR frame­ (24 %) (Fig. 2(B)). For in vitro data-based models, many studies
work also includes mechanistic interpretation as an important factor generally targeted various endpoints (31 %) rather than focusing on
when modeling QSAR, if possible (Tomoko, 2023). Within this context, specific ones. Among the studies targeting specific endpoints, the most
ToxCast data have the potential to be used in developing explainable AI studied were endocrine disruption (21 %), followed by cytotoxicity
models since they consist in thousands of bioassays covering a wide (12 %) and hepatotoxicity-related molecular/cellular endpoints (10 %).
range of biological mechanisms. (Fig. 2(C)). Hepatotoxicity, a major target in drug discovery, has rela­
In response, this study aimed to overview the current status of AI- tively sufficient in vitro and in vivo data, leading to the use of both in
based prediction models using ToxCast bioassay data. In this paper, AI vitro and in vivo data during model development (33 %) (Fig. 2(D)).
methods refer to data-driven machine learning approaches, including This research trend can be explained by the fact that AI-based models
both traditional algorithms (e.g., Random Forest, Support Vector Ma­ heavily rely on the quantity and quality of the dataset (Pushkaran and
chine) and deep learning networks. We analyzed the study purposes, Arabi, 2024). This has resulted in a focus on developing models using a

Fig. 1. (A) Flow diagram for literature search (B) Number of publications indexed by PubMed annually from 2015 to 2023 using search terms “ToxCast” or “Tox21”
(blue) and curated literatures featuring “ToxCast/Tox21 data-based AI” (orange).

2
D. Kim and J. Choi Toxicology 517 (2025) 154230

Fig. 2. Recent trends in studies on AI-based toxicity prediction models (Adapted from the study by Jeong et al., 2022) (A) Overview of the studies regarding the data
types used and a summary of the targeted toxicity endpoints in studies using (B) in vivo data-based models, (C) in vitro data-based models, and (D) combined in vitro
and in vivo data-based models.

few data-rich endpoints and datasets. Moreover, there are several for being adopted in chemical screening.
challenges with this trend in respect to applying them in chemical
screening. Although in vitro data-based models can achieve high per­ 4. Overview of ToxCast data-based AI models
formance due to their sufficient datasets compared to in vivo data-based
models, they may not be directly used in chemical screening since they To understand the research trends in AI-based toxicity prediction
are not directly related to regulatory endpoints. Meanwhile, in vivo models using ToxCast data, we analyzed the annual number of publi­
data-based models have limitations regarding low explainability, which cations (Fig. 1(B)) and research purposes of the studies (Fig. 3). The
constrain their regulatory use. Therefore, the current trend of devel­ number of studies focusing on ToxCast experienced a gradual increase
oping in vitro and in vivo data-based models separately has limitations from 2015 to 2023, nearly tripling during this period. This growth trend

Fig. 3. Overview of the research purposes of studies on ToxCast data-based prediction models.

3
D. Kim and J. Choi Toxicology 517 (2025) 154230

is similarly reflected in papers specifically addressing toxicity prediction cell model, assay component, detection technology, target information,
models utilizing ToxCast data. In 2021, approximately 25 % of all pa­ citation information, and reagent information. In invitroDB4.1, there
pers pertained to AI applications based on ToxCast data. are 1485 assay endpoints from 24 assay sources (Table S2) (Fig. 4(A)).
Because ToxCast database contains a broad range of toxicological The number of tested chemicals in each assay endpoints varied from 1 to
targets information, making them unique in serving both as features and 8305 (Fig. 4(B)). As the assay platform, such as species used and expo­
labels for AI models. So far, the majority of the studies (85 papers) have sure time varied for different assay sources, it is necessary to check
primarily utilized ToxCast data as labels, with a smaller portion (8 pa­ which assay platform is used for each assay. Among the assay sources,
pers) employing ToxCast data as features. Among the studies using Tox21 assays provided the most assays (n = 276), followed by NVS
ToxCast data as labels, the papers typically focused on model develop­ (n = 275) (Fig. 4(A)). Tox21 assay sources also provided the most tested
ment (63 papers) with the aims to propose novel algorithms (35 papers), chemicals in average (n = 6757) (Fig. 4(B)). Most Tox21 assays are
molecular representation (19 papers), data balancing (4 papers), and cell-based reporters or viability assays, whereas NVS assays predomi­
others (5 papers). In contrast, a smaller portion explored the application nantly measure binding or enzymatic activity. Across all sources, test
of these models (22 papers), including chemical screening (13 papers), systems span human tissues as well as non-human models, including rat,
mechanism analysis (5 papers), and chemotype enrichment (4 papers). mouse, guinea pig, sheep, rabbit, bovine, pig and chimpanzee.
Meanwhile, among the studies used ToxCast data as model features for Leveraging these ToxCast data, studies focusing on developing
predicting in vivo toxicity, initial papers were published comparing toxicity prediction models for diverse endpoints (Fig. 4(C)). In our
prediction performance with models using only chemical structure in­ analysis, most studies target various endpoints (45 papers) rather than
formation (3 papers). More recently, many studies have been published one-specific endpoint. Among studies focusing specific toxicity end­
not only for model development but also for identifying important points, endocrine-disrupting endpoints were the most prevalent (20
mechanisms (5 papers). papers), followed by hepatotoxicity (9 papers), developmental toxicity
(4 papers), cytotoxicity (4 papers), neurotoxicity (2 papers), pulmonary
toxicity (2 papers), and others. These endpoints have sufficient datasets
4.1. Landscape of ToxCast database and target toxicity endpoints
or relatively well-defined mechanisms; thus, they can be frequently used
to develop toxicity prediction models.
ToxCast data, generated by labs and processed by EPA through the
Interestingly, most of the studies leveraged Tox21 assay data
pipeline, can be downloaded from the EPA website ([Link]
collected from MoleculeNet ([Link] as a benchmark
gov/comptox-tools/exploring-toxcast-data). The most recent ToxCast
dataset to demonstrate the applicability of their proposed novel methods
data is available in the invitroDBv4.2 database (Access in June 2025).
for representation of molecules and learning (Wu et al., 2018). Tox21
The data includes chemical information, bioassay information, and
data used in MoleculeNet was generated in the 2014 Tox21 Data Chal­
summary information about the model and generated data through its
lenge ([Link] This dataset contains
own analytical pipeline (Feshuk et al., 2023; Filer et al., 2017). AC50 or
qualitative toxicity measurements for 8014 chemicals on 12 different
hit-call data are mainly used in the utilization of ToxCast, but infor­
targets, including nuclear receptors and stress response pathways. This
mation on the assay should be considered for the mechanism-based
trend may be due to easy access data format of Tox21 Data Challenge for
toxicity evaluation and classification of chemicals (Jeong et al.,
experts from non-toxicology fields (i.e. chemistry or computer science).
2022a). The assay summary file provides various types of information
Therefore, to adequately adopt emerging solutions in the toxicology
for each assay endpoint, such as assay ID, assay source, used organism or

Fig. 4. Landscape of the latest ToxCast/Tox21 database and target endpoint of the studies. (A) Number of ToxCast/Tox21 assays for each assay source and (B)
Average number of tested chemicals for each assay source and (C) Target assay endpoints of the studies. Note that the reported “average chemicals per assay source”
is obtained by taking the mean across all assays within each source, regardless of their underlying platform differences. Consequently, sources with just a handful of
assays are treated the same way as those with dozens of assays.

4
D. Kim and J. Choi Toxicology 517 (2025) 154230

field, it is important to harmonize toxicology datasets like the Tox21 proposed a model, leveraging both SMILES strings and graph structures
assay dataset. Also, it is necessary to think how to apply these advanced of molecules to extract drug molecular features for toxicity prediction
models in chemical screening within the toxicological field. (Liu et al., 2023a,b).
Although deep learning-based feature extraction and representation
learning can improve model predictivity, these methods still face a
4.2. Molecular representations critical challenge with low explainability. In contrast, ‘features’ in
conventional machine learning can be designed to be intuitive and
Initial QSAR models used predefined molecular features as model closely related to toxicity datasets. This intuitive design facilitates
input, necessitating the conversion of chemical structure or character­ explaining the model’s decisions to non-experts, as these features
istics into computer-readable numeric vectors (Belfield et al., 2023). correlate directly with familiar toxicity dataset. Consequently, domain
This conversion is achieved through molecular fingerprints (MFs) and specific feature engineering in toxicology remains important despite the
molecular descriptors (MDs)(Orosz et al., 2022). In our analysis, recent trend of developing novel representation learning methods. In
Extended-Connectivity Fingerprints (ECFPs)(Rogers and Hahn, 2010) this study, we broadly categorized model features into chemistry-backed
(13 papers) and Mol2 (Hong et al., 2008) (5 papers) were the most and biological feature-backed categories. In chemistry-backed feature-
frequently used MFs and MDs, respectively (Figure S1). based models, including two-dimensional (2D) graphs and three-
More recently, with the advancement of deep learning methods dimensional (3D) images, Sedykh et al. developed Saagar–A New,
capable of processing unstructured data, diverse novel molecular rep­ Extensible Set of Molecular Substructures as molecular fingerprint,
resentation methods have been proposed (Fig. 5) (Choudhary et al., which can explain prediction results with meaningful substructure
2022; Gebauer et al., 2022). Since molecules consist of atoms and the (Sedykh et al., 2021). Other studies also attempted to leverage molec­
bonds connecting them, Graph Neural Network (GNN)-based models are ular descriptors with chemistry-backed knowledge such as quantum
well-suited for learning the structure of molecules. Additionally, chemistry information, fragmentation spectra of chemicals, 3D molec­
two-dimensional (2D) and three-dimensional (3D) images of molecules ular surface potin clouds with electrostatic potential, and others (Arturi
have been used to train models. Fernandez et al. processed 2D sketches and Hollender, 2023; Wang et al., 2021a,b; Wang et al., 2023).
of molecules with a supervised 2D convolutional neural network Moreover, to overcome limitations of approaches solely relying on
(2DConvNet) and demonstrated that modern image recognition tech­ cheminformatics to predict in vivo toxicity, various biological de­
nology yields high accuracy comparable to state-of-the-art chem­ scriptors relevant to toxicity mechanisms have been developed. Inter­
informatics tools (Fernandez et al., 2018). Matsuzaka et al. recently estingly, Seal et al. compared cellular morphological descriptors and
developed the novel molecular image-based deep learning method, molecular fingerprints for predicting cytotoxicity- and proliferation-
DeepSnap-DL, which produced multiple snapshots from 3D chemical related assays, finding that cell morphological descriptors offer com­
structures and achieved high performance in predicting Tox21 data plementary information to molecular fingerprints, especially in novel
(Matsuzaka and Uesawa, 2019). Meanwhile, molecules can also be structural space (Seal et al., 2021). Additionally, biological descriptors
represented as strings, ensuring consistency irrespective of the sequen­ not only enhance toxicity prediction accuracy but also can provide
tial atom representation. Common string representations include the mechanistic insights into relevant biological pathways (Liu et al., 2023a,
International Chemical Identifier (InChI) (Heller et al., 2015) and the b; Ring et al., 2021a). Likewise, recent studies highlight the importance
Simplified Molecular Input Line Entry System (SMILES) (Islam and Pil­ of selecting optimal model features reflecting target endpoint, rather
lay, 2016), among others. These provide a consistent and computa­ than using all available features without considering mechanisms.
tionally conducive format for molecular structure analysis. By treating
these string representations as sequences, molecular property prediction
can be approached as a natural language processing (NLP) task. Liu et al.

Fig. 5. Summary of novel representation learning methods in ToxCast/Tox21 data-based models.

5
D. Kim and J. Choi Toxicology 517 (2025) 154230

4.3. Model algorithms in data-scarce scenarios. Google recently introduced the ’Noisy Student’
self-training method, where a simple teacher model trains increasingly
So far, various algorithms have been employed in toxicity prediction complex student models (Xie et al., 2019). These models demonstrated
model development using ToxCast dataset. Machine learning methods superior performance and robustness compared to standard convolu­
like random forest (RF, 32 papers) and support vector machine (SVM, 29 tional neural networks (CNNs) in their experiments. In practice, Liu
papers) are prevalent, while neural networks and deep learning, et al. designed a combined self-training and self-supervised learning
including deep neural networks (DNNs, 6 papers) and artificial neural strategy and found it could significantly improve the performance in
networks (ANNs, 6 papers), are gaining traction (Figure S2). In more both cases by a large margin in Tox21 dataset (Liu et al., 2022a).
recent years (2021–2024), novel deep learning methods have been Therefore, these methods also may be adopted to modeling diverse
studied and evaluated for their applicability to the toxicological field endpoints which have limited labels.
using the Tox21 assay dataset. These deep learning models can be Transfer learning involves taking a pre-trained model on a large
broadly categorized as supervised learning, (semi)self-supervised dataset and fine-tuning it on a smaller, task-specific dataset. This
learning, transfer learning, meta-learning, and hybrid learning approach is particularly useful when there is limited data available for
(Table 1). the target datasets (Cai et al., 2020). Hu et al. found that the transfer
Beyond the conventional supervised learning models, self-supervised learning algorithm could improve model performance for skin sensiti­
learning, which has seen great success in NLP (Devlin et al., 2018), has zation using in vivo acute toxicity data and in vitro Tox21 data (Hu et al.,
been applied to graph-structured data (Hu et al., 2019). These tech­ 2023). Meta-learning, often referred to as ‘learning to learn’, enables
niques are designed to leverage large amounts of unlabeled data, with models to quickly adapt to new tasks with minimal data by leveraging
self-supervised learning creating labels from the data itself and knowledge gained from similar tasks (Tian et al., 2022). Vella et al.
semi-supervised learning combining a small amount of labeled data with explored few-shot machine learning for hit discovery and demonstrated
a large amount of unlabeled data. This can improve model performance that few-shot learning models on Tox21 data outperform benchmark

Table 1
Studies proposing novel learning methods for modelling AI models and applying them to the Tox21 dataset.
Category Aim of study (Finding) ToxCast Data Features Model algorithms Performance Ref
(Mean value)
Assay Source

(Semi) Self- To develop a self-training method, Partially Tox21 MoleculeNet Graph- Partially LAbeled Noisy F1 = 0.25 (Liu et al.,
supervised LAbeled Noisy Student (PLANS), and a novel Isomorphism- Student (PLANS) 2022b)
learning self-supervised graph embedding, Graph- Network
Isomorphism-Network Fingerprint (GINFP), Fingerprint (GINFP)
for chemical compounds representations with
substructure information using unlabeled
data.
To propose a Graph Convolution Neural Tox21 MoleculeNet Graph Mean Teacher (MT) SSL #N/A (Chen
Network (GCN) to predict chemical toxicity algorithm et al.,
and trained the network by the Mean Teacher 2021)
(MT) SSL algorithm
Transfer To provide comprehensive in silico prediction Tox21 Tox21 data Morgan, Graph COVIDVS AUC= 0.63 (Hu et al.,
learning models for eight significant human organ challenge 2023)
level toxicity end points using machine
learning, deep learning, and transfer learning
algorithms.
Meta learning To explore few-shot machine learning for hit Tox21 DeepChem ECFP, GCN-based Prototypical Networks AUC= 0.83 (Vella and
discovery and lead optimization embeddings Ebejer,
2023)
To propose a novel Adaptive Transfer Tox21 MoleculeNet Graph Adaptive Transfer AUC= 0.86 (Zhang
framework of GNN for FSMPP, called ATGNN, framework of GNN for et al.,
which transfers the knowledge of pretrained FSMPP (ATGNN) 2023b)
and finetuned GNNs in a task-adaptive
manner to adapt novel properties.
Multi-tasking To propose a multitask deep learning Tox21 MoleculeNet SMILES BiLAT AUC= 0.89 (Qian et al.,
learning framework called BiLAT based on SMILES 2023)
representation
To construct three types of models for single Tox21 Tox21 data MACCS, ECFP4, Multi-tasking DNN, AUC= 0.90, (Yuan Li
and multi-tasking based on 2D and 3D challenge Rdkit2D, EGCN F1 = 0.57 et al.,
descriptors, fingerprints and molecular Mordred3D, Graph 2023)
graphs
To integrate the bi-directional gated recurrent Tox21 MoleculeNet SMILES TranGRU AUC= 0.79 (Jiang
unit (BiGRU) into the original Transformer et al.,
encoder, together with self-attention to better 2023)
capture local and global molecular
information simultaneously.
To develop a new, multitask framework based Tox21 Tox21 data MACCS, 13 Multitask CapsNet AUC= 0.86 (Wang
on a capsule neural network (multitask challenge molecular et al.,
CapsNet) properties 2021a,b)
To compare the performance of 3 multi-label Tox21 Tox21 data CDK, PaDEL Three multi-tasking #N/A (Yap and
classification (MLC) models, namely Classifier challenge models (Classifier Raymer,
Chains (CC), Label Powersets (LP) and Chains, LAble Powerset, 2021)
Stacking (SBR) and Stacking)
To introduce an alternative framework Tox21 Tox21 data ECFP Stacked single target AUC= 0.78 (Tan et al.,
(multi-tasking model) by using the problem challenge (SST) multitask 2021)
transformation methods. architecture with SVM

6
D. Kim and J. Choi Toxicology 517 (2025) 154230

models significantly (Vella and Ebejer, 2023). Similarly, Zhang et al. 5. Leveraging ToxCast data for explainable AI models
proposed a novel adaptive transfer framework for GNNs for few-shot
prediction, which transfers the knowledge of pretrained and Interestingly, ToxCast data can be used not only as labels but also as
fine-tuned GNNs in a task-adaptive manner to adapt to novel properties features for AI-based prediction models. This is because bioactivity in
(Zhang et al., 2023a). In that study, experimental results on ToxCast molecular/cellular levels can serve as an intermediate step in inducing
datasets showed that the models obtained superior performance over in vivo toxicity. Several studies have focused on using ToxCast data to
previous state-of-the-art methods. Both transfer learning and meta predict toxicity outcomes observed in vivo (Fig. 5). In these studies,
learning are designed to increase model accuracy, even with a small hepatotoxicity was the most targeted endpoint (4 papers), followed by
amount of data. Given that toxicity data often have limited datasets, repeated dose toxicity (3 papers), and acute toxicity (1 paper). All
these methods can be useful to develop robust toxicity prediction studies using repeated dose toxicity data focused also on hepatoxicity,
models. which may be related to mechanistic target, such as hepatotoxicity
Moreover, much research is currently being conducted to develop marker gene activity and liver cell death, in ToxCast bioassays.
multi-tasking models regardless of specific learning methods. Muti-task However, the studies reported different results regarding whether
learning, which models toxicity datasets simultaneously, has proven to ToxCast data can improve model performance for predicting in vivo
enhance performance by sharing modeling architectures among toxicity. For example, Liu et al. (2017, 2015) and Allen et al. (2019)
different toxicity endpoints (Sosnin et al., 2019; Wu et al., 2021). These reported that hybrid classifiers using both chemical and biological de­
methods also often be integrated, such as using a multi-tasking model scriptors achieved higher balanced accuracy. In contrast, several studies
with transfer learning (Mehmood et al., 2020). Qian et al. proposed a reported that adding biological descriptors did not significantly increase
multi-task deep learning framework based on SMILES representation for model performance or worsened it (Nukaga et al., 2023; Tate et al.,
predicting the inhibitory activity of molecules on eight cyclin-dependent 2024; Truong et al., 2018). Xu et al. (2020) and Adeluwa et al. (2021)
protein kinases (CDK) subtypes (Qian et al., 2023). Similarly, Wang et al. reported that biological descriptors slightly improved performance or
developed a new multi-task framework based on a capsule neural had nearly equal success. Several reasons partly explain these results,
network to predict 12 different bioactivities in Tox21 assays simulta­ including limited biological space in ToxCast assays and a lack of
neously (Wang et al., 2021a,b). Both studies demonstrated that consideration of kinetics for the full chemical library. Interestingly, Ring
multi-task learning can improve model performance compared to con­ et al. (2021b) reported that incorporating toxicokinetic information
ventional single-task methods more efficiently. Given that toxicity enabled different sets of in vitro data to predict more accurately the doses
endpoints are diverse when predicting the toxicity of chemicals, at which in vivo liver effects occur.
well-performing multi-tasking models can be useful for the screening of These findings may suggest that leveraging significant features, not
chemicals. all available features without considering mechanisms, is important.
However, note that these deep learning methods have only been Moreover, these approaches not only develop toxicity prediction models
demonstrated with Tox21 assays, which provide the most sufficient data but also can be used to identify significant molecular initiating events
among the publicly available dataset in the field of toxicology. Even (MIEs) leading to adverse outcomes. For example, in 2015, Liu et al.
though these novel methods showed great feasibility with the Tox21 built machine learning models that predict in vivo chronic toxicity
dataset, given that most toxicity datasets are much smaller compared to observed in liver based on either chemical descriptor, ToxCast bioac­
the Tox21 assay dataset, their potential for adoption with other toxicity tivity descriptors, or a combination of both (Liu et al., 2015). This study
datasets is still unclear. Another aspect hindering the adaptation of has been extended in 2017–19 other organs (Liu et al., 2017). They
novel methods using the Tox21 dataset relates to the performance extracted the in vitro ToxCast assays supposed to be the most correlated
metrics used in studies. Most toxicity data have an overwhelming with target organ toxicity. These selected ToxCast assays, serving as
number of negative cases, and reducing the false-negative ratio is the significant MIEs, can improve sensitivity for predicting toxicity of
most important from a conservative risk assessment perspective. chemicals in the early stage of assessment (Nukaga et al., 2023) and
Therefore, when addressing highly imbalanced datasets with predomi­ ultimately contribute to mechanism-based chemical assessment.
nantly non-toxic chemicals, it is essential to evaluate the predictivity of
the model using the F1 score. The F1 score is a harmonic mean of recall 6. Perspectives of ToxCast data for developing AI models and
and precision, meaning false negatives (FN) and false positives (FP) their application to NGRA
should be low for a high F1 score (Figure S3). On the other hand, AUC is
a mean of the true positive rate and true negative rate. In an imbalanced The key limitation of current QSAR models are ’black boxes’ natures,
dataset where most cases are negative, the true negative rate, TN/ making interpretation challenging for toxicologists and hindering the
(TN+FP), stays stable even if FP are high, while the true positive rate regulatory acceptance of these models. The mechanisms leading to the
fluctuates. Therefore, AUC cannot detect cases where only FN is low onset of apical toxicity are complex, and in the absence of process evi­
(Bae et al., 2021). (Note that even though the exact definition of AUC dence, it is difficult to trust the results, which may be, in the worst case, a
(ROC AUC) is the area under the curve between the true positive rate mere coincidence. In the field of toxicology, it is more valuable to
and the true negative rate given a range of thresholds, the described develop a model with a reliable scientific basis than one with good
formula is used as an approximation due to some models not being able performance. To address this problem, AOP framework can be used to
to compute the prediction probability. In this case, AUC is the same as improve the intrinsic explainability of models by linking mechanistic
balanced accuracy). targets to regulatory endpoints. In particular, ToxCast data provides
However, in our analysis, among the 32 papers utilizing the Tox21 abundant toxicological targets, thus they can be effectively incorporated
benchmark dataset from 2016 to 2024, only 4 papers (12.5 %) reported to AOP framework. As described in Section 5, if we use ToxCast bioac­
the F1 score of models, while 28 papers (87.5 %) did not report the F1 tivity data, particularly those corresponding to target AOPs as model
score of models (Figure S4). For the studies reporting the F1 score for features, we can build explainable AI models that address the black box
each model, the highest F1 score was 0.573 (mean value). Therefore, problem of conventional QSAR models (Fig. 6). Therefore, we strongly
additional case studies should be conducted that adopt these novel believe that ToxCast data have significant potential to be used to
methods on different datasets and evaluate them using adequate per­ develop robust explainable AI models considering both predictivity and
formance metrics. explainability. Despite the potential of ToxCast data to develop
explainable AI models, there are still limitations to achieving this pur­
pose. Here, we have outlined several key considerations for future
studies.

7
D. Kim and J. Choi Toxicology 517 (2025) 154230

Fig. 6. Framework to apply the AOP-based explainable AI models for mechanism-based toxicity prediction supporting next generation risk assessment.

6.1. Development of AI models using ToxCast data the use of in vitro bioactivity data in NGRA, it is necessary to move
beyond binary hit-calls and embrace richer dose-response metrics such
Firstly, the validity of ToxCast data remains limited, particularly in as AC50 (or ACC), maximum observed efficacy, curve slope, and the are
terms of their correlation with in vivo toxicity results. Inherently, the under the curve (AUC), as well as measures of uncertainty around po­
application of in vitro test systems in toxicology has been hampered by tency estimates. By leveraging in vitro to in vivo extrapolation (IVIVE)
the inability to translate perturbations at the molecular level to possible approach (Chang et al., 2022; Jeong et al., 2022a), these endpoints can
tissue-, organ-, and organism-level effects. To address these issues, many be used to calculate an equivalent in vivo administered dose based on
studies tried to profile ToxCast bioassays against traditional animal data the in vitro response concentration (Hines et al., 2022). These ap­
such as ToxRef database (Martin et al., 2009; Watford et al., 2019). For proaches enabled the derivation of a bioactivity-exposure ratio (BER)
example, Eytcheson et al. (2023) assessed the utility of thyroid in vitro that could be used as a NAMs-based risk parameter in the NGRA case
screening assays from the ToxCast through comparison to observed studies (Baltazar et al., 2020; Lin and Lin, 2020; Moxon et al., 2020; Paul
impacts from in vivo tests (Eytcheson et al., 2023). They observed Friedman et al., 2020). However, until now, most studies using ToxCast
concordance between in vitro bioactivity and in vivo thyroid impacts data have relied only on hit-call results rather than quantitative bioac­
ranging from 58 % to 78 % in their study. However, other studies sug­ tive concentrations (AC50, ACC). Equally important is the adoption of
gested that single ToxCast assay testing results are not sufficient to robust data-cleaning protocols, such as curve-fit quality control, outlier
predict in vivo toxicity and did not show consistent concordance with in detection and removal, cytotoxicity filtering, normalization, and other
vivo toxicity data (Phifer et al., 2021; Silva et al., 2015; Thomas et al., related procedures. Since regression prediction is generally a more
2012). These results may be explained partly by the fact that ToxCast difficult task than binary classification, sufficient data is required for
assays include inadequate coverage of biological targets and pathways, model training, but the individual ToxCast assay data to date is insuf­
as well as reduced or distinct xenobiotic metabolism compared to in vivo ficient to support this task. Therefore, further research to address this
studies (Thomas et al., 2019). Such conflicting research results make it problem should be underway.
difficult to draw definitive conclusions about in vitro–in vivo consistency. Despite several challenges in using ToxCast data for developing AI
Therefore, when constructing predictive models with ToxCast data, it is models, it remains the largest toxicity database offering such a
essential to select significant assays to be applied. comprehensive dataset. By refining data to improve reliability and
Secondly, by integrating ToxCast bioassays with other high- expanding its applicability through integration with other databases, we
throughput screening databases, we can gain a more comprehensive, believe that ToxCast data can serve as a crucial foundation for devel­
mechanistic understanding of chemical toxicity. For example, efforts are oping explainable AI models. These ToxCast data-based explainable AI
currently being made to develop high-throughput phenotypic effect models have potential to be used to NGRA framework for predicting
screening methods, such as high-throughput phenotypic profiling specific adverse outcomes or identifying threshold doses for systemic
(HTPP) (Thomas et al., 2019). In addition to leveraging ToxCastdata effects as points of departure in risk assessment. Because AOP-based
only, we also emphasize the need to integrate heterogeneous datasets explainable models outperform traditional QSAR approaches and yield
into AOPs. In our previous study, we presented an integrative explainable results, they hold tremendous promise for modernizing
data-mining approach to building AOPs by incorporating multiple het­ chemical risk assessments and minimizing reliance on animal testing.
erogeneous databases (Jeong et al., 2022b). Similarly, we suggest that
continued efforts should be made to improve methodologies for inte­ CRediT authorship contribution statement
grating overlapping data for building robust models.
Donghyeon Kim: Methodology, Investigation, Software, Visualiza­
tion, Writing – original draft, Writing – review & editing. Jinhee Choi:
6.2. Application of ToxCast data-based models to NGRA Supervision, Conceptualization, Writing – original draft, Writing – re­
view & editing.
In addition, there is a limited number of models that consider the
quantitative bioactive concentrations of ToxCast assays. To accelerate

8
D. Kim and J. Choi Toxicology 517 (2025) 154230

Declaration of Competing Interest Chen, J., Si, Y.W., Un, C.W., Siu, S.W.I., 2021. Chemical toxicity prediction based on
semi-supervised learning and graph convolutional neural network. J. Cheminform.
13. [Link]
The authors declare that there are no potential conflicts of interest Cherkasov, A., Muratov, E.N., Fourches, D., Varnek, A., Baskin, I.I., Cronin, M.,
with respect to the research, authorship, and/or publication of this Dearden, J., Gramatica, P., Martin, Y.C., Todeschini, R., Consonni, V., Kuz’Min, V.E.,
article. Cramer, R., Benigni, R., Yang, C., Rathman, J., Terfloth, L., Gasteiger, J., Richard, A.,
Tropsha, A., 2014. QSAR modeling: where have you been? Where are you going to?
J. Med. Chem. [Link]
Acknowledgment Choudhary, K., DeCost, B., Chen, C., Jain, A., Tavazza, F., Cohn, R., Park, C.W.,
Choudhary, A., Agrawal, A., Billinge, S.J.L., Holm, E., Ong, S.P., Wolverton, C.,
2022. Recent advances and applications of deep learning methods in materials
This work was supported by Korea Environmental Industry & science. NPJ Comput. Mater. [Link]
Technology Institute (KEITI) through ’Core Technology Development Dent, M., Amaral, R.T., Da Silva, P.A., Ansell, J., Boisleve, F., Hatao, M., Hirose, A.,
Project for Environmental Diseases Prevention and Management’ Kasai, Y., Kern, P., Kreiling, R., Milstein, S., Montemayor, B., Oliveira, J., Richarz, A.,
Taalman, R., Vaillancourt, E., Verma, R., Posada, N.V.O.R.C., Weiss, C., Kojima, H.,
(2021003310005) and the 2025 Chemical Substance Safety Manage­ 2018. Principles underpinning the use of new methodologies in the risk assessment
ment Cooperation Course funded by the Ministry of Environment. of cosmetic ingredients. Comput. Toxicol. 7, 20–26. [Link]
comtox.2018.06.001.
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K., 2018. BERT: Pre-training of Deep
Supporting Information Bidirectional Transformers for Language Understanding.
Eytcheson, S.A., Olker, J.H., Friedman, K.P., Hornung, M.W., Degitz, S.J., 2023.
Assessing utility of thyroid in vitro screening assays through comparisons to
The Supporting Information is available free of charge. observed impacts in vivo. Regul. Toxicol. Pharmacol. [Link]
yrtph.2023.105491.
• Robust summary of the papers on ToxCast data-based AI models for Fernandez, M., Ban, F., Woo, G., Hsing, M., Yamazaki, T., Leblanc, E., Rennie, P.S.,
Welch, W.J., Cherkasov, A., 2018. Toxic colors: the use of deep learning for
toxicity prediction; Summary of assays by assay source in ToxCast predicting toxicity of compounds merely from their graphic images. J. Chem. Inf.
INVITRODB_V4_1; Summary of the molecular representation Model 58, 1533–1543. [Link]
methods used in the papers; Summary of the algorithms used in the Feshuk, M., Kolaczkowski, L., Dunham, K., Davidson-Fritz, S.E., Carstens, K.E., Brown, J.,
Judson, R.S., Paul Friedman, K., 2023. The ToxCast pipeline: updates to curve-fitting
papers; Confusion matrix and formula of F1 score and AUC; Perfor­ approaches and database structure. Front. Toxicol. 5. [Link]
mance measurement on AI models using Tox21 benchmark dataset ftox.2023.1275980.
(2016–2024) (XLSX). Filer, D.L., Kothiya, P., Woodrow Setzer, R., Judson, R.S., Martin, M.T., 2017. Tcpl: the
ToxCast pipeline for high-throughput screening data. Bioinformatics 33, 618–620.
[Link]
Füzi, B., Mathai, N., Kirchmair, J., Ecker, G.F., 2023. Toxicity prediction using target,
Appendix A. Supporting information interactome, and pathway profiles as descriptors. Toxicol. Lett. 381, 20–26. https://
[Link]/10.1016/[Link].2023.04.005.
Gadaleta, D., Manganelli, S., Roncaglioni, A., Toma, C., Benfenati, E., Mombelli, E.,
Supplementary data associated with this article can be found in the 2018. QSAR modeling of ToxCast assays relevant to the molecular initiating events
online version at doi:10.1016/[Link].2025.154230. of AOPs leading to hepatic steatosis. J. Chem. Inf. Model 58, 1501–1517. [Link]
org/10.1021/[Link].8b00297.
Gebauer, N.W.A., Gastegger, M., Hessmann, S.S.P., Müller, K.R., Schütt, K.T., 2022.
Data availability Inverse design of 3d molecular structures with conditional generative neural
networks. Nat. Commun. 13. [Link]
Hassija, V., Chamola, V., Mahapatra, A., Singal, A., Goel, D., Huang, K., Scardapane, S.,
Data will be made available on request.
Spinelli, I., Mahmud, M., Hussain, A., 2024. Interpreting black-box models: a review
on explainable artificial intelligence. Cogn. Comput. [Link]
References s12559-023-10179-8.
Heller, S.R., McNaught, A., Pletnev, I., Stein, S., Tchekhovskoi, D., 2015. InChI, the
IUPAC International Chemical Identifier. J. Chemin.. 7. [Link]
Adeluwa, T., McGregor, B.A., Guo, K., Hur, J., 2021. Predicting drug-induced liver injury
s13321-015-0068-4.
using machine learning on a diverse set of predictors. Front Pharmacol. 12. https://
Hines, D.E., Bell, S., Chang, X., Mansouri, K., Allen, D., Kleinstreuer, N., 2022.
[Link]/10.3389/fphar.2021.648805.
Application of an accessible interface for pharmacokinetic modeling and in vitro to
Allen, C.H.G., Mervin, L.H., Mahmoud, S.Y., Bender, A., 2019. Leveraging heterogeneous
in vivo extrapolation. Front Pharm. 13. [Link]
data from GHS toxicity annotations, molecular and protein target descriptors and
fphar.2022.864742.
Tox21 assay readouts to predict and rationalise acute toxicity. J. Chemin.. 11.
Hong, H., Xie, Q., Ge, W., Qian, F., Fang, H., Shi, L., Su, Z., Perkins, R., Tong, W., 2008.
[Link]
Mold2, molecular descriptors from 2D structures for chemoinformatics and
Arturi, K., Hollender, J., 2023. Machine learning-based hazard-driven prioritization of
toxicoinformatics. J. Chem. Inf. Model 48, 1337–1344. [Link]
features in nontarget screening of environmental high-resolution mass spectrometry
ci800038f.
data. Environ. Sci. Technol. 57, 18067–18079. [Link]
Hu, W., Liu, B., Gomes, J., Zitnik, M., Liang, P., Pande, V., Leskovec, J., 2019. Strategies
est.3c00304.
for Pre-training Graph Neural Networks.
Bae, S.Y., Lee, J., Jeong, J., Lim, C., Choi, J., 2021. Effective data-balancing methods for
Hu, Y., Ren, Q., Liu, X., Gao, L., Xiao, L., Yu, W., 2023. In silico prediction of human
class-imbalanced genotoxicity datasets using machine learning algorithms and
organ toxicity via artificial intelligence methods. Chem. Res Toxicol. 36, 1044–1054.
molecular fingerprints. Comput. Toxicol. 20. [Link]
[Link]
comtox.2021.100178.
Islam, M.A., Pillay, T.S., 2016. Simplified molecular input line entry system-based
Baltazar, M.T., Cable, S., Carmichael, P.L., Cubberley, R., Cull, T., Delagrange, M.,
descriptors in QSAR modeling for HIV-protease inhibitors. Chemom. Intell. Lab. Syst.
Dent, M.P., Hatherell, S., Houghton, J., Kukic, P., Li, H., Lee, M.Y., Malcomber, S.,
153, 67–74. [Link]
Middleton, A.M., Moxon, T.E., Nathanail, A.V., Nicol, B., Pendlington, R.,
Jebali, F., Majumdar, A., Turck, C., Harabi, K.E., Faye, M.C., Muhr, E., Walder, J.P.,
Reynolds, G., Reynolds, J., White, A., Westmoreland, C., 2020. A next-generation
Bilousov, O., Michaud, A., Vianello, E., Hirtzlin, T., Andrieu, F., Bocquet, M.,
risk assessment case study for coumarin in cosmetic products. Toxicol. Sci. 176,
Collin, S., Querlioz, D., Portal, J.M., 2024. Powering AI at the edge: a robust,
236–252. [Link]
memristor-based binarized neural network with near-memory computing and
Belfield, S.J., Cronin, M.T.D., Enoch, S.J., Firman, J.W., 2023. Guidance for good
miniaturized solar cell. Nat. Commun. 15. [Link]
practice in the application of machine learning in development of toxicological
44766-6.
quantitative structure-activity relationships (QSARs). PLoS One 18. [Link]
Jeong, J., Choi, J., 2022. Artificial intelligence-based toxicity prediction of
10.1371/[Link].0282924.
environmental chemicals: future directions for chemical management applications.
Cai, C., Wang, S., Xu, Y., Zhang, W., Tang, K., Ouyang, Q., Lai, L., Pei, J., 2020. Transfer
Environ. Sci. Technol. 56, 7532–7543. [Link]
learning for drug discovery. J. Med Chem. 63, 8683–8694. [Link]
Jeong, J., Kim, D., Choi, J., 2022a. Application of ToxCast/Tox21 data for toxicity
[Link].9b02147.
mechanism-based evaluation and prioritization of environmental chemicals:
Cavasotto, C.N., Scardino, V., 2022. Machine learning toxicity prediction: latest advances
perspective and limitations. Toxicol. Vitr. 84, 105451. [Link]
by toxicity end point. ACS Omega. [Link]
tiv.2022.105451.
Chang, X., Tan, Y.M., Allen, D.G., Bell, S., Brown, P.C., Browning, L., Ceger, P.,
Jeong, J., Kim, D., Choi, J., 2022b. Integrative data mining approach: case study with
Gearhart, J., Hakkinen, P.J., Kabadi, S.V., Kleinstreuer, N.C., Lumen, A.,
adverse outcome pathway network leading to pulmonary fibrosis. Chem. Res
Matheson, J., Paini, A., Pangburn, H.A., Petersen, E.J., Reinke, E.N., Ribeiro, A.J.S.,
Toxicol. [Link]
Sipes, N., Sweeney, L.M., Wambaugh, J.F., Wange, R., Wetmore, B.A., Mumtaz, M.,
Jia, X., Wang, T., Zhu, H., 2023. Advancing computational toxicology by interpretable
2022. IVIVE: facilitating the use of in vitro toxicity data in risk assessment and
machine learning. Environ. Sci. Technol. [Link]
decision making. Toxics. [Link]

9
D. Kim and J. Choi Toxicology 517 (2025) 154230

Jiang, J., Zhang, R., Ma, J., Liu, Y., Yang, E., Du, S., Zhao, Z., Yuan, Y., 2023. TranGRU: Qin, Y., Xu, Z., Wang, X., Skare, M., 2023. Artificial intelligence and economic
focusing on both the local and global information of molecules for molecular development: an evolutionary investigation and systematic review. J. Knowl. Econ.
property prediction. Appl. Intell. 53, 15246–15260. [Link] [Link]
s10489-022-04280-y. Raslan, M.A., Raslan, S.A., Shehata, E.M., Mahmoud, A.S., Sabri, N.A., 2023. Advances in
Judson, R., Houck, K., Martin, M., Knudsen, T., Thomas, R.S., Sipes, N., Shah, I., the applications of bioinformatics and chemoinformatics. Pharmaceuticals. https://
Wambaugh, J., Crofton, K., 2014. In vitro and modelling approaches to risk [Link]/10.3390/ph16071050.
assessment from the U.S. environmental protection agency ToxCast programme. Richard, A.M., Judson, R.S., Houck, K.A., Grulke, C.M., Volarath, P., Thillainadarajah, I.,
Basic Clin. Pharm. Toxicol. 115, 69–76. [Link] Yang, C., Rathman, J., Martin, M.T., Wambaugh, J.F., Knudsen, T.B., Kancherla, J.,
Lee, S.Y., Sagoo, H., Farwana, R., Whitehurst, K., Fowler, A., Agha, R., 2017. Compliance Mansouri, K., Patlewicz, G., Williams, A.J., Little, S.B., Crofton, K.M., Thomas, R.S.,
of systematic reviews in ophthalmology with the PRISMA statement. BMC Med Res 2016. ToxCast chemical landscape: paving the road to 21st century toxicology.
Method. [Link] Chem. Res Toxicol. [Link]
Lin, Y.J., Lin, Z., 2020. In vitro-in silico-based probabilistic risk assessment of combined Ring, C., Sipes, N.S., Hsieh, J.H., Carberry, C., Koval, L.E., Klaren, W.D., Harris, M.A.,
exposure to bisphenol A and its analogues by integrating ToxCast high-throughput in Auerbach, S.S., Rager, J.E., 2021a. Predictive modeling of biological responses in the
vitro assays with in vitro to in vivo extrapolation (IVIVE) via physiologically based rat liver using in vitro Tox21 bioactivity: benefits from high-throughput
pharmacokinetic (PBPK) modeling. J. Hazard Mater. 399. [Link] toxicokinetics. Comput. Toxicol. 18. [Link]
[Link].2020.122856. comtox.2021.100166.
Liu, A., Seal, S., Yang, H., Bender, A., 2023a. Using chemical and biological data to Ring, C., Sipes, N.S., Hsieh, J.H., Carberry, C., Koval, L.E., Klaren, W.D., Harris, M.A.,
predict drug toxicity. SLAS Discov. [Link] Auerbach, S.S., Rager, J.E., 2021b. Predictive modeling of biological responses in the
Liu, J., Lei, X., Zhang, Y., Pan, Y., 2023b. The prediction of molecular toxicity based on rat liver using in vitro Tox21 bioactivity: benefits from high-throughput
BiGRU and GraphSAGE. Comput. Biol. Med 153. [Link] toxicokinetics. Comput. Toxicol. 18. [Link]
compbiomed.2022.106524. comtox.2021.100166.
Liu, J., Mansouri, K., Judson, R.S., Martin, M.T., Hong, H., Chen, M., Xu, X., Thomas, R. Rogers, D., Hahn, M., 2010. Extended-connectivity fingerprints. J. Chem. Inf. Model 50,
S., Shah, I., 2015. Predicting hepatotoxicity using ToxCast in vitro bioactivity and 742–754. [Link]
chemical structure. Chem. Res Toxicol. 28, 738–751. [Link] Rusyn, I., Daston, G.P., 2010. Computational toxicology: realizing the promise of the
tx500501h. toxicity testing in the 21st century. Environ. Health Perspect. 118, 1047–1050.
Liu, J., Patlewicz, G., Williams, A.J., Thomas, R.S., Shah, I., 2017. Predicting organ [Link]
toxicity using in vitro bioactivity data and chemical structure. Chem. Res Toxicol. Seal, S., Yang, H., Vollmers, L., Bender, A., 2021. Comparison of cellular morphological
30, 2046–2059. [Link] descriptors and molecular fingerprints for the prediction of cytotoxicity- and
Liu, Y., Lim, H., Xie, L., 2022a. Exploration of chemical space with partial labeled noisy proliferation-related assays. Chem. Res Toxicol. 34, 422–437. [Link]
student self-training and self-supervised graph embedding. BMC Bioinforma. 23. 10.1021/[Link].0c00303.
[Link] Sedykh, A.Y., Shah, R.R., Kleinstreuer, N.C., Auerbach, S.S., Gombar, V.K., 2021. Saagar-
Liu, Y., Lim, H., Xie, L., 2022b. Exploration of chemical space with partial labeled noisy A New, Extensible Set of Molecular Substructures for QSAR/QSPR and Read-Across
student self-training and self-supervised graph embedding. BMC Bioinforma. 23. Predictions. Chem. Res Toxicol. 34, 634–640. [Link]
[Link] chemrestox.0c00464.
Martin, M.T., Judson, R.S., Reif, D.M., Kavlock, R.J., Dix, D.J., 2009. Profiling chemicals Sewell, F., Alexander-White, C., Brescia, S., Currie, R.A., Roberts, R., Roper, C.,
based on chronic toxicity results from the U.S. EPA ToxRef database. Environ. Health Vickers, C., Westmoreland, C., Kimber, I., 2024. New approach methodologies
Perspect. 117, 392–399. [Link] (NAMs): identifying and overcoming hurdles to accelerated adoption. Toxicol. Res
Matsuzaka, Y., Uesawa, Y., 2019. Optimization of a deep-learning method based on the (Camb. ). [Link]
classification of images generated by parameterized Deep Snap a novel molecular- Silva, M., Pham, N., Lewis, C., Iyer, S., Kwok, E., Solomon, G., Zeise, L., 2015.
image-input technique for Quantitative Structure-Activity Relationship (QSAR) A Comparison of ToxCast Test Results with In Vivo and Other In Vitro Endpoints for
analysis. Front Bioeng. Biotechnol. 7. [Link] Neuro, Endocrine, and Developmental Toxicities: A Case Study Using Endosulfan
Mehmood, T., Gerevini, A.E., Lavelli, A., Serina, I., 2020. Combining multi-task learning and Methidathion. Birth Defects Res B Dev. Reprod. Toxicol. 104, 71–89. https://
with transfer learning for biomedical named entity recognition. Procedia Computer [Link]/10.1002/bdrb.21140.
Science. Elsevier B.V, pp. 848–857. [Link] Sosnin, S., Karlov, D., Tetko, I.V., Fedorov, M.V., 2019. Comparative Study of Multitask
Moxon, T.E., Li, H., Lee, M.Y., Piechota, P., Nicol, B., Pickles, J., Pendlington, R., Toxicity Modeling on a Broad Chemical Space. J. Chem. Inf. Model 59, 1062–1072.
Sorrell, I., Baltazar, M.T., 2020. Application of physiologically based kinetic (PBK) [Link]
modelling in the next generation risk assessment of dermally applied consumer Stucki, A.O., Barton-Maclaren, T.S., Bhuller, Y., Henriquez, J.E., Henry, T.R., Hirn, C.,
products. Toxicol. Vitr. 63, 104746. [Link] Miller-Holt, J., Nagy, E.G., Perron, M.M., Ratzlaff, D.E., Stedeford, T.J.,
Nukaga, T., Takemura, A., Endo, Y., Uesawa, Y., Ito, K., 2023. Estimating drug-induced Clippinger, A.J., 2022. Use of new approach methodologies (NAMs) to meet
liver injury risk by in vitro molecular initiation response and pharmacokinetic regulatory requirements for the assessment of industrial chemicals and pesticides for
parameters for during early drug development. Toxicol. Res. 12, 86–94. [Link] effects on human health. Front. Toxicol. 4. [Link]
org/10.1093/toxres/tfac083. ftox.2022.964553.
Orosz, Á., Héberger, K., Rácz, A., 2022. Comparison of descriptor- and fingerprint sets in Tan, Z., Li, Y., Shi, W., Yang, S., 2021. A Multitask Approach to Learn Molecular
machine learning models for ADME-tox targets. Front Chem. 10. [Link] Properties. J. Chem. Inf. Model. [Link]
10.3389/fchem.2022.852893. Tate, T., Patlewicz, G., Shah, I., 2024. A comparison of machine learning approaches for
Paul Friedman, K., Gagne, M., Loo, L.H., Karamertzanis, P., Netzeva, T., Sobanski, T., predicting hepatotoxicity potential using chemical structure and targeted
Franzosa, J.A., Richard, A.M., Lougee, R.R., Gissi, A., Lee, J.Y.J., Angrish, M., transcriptomic data. Comput. Toxicol. 29. [Link]
Dorne, J., Lou, Foster, S., Raffaele, K., Bahadori, T., Gwinn, M.R., Lambert, J., comtox.2024.100301.
Whelan, M., Rasenberg, M., Barton-Maclaren, T., Thomas, R.S., 2020. Utility of in Thomas, R.S., Bahadori, T., Buckley, T.J., Cowden, J., Deisenroth, C., Dionisio, K.L.,
vitro bioactivity as a lower bound estimate of in vivo adverse effect levels and in risk- Frithsen, J.B., Grulke, C.M., Gwinn, M.R., Harrill, J.A., Higuchi, M., Houck, K.A.,
based prioritization. Toxicol. Sci. 173, 202–225. [Link] Hughes, M.F., Sidney Hunter, E., Isaacs, K.K., Judson, R.S., Knudsen, T.B.,
kfz201. Lambert, J.C., Linnenbrink, M., Martin, T.M., Newton, S.R., Padilla, S., Patlewicz, G.,
Perkins, R., Fang, H., Tong, W., Welsh, W.J., 2003. Quantitative structure-activity Paul-Friedman, K., Phillips, K.A., Richard, A.M., Sams, R., Shafer, T.J., Woodrow
relationship methods: perspectives on drug discovery and toxicology. Environ. Setzer, R., Shah, I., Simmons, J.E., Simmons, S.O., Singh, A., Sobus, J.R., Strynar, M.,
Toxicol. Chem. [Link] Swank, A., Tornero-Valez, R., Ulrich, E.M., Villeneuve, D.L., Wambaugh, J.F.,
Phifer, A., Gray, G., Kratchman, J., Attene-Ramos, M.S., 2021. Assessing how in vitro Wetmore, B.A., Williams, A.J., 2019. The next generation blueprint of computational
assay types predict in vivo toxicology data. Journal Toxicology Environmental toxicology at the U.S. Environmental protection agency. Toxicol. Sci. [Link]
Health Part A Current Issues 84, 710–728. [Link] org/10.1093/toxsci/kfz058.
15287394.2021.1937418. Thomas, R.S., Black, M.B., Li, L., Healy, E., Chu, T.M., Bao, W., Andersen, M.E.,
Punt, A., Bouwmeester, H., Blaauboer, B.J., Coecke, S., Hakkert, B., Hendriks, D.F.G., Wolfinger, R.D., 2012. A comprehensive statistical analysis of predicting in vivo
Jennings, P., Kramer, N.I., Neuhoff, S., Masereeuw, R., Paini, A., Peijnenburg, A.A.C. hazard using high-throughput in vitro screening. Toxicol. Sci. 128, 398–417. https://
M., Rooseboom, M., Shuler, M.L., Sorrell, I., Spee, B., Strikwold, M., van der Meer, A. [Link]/10.1093/toxsci/kfs159.
D., van der Zande, M., Vinken, M., Yang, H., Bos, P.M.J., Heringa, M.B., 2020. New Tian, Y., Zhao, X., Huang, W., 2022. Meta-learning approaches for learning-to-learn in
Approach Methodologies (NAMs) for Human-Relevant Biokinetics Predictions: deep learning: A survey. Neurocomputing. [Link]
Meeting the Paradigm Shift in Toxicology Towards an Animal-Free Chemical Risk neucom.2022.04.078.
Assessment. ALTEX 37, 607–622. [Link] Tomoko, A., 2023. (Q)SAR Assessment Framework: Guidance for the regulatory
Pushkaran, A.C., Arabi, A.A., 2024. From understanding diseases to drug design: can assessment of (Quantitative) Structure Activity Relationship models, predictions,
artificial intelligence bridge the gap? Artif. Intell. Rev. 57. [Link] and results based on multiple predictions Series on Testing and Assessment No. 386.
s10462-024-10714-5. Truong, L., Ouedraogo, G., Pham, L.L., Clouzeau, J., Loisel-Joubert, S., Blanchet, D.,
Qian, X., Dai, X., Luo, L., Lin, M., Xu, Y., Zhao, Y., Huang, D., Qiu, H., Liang, L., Liu, H., Noçairi, H., Setzer, W., Judson, R., Grulke, C., Mansouri, K., Martin, M., 2018.
Liu, Y., Gu, L., Lu, T., Chen, Y., Zhang, Y., 2023. An interpretable multitask Predicting in vivo effect levels for repeat-dose systemic toxicity using chemical,
framework BiLAT enables accurate prediction of cyclin-dependent protein kinase biological, kinetic and study covariates. Arch. Toxicol. 92, 587–600. [Link]
inhibitors. J. Chem. Inf. Model 63, 3350–3368. [Link] 10.1007/s00204-017-2067-x.
jcim.3c00473. Vella, D., Ebejer, J.P., 2023. Few-Shot Learning for Low-Data Drug Discovery. J. Chem.
Inf. Model 63, 27–42. [Link]

10
D. Kim and J. Choi Toxicology 517 (2025) 154230

Wang, L., Zhao, L., Liu, X., Fu, J., Zhang, A., 2021a. SepPCNET: Deeping Learning on a Xie, Q., Luong, M.-T., Hovy, E., Le, Q.V., 2019. Self-training with Noisy Student improves
3D Surface Electrostatic Potential Point Cloud for Enhanced Toxicity Classification ImageNet classification.
and Its Application to Suspected Environmental Estrogens. Environ. Sci. Technol. 55, Xu, T., Ngan, D.K., Ye, L., Xia, M., Xie, H.Q., Zhao, B., Simeonov, A., Huang, R., 2020.
9958–9967. [Link] Predictive models for human organ toxicity based on in vitro bioactivity data and
Wang, X., Wang, L., Wang, S., Ren, Y., Chen, W., Li, X., Han, P., Song, T., 2023. chemical structure. Chem. Res Toxicol. 33, 731–741. [Link]
QuantumTox: Utilizing quantum chemistry with ensemble learning for molecular chemrestox.9b00305.
toxicity prediction. Comput. Biol. Med 157. [Link] Yap, X.H., Raymer, M., 2021. Multi-label classification and label dependence in in silico
compbiomed.2023.106744. toxicity prediction. Toxicol. Vitr. 74. [Link]
Wang, Y., Wang, B., Jiang, J., Guo, J., Lai, J., Lian, X.Y., Wu, J., 2021b. Multitask Yuan Li, Y., Chen, L., Pu, C., Zang, C., Yan, Y.C., Chen, Y., Zhang, Y., Liu, H., 2023. Co-
CapsNet: An Imbalanced Data Deep Learning Method for Predicting Toxicants. ACS model for chemical toxicity prediction based on multi-task deep learning. Mol. Inf.
Omega 6, 26545–26555. [Link] 42. [Link]
Watford, S., Ly Pham, L., Wignall, J., Shin, R., Martin, M.T., Friedman, K.P., 2019. Zhang, Baoquan, Luo, C., Jiang, H., Feng, S., Li, X., Zhang, Bowen, Ye, Y., 2023a.
ToxRefDB version 2.0: Improved utility for predictive and retrospective toxicology Adaptive transfer of graph neural networks for few-shot molecular property
analyses. Reprod. Toxicol. 89, 145–158. [Link] prediction. IEEE/ACM Trans. Comput. Biol. Bioinform 20, 3863–3875. [Link]
reprotox.2019.07.012. org/10.1109/TCBB.2023.3327452.
Wu, L., Huang, R., Tetko, I.V., Xia, Z., Xu, J., Tong, W., 2021. Trade-off Predictivity and Zhang, Baoquan, Luo, C., Jiang, H., Feng, S., Li, X., Zhang, Bowen, Ye, Y., 2023b.
Explainability for Machine-Learning Powered Predictive Toxicology: An in-Depth Adaptive transfer of graph neural networks for few-shot molecular property
Investigation with Tox21 Data Sets. Chem. Res Toxicol. 34, 541–549. [Link] prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 20, 3863–3875. [Link]
org/10.1021/[Link].0c00373. org/10.1109/TCBB.2023.3327452.
Wu, Z., Ramsundar, B., Feinberg, E.N., Gomes, J., Geniesse, C., Pappu, A.S., Leswing, K.,
Pande, V., 2018. MoleculeNet: a benchmark for molecular machine learning. Chem.
Sci. 9, 513–530. [Link]

11

You might also like