0% found this document useful (0 votes)
40 views8 pages

Synthesis of Covalent Organic Frameworks Using Sustainable Solvents and Machine Learning

This article discusses the synthesis of covalent organic frameworks (COFs) using sustainable solvents and machine learning to enhance the process. It evaluates twelve green solvents for COF synthesis, identifying γ-butyrolactone, para-cymene, and PolarClean as effective options for producing high-quality COFs. The study also employs quantitative structure–property relationships to predict COF properties based on solvent and building block structures, marking a novel approach in COF design.

Uploaded by

kinuhataoffense
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views8 pages

Synthesis of Covalent Organic Frameworks Using Sustainable Solvents and Machine Learning

This article discusses the synthesis of covalent organic frameworks (COFs) using sustainable solvents and machine learning to enhance the process. It evaluates twelve green solvents for COF synthesis, identifying γ-butyrolactone, para-cymene, and PolarClean as effective options for producing high-quality COFs. The study also employs quantitative structure–property relationships to predict COF properties based on solvent and building block structures, marking a novel approach in COF design.

Uploaded by

kinuhataoffense
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Green Chemistry

View Article Online


PAPER View Journal | View Issue

Synthesis of covalent organic frameworks using


This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

sustainable solvents and machine learning†


Open Access Article. Published on 08 October 2021. Downloaded on 10/22/2024 10:49:02 PM.

Cite this: Green Chem., 2021, 23,


8932
Sushil Kumar, Gergo Ignacz and Gyorgy Szekely *

Covalent organic frameworks (COFs) have attracted considerable interest owing to their structural prede-
sign ability, controllable chemistry, long-range periodicity, and pore interior functionalization ability. The
most widely adopted solvothermal synthesis of COFs requires the use of toxic organic solvents. In line
with the 5th principle of green chemistry and the United Nations’ 12th Sustainable Development Goal, we
aim to mitigate the adverse effect of solvents on COF synthesis. Here we have investigated twelve green
solvents for the sustainable synthesis of five series of COFs using the solvothermal approach. Crystallinity
and porosity were used to assess the quality of the obtained COFs. In addition, the suitability of the sol-
vents in the synthesis of crystalline and porous COFs was investigated and color-coded for the final green
assessment. In particular, γ-butyrolactone (for TpPa, TpBD, and TpAzo), para-cymene (TpAnq), and
Received 4th August 2021, PolarClean (TpTab) were found to be excellent green solvents to produce high-quality COFs. For the first
Accepted 8th October 2021
time, we successfully used quantitative structure–property relationships in combination with machine
DOI: 10.1039/d1gc02796d learning approaches to predict both the surface area and crystallinity of COFs using the structure of the
rsc.li/greenchem solvents and COF building blocks.

Introduction facial polymerization,7,8 and microfluidic synthesis.9 Among


these methods, the solvothermal approach has been widely
Two-dimensional (2D) covalent organic frameworks (COFs) used in the construction of high-quality COFs.10 This approach
have gained both academic and industrial interest owing to relies on solvent selection for reaction media. In particular,
their unique design, ordered network, pore engineering, high the nature of solvent, the solubility of precursors, temperature,
porosity, and crystallinity.1,2 The conventional synthesis of and the duration of the reaction are considered as crucial
long-range ordered COFs involves the formation of transposa- factors, which affect the crystallinity and porosity of the resul-
ble connectivity through covalent bonds between symmetric tant COFs. The solvothermal preparation of COFs often
organic building blocks in a symmetrical fashion. requires a combination of two organic solvents (e.g., mesity-
Consequently, COFs exhibit structural uniformity, periodicity, lene–dioxane) in a particular ratio. This method is not appli-
porosity, crystallinity, and framework robustness. Owing to cable for all types of COFs. Moreover, solvent mixtures are
these unique structural properties, the range of application of more difficult to recover and recycle, and therefore undesired
COFs is vast, including gas storage, separation, heterogeneous from a green chemistry perspective.
catalysis, energy storage and separation, supercapacitors and The synthesis of newly designed COFs requires a cumber-
batteries, sensing, drug delivery, and optoelectronics.1,2 some screening of organic solvents and their mixtures. The
In the past few years, we have witnessed a significant devel- limited solubility of the precursors and their rate of diffusion
opment in synthetic methods for the preparation of highly in the selected solvent system significantly affect the crystalli-
porous and long-range ordered COFs. The methods include zation process and ultimately, the quality of the obtained
solvothermal synthesis, mechanochemical grinding,3,4 COFs. Therefore, understanding the structure–property
ionothermal synthesis, microwave-assisted synthesis,6 inter-
5
relationship of the solvent–precursor nexus is crucial in the
synthesis of high-quality COFs. The reaction medium has sub-
stantial contribution to the sustainability of synthetic pro-
Advanced Membranes and Porous Materials Center, Physical Science and cesses.11 The application of green solvents in the solvothermal
Engineering Division (PSE), King Abdullah University of Science and Technology
synthesis of COFs is scarce. Banerjee and co-workers success-
(KAUST), Thuwal 23955-6900, Saudi Arabia. E-mail: [email protected];
Tel: +966128082769http://www.szekelygroup.com
fully synthesized COFs in water using the dynamic covalent
† Electronic supplementary information (ESI) available. See DOI: 10.1039/ chemistry approach.12 Water is considered as an environmen-
d1gc02796d tally friendly reaction medium. The resulting COFs are porous

8932 | Green Chem., 2021, 23, 8932–8939 This journal is © The Royal Society of Chemistry 2021
View Article Online

Green Chemistry Paper

and crystalline in nature. COFs with high surface areas were The QSPR was used to identify the key structural elements
successfully prepared in ethanol, which is considered a green affecting the surface area and to determine if the resultant
solvent.13,14 Deep eutectic solvents as green media for the syn- COFs are crystalline or amorphous by analysing the solvent–
thesis of 2D and three-dimensional (3D) COFs based on Schiff- precursor pairs. We used the partial least squares (PLS)
base chemistry were also reported. However, the porosity and regression tool and 11 different machine learning (ML) algor-
crystallinity of the prepared COFs were compromised.15 ithms for binary classification. Our study initiates the explora-
Identification of efficient green solvents in the synthesis of tion of the field of COFs by design using advanced molecule
COFs is a tedious task that is commonly performed via trial- design tools.
and-error experimentation. However, the quantitative struc-
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

ture–property relationship (QSPR) tool, which is an emerging


Open Access Article. Published on 08 October 2021. Downloaded on 10/22/2024 10:49:02 PM.

technique among the major computational methods in Experimental


modern molecule design, could offer a resource and time
COF synthesis
efficient solution.16 QSPR analysis refers to any practical
approach by which the chemical structure is quantitatively cor- The solvothermal syntheses of five series of β-ketoenamine-
related with the physicochemical properties of the molecule or based COFs were performed by employing twelve different bio-
material. QSPR models have already found application in based green solvents such as dimethyl carbonate (DC), propy-
assessing the potential impacts of chemicals and nano- lene carbonate (PC), γ-butyrolactone (GBL), 1,2-ethylene sulfite
materials on both living and synthetic systems. There have (ES), 1,3-propylene sulfite (PS), cyrene (Cyr), isosorbide dimethyl
been no QSPR or any related quantitative structural–activity ether (IDE), 2,5-dimethyl furan (DF), 2-methyl-1-propanol (MP),
relationship-based studies on the property prediction of COFs. terpineol (Tn), para-cymene (Cym), and Polar-Clean (PCl)
In this work, we surveyed various green solvents as reaction (Fig. 1). A Pyrex tube was charged with 0.3 mmol Tp, 0.45 mmol
media for the synthesis of high-quality COFs. We prepared five of the corresponding diamines, i.e., 1,4-phenylenediamine (Pa),
series of β-ketoenamine-based COFs in twelve different green benzidine (BD), 4,4′-azodianiline (Azo), 2,6-diaminoanthraqui-
solvents (Fig. 1). We identified the best solvent for each series none (Anq), and 0.3 mmol of triamine, i.e., 1,3,5-tris(4-amino-
that is suitable to deliver highly porous and crystalline COFs. phenyl)benzene (Tab), and 3 mL of a green solvent having
0.2 mL of glacial acetic acid (3 M) as a green catalyst. After soni-
cation for 15 min, the reaction mixture was subjected to three
consecutive freeze–pump–thaw cycles under liquid nitrogen.
The tube was sealed under 1 mbar vacuum and heated at
120 °C for 72 h in a preheated oven (section S2, ESI†). Prior to
characterization studies, the resulting solid COF material was
washed and dried at 90 °C under 1 mbar vacuum overnight.

COF characterization
The crystallinities of the COFs prepared were determined from
the powder X-ray diffraction (PXRD) patterns collected using a
Bruker D8 ADVANCE with a high-intensity microfocus rotating
anode X-ray generator. The PXRD patterns of the COFs were
recorded in the 2θ range between 2.5° and 40°, and the data
were obtained using the DIFFRACplus XRD Commander soft-
ware. The radiation used was CuKα (α = 1.54 Å) with a Ni filter,
and the data collection was performed using a Quartz holder
at a scan speed of 1° min−1 and a step size of 0.01°. Fourier-
transform infrared (FTIR) spectra were obtained using a
Thermo Scientific Nicolet iS10 spectrometer with a universal
Zn–Se attenuated total reflection accessory. Solid-state 13C
cross polarization magic angle spinning (CP-MAS) NMR
spectra were measured using a Bruker Avance III 400 MHz
widebore instrument. Thermogravimetric analyses (TGA) were
performed on a TGA 209 F1 analyser (Netzsch) under an N2
atmosphere at a heating rate of 10 °C min−1 within the temp-
erature range of 30–900 °C. Scanning electron microscopy
(SEM) measurements were performed using a Magellan FEI
400. The samples were prepared by casting a drop of COFs dis-
Fig. 1 Schematic representation of COF synthesis using Tp trialdehyde persed in propan-2-ol on a silicon wafer. To avoid charging
and five different amines in green solvents. during the SEM analyses, all the samples were coated with a

This journal is © The Royal Society of Chemistry 2021 Green Chem., 2021, 23, 8932–8939 | 8933
View Article Online

Paper Green Chemistry

3 nm-thick layer of iridium using a Q150 T S sputter coater outcome of the reaction was “1” if the reaction resulted in a
prior to the analyses. Nitrogen adsorption analyses were per- crystalline COF, and “0” if the reaction did not occur or
formed at 77 K using a liquid nitrogen bath on a resulted in an amorphous COF or a polymer. The final dataset
Micromeritics ASAP 2420 BET instrument. All the samples contained 60 binary-valued outcomes and descriptors. The
were degassed for 12 h at 140 °C under vacuum prior to gas binary classification problem was chosen over regression ana-
adsorption studies. The surface areas were evaluated using a lysis for the reaction outcome due to the small dataset and the
Brunauer–Emmett–Teller (BET) model applied between P/Po missing correlation between the surface area, crystallinity, and
values that fall in the range of 0.05–0.3 for the COFs. The pore yield. The dataset was split into training and test datasets in
size distributions were calculated using the non-localized an 85 : 15 ratio. It was necessary to perform principal com-
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

density functional theory (NLDFT) method. ponent analysis (PCA) and Y-scrambling (Y-randomization)
Open Access Article. Published on 08 October 2021. Downloaded on 10/22/2024 10:49:02 PM.

due to the high dimensionality and the small dataset, respect-


Dataset generation ively.20 The algorithms employed were k-nearest neighbours,
The dataset was generated using the chemical structures of the sigmoid support vector machine (SVM), radial basis function
precursors and the solvents (section S1, ESI†). The results were (RBF) SVM, polynomial SVM, decision tree, random forest, arti-
transformed into a matrix of (60,1) for the surface area, yield, ficial neural network, adaptive boosting (AdaBoost), naïve
and crystallinity. Chemical descriptors corresponding to each Bayes, and quadratic classifier algorithms (section S1, ESI†).
experimental data point ( precursor and solvent) were calcu- All Python calculations were performed on 100% sustainable
lated by Mordred and RDKit packages using a Python script,17 Google Cloud Platform.21
and the NaN values were removed. A total of 1860 classical 1D,
2D, and 3D molecular descriptors18 were calculated from the
amine precursors and solvents each. The majority of descrip- Results and discussion
tors belonged to the autocorrelation, the Barysz matrix, electro-
topological atomic state, different topological and MoRSE type A Schiff-base condensation reaction was performed between
descriptors. For a more comprehensive collection of different Tp and the respective amines in various green solvents using
descriptor types, refer to section S1, ESI.† The final dataset the solvothermal approach, thereby affording TpPa, TpBD,
was a matrix of (60 2639) containing 158 340 data points. The TpAzo, TpAnq, and TpTab COFs (Fig. 1). All the COFs were syn-
dataset was split into train and test sets, and subjected to data thesized under identical reaction conditions for all the green
analysis, PLS regression, and classification. The reduced and solvents investigated. The crystallinity of the COFs was deter-
clean dataset contained 2631 molecular descriptors. The amor- mined from PXRD patterns (Fig. 2; section S4, ESI†). The high-
phous COFs, including low yield and surface area, were intensity first peak observed at a 2θ lower than 5° can be attrib-
omitted from the dataset for surface area prediction and only uted to the strong diffraction from the [100] planes, while the
used for crystallinity binary classification. The descriptors con- broad peak observed at a 2θ higher than 25° can be attributed
taining non-float values (e.g., lists, NaN, or string values) were to the diffraction from the [001] planes. The PXRD obser-
also removed. vations suggest the π–π stacking of the COF layers along the
[001] plane. The experimental PXRD patterns of the COFs were
QSPR and ML-based predictions found to match well with the PXRD patterns simulated for the
PLS prediction was made in PLS Toolbox (Eigenvector eclipsed AA stacking model (section S5, ESI†) and are in good
Research) under a MATLAB environment. For cross-validation, agreement with the results of previous studies.4 The relatively
we used random samples with seven-fold cross-validation. high intensities of the first peaks demonstrate the high crystal-
Optimal parameter selection based on the global minimum of linity of the COFs.
the root-mean-square error of cross-validation (RMSECV) auto- The FTIR spectra of the COFs are in good agreement with
scaling was used to pre-process the dataset, and the outliers those reported in the literature.4 The presence of strong peaks
were removed by plotting the first two latent variables on a at 1250 cm−1 for ν(C–N) and 1575 cm−1 for ν(CvC) confirmed
95% confidence ellipse. Variable selection on projection (VIP) that the precursors, i.e., Tp and amines, were covalently linked
scoring was used to reveal the relative impact of each mole- together via the formation of β-ketoenamine moieties in the
cular descriptor on the surface area. Validation of the PLS framework (section S6, ESI†). We have performed 13C CP-MAS
results was performed using cross-validation, external vali- solid-state NMR studies to explore the composition of the
dation, and Y-scrambling to reduce and eliminate possible framework structure. The carbon signal present at approxi-
overfitting.19 The data were split into 80 : 20 ratio of training mately 180 ppm was assigned to the keto group, while the
and test datasets, respectively. The training root-mean-square peak at 100 ppm corresponded to the CvC bond adjacent to
error of calibration (RMSEC) and the RMSECV were recorded. the keto group (section S7, ESI†).
The test dataset was used to quantify the goodness of the The chemical structure of the COFs was characterized using
model by predicting the test data from the known descriptors. XPS profiles (section S8, ESI†). For example, the TpPa COF
Binary classification was used for the prediction of the crys- showed three intense peaks at 284.62, 399.63, and 530.62 eV,
tallinity of the COFs. The dataset consisted of the same which correspond to C (1s), N (1s), and O (1s) signals, respect-
descriptors that were used in the PLS dataset. The binary ively. Detailed analysis of the high-resolution XPS profile is

8934 | Green Chem., 2021, 23, 8932–8939 This journal is © The Royal Society of Chemistry 2021
View Article Online

Green Chemistry Paper

in this work, TpAzo-GBL exhibited the highest surface area of


1674 m2 g−1, followed by 1046 (TpBD-GBL), 1036 (TpTab-PCl),
1033 (TpAnq-Cym), and 888 (TpPa-GBL). Note that most of the
COFs synthesized here exhibited improved surface area values
as compared to the ones reported in conventional organic sol-
vents.2 The pore size distributions for the as-synthesized COFs
are presented in section S13 (ESI)† and were found to be
approximately 15 Å (TpPa), 18 Å (TpBD), 22 Å (TpAzo), 18 Å
(TpAnq), and 14 Å (TpTab), which were calculated on the basis
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

of the NLDFT model.


Open Access Article. Published on 08 October 2021. Downloaded on 10/22/2024 10:49:02 PM.

Fig. 1 shows the list of the green solvents used for the syn-
thesis of the COFs. Solvents can be classified into seven
classes: carbonates, esters, ethers, sulfites, alcohols, aromatic
solvents, and aprotic solvents. A color-coding system was intro-
duced in the GlaxoSmithKline and CHEM21 solvent selection
guides,22–24 which were successfully used to describe the sus-
tainable synthesis of UiO-66.25 We employed the same color-
coding system in this work (section S14, ESI†). The column
“overall green assessment”, which shows the color code for the
green solvents utilized for the synthesis of the COFs, is based
on the solvent greenness mentioned in the solvent selection
guides (section S14, ESI†). The color codes for boiling point,
viscosity, the presence of a characteristic PXRD peak (corres-
ponds to diffraction from 100 planes), and SABET column are
defined according to the ranges mentioned in Table S14, ESI.†
The conventional solvents reported for the synthesis of COFs
were also included as a reference for comparison.
The color codes for the last two columns define the rank by
default and ranking after discussion. The column named as
“rank by default” indicates the composite color extracted from
the combined evaluation of solvent as well as the COF pro-
Fig. 2 Examples of experimental PXRD patterns and SEM images of
perties. Owing to the prime importance of the crystallinity and
TpPa-GBL, TpBD-GBL, TpAzo-GBL, TpAnq-PCl, and TpTab-PCl COFs.
surface area of the COFs in a wide range of applications, the
final color code in the “rank by default” column is dominated
by the porosity of the COFs. Finally, the color code in the
shown in Fig. S25, ESI.† The high-resolution profile for C (1s) column “ranking after discussion” indicates the compatibility
displayed three main peaks and one additional π–π* satellite of the employed solvent and has been interpreted after an
peak. The peak at 284.13 eV corresponded to the CvC bond of overall evaluation of solvent properties in the generation of
the aromatic rings, where the shoulders at 285.36 and 287.01 crystalline and porous COFs. In general, the green code
eV were assigned to the C–O and CvO bonds, respectively, denotes efficient solvents with minor issues, the yellow code
present in the framework backbone. The high-resolution for solvents that can be used but are found to be less efficient,
profile for N (1s) showed a peak at 399.63 eV, which corre- and the red code for solvents that are either not recommended
sponded to the vC–NH moiety of the ketoenamine bond of (according to solvent selection guides) or resulted in very low
the framework. In the high-resolution profile of O (1s), the crystalline porous COFs.
peak signals that appeared at 530.49 and 532.21 eV were To assess the suitability of green solvents in the preparation
assigned to the CvO and C–O bonds, respectively. For the of high-quality COFs, we calculated the relative SABET, relative
detailed analysis of the XPS profiles, refer to section S8 in the crystallinity, and relative yield for the COFs. As shown in
ESI.† All the COFs exhibited good thermal stability up to Fig. 4a, the TpPa, TpBD, and TpAzo COFs synthesized in GBL
approximately 350 °C (section S9, ESI†). The COFs displayed a displayed high BET surface area values. In contrast, in the case
sheet texture with lateral dimensions of 1–5 µm for all the of the TpAnq and TpTab COFs, the Cym and PCl solvents were
COFs (section S10, ESI†). found to be efficient in delivering highly porous COFs. In
The permanent porosity of the COFs was evaluated by terms of the crystallinity of the COFs, the results were quite
measuring the nitrogen gas uptake at 77 K (section S12, ESI†). vague and the data points were scattered all over the plot
The obtained BET surface area (SABET) of the COFs spanned (Fig. 4b). All the solvents afforded relatively moderate to low
across a wide range of 30 to 1674 m2 g−1 depending on the crystalline COFs. This suggests difficulty in correlating the
green solvent employed (Fig. 3). Among all the COFs reported crystallinity of the as-synthesized COFs with respect to the sol-

This journal is © The Royal Society of Chemistry 2021 Green Chem., 2021, 23, 8932–8939 | 8935
View Article Online

Paper Green Chemistry


This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
Open Access Article. Published on 08 October 2021. Downloaded on 10/22/2024 10:49:02 PM.

Fig. 3 Forty-three COFs were synthesized in twelve different green solvents. Surface area values for each COF have been provided at the bottom of
each COF structure. The cross sign signifies either no reaction or amorphous polymer formation.

vents used. A similar kind of observation was made with the surface area, crystallinity, and yield. No direct correlation for
relative yield plot (Fig. 4c); the data points were randomly dis- the highly scattered, randomly distributed points was observed
tributed across the plot, making it difficult to directly correlate for the yield-surface area results (Fig. S54a, ESI†). Similarly,
with the COFs synthesized in this study. For example, PC the crystallinity-yield (Fig. S54b, ESI†) and the crystallinity-
resulted in high yields for TpBD and TpAzo; however, it surface area (Fig. S54c, ESI†) datasets did not reveal any corre-
afforded moderate to low yields of other COFs. In other words, lation. The non-correlated data indicate that, for example, a
on the basis of relative crystallinity and yield, it is difficult to COF obtained in a high yield does not necessarily have a high
obscure a strong correlation of these COF properties with the surface area. Having no correlations across the results suggests
solvents employed. that the surface area, crystallinity, and yield data need to be
To address this problem, for the very first time, we utilized predicted separately; thus, none of them could be obtained
an ML approach to deduce the structure–property relationship one from the other.
between the solvents and resultant COFs. The surface area of With only 43 measured surface area data points and 2639
the COFs is co-dependent on the type of solvent(s) used. Thus, calculated descriptors ( predictor features), the original dataset
classical ab initio DFT calculations would require overly was high-dimensional and prone to suffer from dimensionality
complex methods to quantify the properties of COFs.26 To issues, making the application of classical prediction methods
overcome the issues with solvent dependency, we used QSPR challenging.27 To overcome the issues related to high dimen-
computational tools to predict the surface area and to verify if sionality datasets, PLS regression and PCA were applied to the
the resultant COF can be synthesized in the crystalline form. dataset. PLS regression and PCA are useful when the number
We hypothesized that by determining the structure of the of predictor features is high, and they are possibly cross-corre-
solvent and the structure of the COF, a predictive relationship lated. Using a PLS model, the response features were predicted
could be drawn while other parameters can be kept constant. from a large set of predictor features by reducing the set of the
Using a dataset with 60 points with high-capacity ML and deep latter to a smaller set of uncorrelated components ( projection
learning methods remains a challenge since they generally to latent structures). In the model-building phase, the original
require a large amount of data to obtain good predictive dataset contained a matrix of 3672 molecular descriptors of
results. Using the QSPR approach, we developed a quantitative the used solvents and amine precursors as the X matrix, and
structural–property relationship to predict the key structural the surface area and the binary results of the corresponding
elements necessary to generate high surface area and crystal- COF as Y variables as a vector. The first two PLS components
line COFs by analyzing the solvent–amine precursor pairs. were plotted against each other, and the outliers were removed
Initially, a cross-correlation analysis between the obtained based on a 95% confidence ellipse. The resultant matrix of
results was necessary to filter out relationships across the (39 2631) was split and standardized.

8936 | Green Chem., 2021, 23, 8932–8939 This journal is © The Royal Society of Chemistry 2021
View Article Online

Green Chemistry Paper

the surface area was found to increase with an increase in the


surface area. In general, the model shows a strong correlation
between the predicted and measured surface area. Based on
the VIP scoring, 196 descriptors were selected (refer to VIP
scoring, section S15, ESI†) from descriptors with the highest
VIP scoring related to the amine precursors’ and the solvents’
electronic structures. From the best 196 descriptors 90 of them
were ligand descriptors (45%), which means that the BET
surface area is dependent on the structure of both the solvent
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

and the ligand. Interestingly, out of the top 50 descriptors,


Open Access Article. Published on 08 October 2021. Downloaded on 10/22/2024 10:49:02 PM.

only 12 belonged to the ligands (24%), and the first ligand


descriptor was only the 17th from the absolute value sorted
PLS prediction list. The highest scoring descriptors belonged
to hybridization factor, spatial autocorrelation values (Moran’s
index), electrotopological state indexes and the log P of the
solvent. The highest scoring ligand descriptor was also a
spatial autocorrelation index (electronegativity weighted Geary
index). Fig. S1† shows the VIP scoring in decreasing absolute
order. There was no single outstanding descriptor with several
mid-range VIP scores, emphasizing the complexity in surface
area prediction. For the captured variance values and model
parameter diagram, refer to section S15 (ESI).† The crystallinity
of the COFs depends on the PXRD measurement parameters,
while yield results generally have a high error. Thus, the yield
and crystallinity results were combined and simplified for use
in the prediction. The binary classification problem was
created by combining the yield and crystallinity results into
simple crystalline COF/amorphous COF data. The original
dataset contained a matrix of (60 2631) molecular descriptors
of the used solvents and amine precursors as the X matrix and
the binary values of crystalline/amorphous COFs as the Y
vector. The results of the binary classification ML algorithms
and classical statistical methods are shown in Fig. 5. The per-
formance of the naïve Bayes and QDA algorithms was better
than those of the SVM, decision tree, random forest, artificial
neural network, and boosting algorithms. This difference can
be attributed to the insufficient data when the ML algorithms
Fig. 4 (a) Relative surface area, (b) relative crystallinity, and (c) relative tend to underperform the classical statistical methods. Both
yield of the TpPa, TpBD, TpAzo, TpAnq, and TpTab series of COFs syn- the naïve Bayes and QDA reached an accuracy score of 0.87.
thesized in twelve green solvents. The value provided in parenthesis For details of each algorithm, refer to section S15, ESI.†
along the x-axis denotes the maximum value of (a) BET surface area (m2
g−1), (b) crystallinity, and (c) yield (%) used in the calculations (section S4,
ESI†).
Real-world application
To test our model in a real-world application, we first used the
The optimal number of PLS components was found to be 3 best performing binary classification models (QDA and naïve
with seven-fold cross-validation and a blind thickness of 1 Bayes) to predict the expected crystallinity of two new COFs in
based on the average minimum of the RMSECV values. The GBL and PCl solvents (Fig. 6). We chose these two solvents
RMSEC and RMSECV values were found to be 119 and 174 because, from the previous measurement, they yielded high
from the Y-scrambling test, respectively. In contrast, RMSEP surface area COFs. The two new COFs, namely TpPa2 and
was 199 based on the Y-scrambling test. The insignificant TpTta, were selected because the ligand amine is inherently
difference between the cross-validation and the test R2 score different from that in the training set. Using diverse ligands,
values indicates no overfitting. The prediction error agrees well we further tested the robustness of the model. The Pa2 ligand
with the measured general error of the surface area of the contains two methyl groups at para position to each other,
microporous materials.28 Fig. 5 shows general model training while Tta contains a 1,3,5-triazine group in its core. Note that
and test data with the corresponding trend line. The error of in the training set, not a single ligand included either an ali-

This journal is © The Royal Society of Chemistry 2021 Green Chem., 2021, 23, 8932–8939 | 8937
View Article Online

Paper Green Chemistry


This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
Open Access Article. Published on 08 October 2021. Downloaded on 10/22/2024 10:49:02 PM.

Fig. 5 (a) Visualization of the predicted versus measured BET surface areas (m2 g−1). Visual representation of the binary classification results using
different algorithms, where the accuracy score is provided in parenthesis: (b) input data projected on the principal component 1 (x-axis) and principal
component 2 (y-axis), (c) k-nearest neighbor algorithm (0.83), (d) sigmoid support vector machine (0.70), (e) radial basis function support vector
machine (0.71), (f ) polynomial kernel support vector machine (0.71), (g) Gaussian process (0.82), (h) decision-tree algorithm (0.77), (i) random forest
algorithm (0.73), ( j) artificial neural network (shallow) (0.76), (k) adaptive boosting algorithm (0.79), (l) naïve Bayes method (0.87), and (m) quadratic
statistical classifier (0.87). The higher the accuracy score, the higher the predictive power of the method.

dictions, lower than the test RMSE from the model building
phase. We demonstrated that our ML-based methodology has
excellent predictive power with respect to crystallinity and
surface area of COFs, which could open new avenues for
in silico COF design strategies.

Conclusions
We synthesized forty-three COFs, falling into five series, in
twelve green solvents using an acetic acid green catalyst
through a solvothermal method. The suitability of the green
solvents in the synthesis of the high-quality COFs was investi-
gated by correlating the relative surface area, crystallinity, and
yield of the resultant COFs with varying parameters of the
green solvents. The gas adsorption studies and PXRD patterns
Fig. 6 Comparison of predicted vs. measured SABET of two COFs syn- indicate the possible role of green solvents as reaction media
thesized in PCl and GBL solvents. in navigating the formation of high-quality COFs. Using ML
approaches for the first time, we successfully demonstrated
that the surface area of the COFs can be predicted using
solvent and amine precursor descriptors with 0.83 R2 values in
phatic side group or a heteroaromatic core. The predicted the PLS regression analysis. We also demonstrated that the for-
surface area was 364 and 175 m2 g−1 for the TpPa2 COF in PCl mation of crystalline or amorphous COFs can be predicted
and GBL, respectively. The predicted surface area was 963 and using ML binary classification by only using the solvent media
774 m2 g−1 for the TpTta COF in PCl and GBL, respectively and the amine precursor’s descriptors, achieving an accuracy
(Fig. 6). The TpPa2 and TpTta COFs were synthesized using score of 0.87. In future, we aim to design new ML experiments
the same solvothermal method described above. All four COFs to identify a better correlation of the efficiency of the most
were crystalline with moderately high yield and PXRD results promising solvent with high-quality COF preparation. We
(section S16, ESI†). The measured surface areas were in close believe that these preliminary results will provide a fundamen-
agreement with the predictions. The RMSE was 124 for the pre- tal understanding of solvent behavior and provide access to

8938 | Green Chem., 2021, 23, 8932–8939 This journal is © The Royal Society of Chemistry 2021
View Article Online

Green Chemistry Paper

several other green solvents used in preparing high-perform- 9 D. Rodríguez-San-Miguel, A. Abrishamkar, J. A. R. Navarro,
ance COFs. The real-world application showed the robustness R. Rodriguez-Trujillo, D. B. Amabilino, R. Mas-Ballesté,
of the model, which can be extended to design new COFs. The F. Zamora and J. Puigmartí-Luis, Chem. Commun., 2016, 52,
binary classification model is an excellent tool to predict 9212–9215.
whether a COF can be synthesized in an amorphous or crystal- 10 P. J. Waller, F. Gándara and O. M. Yaghi, Acc. Chem. Res.,
line form, while the surface area predictions were similar to 2015, 48, 3053–3063.
the measured values. 11 T. Welton, Proc. R. Soc. A, 2015, 471, 20150502.
12 J. Thote, H. Barike Aiyappa, R. Rahul Kumar,
S. Kandambeth, B. P. Biswal, D. Balaji Shinde, N. Chaki
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

Author contributions Roy and R. Banerjee, IUCrJ, 2016, 3, 402–407.


Open Access Article. Published on 08 October 2021. Downloaded on 10/22/2024 10:49:02 PM.

13 C.-X. Yang, C. Liu, Y.-M. Cao and X.-P. Yan, Chem.


Sushil Kumar: Investigation, validation, formal analysis, data Commun., 2015, 51, 12254–12257.
curation, visualization, and writing—original draft. Gergo 14 L. Cseri, M. Razali, P. Pogany and G. Szekely, Organic
Ignacz: modeling, formal analysis, visualization, and writing— solvents in sustainable synthesis and engineering, ed. B.
original draft. Gyorgy Szekely: conceptualization, methodology, Török and T. B. T.-G. C. Dransfield, Elsevier, 2018, pp.
resources, visualization, writing—review & editing, supervi- 513–553.
sion, funding acquisition, and project administration. 15 J. Qiu, P. Guan, Y. Zhao, Z. Li, H. Wang and J. Wang, Green
Chem., 2020, 22, 7537–7542.
16 E. N. Muratov, J. Bajorath, R. P. Sheridan, I. V. Tetko,
Conflicts of interest D. Filimonov, V. Poroikov, T. I. Oprea, I. I. Baskin,
A. Varnek, A. Roitberg, O. Isayev, S. Curtalolo, D. Fourches,
There are no conflicts to declare.
Y. Cohen, A. Aspuru-Guzik, D. A. Winkler, D. Agrafiotis,
A. Cherkasov and A. Tropsha, Chem. Soc. Rev., 2020, 49,
3525–3564.
Acknowledgements 17 H. Moriwaki, Y.-S. Tian, N. Kawashita and T. Takagi,
This work was supported by the King Abdullah University of J. Cheminf., 2018, 10, 4.
Science and Technology (KAUST). The postdoctoral (SK) and 18 R. Todeschini and V. Consonni, Handbook of Molecular
PhD (GI) fellowships from the Advanced Membranes and Descriptors, Wiley-VCH, 2000, vol. 11, p. 688.
Porous Materials Center at KAUST are gratefully acknowledged. 19 P. Gramatica, QSAR Comb. Sci., 2007, 26, 694–701.
20 P. F. J. Lipiński and P. Szurmak, Chem. Pap., 2017, 71,
2217–2232.
21 Carbon neutral since 2007. Carbon free by 2030., https://
References sustainability.google.
1 S. J. Lyle, P. J. Waller and O. M. Yaghi, Trends Chem., 2019, 22 R. K. Henderson, C. Jiménez-González, D. J. C. Constable,
1, 172–184. S. R. Alston, G. G. A. Inglis, G. Fisher, J. Sherwood,
2 K. Geng, T. He, R. Liu, S. Dalapati, K. T. Tan, Z. Li, S. Tao, S. P. Binks and A. D. Curzons, Green Chem., 2011, 13, 854–
Y. Gong, Q. Jiang and D. Jiang, Chem. Rev., 2020, 120, 862.
8814–8933. 23 C. M. Alder, J. D. Hayler, R. K. Henderson, A. M. Redman,
3 B. P. Biswal, S. Chandra, S. Kandambeth, B. Lukose, T. Heine L. Shukla, L. E. Shuster and H. F. Sneddon, Green Chem.,
and R. Banerjee, J. Am. Chem. Soc., 2013, 135, 5328–5331. 2016, 18, 3879–3890.
4 S. Karak, S. Kandambeth, B. P. Biswal, H. S. Sasmal, 24 D. Prat, A. Wells, J. Hayler, H. Sneddon, C. R. McElroy,
S. Kumar, P. Pachfule and R. Banerjee, J. Am. Chem. Soc., S. Abou-Shehada and P. J. Dunn, Green Chem., 2016, 18,
2017, 139, 1856–1862. 288–296.
5 P. Kuhn, M. Antonietti and A. Thomas, Angew. Chem., Int. 25 D. Morelli Venturi, F. Campana, F. Marmottini,
Ed., 2008, 47, 3450–3453. F. Costantino and L. Vaccaro, ACS Sustainable Chem. Eng.,
6 M. Dogru, A. Sonnauer, S. Zimdars, M. Döblinger, 2020, 8, 17154–17164.
P. Knochel and T. Bein, CrystEngComm, 2013, 15, 1500–1502. 26 A. Datar, Y. G. Chung and L.-C. Lin, J. Phys. Chem. Lett.,
7 K. Dey, M. Pal, K. C. Rout, S. Kunjattu H, A. Das, 2020, 11, 5412–5417.
R. Mukherjee, U. K. Kharul and R. Banerjee, J. Am. Chem. 27 I. M. Johnstone and D. M. Titterington, Philos.
Soc., 2017, 139, 13083–13091. Trans. R. Soc., A, 2009, 367, 4237–4253.
8 D. B. Shinde, G. Sheng, X. Li, M. Ostwal, A.-H. Emwas, 28 P. Sinha, A. Datar, C. Jeong, X. Deng, Y. G. Chung and
K.-W. Huang and Z. Lai, J. Am. Chem. Soc., 2018, 140, L.-C. Lin, J. Phys. Chem. C, 2019, 123, 20195–
14342–14349. 20209.

This journal is © The Royal Society of Chemistry 2021 Green Chem., 2021, 23, 8932–8939 | 8939

You might also like