Synthesis of Covalent Organic Frameworks Using Sustainable Solvents and Machine Learning
Synthesis of Covalent Organic Frameworks Using Sustainable Solvents and Machine Learning
Covalent organic frameworks (COFs) have attracted considerable interest owing to their structural prede-
sign ability, controllable chemistry, long-range periodicity, and pore interior functionalization ability. The
most widely adopted solvothermal synthesis of COFs requires the use of toxic organic solvents. In line
with the 5th principle of green chemistry and the United Nations’ 12th Sustainable Development Goal, we
aim to mitigate the adverse effect of solvents on COF synthesis. Here we have investigated twelve green
solvents for the sustainable synthesis of five series of COFs using the solvothermal approach. Crystallinity
and porosity were used to assess the quality of the obtained COFs. In addition, the suitability of the sol-
vents in the synthesis of crystalline and porous COFs was investigated and color-coded for the final green
assessment. In particular, γ-butyrolactone (for TpPa, TpBD, and TpAzo), para-cymene (TpAnq), and
Received 4th August 2021, PolarClean (TpTab) were found to be excellent green solvents to produce high-quality COFs. For the first
Accepted 8th October 2021
time, we successfully used quantitative structure–property relationships in combination with machine
DOI: 10.1039/d1gc02796d learning approaches to predict both the surface area and crystallinity of COFs using the structure of the
rsc.li/greenchem solvents and COF building blocks.
8932 | Green Chem., 2021, 23, 8932–8939 This journal is © The Royal Society of Chemistry 2021
View Article Online
and crystalline in nature. COFs with high surface areas were The QSPR was used to identify the key structural elements
successfully prepared in ethanol, which is considered a green affecting the surface area and to determine if the resultant
solvent.13,14 Deep eutectic solvents as green media for the syn- COFs are crystalline or amorphous by analysing the solvent–
thesis of 2D and three-dimensional (3D) COFs based on Schiff- precursor pairs. We used the partial least squares (PLS)
base chemistry were also reported. However, the porosity and regression tool and 11 different machine learning (ML) algor-
crystallinity of the prepared COFs were compromised.15 ithms for binary classification. Our study initiates the explora-
Identification of efficient green solvents in the synthesis of tion of the field of COFs by design using advanced molecule
COFs is a tedious task that is commonly performed via trial- design tools.
and-error experimentation. However, the quantitative struc-
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
COF characterization
The crystallinities of the COFs prepared were determined from
the powder X-ray diffraction (PXRD) patterns collected using a
Bruker D8 ADVANCE with a high-intensity microfocus rotating
anode X-ray generator. The PXRD patterns of the COFs were
recorded in the 2θ range between 2.5° and 40°, and the data
were obtained using the DIFFRACplus XRD Commander soft-
ware. The radiation used was CuKα (α = 1.54 Å) with a Ni filter,
and the data collection was performed using a Quartz holder
at a scan speed of 1° min−1 and a step size of 0.01°. Fourier-
transform infrared (FTIR) spectra were obtained using a
Thermo Scientific Nicolet iS10 spectrometer with a universal
Zn–Se attenuated total reflection accessory. Solid-state 13C
cross polarization magic angle spinning (CP-MAS) NMR
spectra were measured using a Bruker Avance III 400 MHz
widebore instrument. Thermogravimetric analyses (TGA) were
performed on a TGA 209 F1 analyser (Netzsch) under an N2
atmosphere at a heating rate of 10 °C min−1 within the temp-
erature range of 30–900 °C. Scanning electron microscopy
(SEM) measurements were performed using a Magellan FEI
400. The samples were prepared by casting a drop of COFs dis-
Fig. 1 Schematic representation of COF synthesis using Tp trialdehyde persed in propan-2-ol on a silicon wafer. To avoid charging
and five different amines in green solvents. during the SEM analyses, all the samples were coated with a
This journal is © The Royal Society of Chemistry 2021 Green Chem., 2021, 23, 8932–8939 | 8933
View Article Online
3 nm-thick layer of iridium using a Q150 T S sputter coater outcome of the reaction was “1” if the reaction resulted in a
prior to the analyses. Nitrogen adsorption analyses were per- crystalline COF, and “0” if the reaction did not occur or
formed at 77 K using a liquid nitrogen bath on a resulted in an amorphous COF or a polymer. The final dataset
Micromeritics ASAP 2420 BET instrument. All the samples contained 60 binary-valued outcomes and descriptors. The
were degassed for 12 h at 140 °C under vacuum prior to gas binary classification problem was chosen over regression ana-
adsorption studies. The surface areas were evaluated using a lysis for the reaction outcome due to the small dataset and the
Brunauer–Emmett–Teller (BET) model applied between P/Po missing correlation between the surface area, crystallinity, and
values that fall in the range of 0.05–0.3 for the COFs. The pore yield. The dataset was split into training and test datasets in
size distributions were calculated using the non-localized an 85 : 15 ratio. It was necessary to perform principal com-
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
density functional theory (NLDFT) method. ponent analysis (PCA) and Y-scrambling (Y-randomization)
Open Access Article. Published on 08 October 2021. Downloaded on 10/22/2024 10:49:02 PM.
8934 | Green Chem., 2021, 23, 8932–8939 This journal is © The Royal Society of Chemistry 2021
View Article Online
Fig. 1 shows the list of the green solvents used for the syn-
thesis of the COFs. Solvents can be classified into seven
classes: carbonates, esters, ethers, sulfites, alcohols, aromatic
solvents, and aprotic solvents. A color-coding system was intro-
duced in the GlaxoSmithKline and CHEM21 solvent selection
guides,22–24 which were successfully used to describe the sus-
tainable synthesis of UiO-66.25 We employed the same color-
coding system in this work (section S14, ESI†). The column
“overall green assessment”, which shows the color code for the
green solvents utilized for the synthesis of the COFs, is based
on the solvent greenness mentioned in the solvent selection
guides (section S14, ESI†). The color codes for boiling point,
viscosity, the presence of a characteristic PXRD peak (corres-
ponds to diffraction from 100 planes), and SABET column are
defined according to the ranges mentioned in Table S14, ESI.†
The conventional solvents reported for the synthesis of COFs
were also included as a reference for comparison.
The color codes for the last two columns define the rank by
default and ranking after discussion. The column named as
“rank by default” indicates the composite color extracted from
the combined evaluation of solvent as well as the COF pro-
Fig. 2 Examples of experimental PXRD patterns and SEM images of
perties. Owing to the prime importance of the crystallinity and
TpPa-GBL, TpBD-GBL, TpAzo-GBL, TpAnq-PCl, and TpTab-PCl COFs.
surface area of the COFs in a wide range of applications, the
final color code in the “rank by default” column is dominated
by the porosity of the COFs. Finally, the color code in the
shown in Fig. S25, ESI.† The high-resolution profile for C (1s) column “ranking after discussion” indicates the compatibility
displayed three main peaks and one additional π–π* satellite of the employed solvent and has been interpreted after an
peak. The peak at 284.13 eV corresponded to the CvC bond of overall evaluation of solvent properties in the generation of
the aromatic rings, where the shoulders at 285.36 and 287.01 crystalline and porous COFs. In general, the green code
eV were assigned to the C–O and CvO bonds, respectively, denotes efficient solvents with minor issues, the yellow code
present in the framework backbone. The high-resolution for solvents that can be used but are found to be less efficient,
profile for N (1s) showed a peak at 399.63 eV, which corre- and the red code for solvents that are either not recommended
sponded to the vC–NH moiety of the ketoenamine bond of (according to solvent selection guides) or resulted in very low
the framework. In the high-resolution profile of O (1s), the crystalline porous COFs.
peak signals that appeared at 530.49 and 532.21 eV were To assess the suitability of green solvents in the preparation
assigned to the CvO and C–O bonds, respectively. For the of high-quality COFs, we calculated the relative SABET, relative
detailed analysis of the XPS profiles, refer to section S8 in the crystallinity, and relative yield for the COFs. As shown in
ESI.† All the COFs exhibited good thermal stability up to Fig. 4a, the TpPa, TpBD, and TpAzo COFs synthesized in GBL
approximately 350 °C (section S9, ESI†). The COFs displayed a displayed high BET surface area values. In contrast, in the case
sheet texture with lateral dimensions of 1–5 µm for all the of the TpAnq and TpTab COFs, the Cym and PCl solvents were
COFs (section S10, ESI†). found to be efficient in delivering highly porous COFs. In
The permanent porosity of the COFs was evaluated by terms of the crystallinity of the COFs, the results were quite
measuring the nitrogen gas uptake at 77 K (section S12, ESI†). vague and the data points were scattered all over the plot
The obtained BET surface area (SABET) of the COFs spanned (Fig. 4b). All the solvents afforded relatively moderate to low
across a wide range of 30 to 1674 m2 g−1 depending on the crystalline COFs. This suggests difficulty in correlating the
green solvent employed (Fig. 3). Among all the COFs reported crystallinity of the as-synthesized COFs with respect to the sol-
This journal is © The Royal Society of Chemistry 2021 Green Chem., 2021, 23, 8932–8939 | 8935
View Article Online
Fig. 3 Forty-three COFs were synthesized in twelve different green solvents. Surface area values for each COF have been provided at the bottom of
each COF structure. The cross sign signifies either no reaction or amorphous polymer formation.
vents used. A similar kind of observation was made with the surface area, crystallinity, and yield. No direct correlation for
relative yield plot (Fig. 4c); the data points were randomly dis- the highly scattered, randomly distributed points was observed
tributed across the plot, making it difficult to directly correlate for the yield-surface area results (Fig. S54a, ESI†). Similarly,
with the COFs synthesized in this study. For example, PC the crystallinity-yield (Fig. S54b, ESI†) and the crystallinity-
resulted in high yields for TpBD and TpAzo; however, it surface area (Fig. S54c, ESI†) datasets did not reveal any corre-
afforded moderate to low yields of other COFs. In other words, lation. The non-correlated data indicate that, for example, a
on the basis of relative crystallinity and yield, it is difficult to COF obtained in a high yield does not necessarily have a high
obscure a strong correlation of these COF properties with the surface area. Having no correlations across the results suggests
solvents employed. that the surface area, crystallinity, and yield data need to be
To address this problem, for the very first time, we utilized predicted separately; thus, none of them could be obtained
an ML approach to deduce the structure–property relationship one from the other.
between the solvents and resultant COFs. The surface area of With only 43 measured surface area data points and 2639
the COFs is co-dependent on the type of solvent(s) used. Thus, calculated descriptors ( predictor features), the original dataset
classical ab initio DFT calculations would require overly was high-dimensional and prone to suffer from dimensionality
complex methods to quantify the properties of COFs.26 To issues, making the application of classical prediction methods
overcome the issues with solvent dependency, we used QSPR challenging.27 To overcome the issues related to high dimen-
computational tools to predict the surface area and to verify if sionality datasets, PLS regression and PCA were applied to the
the resultant COF can be synthesized in the crystalline form. dataset. PLS regression and PCA are useful when the number
We hypothesized that by determining the structure of the of predictor features is high, and they are possibly cross-corre-
solvent and the structure of the COF, a predictive relationship lated. Using a PLS model, the response features were predicted
could be drawn while other parameters can be kept constant. from a large set of predictor features by reducing the set of the
Using a dataset with 60 points with high-capacity ML and deep latter to a smaller set of uncorrelated components ( projection
learning methods remains a challenge since they generally to latent structures). In the model-building phase, the original
require a large amount of data to obtain good predictive dataset contained a matrix of 3672 molecular descriptors of
results. Using the QSPR approach, we developed a quantitative the used solvents and amine precursors as the X matrix, and
structural–property relationship to predict the key structural the surface area and the binary results of the corresponding
elements necessary to generate high surface area and crystal- COF as Y variables as a vector. The first two PLS components
line COFs by analyzing the solvent–amine precursor pairs. were plotted against each other, and the outliers were removed
Initially, a cross-correlation analysis between the obtained based on a 95% confidence ellipse. The resultant matrix of
results was necessary to filter out relationships across the (39 2631) was split and standardized.
8936 | Green Chem., 2021, 23, 8932–8939 This journal is © The Royal Society of Chemistry 2021
View Article Online
This journal is © The Royal Society of Chemistry 2021 Green Chem., 2021, 23, 8932–8939 | 8937
View Article Online
Fig. 5 (a) Visualization of the predicted versus measured BET surface areas (m2 g−1). Visual representation of the binary classification results using
different algorithms, where the accuracy score is provided in parenthesis: (b) input data projected on the principal component 1 (x-axis) and principal
component 2 (y-axis), (c) k-nearest neighbor algorithm (0.83), (d) sigmoid support vector machine (0.70), (e) radial basis function support vector
machine (0.71), (f ) polynomial kernel support vector machine (0.71), (g) Gaussian process (0.82), (h) decision-tree algorithm (0.77), (i) random forest
algorithm (0.73), ( j) artificial neural network (shallow) (0.76), (k) adaptive boosting algorithm (0.79), (l) naïve Bayes method (0.87), and (m) quadratic
statistical classifier (0.87). The higher the accuracy score, the higher the predictive power of the method.
dictions, lower than the test RMSE from the model building
phase. We demonstrated that our ML-based methodology has
excellent predictive power with respect to crystallinity and
surface area of COFs, which could open new avenues for
in silico COF design strategies.
Conclusions
We synthesized forty-three COFs, falling into five series, in
twelve green solvents using an acetic acid green catalyst
through a solvothermal method. The suitability of the green
solvents in the synthesis of the high-quality COFs was investi-
gated by correlating the relative surface area, crystallinity, and
yield of the resultant COFs with varying parameters of the
green solvents. The gas adsorption studies and PXRD patterns
Fig. 6 Comparison of predicted vs. measured SABET of two COFs syn- indicate the possible role of green solvents as reaction media
thesized in PCl and GBL solvents. in navigating the formation of high-quality COFs. Using ML
approaches for the first time, we successfully demonstrated
that the surface area of the COFs can be predicted using
solvent and amine precursor descriptors with 0.83 R2 values in
phatic side group or a heteroaromatic core. The predicted the PLS regression analysis. We also demonstrated that the for-
surface area was 364 and 175 m2 g−1 for the TpPa2 COF in PCl mation of crystalline or amorphous COFs can be predicted
and GBL, respectively. The predicted surface area was 963 and using ML binary classification by only using the solvent media
774 m2 g−1 for the TpTta COF in PCl and GBL, respectively and the amine precursor’s descriptors, achieving an accuracy
(Fig. 6). The TpPa2 and TpTta COFs were synthesized using score of 0.87. In future, we aim to design new ML experiments
the same solvothermal method described above. All four COFs to identify a better correlation of the efficiency of the most
were crystalline with moderately high yield and PXRD results promising solvent with high-quality COF preparation. We
(section S16, ESI†). The measured surface areas were in close believe that these preliminary results will provide a fundamen-
agreement with the predictions. The RMSE was 124 for the pre- tal understanding of solvent behavior and provide access to
8938 | Green Chem., 2021, 23, 8932–8939 This journal is © The Royal Society of Chemistry 2021
View Article Online
several other green solvents used in preparing high-perform- 9 D. Rodríguez-San-Miguel, A. Abrishamkar, J. A. R. Navarro,
ance COFs. The real-world application showed the robustness R. Rodriguez-Trujillo, D. B. Amabilino, R. Mas-Ballesté,
of the model, which can be extended to design new COFs. The F. Zamora and J. Puigmartí-Luis, Chem. Commun., 2016, 52,
binary classification model is an excellent tool to predict 9212–9215.
whether a COF can be synthesized in an amorphous or crystal- 10 P. J. Waller, F. Gándara and O. M. Yaghi, Acc. Chem. Res.,
line form, while the surface area predictions were similar to 2015, 48, 3053–3063.
the measured values. 11 T. Welton, Proc. R. Soc. A, 2015, 471, 20150502.
12 J. Thote, H. Barike Aiyappa, R. Rahul Kumar,
S. Kandambeth, B. P. Biswal, D. Balaji Shinde, N. Chaki
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
This journal is © The Royal Society of Chemistry 2021 Green Chem., 2021, 23, 8932–8939 | 8939