Using A Machine Learning Approach To Determine The Space Group of A Structure From The Atomic Pair Distribution Function
Using A Machine Learning Approach To Determine The Space Group of A Structure From The Atomic Pair Distribution Function
Chia-Hao Liu,a Yunzhe Tao,a Daniel Hsu,b Qiang Dua and Simon J. L. Billingea,c*
a
Department of Applied Physics and Applied Mathematics, Columbia University, New York, New York, 10027, USA,
Received 26 February 2019 b
Department of Computer Science, Columbia University, New York, New York, 10027, USA, and cCondensed Matter
Accepted 24 April 2019 Physics and Materials Science Department, Brookhaven National Laboratory, Upton, New York, 11973, USA.
*Correspondence e-mail: [email protected]
1. Introduction
Crystallography is used to determine crystal structures from
diffraction patterns (Giacovazzo, 1999), including patterns
from powdered samples (Pecharsky & Zavalij, 2005). The
analysis of single-crystal diffraction is the most direct
approach for solving crystal structures. However, powder
diffraction becomes the best option when single crystals with
desirable size and quality are not available.
A crystallographic structure solution makes heavy use of
symmetry information to succeed. The first step is to deter-
mine the unit cell and space group of the underlying structure.
Information about this is contained in the positions (and
characteristic absences) of Bragg peaks in the diffraction
pattern. This process of determining the unit cell and space
group of the structure is known as ‘indexing’ the pattern
(Giacovazzo, 1999). Indexing is inherently challenging for
powder diffraction due to the loss of explicit directional
information in the pattern, which is the result of projecting the
data from three dimensions into a one-dimensional pattern
(de Wolff, 1957; Mighell & Santoro, 1975). However, there are
a number of different algorithms available that work well in
different situations (Visser, 1969; Coelho, 2003; Boultif &
Louër, 2004; Altomare, Campi et al., 2009). Once the unit-cell
information is determined, an investigation on systematic
absences of diffraction peaks is carried out to identify the
space group. Various methods for determining space-group
information, based on either statistical or brute-force searches,
have been used (Neumann, 2003; Markvardsen et al., 2008;
# 2019 International Union of Crystallography Altomare, Camalli et al., 2009; Coelho, 2017).
634 Chia-Hao Liu et al. Determining space groups using machine learning Acta Cryst. (2019). A75, 633–643
research papers
Table 1 Table 2
Space group and corresponding number of entries considered in this Parameters used to calculate PDFs from atomic structures.
study.
ADP stands for isotropic atomic displacement parameter. All parameters
Space group (No.) No. of entries follow the same definitions as in Farrow et al. (2007).
Acta Cryst. (2019). A75, 633–643 Chia-Hao Liu et al. Determining space groups using machine learning 635
research papers
3.1. Space-group determination based on logistic regression The best LR model with X as the input yields an accuracy of
(LR) model 20% at ð; Þ ¼ ð105 ; 0:75Þ. This result is better than a
We start our learning experiment with a rather simple random guess from 45 space groups (2%) but is still far from
model, LR. In the setup of the LR model the probability of a satisfactory. We reason that the symmetry information
given feature being classified as a particular space group is depends not on the absolute value of the PDF peak positions,
parametrized by a ‘logistic function’ (Hastie et al., 2009). which depend on specifics of the chemistry, but on their
Forty-five space groups are considered in our study; therefore relative positions. This information may be more apparent in
there are the same number of logistic functions, each with a set an autocorrelation of the PDF with itself, which is a quadratic
of parameters left to be determined. Since the space-group feature in ML language. Our quadratic feature, X2 , is defined
label is known for each data entry in the training set, the as
learning algorithm is then used to find an optimized set of
parameters for each of the 45 logistic functions such that the X2 ¼ fXi Xj ji; j ¼ 1; 2; . . . d; j > ig ð5Þ
overall probability of determining the correct space group on
all training data is maximized. As a common practice, we also where d is the dimension of X and X2 is a vector of dimension
include ‘regularization’ (Hastie et al., 2009) to reduce over- f½dðd 1Þ=2g 1. An example of the quadratic feature from
fitting in the trained model. The regularization scheme chosen Li18Ta6O24 (space group P2=c) is shown in Fig. 1(b).
in our implementation is ‘elastic net’ which is known for The best LR model with X2 as the input yields an accuracy
encouraging sparse selections on strongly correlated variables of 44.5% at ð; Þ ¼ ð105 ; 1:0Þ. This is much better than for
(Zou & Hastie, 2005). Two hyperparameters and are the linear feature, but still quite low. However, the goal of the
introduced under the context of our regularization scheme. space-group determination problem is to find the right space
The explicit definition of these two parameters is presented in group, not necessarily to have it returned in the top position in
Appendix A. Our LR model is implemented through scikit- a rank-ordered list of suggestions. We therefore define alter-
learn (Pedregosa et al., 2011). The optimum ; for our LR native accuracy (A6 ) that allows the correct space group to
model is determined by cross-validation (Hastie et al., 2009) in appear at any position in the top-6 space groups returned by
the training stage. the model. The values of Ai (i = 1; 2; . . . 6) and their first
discrete differences Ai = Ai Ai1 (i = 2; 3; . . . ; 6) of our
best LR model are shown in Fig. 2. We observed a more than
10% improvement in the alternative accuracy after consid-
ering top-2 predictions from the LR model (A2 ) and the
improvement (Ai ) diminishes monotonically when more
predictions are considered, as expected. A top-6 estimate
yields a good accuracy (77%) and this is still a small enough
number of space groups that could be tested manually in any
structure determination.
Figure 2
Accuracy in determining space group when top-i predictions are
considered (Ai ). The inset shows the first discrete differences (Ai =
Figure 1 Ai Ai1 ) when i predictions are considered. Blue represents the result
Example of (a) normalized PDF X and (b) its quadratic form X2 of of the logistic regression model with X2 and red is the result from the
compound Li18Ta6O24 (space group P2/c). convolutional neural network model.
636 Chia-Hao Liu et al. Determining space groups using machine learning Acta Cryst. (2019). A75, 633–643
research papers
The performance of a CNN depends on the overall archi-
tecture as well as the choice of hyperparameters such as the
size of kernels, the number of channels at each convolutional
layer, the pooling size and the dimension of the fully
connected (FC) layer (Goodfellow et al., 2016). However there
is no well-established protocol for selecting these parameters,
which is a largely trial-and-error effort for any given problem.
We build our CNN by tuning hyperparameters and validating
the performance on the testing data, which is just 20% of the
total data.
The resulting CNN built for the space-group determination
problem is illustrated in Fig. 4.
The input PDF is a one-dimensional signal sequence of
dimension 209 1 1. We first apply a convolution layer of
256 channels with kernel size 32 1 to extract the first set of
feature maps (Lecun et al., 1998) of dimension 209 1 256.
It has been shown that applying a nonlinear activation func-
tion to each output improves not only the ability of a model to
learn complex decision rules but also the numerical stability
during the optimization step (LeCun et al., 2015). We chose
rectified linear unit (ReLU) (Dahl et al., 2013) as our activa-
tion function for the network. After the first convolution layer,
we apply a 64-channel kernel of size 32 1 to the first feature
map and generate the second set of feature maps of dimension
209 1 64. Similar to the first convolution layer, the second
feature map is also activated by ReLU. This is followed by a
max-pooling layer (Jarrett et al., 2009) of size 2, which is
applied to reduce overfitting. After the subsampling process in
Figure 3
The ratio of correctly classified structures versus space-group number the max-pooling layer, the output is of size 104 1 64 and it
from (a) the logistic regression model (LR) with quadratic feature X2 and is then flattened to a size of 6556 1 before two fully
(b) the convolutional neural network (CNN) model. Marker size reflects connected layers of size 128 and 45 are applied. The first FC
the relative frequency of the space group in the training set. Markers are layer is used to further reduce the dimensionality of output
color coded with corresponding crystal systems [triclinic (dark blue),
monoclinic (orange), orthorhombic (green), tetragonal (blue), trigonal from the max-pooling layer and it is activated with ReLU. The
(gray), hexagonal (yellow) and cubic (dark red)]. second FC layer is activated with the softmax function
(Goodfellow et al., 2016) to output the probability of the input
PDF being one of the 45 space groups considered in our study.
The ratio of correctly classified structures versus space- Categorical cross entropy loss (Bishop, 2006) is used for
group number is shown Fig. 3(a). training our model. It is apparent from Table 1 that the
The space-group numbering follows standard convention number of data entries in each space group are not evenly
(Hahn, 2002). Higher space-group number means a more distributed, varying from 373 (I42d) to 7392 (P21 =c) per space
symmetric structure and we find, in general, the LR model group. We would like to avoid the possibility of obtaining a
yields a decent performance in predicting space groups from neural network that is biased towards space groups with
structures with high symmetry but it performs poorly on abundant data entries. To mitigate the effect of the unbalanced
classifying low-symmetry structures. data set, loss from each training sample is multiplied by a class
weight (King & Zeng, 2001) which is the inverse of the ratio
between the number of data entries from the same space-
3.2. Space-group determination based on the convolutional group label in the training sample and the size of the entire
neural network (CNN) training set. We then use adaptive moment estimation (Adam)
The result from the linear ML model (LR) is promising, (Kingma & Ba, 2014) as the stochastic optimization method to
prompting us to move to a more sophisticated deep learning train our model with a mini-batch size of 64. During the
model. Deep learning models (LeCun et al., 2015; Goodfellow training step, we follow the same protocol outlined in the work
et al., 2016) have been successfully applied to various fields, of He et al. (2016) to perform the weight initialization (He et
ranging from computer vision (He et al., 2016; Krizhevsky et al., 2015) and batch normalization (Ioffe & Szegedy, 2015). A
al., 2012; Radford et al., 2015), natural language processing dropout strategy (Srivastava et al., 2014) is also applied in the
(Bahdanau et al., 2014; Sutskever et al., 2014; Kim, 2014) to pooling layer to reduce overfitting in our neural network. The
material science (Ramprasad et al., 2017; Ziletti et al., 2018). In parameters in the CNN model are iteratively updated through
particular, we sought to use a CNN (Lecun et al., 1998). the stochastic gradient descent method (Adam).
Acta Cryst. (2019). A75, 633–643 Chia-Hao Liu et al. Determining space groups using machine learning 637
research papers
Figure 4
Schematic of our convolutional neural network (CNN) architecture.
Learning rate is a parameter that affects how drastically the optimization loss all plateau, meaning no significant
parameters are updated at each iteration. A small learning improvement to the model would be gained with further
rate is preferable when the parameters are close to some set of updates to the parameters.
optimal values and vice versa. Therefore, an appropriate Our CNN model is implemented with Keras (Chollet et al.,
schedule of learning rate is crucial for training a model. Our 2015) and trained on a single Nvidia Tesla K80 GPU.
training starts with a learning rate of 0.1, and the value is Under the architecture and training protocol discussed
reduced by a factor of 10 at epochs 81 and 122. With the above, our best CNN model yields an accuracy of 70.0% from
learning rate schedule described, the optimization loss against top-1 prediction and 91.9% from top-6 predictions, which
the testing set, along with the prediction accuracy on the outperforms the LR model by 15%. Similarly, from Fig. 2, we
training and testing sets, are plotted with respect to the observe a more than 10% improvement in the alternative
number of epochs in Fig. 5. Our training is terminated after accuracy after considering top-2 predictions (A2 ) in the
164 epochs when the training accuracy, testing accuracy and CNN model and the improvement (Ai ) decreases mono-
tonically, even on a more drastic trend than the case of the LR
model, when more predictions are considered.
638 Chia-Hao Liu et al. Determining space groups using machine learning Acta Cryst. (2019). A75, 633–643
research papers
samples. However, from Fig. 3, it is clear that the LR model trend towards increase in the prediction ability as the
even fails to identify well-represented space groups across all symmetry increases, and there are outliers, but there seems to
space-group numbers. On the other hand, a positive correla- be a trend that the CNN model is better at predicting space
tion between the size of the training data and the classification groups for more highly populated space groups.
ratio is observed in the CNN model. Furthermore, except for The confusion matrix (Stehman, 1997) is a common tool to
space group Ia3d, which is the most symmetric space group, assess the performance of an ML model. The confusion
the classification ratios on the rarely seen groups are lower matrix, M, is an N-by-N matrix, where N is the number of
than the well-represented groups in our CNN model. labels in the data set. The rows of M identify the true label
However, the main result is that the CNN performs signifi- (correct answer) and the columns of M mean the label
cantly better than the LR model for all space groups, espe- predicted by the model. The numbers in the matrix are the
cially on structures with lower symmetry. There is an overall proportion of results in each category. For example, the
Figure 6
The confusion matrix of our CNN model. The row labels indicate the correct space group and the column labels the space group returned by the model.
An ideal model would result in a confusion matrix with all diagonal values being 1 and all off-diagonal values being zero. The numbers in parentheses are
the space-group number.
Acta Cryst. (2019). A75, 633–643 Chia-Hao Liu et al. Determining space groups using machine learning 639
research papers
Table 3
Top-6 space-group predictions from the CNN model on experimental PDFs.
Entries in bold are the most probable space group from existing literature listed in the References column. More than one prediction are highlighted when these
space groups are regarded as highly similar in the literature. Details about these cases are discussed in the text. The Note column specifies if the PDF is from a
crystalline (C) or nanocrystalline (NC) sample. The experimental data were collected under various instrumental conditions which are not identical to the training
set and experimental data were measured at room temperature, unless otherwise specified.
Sample 1st 2nd 3rd 4th 5th 6th References Note
Ni Fm3m Pm3m Fd3m F43m P4=mmm P63 =mmc Owen & Yates (1936) C
Fe3O4 Fd3m I41 =amd R3m Fm3m F43m P63 =mmc Fleet (1981) C
CeO2 Fm3m Fd3m Pm3m F43m Pa3 P4=mmm Yashima & Kobayashi (2004) C
Sr2IrO4† Fm3m P6=mmm P63 =mmc Pm3m Fd3m R3m Huang et al. (1994), Shimura et al. (1995) C
CuIr2S4 Fd3m Fm3m F43m R3m Pm3m R3m Furubayashi et al. (1994) C
CdSe† P21 =c P1 C2=c Pnma Pna21 P21 21 21 Masadeh et al. (2007) C
IrTe2 C2=m P3m1 P21 =c P1 P21 =m C2=c Matsumoto et al. (1999), Yu et al. (2018) C
IrTe2@10 K C2=m P63 =mmc P6=mmm P4=mmm P1 P21 =c Matsumoto et al. (1999), Toriyama et al. (2014) C
Ti4O7 P1 C2=c P21 =c C2=m Pnnm P42 =mnm Marezio & Dernier (1971) C
MAPbI3@130 K P1 P21 =c C2=c P21 21 21 Pnma Pna21 Swainson et al. (2003) C
MoSe2 P63 =mmc R3m R3m P63 mc P4=mmm Fd3m James & Lavik (1963) C
TiO2 (anatase) I41 =amd C2=m P21 =m C2=c P1 P21 =c Horn et al. (1972) NC
TiO2 (rutile) P42 =mnm C2=m P21 =c P1 P21 =m Pnma Baur & Khan (1971) NC
Si† P63 mc I42d R3m C2=c P1 Pbca Rohani et al. (2019) NC
BaTiO3 R3m P4=mmm C2=m P63 =mmc Pnma Cmcm Kwei et al. (1993), Page et al. (2010) NC
† Indicates where the CNN model fails to predict the correct space group.
diagonal elements indicate the proportion of outcomes where related by the group/subgroup relationship. However, we did
the correct label was predicted in each case, and the matrix not implement this kind of hierarchical model in our study.
element in the Fd3m row and the F43m column (value 0.05) is
the proportion of PDFs from an Fd3m space-group structure
that were incorrectly classified as being in space group F43m. 4.2. Space-group determination on experimental PDFs
For an ideal prediction model, the diagonal elements of the The CNN model is used to determine the space group of 15
confusion matrix should be 1.0 and all off-diagonal elements experimental PDFs and the results are reported in Table 3. For
would be zero. The confusion matrix from our CNN model is each experimental PDF, structures are known from previous
shown Fig. 6. studies which are also referenced in the table. Both crystalline
We observe ‘teardrop’ patterns in the columns of P1, P21 =c (C) and nanocrystalline (NC) samples with a wide range of
and Pnma, meaning the CNN model tends to incorrectly structural symmetries are covered in this set of experimental
assign a wide range of space groups into these groups. On the PDFs. It is worth noting that the sizes of the NC samples
surface, this behavior is worrying but the confusions actually chosen are roughly equal to or larger than 10 nm, at which
correspond to the real group–subgroup relation which has size, in our measurements, the PDF signal from the NC
been known and tabulated in the literature (Ascher et al., material falls off roughly at the same rate as that from crys-
1969; Boyle & Lawrenson, 1972; Hahn, 2002). For the case of talline PDFs in the training set. Every experimental PDF is
P1, the major confusion groups (P21 =c, C2=c and P2=c) are in subject to experimental noise and collected under various
fact minimal non-isomorphic supergroups of P1. Moreover, instrumental conditions that result in aberrations to the PDF
P21 21 21 shares the same subgroup (P21 ) with P21 =c and Pbca that are not identical to parameter values used to generate our
is a supergroup of P21 21 21 while Pbcn is a supergroup of training set (Table 2). It is therefore expected that the CNN
P21 =c. Similar reasoning can be applied to the case of P21 =c classifier will work less well than on the testing set. From Table
and Pnma as well. The statistical model appears to be picking 3, it is clear that the CNN model yields an overall satisfactory
up some real underlying mathematical relationships. result in determining space groups from experimental data
We also investigate the cases with low classification accu- with the space group from 12 out of 15 test cases properly
racy (low value in diagonal elements) from the CNN model. identified in the top-6 predictions.
P21 is the group with the lowest accuracy (27%) among all Here we comment on the performance of the CNN. In the
labels. The similar group–subgroup reasoning holds for this cases of IrTe2 at 10 K, the material has been reported in the
case as well. P21 =c (32% error rate) is, again, a supergroup of literature in both C2=m and P1 space groups (Matsumoto et
P21 and C2=c (10% error rate) is a supergroup of P21 =c. The al., 1999; Toriyama et al., 2014), and it is not clear which is
same reasoning holds for other confusion cases and we will not correct. The CNN returned both space groups in the top-6.
explicitly go through it here, but this suggests that these Furthermore, for data from the same sample at room
closely group/subgroup-related space groups should also be temperature, the CNN model identifies not only the correct
considered whenever the CNN model returns another one in space group (P3m1), but also the space groups that the
the series. It is possible to train a different CNN model which structure will occupy below the low-temperature symmetry-
focuses on disambiguating space groups that are closely lowering transition (C2=m, P1). For the case of BaTiO3
640 Chia-Hao Liu et al. Determining space groups using machine learning Acta Cryst. (2019). A75, 633–643
research papers
nanoparticles, the CNN model identifies two space groups that denote the space group of the mth structure as km where
are considered in the literature to yield rather equivalent km 2 f1; 2; . . . Kg, our complete set of space groups. In the
explanatory power (R3m, P4=mmm) (Kwei et al., 1993; Page et setup of the LR model, the probability of a feature xm of
al., 2010). It is encouraging that the CNN appears to be getting dimension d, which is a computable from the mth structure,
the physics right in these cases. belonging to a specific space group km is parametrized as
Investigating the failing cases from the CNN model (entries Pd
with a dagger in Table 3) also reveals insights into the decision exp k0 m þ i¼1 ki m xm;i
Prðkm jxm ; km Þ ¼ Pd ; ð6Þ
rules learned by the model. Sr2IrO4 was firstly identified as a
1 þ exp k0 m þ i¼1 ki m xm;i
perovskite structure with space group I4=mmm (Randall et al.,
1957), but later work pointed out that a lower-symmetry group k k k
where km ¼ f0 m ; 1 m ; . . . ; dm g is a set of parameters to be
I41 =acd is more appropriate due to correlated rotations of determined. The index km runs from 1 to 45 which corresponds
the corner-shared IrO6 octahedra about the c axis (Huang et to the total number of space groups considered in our study.
al., 1994; Shimura et al., 1995). There is a long-wavelength Since the space group k and feature x are both known for the
modulation of the rotations along the c axis resulting in a training data, the learning algorithm is then used to find an
supercell with a five-times expansion along that direction (a = optimized set of ¼ fkm : km ¼ 1; 2; . . . ; Kg which maxi-
5.496, c = 25.793 Å). The PDF will not be sensitive to such a mizes the overall probability of determining the correct space
long-wavelength superlattice modulation which may explain group Prðkm jxm ; km Þ on all M training data.
why the model does not identify a space group close to the For each of the M structures, there will be a binary result for
I41 =acd space group, reflecting additional symmetry breaking classification: either the space-group label is correctly classi-
due to the supermodulation. It is not completely clear what fied or not. This process can be regarded as M independent
the space group would be for the rotated octahedra without Bernoulli trials. The probability function for a single Bernoulli
the supermodulation, so we are not sure if this space group is trial is expressed as
among the top-6 that the model found. m
Somewhat surprisingly the CNN fails to find the right space f ðkm jxm ; b km Þ ¼ Prðkm jxm ; b km Þ
group for wurtzite CdSe, which is a very simple structure, but 1 m
1 Prðkm jxm ; b km Þ ; ð7Þ
rather finds space groups with low symmetries. One possible
reason is that we know there is a high degree of stacking where is an indicator. m ¼ 1 if the space-group label km is
faulting in the bulk CdSe sample that was measured. This was correctly predicted and m ¼ 0 if the prediction is wrong.
best modeled as a phase mixture of wurtzite (space group Since each classification is independent, the joint probability
P63 mc) and zinc-blende (space group F43m) (Masadeh et al., function for M classifications on the space-group label,
2007). The prediction of low-symmetry groups might reflect fM ðKjx; bÞ, is written as
the fact the underlying structure cannot be described with a Y
M
single space group. fM ðKjx; bÞ ¼ f ðkm jxm ; b km Þ; ð8Þ
m¼1
Acta Cryst. (2019). A75, 633–643 Chia-Hao Liu et al. Determining space groups using machine learning 641
research papers
Table 4 Altomare, A., Campi, G., Cuocci, C., Eriksson, L., Giacovazzo, C.,
Accuracies of the CNN model with different sets of hyperparameters. Moliterni, A., Rizzi, R. & Werner, P.-E. (2009). J. Appl. Cryst. 42,
768–775.
The last row specifies the optimum set of hyperparameters for our final CNN
model. Ascher, E., Gramlich, V. & Wondratschek, H. (1969). Acta Cryst. B25,
2154–2156.
No. Top-1 Top-6 Bahdanau, D., Cho, K. & Bengio, Y. (2014). arXiv:1409.0473 [cs.CL].
No. Kernel hidden No. accuracy accuracy Baur, W. H. & Khan, A. A. (1971). Acta Cryst. B27, 2133–2139.
filters size units ensembles (%) (%) Belsky, A., Hellenbrandt, M., Karen, V. L. & Luksch, P. (2002). Acta
128, 32 24 128 2 64.1 90.7 Cryst. B58, 364–369.
256, 64 24 128 2 68.6 91.6 Billinge, S. J. L., Duxbury, P. M., Gonalves, D. S., Lavor, C. &
64, 64 24 128 2 67.4 91.1 Mucherino, A. (2018). Ann. Oper. Res. pp. 1–43.
128, 64 32 128 2 69.0 91.7 Billinge, S. J. L. & Levin, I. (2007). Science, 316, 561–565.
128, 64 16 128 2 66.6 91.3 Bishop, C. M. (2006). Pattern Recognition and Machine Learning
128, 64 24 256 2 69.2 91.6 (Information Science and Statistics). New York: Springer-Verlag,
128, 64 24 64 2 66.4 91.2 Inc.
128, 64 24 128 1 65.7 91.1 Boultif, A. & Louër, D. (2004). J. Appl. Cryst. 37, 724–731.
128, 64 24 128 3 68.2 91.6
Boyle, L. L. & Lawrenson, J. E. (1972). Acta Cryst. A28, 489–493.
256, 64 32 128 3 70.0 91.9
Choi, J. J., Yang, X., Norman, Z. M., Billinge, S. J. L. & Owen, J. S.
(2014). Nano Lett. 14, 127–133.
Chollet, F., et al. (2015). Keras. https://keras.io.
Cliffe, M. J., Dove, M. T., Drabold, D. A. & Goodwin, A. L. (2010).
where kk and kk22 stand for the L1 and L2 norm (Horn, 2012),
Phys. Rev. Lett. 104, 125501.
respectively. Two hyperparameters and are introduced Coelho, A. A. (2003). J. Appl. Cryst. 36, 86–95.
under this regularization scheme. is a hyperparameter that Coelho, A. A. (2017). J. Appl. Cryst. 50, 1323–1330.
determines the overall ‘strength’ of the regularization and Dahl, G. E., Sainath, T. N. & Hinton, G. E. (2013). 2013 IEEE
governs the relative ratio between L1 and L2 regularization International Conference on Acoustics, Speech and Signal Proces-
sing, pp. 8609–8613.
(Zou & Hastie, 2005). Describing the detailed steps in opti-
Egami, T. & Billinge, S. J. L. (2012). Underneath the Bragg Peaks:
mizing equation (11) is beyond the scope of this paper, but Structural Analysis of Complex Materials, 2nd ed. Amsterdam:
they are available in most of the standard ML reviews (Hastie Elsevier.
et al., 2009; Bishop, 2006). Farrow, C. L. & Billinge, S. J. L. (2009). Acta Cryst. A65, 232–239.
Farrow, C. L., Juhás, P., Liu, J., Bryndin, D., Božin, E. S., Bloch, J.,
Proffen, T. & Billinge, S. J. L. (2007). J. Phys. Condens. Matter, 19,
APPENDIX B 335219.
Fleet, M. E. (1981). Acta Cryst. B37, 917–920.
Robustness of the CNN model Furubayashi, T., Matsumoto, T., Hagino, T. & Nagata, S. (1994). J.
The classification accuracies from CNN models with different Phys. Soc. Jpn, 63, 3333–3339.
sets of hyperparameters, such as number of filters, kernel size Giacovazzo, C. (1999). Direct Phasing in Crystallography: Funda-
mentals and Applications, 1st ed. Oxford University Press/
and pooling size, are reproduced in Table 4. The classification International Union of Crystallography.
accuracy only varies modestly across different sets of hyper- Goodfellow, I., Bengio, Y. & Courville, A. (2016). Deep Learning.
parameters and this implies the robustness of our CNN MIT Press.
architecture. We determined the desired architecture of our Hahn, T. (2002). International Tables for Crystallography, Vol. A:
Space-group Symmetry, 5th ed. Dordrecht: Springer.
CNN model based on the classification accuracy on the testing
Hastie, T., Tibshirani, R. & Friedman, J. (2009). The Elements of
set and the learning curves (loss, training accuracy and testing Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.
accuracy) reported in Fig. 5. Springer Series in Statistics. New York: Springer-Verlag.
He, K., Zhang, X., Ren, S. & Sun, J. (2015). Proceedings of the IEEE
Funding information International Conference on Computer Vision, pp. 1026–1034.
He, K., Zhang, X., Ren, S. & Sun, J. (2016). Computer Vision – ECCV
Funding for this research was provided by: National Science 2016, edited by B. Leibe, J. Matas, N. Sebe & M. Welling, Lecture
Foundation, Division of Materials Research (grant No. Notes in Computer Science, pp. 630–645. New York: Springer
1534910); National Science Foundation, Division of Mathe- International Publishing.
Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A., Jaitly, N.,
matical Sciences (grant No. 1719699); National Science Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N. & Kingsbury,
Foundation, Division of Computing and Communication B. (2012). IEEE Signal Process. Mag. 29, 82–97.
Foundations (grant No. 1704833). X-ray PDF measurements Horn, M., Schwerdtfeger, C. F. & Meagher, E. P. (1972). Z.
were conducted on beamline 28-ID-1 (PDF) and 28-ID-2 Kristallogr. 136, 273–281.
(XPD) of the National Synchrotron Light Source II, a US Horn, R. A. (2012). Matrix Analysis, 2nd ed. New York: Cambridge
University Press.
Department of Energy (DOE) Office of Science User Facility Huang, Q., Soubeyroux, J. L., Chmaissem, O., Sora, I. N., Santoro, A.,
operated for the DOE Office of Science by Brookhaven Cava, R. J., Krajewski, J. J. & Peck, W. F. (1994). J. Solid State
National Laboratory under Contract No. DE-SC0012704. Chem. 112, 355–361.
Ioffe, S. & Szegedy, C. (2015). arXiv:1502.03167 [cs.LG].
James, G., Witten, D., Hastie, T. & Tibshirani, R. (2013). An
References Introduction to Statistical Learning, Vol. 103 of Springer Texts in
Altomare, A., Camalli, M., Cuocci, C., Giacovazzo, C., Moliterni, A. Statistics. New York: Springer New York.
& Rizzi, R. (2009). J. Appl. Cryst. 42, 1197–1202. James, P. B. & Lavik, M. T. (1963). Acta Cryst. 16, 1183.
642 Chia-Hao Liu et al. Determining space groups using machine learning Acta Cryst. (2019). A75, 633–643
research papers
Jarrett, K., Kavukcuoglu, K., Ranzato, M. & LeCun, Y. (2009). 2009 Peterson, P. F., Božin, E. S., Proffen, Th. & Billinge, S. J. L. (2003). J.
IEEE 12th International Conference on Computer Vision, pp. 2146– Appl. Cryst. 36, 53–64.
2153. Proffen, T., Page, K. L., McLain, S. E., Clausen, B., Darling, T. W.,
Juhás, P., Cherba, D. M., Duxbury, P. M., Punch, W. F. & Billinge, S. J. L. TenCate, J. A., Lee, S.-Y. & Ustundag, E. (2005). Z. Kristallogr. 220,
(2006). Nature, 440, 655–658. 1002–1008.
Juhás, P., Granlund, L., Gujarathi, S. R., Duxbury, P. M. & Billinge, Radford, A., Metz, L. & Chintala, S. (2015). arXiv:1511.06434
S. J. L. (2010). J. Appl. Cryst. 43, 623–629. [cs.LG].
Keen, D. A. & Goodwin, A. L. (2015). Nature, 521, 303–309. Ramprasad, R., Batra, R., Pilania, G., Mannodi-Kanakkithodi, A. &
Kim, Y. (2014). arXiv:1408.5882 [cs.CL]. Kim, C. (2017). NPJ Comput. Mater. 3, 54.
King, G. & Zeng, L. (2001). Polit. Anal. 9, 137–163. Randall, J. J., Katz, L. & Ward, R. (1957). J. Am. Chem. Soc. 79, 266–
Kingma, D. P. & Ba, J. (2014). arXiv:1412.6980 [cs.LG]. 267.
Krizhevsky, A., Sutskever, I. & Hinton, G. E. (2012). Advances in Rohani, P., Banerjee, S., Ashrafi-Asl, S., Malekzadeh, M., Shahbazian-
Neural Information Processing Systems 25, edited by F. Pereira, Yassar, R., Billinge, S. J. L. & Swihart, M. T. (2019). Adv. Funct.
C. J. C. Burges, L. Bottou & K. Q. Weinberger, pp. 1097–1105. Red Mater. 29, 1807788.
Hook, New York, USA: Curran Associates, Inc. Shimura, T., Inaguma, Y., Nakamura, T., Itoh, M. & Morii, Y. (1995).
Kwei, G. H., Lawson, A. C., Billinge, S. J. L. & Cheong, S.-W. (1993). J. Phys. Rev. B, 52, 9143–9146.
Phys. Chem. 97, 2368–2377. Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A.,
LeCun, Y., Bengio, Y. & Hinton, G. (2015). Nature, 521, 436–444. Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y.,
Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. (1998). Proc. IEEE, 86, Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T. &
2278–2324. Hassabis, D. (2017). Nature, 550, 354–359.
Marezio, M. & Dernier, P. D. (1971). J. Solid State Chem. 3, 340– Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. &
348. Salakhutdinov, R. (2014). J. Mach. Learn. Res. 15, 1929–1958.
Markvardsen, A. J., Shankland, K., David, W. I. F., Johnston, J. C., Stehman, S. V. (1997). Remote Sens. Environ. 62, 77–89.
Ibberson, R. M., Tucker, M., Nowell, H. & Griffin, T. (2008). J. Sutskever, I., Vinyals, O. & Le, Q. V. (2014). Advances in Neural
Appl. Cryst. 41, 1177–1181. Information Processing Systems 27, edited by Z. Ghahramani, M.
Masadeh, A. S., Božin, E. S., Farrow, C. L., Paglia, G., Juhás, P., Welling, C. Cortes, N. D. Lawrence & K. Q. Weinberger, pp. 3104–
Billinge, S. J. L., Karkamkar, A. & Kanatzidis, M. G. (2007). Phys. 3112. Red Hook, New York, USA: Curran Associates, Inc.
Rev. B, 76, 115413. Swainson, I. P., Hammond, R. P., Soullière, C., Knop, O. & Massa, W.
Matsumoto, N., Taniguchi, K., Endoh, R., Takano, H. & Nagata, S. (2003). J. Solid State Chem. 176, 97–104.
(1999). J. Low Temp. Phys. 117, 1129–1133. Toriyama, T., Kobori, M., Konishi, T., Ohta, Y., Sugimoto, K., Kim, J.,
Mighell, A. D. & Santoro, A. (1975). J. Appl. Cryst. 8, 372–374. Fujiwara, A., Pyon, S., Kudo, K. & Nohara, M. (2014). J. Phys. Soc.
Neumann, M. A. (2003). J. Appl. Cryst. 36, 356–365. Jpn, 83, 033701.
Owen, E. & Yates, E. (1936). London Edinb. Dubl. Philos. Mag. J. Urusov, V. S. & Nadezhina, T. N. (2009). J. Struct. Chem. 50, 22–37.
Sci. 21, 809–819. Visser, J. W. (1969). J. Appl. Cryst. 2, 89–95.
Page, K., Proffen, T., Niederberger, M. & Seshadri, R. (2010). Chem. Wolff, P. M. de (1957). Acta Cryst. 10, 590–595.
Mater. 22, 4386–4391. Yashima, M. & Kobayashi, S. (2004). Appl. Phys. Lett. 84, 526–528.
Park, W. B., Chung, J., Jung, J., Sohn, K., Singh, S. P., Pyo, M., Shin, N. Yu, R., Banerjee, S., Lei, H. C., Sinclair, R., Abeykoon, M., Zhou,
& Sohn, K.-S. (2017). IUCrJ, 4, 486–494. H. D., Petrovic, C., Guguchia, Z. & Bozin, E. (2018). Phys. Rev. B,
Pecharsky, V. K. & Zavalij, P. Y. (2005). Fundamentals of Powder 97, 174515.
Diffraction and Structural Characterization of Materials. New York, Ziletti, A., Kumar, D., Scheffler, M. & Ghiringhelli, L. M. (2018). Nat.
USA: Springer. Commun. 9, 2775.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Zobel, M., Neder, R. B. & Kimber, S. A. J. (2015). Science, 347, 292–
Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., 294.
Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M. Zou, H. & Hastie, T. (2005). J. R. Stat. Soc. Ser. B Stat. Methodol. 67,
& Duchesnay, E. (2011). J. Mach. Learn. Res. 12, 2825. 301–320.
Acta Cryst. (2019). A75, 633–643 Chia-Hao Liu et al. Determining space groups using machine learning 643