Automatic Prediction and Linguistic
Interpretation of Chinese Directional
Complements Based on BERT Model
Young Hoon Jeong, Ming Yue Li, Su Min Kang, Yun Kyung Eum,
and Byeong Kwu Kang(&)
Sogang University, Seoul, Korea
{boychaboy,kbg43}@sogang.ac.kr,
{sinabeurolmy,kksm9801}@naver.com
Abstract. The Chinese directional complement is one of the trickiest concepts
for second language learners due to their derivative meanings. In particular, 出
来, 起来, 下来, and 下去 are easy to be confused. This study aims to gain
grammatical and educational insights for these complements with a neural
network model. This study fine-tuned the Chinese BERT model with Chinese
directional complement data composed of sentences containing the above four
components used in literature, media, and textbook. By measuring these fine-
tuned models’ accuracy, we show how accurately and efficiently the neural
network model predicts Chinese directional complements. Furthermore, we
interpret and analyze the model’s decision using the Sampling and Occlusion
algorithm and visually present which components of the sentence influence the
choice for complements.
Keywords: Chinese directional complement BERT Transfer learning
Explainable AI Sampling and occlusion
1 Introduction
In recent natural language processing, neural network models are one of the most
popular methodologies. Neural network models use contextual information of words
acquired by learning large amounts of text data, as we can see from the mechanism of
ELMo, GPT, and BERT. These algorithms are used in various fields because they are
superior to other models in efficiency and accuracy. Taking BERT as an example, this
language model far surpasses the existing model in 11 items such as POS tagging,
NER, dependency parsing, and QA. [1] The latest neural network models have reached
a level that helps researchers work in NLP and linguistic research. There is an
increasing need to utilize neural models in Chinese grammar research and Teaching
Chinese as a Second Language (TCSL) more actively in this situation.
The purpose of this study is to explore a method of automatically predicting
Chinese Directional Complement (趋向补语: CDC) using the BERT model. Further-
more, we would like to explore how to use BERT in Chinese grammar research and
TCSL. The core themes of this study are the following two.
© Springer Nature Switzerland AG 2022
M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 405–416, 2022.
https://doi.org/10.1007/978-3-031-06703-7_31
406 Y. H. Jeong et al.
First, how accurately and efficiently can a neural network language model predict
and classify CDC? Second, what clues or weights are used in inferring CDC? The
former is a task to check the performance of neural models, and the latter is to attempt
an interpretation of neural models.
In this study, it is CDC (Chinese Directional Complement) that we want to consider
using neural network models. The CDC was chosen as the subject of study because it is
important in grammar research and language education, but it is hard to grasp its
meaning and function. CDC is a multifunctional sentence component that is located
after a verb and represents various meanings. CDC is not only used frequently but also
represents various meanings from lexical meaning to grammatical meaning. However,
in terms of TCSL, CDC is essential but difficult to understand completely. In phrases
such as 坚持下来 (hold + come down), 坚持下去 (hold + go down), 想出来(think +
come out) and 想起来(think + get up), CDC represents the resultative and stative
status of an action rather than a specific direction of movement. Several previous
studies revealed that the CDC is a challenging sentence component to learn. According
to the results of [2, 3], and [4], there are a large number of second language learners
who misused the CDC. Among all incorrect answers, the proportion of CDC was the
highest. In terms of TCSL, it is necessary to teach the grammatical functions of the
CDC effectively.
2 Related Work and Our Framework
The core methodology used in this study is a transfer learning model based on BERT.
BERT. BERT (Bidirectional Encoder Representations from Transformer) is a deep
learning algorithm that learns language models based on the transformer encoder.
BERT learns not only words but whole sentences, so it has an excellent ability to
capture the meaning and grammatical features. The BERT model has a flexible
structure and can sufficiently learn contextual information at the sentence level.
BERT uses a masked language model to learn sentences. A masked language
model is a learning method that predicts the next word based on a given word
sequence. As shown in the figure below, the model learns contextual information by
repeating the process of predicting which word will appear at the corresponding mask
position by putting a mask in the middle of the sentence. BERT pursues a bi-directional
approach. The whole sentence is observed from both directions to learn the contextual
information. BERT models that use bi-directional encoders have better embedding
quality than unidirectional encoders [5] (Fig. 1).
Fig. 1. Different learning methods of masked language models.
Automatic Prediction and Linguistic Interpretation of Chinese Directional Complements 407
Transfer Learning. Transfer learning is a deep learning technique that increases
efficiency by training newly constructed data after pre-training data. Transfer learning
aims to improve target learners’ performance on target domains by transferring the
knowledge in different but related source domains. The dependency on a large number
of target-domain data can be reduced for constructing target learners. Due to the broad
application prospects, transfer learning has become a popular and promising area in
machine learning [6].
Our Framework. The architecture of this study can be represented in the figure
below. As shown in the figure below, the transfer learning process for CDC prediction
is conducted by combining the existing Chinese BERT pre-trained data and the CDC
corpus. The learning process is accomplished by learning basic information from large-
scale data and then learning target data for CDC prediction and classification. Since the
pre-trained BERT data is mainly composed of encyclopedias and newspapers, various
CDC examples are relatively insufficient. Therefore, in this study, fine-tuning learning
was conducted by extracting CDC examples from literature works, broadcast scripts,
and Chinese textbooks with many colloquial expressions. Moreover, we performed the
task of improving performance through the transfer learning process (Fig. 2).
Fig. 2. Model architecture for CDC prediction.
3 Design of the Prediction Model for CDC
3.1 Data Processing
CDC Dataset. CDC Dataset consists of sentences with CDC, which we collected from
5,475 literary works, 108 broadcast scripts, and 290 Chinese textbooks.1 We chose
these sources for the unbiased and sufficient data and for our goal of this project to be
used for educational purposes. Since the pre-trained data tends to have more literary
expressions, we added colloquial expressions.
1
The raw data that we used were collected from the following sites, respectively.
Literary works: CCL Corpus of Peking University, http://ccl.pku.edu.cn:8080/ccl_corpus/.
Broadcast scripts: Media Language Corpus (媒体语言语料库), http://ling.cuc.edu.cn/RawPub/.
Chinese textbooks: Corpus of teaching Chinese as second language, http://www.aihanyu.org/.
408 Y. H. Jeong et al.
While collecting, we focused on four directional complements, which are 出来 (come
out), 起来 (get up), 下来 (come down) and 下去 (go down). These four complements
are used frequently, but they are very tricky in their use due to similarities between
them. They also have abundant derivative meanings, such as a resultative or stative
state. It is not easy for second-language learners to fully understand these comple-
ments’ functions because of these semantic features. It becomes clear with the HSK
Dynamic Composition Corpus2 data, where we can find out lots of misuse of the four
directional complements by second language learners.
After extracting sentences with one of these four complements, we had to remove
the sentences with tokens that are not used as directional complements among these
sentences. After roughly assorting with the Corpus Word Parser3, we checked the
remaining sentences one by one. For instance, in 她在他的陪伴下来到医院。 (He
accompanied her to the hospital.), 下来 (come down) is not used as a directional
complement, but Corpus Word Parser could not recognize it. These statements were
removed through regular expressions and NLPIR-Parser4. The final CDC Dataset
consists of 98,327 sentences, and we used 94,327 sentences for training data and 4,000
sentences for test data. For more accurate measurements, each data source was thor-
oughly distinguished from minimizing the similarity of context.
3.2 Model Analysis
In this study, we trained the following three pre-trained masked language models using
the CDC Dataset above. We used the BERT Classification Model, a traditional BERT
model with an additional single linear layer on the top, to classify four directional
complements.
The first pre-trained model we used is bert-base-chinese, which was presented in
the same year Google presented BERT for the first time. The bert-base-chinese model
was pre-trained on Chinese Wikipedia, which contains both Simplified and Traditional
Chinese, resulting in 0.4B words. The model consists of 12 layers, 768 hidden layers,
12 attention heads, and 110M parameters.
While bert-base-chinese model relies on character-based tokens for tokenization,
the updated version of the model: bert-wwm-ext, presented in [7] masks a complete
word for masking, which makes the model recover the whole word on Masked Lan-
guage Model (MLM). This model also uses extended training data, which is ten times
bigger than the Chinese Wikipedia, leading to high performance compared to the origin
models.
2
HSK Dynamic Composition Corpus covers the HSK composition papers of foreign exam takers from
1992 to 2005.
3
Corpus Word Parser is a parser providing word segmentation and Part-Of-Speech tagging, http://
www.aihanyu.org/.
4
The NLPIR system is multi-functional that supports Chinese word segmentation, English
tokenization, Part-Of-Speech (POS) tagging, named entity recognition, new word identification,
keywords extraction, and user-defined lexicon. NLPIR-ICTCLAS Home page http://ictclas.nlpir.org/
index_e.html.
Automatic Prediction and Linguistic Interpretation of Chinese Directional Complements 409
MacBERT-large, presented in [8], is our third pre-trained language model. It is a
modified version of the original masked language model, which gained high scores on
experiments conducted on various NLP tasks. MacBERT-large uses whole word
masking, similar word replacement, and N-gram masking, making it more competitive
than other masked language models.
With our CDC Dataset, we trained the three language models and used them to
fine-tune the classification layer to classify the four labels in the test sentences. (出来:
0, 起来: 1, 下来: 2, 下去: 3) Each model was trained with a learning rate of 5e-5, a
training batch size of 64, and 10 epochs.
4 Analysis of Accuracy Rate for Neural Network Models
Experiment Results. In this chapter, we will further analyze the test results of each
fine-tuned model. The following table shows the accuracies (Table 1).
Table 1. The accuracy of each CDC classification model.
Model Correct Multiple Wrong Accuracy
bert-base-chinese 86.8% 10.2% 3.0% 97.0%
BERT-wwm-ext 86.9% 10.2% 2.9% 97.1%
MacBERT-Large 87.7% 10.0% 2.3% 97.7%
As we can see from the table, each model’s accuracy was 86.8%, 86.9%, and 87.7%,
respectively. Despite the significant gap between the amount of pre-trained data in each
model, the overall accuracy seemed to have little difference. This not only represents
the quality of the train data determines the performance of the model but also proves
that our train data were of good quality.
Multiple Answers. We found out that about 10% of sentences in our data can have
multiple answers. There are two main reasons why sentences can have multiple
answers. First, some sentences have insufficient contextual information. For instance,
example (1) lacks contextual information about the exact direction, making two
complements above applicable despite semantic differences. Second, multiple direc-
tional complements have similar functions that make them all replaceable, as shown in
example (2).
(1) 你看他把碗拿[MASK]了。(label:下去; Prediction:出来)
(2) 我认为蔡英文接[MASK]还要面对更多事情。(label:下去; Prediction:下来)
Therefore, when it comes to evaluating sentences that even native speakers think
they can have multiple answers, it cannot be said that the answer predicted by BERT is
wrong. When multiple answers are considered correct, the models’ overall accuracy
increases about 10 percent point, MacBERT-large remaining the highest.
Wrong Answers. Unlike the previous samples of multiple answers, there are also
wrong answers, where predictions are not applicable in given sentences. These errors
410 Y. H. Jeong et al.
make up 3% of our total test sentences, and most of the cases are due to the co-
occurrence of verb and complements. As seen in the following sentences, the frequency
of “verb + label” is lower than that of “verb + prediction”. We also found out another
feature in these error sentences: They have implicative meanings.
(3) 这么亲热, 一天两天的还真混不[MASK]。(label:出来; Prediction:下去)
(4) 看见金字塔就有一种恒心, 一定要把二战熬[MASK], 人类有和平才有希望。
(label:下来; Prediction:下去)
In example (3), 混不出来 implies that the relationship between the two is very
close, and in example (4), 熬下来 implies that they will certainly endure difficult times.
Therefore, we can summarize that the low frequencies and implicative meanings
interfere with BERT’s prediction, which causes the occurrence of wrong answers.
Complement Accuracy. Table 2 below shows the individual scores within the four
directional complements. The accuracy was calculated with considering multiple
answers right. Among four complements, 起来 (get up) was the highest in its prediction
accuracy, followed by 出来 (come out), 下来 (come down), and 下去 (go down) in
descending order. We only attached the table of MacBERT-large here, but the ranking
of the four directional complements was the same in all three models.
Table 2. The accuracy of each CDC predicted by MacBERT-large
CDC Correct Multiple Wrong Accuracy
出来 91.7% 6.7% 1.6% 98.4%
起来 95.2% 4.1% 0.7% 99.3%
下来 87.5% 10.5% 2.0% 98.0%
下去 76.2% 18.9% 4.9% 95.1%
Total 87.7% 10.0% 2.3% 97.7%
5 How Does BERT Pay Attention to CDC Selection?
5.1 Model Analysis with Sampling and Occlusion
It is a challenging research area to interpret neural network models, and there are
various approaches to explain why the model has made certain decisions. To analyze
and interpret our CDC Classification models, we used the Sampling and Occlusion
(SOC) algorithm proposed by [9] because it enables hierarchical analysis and visual-
ization. SOC algorithm is a formal and general way to quantify each word’s importance
and phrase in a sentence. It outperforms the existing hierarchical explanation algo-
rithms such as agglomerative contextual decomposition (ACD) because it approxi-
mates the N-context independent importance by sampling multiple sentences. We
further visualized syntactic composition captured by models for linguistic analysis.
Training Models. We have trained four different binary classifiers, labeled true if
corresponding directional complement should replace the padding (e.g. [MASK]) token
in a sentence and false if not. The model architecture was changed from the previous
Automatic Prediction and Linguistic Interpretation of Chinese Directional Complements 411
four-label classifier to fit the SOC algorithm. We used the CDC corpus to train each
model. The pretrained model used was bert-base-chinese, considering the accuracy and
model size. Every model is trained with a learning rate of 5e−5, a training batch size of 64,
and 5 epochs. Test accuracy of each model is written in the following table (Table 3).
Table 3. The accuracy of each binary CDC classifiers
Directional complement Accuracy
出来 94.87%
起来 95.47%
下来 93.25%
下去 92.87%
Algorithmic Details. SOC algorithm is an extension of the input occlusion algorithm
[10], which calculates the importance of phrase p specific to an input sentence x. The
importance score is measured by observing the difference of prediction score by
replacing the phrase p with padding tokens, noted here as x_p.
However, the score measured by the input occlusion algorithm is dependent on the
context words in sentence x. SOC algorithm overcomes this limit by sampling N-
context surrounding the phrase p, obtained by the pre-trained language model. For the
language model, we have trained BiGRU language model with our train data.[11] We
have set N to 5, so five contexts have been replaced when generating a set of sentences
S. The final score is averaged over sampled sentences b x 2 S with the following for-
mulation, which is a simplified version of the formulation in [10]:
1 X
/ðp; xÞ ¼ ½scoreðb
x Þ scoreðb
x pÞ ð1Þ
jSj
bx2S
Using the SOC method for linguistic interpretation is our main contribution. After
measuring the importance score with the SOC algorithm, we used LTP [12] to parse the
sentence into words. The in-depth syntactic analysis was possible by looking at the
word-level score difference in sentence x. After each phrase’s score has been calcu-
lated, we have visualized the result by marking red for the most important phrase and
blue for the least important phrase. The example of the visualization of the hierarchical
SOC explanation is shown below (Fig. 3).
Fig. 3. The visualization of the Sampling and Occlusion algorithm result of the sentence 他拿起
一个红苹果, 继续吃了[MASK]。
412 Y. H. Jeong et al.
5.2 Linguistic Interpretation of CDC Selection Constraints
CDC is a subgroup of Chinese complements following the verb/adjective to indicate
direction movement, action result, and state change. As CDC is a sentence component
combined with the main predicate, the most important clue for CDC selection are verbs
and adjectives. However, in addition to verbs/adjectives, adverbs, prepositional phra-
ses, auxiliary verbs, and temporal expressions also play an essential role in CDC
selection [13].
Verbs and Adjectives. Verbs and adjectives are the words that the BERT model pays
the most attention to when predicting CDC. However, there are certain restrictions on
choice, depending on the type of verb and adjective. For example, 吃, 写, and 看 can
combine with various CDCs, but verbs such as 停, 站, and 贴 only conjugate with
certain CDCs. In particular, Chinese adjectives have more restrictions on their choice
than verbs. When certain adjectives are used as main predicates, it is easy to predict an
appropriate CDC. For example, adjectives with dynamic situations or positive semantic
prosody are often combined with 起来. Such is the case with 热闹起来 and 高兴起来.
Conversely, adjectives with a static situation or negative semantic prosody (安静, 平
静, 黑, 暗, etc.) are often combined with下来. When adjectives are used as main
predicates, they serve as an essential clue in predicting CDC. As shown in the example
below, in the BERT model, 热闹 or 安静 plays a positive role in CDC selection (the
closer to the red color, the more positive it is). Given these words, the BERT model
immediately predicts the appropriate CDC.
Adverbs. Various adverbs appear in CDC construction. Some adverbs (已经, 刚刚,
终于, etc.) correspond to motion events that have already been completed. Also, some
adverbs (一直, 渐渐, 慢慢地, etc.) correspond to the durative meaning of the action. In
many cases, the CDC’s predictability increases depending on whether an adverb is used
in a sentence. For example, 渐渐 (increasingly) means “greater in number or amount”
and CDC 起来 is very close semantically because it indicates the beginning of an
action or entering a new state. The adverb 继续 (continuously) means “event continues
for a while without stopping”. Therefore, it is appropriate to combine with 下去
expressing the continuation of the action. As in the example below, the BERT model
predicts the appropriate CDC based on a specific adverb (Fig. 4).
Fig. 4. Predictive weight of verbs, adjectives, and adverbs
Automatic Prediction and Linguistic Interpretation of Chinese Directional Complements 413
Prepositional Phrase. The prepositional phrase is a component representing the place
or time of an action. The prepositional phrase can serve as an important clue in
determining the direction of movement. In addition, 把 can be called a special
preposition (sometimes called a disposal marker), also influences CDC selection. In the
example below, we can see that prepositional phrases play a positive role in CDC
selection.
Auxiliary Verbs. Auxiliary verbs represent the speaker’s ability, will, and desire.
Auxiliary verbs mainly represent the future for which the action has not yet occurred.
Therefore, when these auxiliary verbs are used, there is a certain influence on the CDC
selection. If auxiliary verbs (能, 会, etc.) are used in a sentence, it is possible to
determine which CDC is appropriate later. As shown in the example below, the BERT
model used auxiliary verbs as a positive clue for CDC selection.
Temporal Expressions. Temporal expressions representing the past or the future also
play a role in CDC selection. CDC indicates the beginning, continuation, and com-
pletion of an action. The meaning is closely related to time. For example, 下来, which
represents the completion of an action, usually implies that it started from some point in
the past and was completed in the present. Therefore, in sentences in which 下来 is
used, temporal expressions indicating past situations are often observed. On the other
hand, 下去, which indicates the continuation of an action, generally describes a situ-
ation in which an action continues to a point in the future. Therefore, 下去 and future
markers have a semantic correspondence. This tendency is captured in the BERT
prediction model (Fig. 5).
Fig. 5. Predictive weight of prepositions, auxiliary verbs, and temporal expressions
414 Y. H. Jeong et al.
As shown above, when choosing a CDC, a neural network model such as BERT
does not simply predict based on the frequency of verbs or adjectives. In addition to the
verb, the CDC prediction is made by considering other components of the sentence.
When adverbs, auxiliary verbs, and temporal expressions are added to a sentence, the
predicted probability of the CDC changes and the type of CDC selected changes.
5.3 Using BERT Model for Educational Purpose
Using the BERT model to study the CDC’s grammatical function is meaningful itself,
but it can be more valuable through educational use. If the BERT model and SOC
algorithm are appropriately utilized, it will be possible to establish an application
system for learning and tutoring CDC.
Our methodology is well worth using when looking at the subject, narrowing it
down to the field of Teaching Chinese as a Second Language (TCSL). Korea is one of
the countries where the vast number of Chinese learners are distributed. Many sec-
ondary schools and universities in Korea teach Chinese as a second language, and the
number of Chinese literature departments established in universities is the second
largest in the world right after China. Korea is also the country where the largest
number of students in the world take the HSK test. These facts suggest that BERT-
based Tutor, which acquires advanced levels of Chinese grammar knowledge, can be
helpful.
Most of the students rely on Chinese instructors or textbooks. However, compared
to students’ demand, the number of Chinese tutors is lacking, and textbooks often do
not have enough examples. In particular, Chinese directional complements are easily
confused by Korean students due to their derivative meanings. Unfortunately,
instructors’ explanations are teacher-dependent, subjective, and time-consuming. Paper
textbooks also omit explanations for complex grammar points on CDC. Applying the
SOC explanation and visualization when learning CDC can help solve these problems
in two ways.
First, the BERT model can automatically give you a clue in choosing the right
CDC. Students can learn the relationship between specific clues and the CDC and
deepen their understanding of the use. Besides, using the visualization tool enables
more intuitive learning by arousing students’ interests.
Second, the ability to analyze various sentences beyond the scope of textbooks and
manuals will also help with efficient learning. Compared to the textbooks with limited
content, BERT is infinite in the context of big data and can better improve students’
knowledge and skills. The result of this research is like a CDC-related HSK question
bank book. It assists tutors in lessening their burden of giving good examples and
explanations to every student.
6 Conclusion
In this study, we investigated how accurately the BERT model can predict the CDC.
We also analyzed which words the BERT model uses as an essential clue in the CDC
inference process. According to the results of this study, it can be seen that the BERT
Automatic Prediction and Linguistic Interpretation of Chinese Directional Complements 415
model shows excellent performance in inferring distributional features and grammatical
relationships based on transfer learning. Results of experiments with four types of CDC
with different meanings and functions show that the accuracy rate of predictions is
relatively high. In addition, as a result of analysis using the SOC algorithm, we found
that the BERT model appropriately uses important clues to determine CDC in context.
We believe that this study is meaningful in NLP and provides insight into Chinese
grammar research or TCSL. If this methodology is utilized correctly, it will be possible
to establish an application system for Chinese grammar research and education. In
Neural network models, sufficient language data learning allows us to predict which
language expressions are more natural to use. Proper use of these advantages will give
us insight into Chinese grammatical functions. This Chinese grammar prediction sys-
tem will also help Chinese learners improve their skills by showing them what
expressions are grammatically correct.
Acknowledgments. This work was supported by the Ministry of Education of the Republic of
Korea and the National Research Foundation of Korea (NRF-2020S1A5A2A01045437).
References
1. Tenney, I., Das, D., Pavlick, E.: Bert rediscovers the classical NLP pipeline. In: Proceedings
of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4593–
4601 (2019)
2. Che, H.: Error Analysis of Chinese Complements Acquisition by Korean students. School of
Liberal Arts, Liaoning Normal University Doctoral dissertation (2006). (in Chinese). (车慧.
韩国留学生习得汉语补语的偏误分析. 辽宁师范大学文学院.)
3. Jung, E.: Difficulties and Strategies for Korean Students in Learning Chinese Grammar. East
China Normal University Doctoral dissertation (2010). (in Chinese)
4. Yang, Q.: Study on the Learning Method of Chinese Complement: Focusing on the error
analysis of Korean learners. Dong-A university Doctoral dissertation (2019). (in Korean)
5. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional
transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
6. Zhuang, F., et al.: A comprehensive survey on transfer learning. Proc. IEEE 109(1), 43–76
(2020)
7. Cui, Y., et al.: Pre-training with whole word masking for Chinese bert. arXiv preprint arXiv:
1906.08101 (2019)
8. Cui, Y., Che, W., Liu, T., Qin, B., Wang, S., Hu, G.: Revisiting pre-trained models for
chinese natural language processing. arXiv preprint arXiv:2004.13922 (2020)
9. Jin, X., Wei, Z., Du, J., Xue, X., Ren, X.: Towards hierarchical importance attribution:
explaining compositional semantics for neural sequence models. arXiv preprint arXiv:1911.
06194 (2020)
10. Li, J., Monroe, W., Jurafsky, D.: Understanding neural networks through representation
erasure. arXiv preprint arXiv:1612.08220 (2016)
11. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical
machine translation. arXiv preprint arXiv:1406.1078 (2014)
12. Che, W., Li, Z., Liu, T.: LTP: a Chinese language technology platform. In: Conference:
COLING 2010, 23rd International Conference on Computational Linguistics, Demonstra-
tions Volume, August 2010, Beijing, China, pp. 23–27 (2010)
416 Y. H. Jeong et al.
13. Kang, B.: Deep learning language model and Chinese grammar. J. Chin. Lit. 106 (2021).
The Society for Chinese Language and Literature. (in Korean) (강병규. 딥러닝 언어모델
과 중국어문법, 중국문학)
14. Han, Y., Zhong, M., Zhou, L., Zan, H.: Statistical analysis and automatic recognition of
grammatical errors in teaching Chinese as a second language. In: Hong, J.-F., Zhang, Y.,
Liu, P. (eds.) CLSW 2019. LNCS (LNAI), vol. 11831, pp. 406–414. Springer, Cham (2020).
https://doi.org/10.1007/978-3-030-38189-9_42