Paper 9
Paper 9
ABSTRACT Financial fraud cases causing serious damage to the interests of investors are not uncommon.
As a result, a wide range of intelligent detection techniques are put forth to support financial institutions’
decision-making. Currently, existing methods have problems such as poor detection accuracy, slow
inference speed, and weak generalization ability. Therefore, we suggest a distributed knowledge distillation
architecture for financial fraud detection based on Transformer. Firstly, the multi-attention mechanism is used
to give weights to the features, followed by feed-forward neural networks to extract high-level features that
include relevant information, and finally neural networks are used to categorize financial fraud. Secondly, for
the problem of inconsistent financial data indicators and unbalanced data distribution focused on different
industries, a distributed knowledge distillation algorithm is proposed. This algorithm combines the detection
knowledge of the multi-teacher network and migrates the knowledge to the student network, which detects
the financial data of different industries. The final experimental results show that the proposed method
outperforms other methods in terms of F1 score (92.87%), accuracy (98.98%), precision (81.48%), recall
(95.45%), and AUC score (96.73%) when compared to the traditional detection methods.
and identify new forgery methods in a timely manner, and incorporate the relevant information of financial data, thus
correlations between data features are difficult to be learned improving the characterization of data relevance.
by the models. It is difficult for the model to extract the (2) To address the problem of inconsistent financial
more critical information for the task at hand from the data indicators and unbalanced data distribution focused on
complex and large data features, resulting in the performance different industries, and to reduce the complexity of the
of existing counterfeiting detection models being greatly financial fraud detection model and improve the accuracy
limited.More significantly, by identifying the relationships of the model, this paper proposes a distributed knowledge
between features, the attention model can uncover more distillation algorithm. The algorithm migrates the detection
concealed counterfeiting information and investigate more knowledge of the multi-teacher network to the student
counterfeiting patterns. For example, literature [6] proposes a network separately, and the student network detects the
two-level attention model that captures deep representations financial data of different industries.
of features from data sample level and feature level sets, (3) The proposed distributed network was evaluated on the
respectively. dataset of the 9th ‘‘TipDM Cup’’ listed company financial
Existing financial fraud detection methods are mostly analysis competition. Experimental results demonstrate that
based on machine learning and deep learning algorithms [4]. our proposed financial fraud detection method based on
These techniques pay less attention to the internal correla- Transformer with distributed knowledge distillation out-
tions within financial data and instead concentrate on mining performs traditional tree models and ensemble models in
the fundamental features of the data. Additionally, different key performance metrics on the dataset. This confirms the
industries may encounter varying challenges in financial data feasibility and effectiveness of our proposed method.
fraud, and the internal correlations of financial data features The rest of the paper is structured as follows, the second
differ across industries. Furthermore, with the continuous part is a review of related research, the third part introduces
growth in the scale of financial data, these models become our proposed model for financial fraud detection, the fourth
increasingly deep and complex, resulting in issues such as part describes the distributed knowledge distillation frame-
model bloat and slow inference speed. Therefore, how to work for detecting fraudulent data in different industries, and
effectively mine the internal correlation of financial data, the experimental results are discussed in the fifth part. Finally
compress the model size, and enhance the model’s ability Part VI summarizes the conclusions of this study.
to detect financial data falsification in different industries
is a new direction for researchers to explore. To address
the above problems, this research suggests a distributed II. BACKGROUND AND RELATED WORK
knowledge distillation architecture based on Transformer. A. TRADITIONAL FINANCIAL FRAUD DETECTION
The method uses a multi-attention mechanism to extract METHODS
the internal correlation of the data, and then the high- Financial fraud detection technology can lower investor
level features that contain the information related to the losses, preserve equity and justice in the trading market,
financial data are extracted through a forward neural network, and assist the China Securities Regulatory Commission
which is combined with the neural network to classify the (CSRC) in determining if listed businesses are suspected
financial data fraud. Secondly, to address the problem of of fraud. Traditional approaches for determining a listed
inconsistent financial data indicators and unbalanced data firm’s involvement in fraudulent operations rely on analyzing
distribution focused on different industries, and to reduce financial data, information from listed firms, and third-party
the complexity of the financial fraud detection model and evidence. With the continuous development of science and
improve the accuracy of the model, this paper proposes a technology, detection methods for fraud have also made sig-
distributed knowledge distillation algorithm. The algorithm nificant progress. Artificial intelligence technologies driven
migrates the detection knowledge of the multi-teacher by big data have been widely applied and have shown
network to the student network separately, and the student promising results in fraud detection. The core idea of artificial
network detects the financial data of different industries. The intelligence is to train a model with strong generalization
final experimental results show that the proposed method capabilities, supported by big data, enabling the model to
has better F1 score, accuracy, precision, recall, and AUC accurately detect the likelihood of listed companies engaging
score compared to the traditional detection methods, which in financial data fraud. According to whether the sample data
improves the accuracy of financial forgery detection. is labeled, these methods can be roughly divided into two
The following are the primary contributions of our categories: supervised learning and unsupervised learning.
research: In a supervised learning approach, the model used for
(1) For financial fraud detection, considering that Trans- financial forgery detection can be viewed as a binary
former has strong generalization and expressive ability, it is classification task, i.e., whether the company is a forgery
easier to adapt to diverse financial data. Therefore, we pro- or not, and the result is often given in the form of a
pose a financial fraud detection model based on Transformer, probability, where the higher the probability the more likely
which utilizes the multi-head attention mechanism and feed- it is that the company is a forgery. Many classification
forward neural network to mine the high-level features that algorithms have been proposed and have achieved good
results in various industries. Based on whether the distribu- detection as a sequence classification task and utilized Long
tion of observed variables is modeled, supervised learning Short-Term Memory (LSTM) for predictions.Experimental
models can be divided into two categories: discriminative results show that LSTM effectively improves the accuracy
models and generative models. Generative models include of credit card fraud compared to random forest [12]. Zhou
Naive Bayes (NB), Restricted Boltzmann Machine (RBM), et al. use a graph embedding algorithm to learn topological
Hidden Markov Model (HMM). Discriminative models features from financial network graphs and represent them
include Logistic Regression (LR), Multilayer Perceptron as low-dimensional dense vectors. In this way, they utilize
(MLP), Support Vector Machine (SVM), K-Nearest Neigh- deep neural networks to intelligently and efficiently classify
bors (KNN), Maximum Entropy Model (ME), Conditional and predict data samples from large-scale datasets [13].
Random Field (CRF), Decision Tree (DT), Random Forest The literature [14], taking into account the homogeneity
(RF). Such as, in reference [7], the accuracy of four machine of the data structure, proposes a graph learning algorithm
learning algorithms–LR, RF, DT,CatBoost–is analyzed and capable of learning topological features and transaction
compared as the subject of financial fraud detection is amount features in financial transaction network graphs.
explored through the use of several algorithms. Using a In literature [15], a novel graph neural network (GNN)
dataset of financial fraud, Liu et al. used the RF technique architecture with a time de-biasing constraint based on
and contrasted it with other algorithms like LR, KNN, DT, adversarial loss is proposed. This architecture captures fraud
and SVM. They discovered that the RF algorithm had the best patterns that exhibit fundamental consistency over time and
interpretability and maximum accuracy [8]. Unsupervised performs well in fraud detection tasks. In literature [16],
learning does not require labeling the data; it is similar in a new credit card fraud detection model named CCFD-
nature to a statistical tool that detects anomalous data to Net is introduced, featuring a hybrid architecture combining
determine if samples that do not belong to the main class are 1D-Conv and Residual Neural Network (Res-net). This
deceptive. Two common types of algorithms for unsupervised model demonstrates good effectiveness and robustness in
learning are clustering and dimensionality reduction. The credit card fraud detection.
clustering algorithms are K-mean clustering, hierarchical
clustering, etc., and the dimensionality reduction algorithms
are Principal Component Analysis(PCA) and Singular Value C. MULTI-TEACHER KNOWLEDGE DISTILLATION
Decomposition(SVD). Such as, reference [9] proposed a METHODS
model framework that separates clusters using the K-means The single-student-multi-teacher distillation paradigm has
method and compared the performance with two of the most made significant progress in converting complicated,
important financial fraud detection systems. Reference [10] multi-attribute instructor information into lightweight student
introduced an unsupervised learning approach that combines networks. Multi-teacher distillation research focuses on
Particle Swarm Optimization(PSO) and K-Means clustering, designing appropriate distillation strategies for use in
demonstrating better performance in financial fraud detection instructing students. In 2017, You et al [17] proposed a
compared to K-Means. framework for multi-teacher distillation. This approach aver-
ages the soft labels of logits produced from several teacher
models and provides them to student models for learning.
B. DEEP LEARNING COUNTERFEIT DETECTION METHODS Shi et al [18] used another way of directly splicing logits of
Classical machine learning algorithms typically use shallow multiple teachers and then performing PCA dimensionality
models, effective for linearly separable tasks or simple reduction on the face recognition model. Shin [19] extended
non-linear tasks. In contrast, deep learning algorithms are the multi-instructor-single-student distillation architecture to
generally employed for deep models, providing stronger non- a visual multi-attribute recognition task of a target, where
linear modeling capabilities and better performance on real- each instructor specialises in learning one attribute, and then
world complex tasks. For tasks with higher complexity and synthesises the multi-instructor’s knowledge to transfer it to
deeper concealment, such as financial data fraud detection, the student to achieve the student’s multi-attribute recognition
deep learning algorithms generally outperform machine learning. Furthermore, in a recent study, Hailin et al. [20]
learning algorithms [4]. For example, Rushin et al. compared proposed an adaptive multi-instructor knowledge distillation
the performance of LR, gradient boosting trees, and deep strategy that allows diverse instructor knowledge to be
learning in detecting credit card fraud, indicating that deep jointly utilised to improve student performance. The multi-
learning methods outperform the other two approaches [11]. instructor knowledge distillation paradigm proposed in the
In addition, deep learning algorithms can deeply explore literature [21] empowers students to integrate and capture a
the potential connections between data, thereby uncovering variety of knowledge from different sources. Although many
more methods for detecting financial fraud and enhancing studies have used a multi-teacher distillation framework, less
the effectiveness of detection. For example, the classification attention has been paid to the uneven distribution of positive
results depend on features constructed from domain-specific and negative samples. In this research, we employ a multi-
knowledge, without considering other attributes of the data, teacher knowledge distillation strategy to aggregate various
such as temporal attribution. Jurgovsky et al. treated fraud instructors’ knowledge of financial fraud detection across
where Q ∈ Rm×d ,K ∈ Rm×d ,V ∈ Rm×d ,Wq ∈ Rm×d ,Wk ∈ The self-attention scores for financial data features can be
Rm×d ,Wv ∈ Rm×d . summarized as formula (6):
Then, for the query matrix Q, calculate its similarity score
QK T
matrix S with the key matrix K . To prevent excessively large Attention (Q, K , V ) = softmax √ V. (6)
scores that could d
√ lead to model gradient explosions, divide
each score by d: The multi-head attention mechanism enables the model
QK T to capture richer correlations among financial data features,
S= √ (4)
d facilitating a more in-depth exploration of patterns related to
where S ∈ Rm×m the scores represent the correlation between data falsification. Multi-head attention involves performing
each financial data feature and other features. the self-attention mechanism multiple times, essentially
Finally, normalize the scores using the softmax function having n individuals focusing attention on different positions
and multiply the normalized correlation scores by the value of financial data features. This approach increases the
matrix V to obtain the self-attention scores O for financial likelihood of detecting crucial information related to data
data features: falsification:
O = softmax(S)V (5) MultiHeadAtt (Q, K , V ) = Concat
where O ∈ Rm×d . (head1 , head2 , . . . , headh ) · Wo (7)
FIGURE 2. The architecture of the financial fraud detection model based on transformer.
where head = Attention (Qi , Ki , Vi ),i ∈ {1, . . . , h},Wo ∈ not discarded, their values are scaled by the reciprocal
Rhd×d . of the dropout probability, maintaining the expected value
of the data. By training different network structures in
2) FEEDFORWARD NEURAL NETWORK each iteration, dropout introduces variability, eliminating
The multi-head attention scores obtained from formula (7) and weakening the interdependence among neuron nodes,
undergo a residual connection and layer normalization thereby enhancing the model’s ability to generalize internal
operation. The residual connection addresses the training correlations in financial data. The dropout computation
issues of deep networks by adding the output to the original process is as follows formula (10).
input, enhancing the network’s representational capacity [25].
0, p
Layer normalization normalizes all inputs to have a mean
droput (X ) = X (10)
of 0 and a standard deviation of 1. This helps alleviate the , 1−p
problem of internal covariate shift in neural network training, 1−p
providing more stable and faster training: 3) OUTPUT NEURAL NETWORK
LayerNorm (X + MultiHeadAtt (Q, K , V )) . (8) After the financial data goes through the stacked encoder,
we map and output the high-level features X , which are
Subsequently, the multi-head attention scores, after the
extracted by the last encoder and contain internal correlation
residual connection and layer normalization, undergo further
information, through a linear layer. We normalize the output
processing through two linear transformations and a ReLU
using the softmax function. The normalization calculation is
activation function. This step aims to extract higher-level
shown in formula (11):
features with richer contextual information:
FFN (X ) = max (0, X W1 + b1 ) W2 + b2 (9) Y pre = softmax W · X T + b (11)
while the linear transformations at different positions in the where Y pre ∈ R1×2 is the probability distribution vector, W
encoder are the same, the parameters between layers are is the neural network weight matrix, and b is the bias vector.
distinct.
In order to prevent overfitting, we introduce dropout into 4) OVERALL LOSS CALCULATION
the output of each fully connected layer to ensure the model’s The financial dataset D = {(Xn , Yn )}N n=1 is passed into
generalization. Dropout involves randomly discarding each the Transformer-based financial fraud detection model. After
neuron with a probability p. For the neurons that are extracting high-level features related to the data, the model
The second step is to compute the student loss. This involves ∂Ltr
using a temperature softmax distiller (with T = 1) on W = W +η · (21)
∂W
the output Z (s) of the student network to calculate soft
targets P,and then calculating the cross-entropy loss between where η represents the learning rate.
P(t=1) and the hard targets Yn from the financial data using
formula (19): V. EXPERIMENT
1 XB XC In this section, we first describe the structure of the
Lcls = −[Yi,j · log Pi,j dataset. Subsequently, we compare the performance metrics
B i=1 j=1
+ 1 − Yi,j · log 1 − Pi,j ].
(19) of the teacher model and the student model. We then
compare the student model with other machine learn-
The final knowledge distillation loss is obtained by taking ing algorithms, followed by visualization and parameter
the weighted sum of both the distillation loss and the student analysis.
Algorithm 1 Multi-Teacher Model Training Algorithm Algorithm 2 Student Model Training Algorithm
Hyperparameters: Enter feature dimension d;Bulk Hyperparameters: Enter feature dimension d;Bulk
attention nhead=6;Number of feedforward neurons attention nhead=2;Number of feedforward neurons
dim=1024;Random dropout dropout=0.2;Encode layer dim=1024;Random dropout dropout=0.2;Encode layer
number layers=2;Learning rate η =0.001;Number of number layers=2;Learning rate η=0.001;Distillation
iterations T =100;Training data amount N1 ;Batch size temperature Tem=7;Number of iterations
n1 =32;Optimizer=Adam. T =100;Training data amount N1 ;Batch size
Input: Multi-industry financial data set collection I = n1 =32;Optimizer=Adam.
{D1 , D2 , . . . , Dm },where Dm = {(Xn , Yn )}N
n=1 . Input: Multi-industry financial data set collection I =
Output: Teacher model convergence parameters W (t) . {D1 , D2 , . . . , Dm },where Dm = {(Xn , Yn )}N n=1 .
1: Random initialization W (t) ← N (0, 1); Output: Multi-industry nstudent network o convergence
(s) (s) (s) (s)
2: Random sorting of different industries in the collection parameters W (s) = w1 , w2 , . . . , wn , where wn
I; Express the network convergence parameters in a certain
3: while t ≤ T do industry.
4: for n = 1 : N1 n1 do 1: Random initialization W (s) ← N (0, 1);
5: Select batch samples from data set I (Xn , Yn ); 2: for i = 1 : m do
6: for k = 1 : layers do 3: Select the industry dataset Di from the collection I ;
7: for i = 1 : nhead do 4: Sorting the sample of the industry dataset Di randomly
8: From the formula (1), (2), (3) calculate Qi , Ki , sort;
Vi according to Xn ; 5: while t ≤ T do
9: From the formula (6) calculate headi according 6: for n = 1 : N1 n1 do
to Qi , Ki , Vi ; 7: Select batch samples from data set Di (Xn , Yn );
10: end for (t)
8: Calculate the output Zn of the teacher network
11: Calculate the multi-head attention score M based based on Xn and the teacher network parameters
on headi according to formula (7); W (t) from algorithm 1;
12: Calculate the residual network and layer nor- (s)
9: Calculate the output Zn of the student network
malization L based on X and M according to based on Xn and the student network parameters
formula (8); (s)
wn ;
13: Feed the feedforward neural network FFN (L) 10: According to equations (16) and (17),distill the
based on formula (9), and apply random dropout (t) (s)
classification results Zn and Zn through a
to each fully connected layer according to distillation process with distillation temperature
formula (10); (t)
Tem = t, resulting in distilled outputs Pn and
14: Calculate the residual network and layer normal- (s)
Pn ;
ization to obtain the encoder output X̄ based on
11: According to equation (15),distill the classifica-
formula (8); (s)
tion result Zn of the student network through a
15: Feed the output back to the input, and stack the
distillation process with a distillation temperature
encoder:X = X̄ ;
Tem = 1, obtaining the distilled output Pn ;
16: end for
12: Calculate the final loss Ln for dataset Di based on
17: Apply the linear output layer to the output of the last
formulas (18),(19) and (20);
encoder based on formula (11) to obtain the output (s)
13: Finally, update the parameters wn of the stu-
result Y pre ;
dent network based on the final loss Li using
18: Calculate the cross-entropy loss for the dataset
formula (21);
based on formula (14);
14: end for
19: Update the model parameters W based on for-
15: end while
mula (13);
16: end for
20: end for
17: return Multi-industry n student networko convergence
21: end while (s) (s) (s) (s) (s)
22: return Output the convergence parameters W (t) of the
parameters W = w1 , w2 , . . . , wn , where wn
teacher model. express the network convergence parameters in a certain
industry.
TABLE 2. Summary of the analyzed data sets. TABLE 3. Comparison of evaluation metrics between teacher and student
models.
FIGURE 4. Comparative analysis of MAE values of the proposed method FIGURE 6. Comparative analysis of MCC values of the proposed method
with other models. with other models.
FIGURE 7. The proposed method and AUC curves compared to other FIGURE 10. The proposed method and precision-recall curves on a
machine learning algorithms on datasets from various industries. manufacturing dataset compared to other ML algorithms.
VI. CONCLUSION
FIGURE 8. The proposed method and AUC curves for a manufacturing The detection of fraudulent financial data in listed companies
dataset compared to other ML algorithms.
is of significant importance for safeguarding the interests of
shareholders and investors. This paper proposes a distributed
knowledge distillation framework based on Transformer
for detecting fraudulent financial data in listed companies.
Experimental validation was conducted using the dataset
from the 9th ‘‘TipDM Cup’’ Financial Analysis Competition
for Listed Companies. The performance of the proposed
method was evaluated by comparing it with other advanced
machine learning algorithms, including logistic regression,
linear support vector machine, decision tree, random forest,
XGBoost, and Adaboost. The experimental results demon-
strate that the proposed method outperforms other machine
learning algorithms, achieving the highest performance in
terms of AUC, accuracy, precision, recall, and F1 score.
REFERENCES
[1] C. Defang and L. Baichi, ‘‘SVM model for financial fraud detection,’’
FIGURE 9. The proposed method and precision-recall curves on datasets Northeastern Univ., Natural Sci., vol. 40, pp. 295–299, Feb. 2019.
from various industries compared to other ML algorithms.
[2] T. Shahana, V. Lavanya, and A. R. Bhat, ‘‘State of the art in financial
statement fraud detection: A systematic review,’’ Technological Forecast-
ing Social Change, vol. 192, Jul. 2023, Art. no. 122527.
[3] W. Xiuguo and D. Shengyong, ‘‘An analysis on financial statement fraud
the proposed fraud detection model on both datasets. These detection for Chinese listed companies using deep learning,’’ IEEE Access,
results demonstrate the effectiveness of our proposed method. vol. 10, pp. 22516–22532, 2022.
[4] M. N. Ashtiani and B. Raahemi, ‘‘Intelligent fraud detection in financial [24] J. Geng and B. Zhang, ‘‘Credit card fraud detection using adversarial
statements using machine learning and data mining: A systematic literature learning,’’ in Proc. Int. Conf. Image Process., Comput. Vis. Mach. Learn.
review,’’ IEEE Access, vol. 10, pp. 72504–72525, 2022. (ICICML), 2023, pp. 891–894.
[5] M. El-Bannany, A. H. Dehghan, and A. M. Khedr, ‘‘Prediction of financial [25] E. Orhan, ‘‘Skip connections as effective symmetry-breaking,’’ 2017,
statement fraud using machine learning techniques in UAE,’’ in Proc. 18th arXiv:1701.09175.
Int. Multi-Conf. Syst., Signals Devices (SSD), Mar. 2021, pp. 649–654. [26] H. Hong and H. Kim, ‘‘Feature distribution-based knowledge distillation
[6] R. Cao, G. Liu, Y. Xie, and C. Jiang, ‘‘Two-level attention model of for deep neural networks,’’ in Proc. 19th Int. SoC Design Conf. (ISOCC),
representation learning for fraud detection,’’ IEEE Trans. Computat. Social Oct. 2022, pp. 75–76.
Syst., vol. 8, no. 6, pp. 1291–1301, Dec. 2021. [27] D. Varmedja, M. Karanovic, S. Sladojevic, M. Arsenovic, and A. Anderla,
[7] A. Singh, A. Singh, A. Aggarwal, and A. Chauhan, ‘‘Design and ‘‘Credit card fraud detection–machine learning methods,’’ in Proc. 18th
implementation of different machine learning algorithms for credit Int. Symp. Infoteh-Jahorina (INFOTEH), Mar. 2019, pp. 1–5.
card fraud detection,’’ in Proc. Int. Conf. Electr., Comput., Commun. [28] T. Priyaradhikadevi, S. Vanakovarayan, E. Praveena, V. Mathavan,
Mechatronics Eng. (ICECCME), Nov. 2022, pp. 1–6. S. Prasanna, and K. Madhan, ‘‘Credit card fraud detection using machine
[8] C. Liu, Y.-C. Chan, S. H. Alam, and H. Fu, ‘‘Financial fraud detection learning based on support vector machine,’’ in Proc. 8th Int. Conf. Sci.
model: Based on random forest,’’ in Econometrics: Econometric Model Technol. Eng. Math. (ICONSTEM), Apr. 2023, pp. 1–6.
Construction, 2015. [29] C.-C. Lin, A.-A. Chiu, S. Y. Huang, and D. C. Yen, ‘‘Detecting the
[9] H. Shivraman, U. Garg, A. Panth, A. Kandpal, and A. Gupta, ‘‘A model financial statement fraud: The analysis of the differences between data
frame work to segregate clusters through K-means method,’’ in Proc. 2nd mining techniques and experts’ judgments,’’ Knowl.-Based Syst., vol. 89,
Int. Conf. Comput. Sci., Eng. Appl. (ICCSEA), Sep. 2022, pp. 1–6. pp. 459–470, Nov. 2015.
[10] N. Sharma and V. Ranjan, ‘‘Credit card fraud detection: A hybrid of PSO [30] V. Arora, R. S. Leekha, K. Lee, and A. Kataria, ‘‘Facilitating user
and K-means clustering unsupervised approach,’’ in Proc. 13th Int. Conf. authorization from imbalanced data logs of credit cards using artificial
Cloud Comput., Data Sci. Eng. (Confluence), Jan. 2023, pp. 445–450. intelligence,’’ Mobile Inf. Syst., vol. 2020, pp. 1–13, Oct. 2020.
[11] G. Rushin, C. Stancil, M. Sun, S. Adams, and P. Beling, ‘‘Horse race [31] L. Torlay, M. Perrone-Bertolotti, E. Thomas, and M. Baciu, ‘‘Machine
analysis in credit card fraud—Deep learning, logistic regression, and learning–XGBoost analysis of language networks to classify patients with
gradient boosted tree,’’ in Proc. Syst. Inf. Eng. Design Symp. (SIEDS), epilepsy,’’ Brain Informat., vol. 4, no. 3, pp. 159–169, Sep. 2017.
Apr. 2017, pp. 117–121. [32] P. Yu and X. Liu, ‘‘Construction and application of bid fraud prediction
[12] J. Jurgovsky, M. Granitzer, K. Ziegler, S. Calabretto, P.-E. Portier, model based on AdaBoost algorithm,’’ in Proc. 2nd Int. Conf. Electron.
L. He-Guelton, and O. Caelen, ‘‘Sequence classification for credit-card Inf. Eng. Comput. Technol. (EIECT), Oct. 2022, pp. 292–295.
fraud detection,’’ Exp. Syst. Appl., vol. 100, pp. 234–245, Jun. 2018. [33] T. Zhang and S. Gao, ‘‘Graph attention network fraud detection based
[13] H. Zhou, G. Sun, S. Fu, L. Wang, J. Hu, and Y. Gao, ‘‘Internet financial on feature aggregation,’’ in Proc. 4th Int. Conf. Intell. Inf. Process. (IIP),
fraud detection based on a distributed big data approach with node2vec,’’ Oct. 2022, pp. 272–275.
IEEE Access, vol. 9, pp. 43378–43386, 2021. [34] A. Vaswani, N. M. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
[14] R. Li, Z. Liu, Y. Ma, D. Yang, and S. Sun, ‘‘Internet financial fraud L. Kaiser, and I. Polosukhin, ‘‘Attention is all you need,’’ in Proc. Neural
detection based on graph learning,’’ IEEE Trans. Computat. Social Syst., Inf. Process. Syst., 2017, pp. 1–11.
vol. 10, no. 3, pp. 1394–1401, 2023.
[15] A. Singh, A. Gupta, H. Wadhwa, S. Asthana, and A. Arora, ‘‘Temporal
debiasing using adversarial loss based GNN architecture for crypto fraud
detection,’’ in Proc. 20th IEEE Int. Conf. Mach. Learn. Appl. (ICMLA),
Dec. 2021, pp. 391–396.
YUXUAN TANG is currently pursuing the bach-
[16] X. Liu, K. Yan, L. Burak Kara, and Z. Nie, ‘‘CCFD-net: A novel deep
learning model for credit card fraud detection,’’ in Proc. IEEE 22nd Int.
elor’s degree in accounting with the School of
Conf. Inf. Reuse Integr. Data Sci. (IRI), Aug. 2021, pp. 9–16. Accounting, Southwestern University of Finance
[17] S. You, C. Xu, C. Xu, and D. Tao, ‘‘Learning from multiple teacher and Economics, Chengdu, Sichuan, China. Her
networks,’’ in Proc. 23rd ACM SIGKDD Int. Conf. Knowl. Discovery Data current research interests include financial big
Mining, Aug. 2017, pp. 1285–1294. data analysis, financial fraud detection, credit
[18] W. Shi, G. Ren, Y. Chen, and S. Yan, ‘‘ProxylessKD: Direct knowl- card fraud detection, machine learning, and deep
edge distillation with inherited classifier for face recognition,’’ 2020, learning.
arXiv:2011.00265.
[19] M. Shin, ‘‘Semi-supervised learning with a teacher–student network for
generalized attribute prediction,’’ in Proc. Eur. Conf. Comput. Vis., 2020,
pp. 509–525.
[20] H. Zhang, D. Chen, and C. Wang, ‘‘Adaptive multi-teacher knowledge
distillation with meta-learning,’’ in Proc. IEEE Int. Conf. Multimedia Expo
(ICME), Jul. 2023, pp. 1943–1948. ZHANJUN LIU received the Ph.D. degree in
[21] A. Amirkhani, A. Khosravian, M. Masih-Tehrani, and H. Kashiani, circuits and systems from Chongqing University,
‘‘Robust semantic segmentation with multi-teacher knowledge distilla- Chongqing, China, in 2018. He is currently a
tion,’’ IEEE Access, vol. 9, pp. 119049–119066, 2021. Professor with the School of Communication and
[22] B. An and Y. Suh, ‘‘Identifying financial statement fraud with decision Information Engineering, Chongqing University
rules obtained from modified random forest,’’ Data Technol. Appl., vol. 54, of Posts and Telecommunications, China. His
no. 2, pp. 235–255, May 2020. current research interests include network intelli-
[23] P. Craja, A. Kim, and S. Lessmann, ‘‘Deep learning for detecting gence, big data analysis, and deep learning.
financial statement fraud,’’ Decis. Support Syst., vol. 139, Dec. 2020,
Art. no. 113421.