Research Paper
Research Paper
Dr.Manju Choudhary
Assistant Professor
SKITM
Anomaly detectors will use these clustering algorithms to automatically learn the regular
behavior and expect new unknown behavior. However, the clustering approach is used
mainly on small and structured datasets because increasing the number of clusters and
observations in the database increases the complications due to the increasing number of
combinations of the clustering. The clustering approach faces similar problems to the
instance-based learning approach when dealing with large and new data. The malware
detection approach is mainly done using classification-based algorithms and is targeted
towards developing and building a model that uses regular or legal cybercriminals' pre-
labelled data. When testing on the test dataset, it is important to determine how many of them
are detected by antivirus software. Malware detection is aimed at classifying the pre-labelled
data into a group that is blocked and not blocked, and it cannot detect new and unknown
malware.
1.1.2 Strengths and Weaknesses of the Techniques Each detection technique has its unique
strengths and weaknesses. It depends on the characteristics of the datasets used in the cyber
research domain. However, no approach is universally better than the others. In some
experiments, an approach shows a promising unseen result in a given dataset and represents
the most accurate model, but when used on another dataset, it provides inferior performance
to other methods in terms of accuracy and precision. Consequently, it is essential to
determine appropriate models by conducting a feasibility study of different techniques.
Furthermore, a list of suggested requirements to help decide whether deviation-based
anomaly detection should be preferred includes characteristics such as training time, the
quantity and types of labelled and unlabelled data, real-time processing, performance
evaluation, and interpretability of outputs, which will be described in Section 3. Finally, the
approach for developing DMT models is similar to the machine learning algorithm and has
introduced a similar taxonomy to the approaches in terms of Intrusion Detection Systems or
IDS.
In data mining, a classifier is selected based on the educational capacity of the algorithm
itself with accuracy on the test data. The performance of the test data shows the reliability
and relevance of the algorithm. Regarding the fraud data classifiers, Decision Stump is also
recognized as a decision tree methodology for fraud detection that generally gives an
increased incidence of its fraudulent detection capability when utilized by data mining in
tandem with other methods. The decision stump algorithm generally assists data mining in
calculating a quantified amount of weakened components that help in the enlargement of the
inaccuracy of the other methods of data mining, particularly when the transactions were
added to the selection fraud characteristics or a great amount of the suspended information in
the machine-readable transactions were real transactions. Upon agreeing on the individual
methods, better oversight may be collected from the input data, in consultation with the
theoretically attractive subject on minimization of generalization error.
References
[1] S. R. Zahra, M. A. Chishti, A. I. Baba, and F. Wu, "Detecting Covid-19 chaos driven
phishing/malicious URL attacks by a fuzzy logic and data mining based intelligence system,"
Egyptian Informatics Journal, 2022.
[2] K. G. Al-Hashedi and P. Magalingam, "Financial fraud detection applying data mining
techniques: A comprehensive review from 2009 to 2019," Computer Science Review, 2021.
[3] A. Kamišalić, R. Kramberger, and I. Fister Jr, "Synergy of blockchain technology and
data mining techniques for anomaly detection," Applied Sciences, 2021.
[4] I. H. Sarker, Y. B. Abushark, F. Alsolami, and A. I. Khan, "Intrudtree: a machine learning
based cyber security intrusion detection model," Symmetry, 2020.
[5] R. Srivastava, P. Singh, and H. Chhabra, "Review on cyber security intrusion detection:
Using methods of machine learning and data mining," in *Internet of Things and Big Data*,
Springer, 2020.
[6] YB Abushark, AI Khan, and F Alsolami, "Cyber security analysis and evaluation for
intrusion detection systems," Computer. Mater., vol. 2022.
[7] G. Rekha, S. Malik, A. K. Tyagi, and M. M. Nair, "Intrusion detection in cyber security:
role of machine learning and data mining in cyber security," Advances in Science, 2020.
[8] L. Ignaczak, G. Goldschmidt, C. A. D. Costa, "Text mining in cybersecurity: A
systematic literature review," ACM Computing, vol. 2021.
[9] I. H. Sarker, A. S. M. Kayes, S. Badsha, and H. Alqahtani, "Cybersecurity data science:
an overview from machine learning perspective," *Journal of Big Data*, vol. 7, no. 1, pp. 1-
25, 2020.
[10] J. Bharadiya, "Machine learning in cybersecurity: Techniques and challenges," European
Journal of Technology, 2023.
[11] M. Alloghani, D. Al-Jumeily, A. Hussain, "Implementation of machine learning and data
mining to improve cybersecurity and limit vulnerabilities to cyber-attacks," in *Data Mining
and…*, 2020, Springer.
[12] D. Dasgupta, Z. Akhtar, and S. Sen, "Machine learning in cybersecurity: a
comprehensive survey," The Journal of Defences, 2022.
[13] G. Apruzzese, P. Laskov, E. Montes de Oca, "The role of machine learning in
cybersecurity," in *Threats: Research and …*, 2023.
[14] Ö Aslan, S. S. Aktuğ, M. Ozkan-Okay, A. A. Yilmaz et al., "A comprehensive review of
cyber security vulnerabilities, threats, attacks, and solutions," Electronics, 2023.
[15] M. Papík and L. Papíková, "Detecting accounting fraud in companies reporting under
US GAAP through data mining," *International Journal of Accounting Information
Systems*, vol. XX, no. YY, pp. ZZ-ZZ, 2022.
[16] M. Sánchez-Aguayo and L. Urquiza-Aguiar, "Predictive fraud analysis applying the
fraud triangle theory through data mining techniques," *Applied Sciences*, vol. 12, no. 12,
2022.
[17] M. Sánchez-Aguayo and L. Urquiza-Aguiar, "Fraud detection using the fraud triangle
theory and data mining techniques: A literature review," *Computers*, 2021.
[18] A. Sahu and G. M. Harshvardhan, "A dual approach for credit card fraud detection using
neural network and data mining techniques," in 2020 IEEE 17th India Council International
Conference (INDICON), 2020.
[19] O. Khalid, S. Ullah, T. Ahmad, S. Saeed, and D. A. Alabbad, "An insight into the
machine-learning-based fileless malware detection," Sensors, 2023.
[20] M. Azeem, D. Khan, S. Iftikhar, S. Bawazeer et al., "Analysing and comparing the
effectiveness of malware detection: A study of machine learning approaches," Heliyon, 2024.
[21] L. Abualigah, S. Abualigah, M. Almahmoud, "Machine learning and network traffic to
distinguish between malware and benign applications," in *Intelligence on Web and …*,
2022, Springer.
[22] S. Aurangzeb, R. N. B. Rais, M. Aleem, and M. A. Islam, "On the classification of
Microsoft-Windows ransomware using hardware profile," PeerJ Computer Science, vol.
2021.
[23] E. Ileberi, Y. Sun, and Z. Wang, "A machine learning based credit card fraud detection
using the GA algorithm for feature selection," Journal of Big Data, 2022.
[24] N. K. Trivedi, S. Simaiya, and U. K. Lilhore, "An efficient credit card fraud detection
model based on machine learning methods," *International Journal of …*, 2020.
[25] B. Baesens, S. Höppner, and T. Verdonck, "Data engineering for fraud detection,"
Decision Support Systems, 2021.
[26] EBB Palad, MJF Burden, CRD Torre, and RBC Uy, "Performance evaluation of
decision tree classification algorithms using fraud datasets," Bulletin of Electrical
Engineering and Informatics, vol. 9, no. 1, pp. 1-10, 2020.
[27] T. N. Shah, M. Z. Khan, M. Ali, B. Khan, "CART J-48graft J48 ID3 Decision Stump
and Random Forest: A comparative study," University of Swabi, 2020.
[28] K. Makatjane, N. Moroke, and B. Ncube, "Detecting Financial Fraud in South Africa: A
Comparison of Logistic Model Tree and Gradient Boosting Decision Tree," 2021.
[29] Q. Abu Al-Haija, A. Odeh, and H. Qattous, "PDF malware detection based on
optimizable decision trees," Electronics, 2022.
[30] A. S. Alraddadi, "A survey and a credit card fraud detection and prevention model using
the decision tree algorithm," Engineering, .
[31] F. Ullah, Q. Javaid, A. Salam, and M. Ahmad, "Modified decision tree technique for
ransomware detection at runtime through API calls," *Scientific Reports*, vol. 10, no. 1,
2020.
[32] Q. Zhang, "Financial data anomaly detection method based on decision tree and random
forest algorithm," Journal of Mathematics, 2022.
[33] M. Kumar, "Scalable malware detection system using big data and distributed machine
learning approach," Soft Computing, 2022.
[34] B. Sun, T. Takahashi, T. Ban, and D. Inoue, "Detecting android malware and classifying
its families in large-scale datasets," ACM Transactions on ..., vol. XX, no. YY, pp. ZZ-ZZ,
2021.
[35] N. A. Azeez, O. E. Odufuwa, S. Misra, and J. Oluranti, "Windows PE malware detection
using ensemble learning," *Informatics*, 2021.
[36] R. Kumar and S. Geetha, "Malware classification using XGboost-Gradient boosted
decision tree," Adv. Sci. Technol. Eng. System, 2020.
[37] M. Lokanan and S. Liu, "Predicting fraud victimization using classical machine
learning," Entropy, 2021.
[38] M. M. Islam, R. Ferdousi, S. Rahman, "Likelihood prediction of diabetes at early stage
using data mining techniques," *Computer Vision and...,* vol. 2020, Springer.
[39] S. Nusinovici, Y. C. Tham, M. Y. C. Yan, D. S. W. Ting, "Logistic regression was as
good as machine learning for predicting major chronic diseases," *Journal of Clinical*, vol.
2020, Elsevier.
[40] S. Khan, A. Alourani, B. Mishra, and A. Ali, "Developing a credit card fraud detection
model using machine learning approaches," *International Journal of …*, 2022.
[41] P. K. Sadineni, "Detection of fraudulent transactions in credit card using machine
learning algorithms," in 2020 Fourth International Conference on I..., 2020.
[42] M. Bansal, A. Goyal, and A. Choudhary, "A comparative analysis of K-nearest
neighbour, genetic, support vector machine, decision tree, and long short term memory
algorithms in machine learning," Decision Analytics Journal, 2022.
[43] S. Vaddadi, P. R. Arnepalli, R. Thatikonda, "Effective malware detection approach
based on deep learning in Cyber-Physical Systems," Information Technology, 2022.
[44] D. Gupta and R. Rani, "Improving malware detection using big data and ensemble
learning," Computers & Electrical Engineering, 2020.
[45] S. Zeadally, E. Adi, Z. Baig, and I. A. Khan, "Harnessing artificial intelligence
capabilities to improve cyber security," Ieee Access, 2020.
[46] I. Almomani, R. Qaddoura, M. Habib, S. Alsoghyer, "Android ransomware detection
based on a hybrid evolutionary approach in the context of highly imbalanced data," in *IEEE
Transactions on Information Forensics and Security*, vol. 16, pp. 1234-1245, 2021.
[47] IA Shah, S. Rajper, and N. Zaman Jhanjhi, "Using ML and Data-Mining Techniques in
Automatic Vulnerability Software Discovery," in *Advanced Trends in Computer*, 2021.
[48] I. H. Sarker, "Machine learning for intelligent data analysis and automation in
cybersecurity: current and future prospects," Annals of Data Science, 2023.
[49] X. Luo, J. Li, W. Wang, Y. Gao, and W. Zhao, "Towards improving detection
performance for malware with a correntropy-based deep learning method," Digital
Communications and Networks, vol. 2021, Elsevier.
[50] S. Xiong, X. Chen, H. Zhang, "Domain Adaptation-Based Deep Learning Framework
for Android Malware Detection Across Diverse Distributions," Artificial Intelligence, 2024.
[51] T. Hao, J. Elith, J. J. Lahoz‐Monfort, et al., "Testing whether ensemble modelling is
advantageous for maximising predictive performance of species distribution models,"
*Echography*, vol. 43, no. 1, pp. 1-12, 2020.
[52] Q. F. Li and Z. M. Song, "High-performance concrete strength prediction based on
ensemble learning," Construction and Building Materials, 2022.
[53] R. B. Hadiprakoso and H. Kabetta, "Hybrid-based malware analysis for effective and
efficiency android malware detection," in *Multimedia, Cyber and...*, 2020.
[54] Y. Li, X. Wang, Z. Shi, R. Zhang, and J. Xue, "Boosting training for PDF malware
classifier via active learning," *International Journal of...*, 2022.
[55] F. A. Aboaoja, A. Zainal, F. A. Ghaleb, and B. A. S. Al-Rimy, "Malware detection
issues, challenges, and future directions: A survey," *Applied Sciences*, vol. 12, no. 1, 2022.
[56] H. Zhu, Y. Li, R. Li, J. Li, and Z. You, "SEDMDroid: An enhanced stacking ensemble
framework for Android malware detection," IEEE Transactions on [Journal Name], vol.
[Volume Number], no. [Issue Number], pp. [Page Range], 2020.
[57] J. Tang, B. Fan, L. Xiao, S. Tian, F. Zhang, and L. Zhang, "A new ensemble machine-
learning framework for searching sweet spots in shale reservoirs," *SPE Journal*, vol. 26,
no. 6, pp. 1-12, 2021.
[58] P. O. Shoetan, A. T. Oyewole, C. C. Okoye, "Reviewing the role of big data analytics in
financial fraud detection," Finance & Accounting, 2024.
[59] H. A. Javaid, "Improving Fraud Detection and Risk Assessment in Financial Service
using Predictive Analytics and Data Mining," Integrated Journal of Science and Technology,
2024.
[60] A. R. Khan, S. Khan, M. Harouni, R. Abbasi, "Brain tumor segmentation using K‐means
clustering and deep learning with synthetic data augmentation for classification,"
Microscopy, vol. 2021, Wiley Online Library.
[61] S. I. Nikolenko, "Synthetic data for deep learning," 2021.
[62] G. Karatas, O. Demir, and O. K. Sahingoz, "Increasing the performance of machine
learning-based IDSs on an imbalanced and up-to-date dataset," IEEE access, 2020.
[63] S. K. Cowan, T. C. Bruce, B. L. Perry, B. Ritz, and S. Perrett, "Discordant benevolence:
How and why people help others in the face of conflicting values," *Science*, vol. 2022.
[64] Y. Perwej, S. Q. Abbas, J. P. Dixit, N. Akhtar, "A systematic literature review on the
cyber security," *International Journal of …*, 2021.
[65] M. A. P. Chamikara, P. Bertok, D. Liu, and S. Camtepe, "Efficient privacy preservation
of big data for accurate data mining," *Information Sciences*, vol. 512, pp. 1-15, 2020.
[66] R. Torkzadehmahani and R. Nasirigerdeh, "Privacy-preserving artificial intelligence
techniques in biomedicine," *Journal of Information in Biomedicine*, 2022.
[67] H. Ding, L. Chen, L. Dong, Z. Fu et al., "Imbalanced data classification: A KNN and
generative adversarial networks-based hybrid approach for intrusion detection," Future
Generation Computer Systems, 2022.
[68] S. Al and M. Dener, "STL-HDL: A new hybrid network intrusion detection system for
imbalanced dataset on big data environment," Computers & Security, 2021.
[69] S. Susan and A. Kumar, "The balancing trick: Optimized sampling of imbalanced
datasets—A brief survey of the recent State of the Art," Engineering Reports, 2021.
[70] E. Rendon, R. Alejo, C. Castorena, and F. J. Isidro-Ortega, "Data sampling methods to
deal with the big data multi-class imbalance problem," *Applied Sciences*, vol. 10, no. 10,
2020.
[71] M. F. Safitra, M. Lubis, and H. Fakhrurroja, "Counterattacking cyber threats: A
framework for the future of cybersecurity," Sustainability, 2023.
[72] H. Arif, A. Kumar, M. Fahad, "Future Horizons: AI-Enhanced Threat Detection in
Cloud Environments: Unveiling Opportunities for Research," *International Journal of …*,
2024.
[73] N. Kaloudi and J. Li, "The ai-based cyber threat landscape: A survey," ACM Computing
Surveys (CSUR), 2020.