0% found this document useful (0 votes)
34 views

IEEE Con Temp FIN

The document discusses MalScanner, a tool that analyzes files for malicious behavior using machine learning. It can perform static and dynamic analysis, and stores results on a blockchain database. The tool is presented to users through a web application. Related work on malware analysis sandboxes and machine learning techniques is also reviewed.

Uploaded by

Saif M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

IEEE Con Temp FIN

The document discusses MalScanner, a tool that analyzes files for malicious behavior using machine learning. It can perform static and dynamic analysis, and stores results on a blockchain database. The tool is presented to users through a web application. Related work on malware analysis sandboxes and machine learning techniques is also reviewed.

Uploaded by

Saif M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Malscanner – File Behavior Analysis using Machine

Learning
Basil Abdulrahman Abdulrahman Qanadeely Abdulaziz Al-Hassan
Dept. of Networks and Dept. of Networks and Dept. of Networks and
Communications Communications Communications
Imam Abdulrahman Bin Faisal Imam Abdulrahman Bin Faisal Imam Abdulrahman Bin Faisal
University University University
Dammam, Saudi Arabia Dammam, Saudi Arabia Dammam, Saudi Arabia
[email protected] [email protected] [email protected]

Omar Al-Ghamdi Nawaf Al-Sukaibi Dr. Nazar Abbas Saqib


Dept. of Networks and Dept. of Networks and Dept. of Networks and
Communications Communications Communications
Imam Abdulrahman Bin Faisal Imam Abdulrahman Bin Faisal Imam Abdulrahman Bin Faisal
University University University
Dammam, Saudi Arabia Dammam, Saudi Arabia Dammam, Saudi Arabia
[email protected] [email protected] [email protected]

Abstract— MalScanner is a tool that aims to provide a simple, user survey of professionals and participants from the
effective, and user-friendly method of scanning files for malicious academic community. Around 92% of survey participants
behavior. Furthermore, MalScanner scans a file and extracts gave positive feedback on the usability of this model. As seen
features to be used in machine learning assisted static malware in the figure below, users from multiple devices can send their
analysis and inspects the file’s behavior dynamically. This tool malware to the server for analysis and receive results.
also implements a blockchain database to store analysis results.
The solution will be presented to the user in a straightforward
manner via web application.

Keywords—Malware Analysis, Machine Learning, Static


Analysis, Dynamic Analysis, Sandboxing.

I. INTRODUCTION
While the methods used by attackers to create, pack, and
disseminate malicious software greatly evolved over the
years; the advancements in malware detection and threat
assessment methods used by security researchers are lacking Fig. 1. Cuckoo Sandbox Architecture.
compared to advanced persistent threats. Current detection
methods can be summarized as the following: signature- In [3], Cuckoo sandbox is used to conduct dynamic
based, behavioral-based, and heuristic-based. While the latter malware analysis. The information extracted from file
two of the priorly mentioned methods are the most effective analysis includes registry changes, file changes, API calls,
and accurate, signature-based detection is the most common network traffic, and summary information. Different
method of detection used by commercial security products [1]. combinations of this information can be used to detect
Moreover, many innovations have been made in the machine malicious behaviour. Based on an experiment using 800
learning field. Which is why we thought of integrating ML in ordinary and 2200 malware files, features are extracted with
cyber security practices, to elevate accuracy without 94.64% accuracy. Some limitations found from this study
sacrificing security. In particular, the features to be used can include the limited network connectivity that prevent
be reutilized in other endeavors based on their precision in complete analysis of network behaviour of malware, the
distinguishing between malicious and legitimate software. intelligent behaviour of sophisticated malware, and the easy
Windows binary executables of a malicious nature all share detection of virtual environments when running malware. The
similar metadata information that are intentionally forged by figure below shows different combinations used for feature
attackers to provide anonymity. Little do they know; these extraction and API with summary information using Gradient
features could be used to indicate the presence of malicious B Classifier algorithm shows the best AUC at 95.86%.
software. Table I. Extracted Feature Combinations Compared Based on their AUC
Percentage.
II. RELATED WORK
In [2], a user-friendly model for ransomware analysis is Combination Algorithm AUC
developed. The results of the analysis are shown in a report APIs+DLLs AdaBoost 84.60
that is summarized to make it easier for the user to understand. Classifier
The user interface contains only two buttons to upload the file
and submit. The model is built using the Cuckoo open-source APIs+Summary Gradient B. 95.86
sandbox environment for automated malware analysis. The information Classifier
Cuckoo sandbox runs on Linux and can be used to analyse Registry Gradient B. 86.10
PDF, EXE, and DOC files. The files are uploaded to the Classifier
Cuckoo sandbox through REST API. The usability and
accessibility of this model is evaluated using a comprehensive

979-8-3503-0030-7/23/$31.00 ©2023 IEEE


DLLs+Summary Gradient B. 94.84 behaviour and it has high efficiency due to lower time
information Classifier consumption in detecting the malware activity.
Registry+APIs Gradient B. 84.87
Classifier
DLLs Logistic 80.42
regression
Registry+Summary Gradient B. 94.64
information Classifier
APIs Calls Gradient B. 94.44
Classifier
As we can see from the table above, there are many
techniques and methodologies for implementing sandbox
technology into malware analysis. There are many sandboxes
that are either open-source or proprietary. Cuckoo sandbox is
the most used and thoroughly tested as it is an open-source
sandbox with the source code publicly available on GitHub, Fig. 3. Blockchain Architecture.
Cuckoo sandbox is used as a base for models [2] [4] [3] [5]
[6]. Based on experiments conducted on each model, III. DISCUSSION
VTCSandbox [7] has the most accurate results when it comes An isolated environment is needed to perform dynamic
to detecting malware features with an accuracy of 98.6% analysis on malicious files. This environment needs to provide
while maintaining a high performance of 23,933 system calls security for the host device and the network. Furthermore, a
within 140 seconds. The second-best approach is [5] that has virtual environment can be detected by sophisticated malware
high efficiency with an accuracy of 97.9%, although it should variants. Therefore, our sandbox must be able to circumvent
be noted that [5] is built on Cuckoo sandbox which is publicly detection attempts by malware. We have reviewed various
tested. sandbox technologies and have concluded that Cuckoo is the
Many malware detection methods have recently used best sandbox technology to use in our project for its ease of
machine learning to detect malware and use both supervised use and customizability. Because Cuckoo is open source, it
and unsupervised models. Moreover, this table contains the has been tested by many developers and researchers for
machine learning models, efficiency, advantage, extracted performance and stability. Additionally, Cuckoo sandbox has
features, and if the model is supervised or unsupervised. In the been used for similar applications that resemble our project
paper [8] has been introduced a new model that detects idea.
malware in backup systems infected by ransomware and can It is apparent that the dataset greatly impacts the results of
restore the files, even if the system was infected. In the papers the experiment. Furthermore, both the EMBER [18] and
[9], [10], [11], and [12] authors did not consider the detection APT1 [19] datasets seem to be credible datasets that reflect
of files in backups. the current state of malware and being of significant size
There are many blockchain technologies that are either which will prevent bias in our machine learning model, while
public or private. Most of the papers discussed signature and also guaranteeing a higher detection rate. More so, the
anomaly-based detection techniques. As a result, they used machine learning algorithms that seemed to provide the
these techniques to ensure the integrity of the file hashes and highest accuracy in the case of static analysis are the random
categorize the hashes based on malicious or benign activities. forest [20] [21] algorithm and the Logistic Regression
Signature and anomaly-based detection techniques are used as algorithm supplemented by the XGBoost [19] respectively.
the base approach for these papers [13] [14] [15] [16]. These two algorithms will be considered in the case of
performing a static malware analysis. It is apparent that
features such as entropy [21], image file header, optional
Selecting
header, image section header, machine, and subsystem [22]
Data
Collectio
Data Feature Appropri Training Model are an accurate indicator of the presence of malware and
n
Cleaning Selection ate Model Testing entropy will help detect the presence of Cryptor malware,
Model
which try to obfuscate themselves using encryption.
Machine learning (ML) plays a significant role in the
Fig. 2. Machine Learning Process. malware detection industry, as it enhances the process of
analysis in both static and dynamic. Therefore, effective
The experimental results showed that signature and methods of malware detection should be encountered the
anomaly-based detection techniques [14] have the most machine learning. As a result, we will apply machine learning
accurate results with a 95.74 % detection rate when it comes to our solution. Additionally, papers [8], [9], [10], and [11]
to detecting malware hashes, even though they have a high have used deep learning models in the proposed
performance in updating the antivirus process and they can methodologies. Moreover, the authors used supervised and
eliminate intermediate processes. The alternative best unsupervised models, which sometimes merge both
approach is Biserial Miyaguchi–Preneel Blockchain-Based supervised and unsupervised models to achieve speed and
Ruzicka-Indexed Deep Perceptive Learning for Malware adaptively.
Detection [17]. It has 95% accuracy in detecting malicious
Malware detection using blockchain databases plays a Web-based platforms work independently from operating
significant role in speeding up the performance of the systems, hence this platform should also work no matter the
analysis since the process will compare the unknown hashes operating system. The only mandatory requirement is that the
to our dataset that has malicious hashes listed in the operating system contains a useable browser.
blockchain database. In the literature review’s part for
malware detection using blockchain databases, we found out
Assumptions about the platform users are as follows:
the most common approach uses signature and anomaly-
based detection techniques to assure the integrity of the file • They have valid accounts: this applies only to admin
hashes. The mentioned techniques have the most accurate
results, with a greater than 95.74 percent detection rate. The and signed-up users who must have valid credentials
malware detection using the blockchain database method will to access their accounts.
completely assist us in integrating this method with machine
learning since all other methodologies are taking a massive • Functional internet connection: web-based platforms
amount of time in the process, and by using the blockchain require internet connections to work, and this is
database method, the process of analysing the files will be
much faster and easier. especially important when files are being uploaded
to it.
IV. PROPOSED FRAMEWORK
This malware scanner platform can be used by three types
of users: admin, user with an account, and guest users.
Admins, as the name suggests, are the administrators that
keep an eye on the platform, regularly update the databases,
control the blockchain infrastructure, and have complete
control over the system. This type of user is not available to
the general public and is a position that can only be held
through the creators of the platform who work behind the
scenes. In addition to these platform specific roles, these
users will also hold typical admin roles such as user
management – viewing, deleting, even creating new user –
and moderating the content of the comments that will be
made available and checking that they are appropriate. Fig. 5. MalScanner Admin Portal.

The second type of users are ones that sign up on the In order to ease the governance of the MalScanner
platform where they will be able to save their files and view platform by the admin, a special admin portal page will be
their scanning history. These users can also upload the created for admins. This page will ease the control of the
suspected files on the platform to check as well as add admin over the platform by containing a simple interface
comments on it for future users to see. Guest users are limited which is easily intelligible and understood. Below is a
to only one functionality which is uploading the files. wireframe demonstrating the admin portal interface.
The following figure showcases and overview of the The main interface provides the user with quick access to
system’s architecture: the functionalities provided by the web application. Below is
a figure displaying a wireframe design of the interface to be
implemented.

Fig. 4. System Architecture.

Fig. 6. MalScanner User Portal.


Since this platform is a web-based application, it should
run smoothly on any browser the user wishes to use. The The results interface needed to be clear and concise as
platform itself will check and authenticate the valid users and users of all technical backgrounds are meant to take one look
refer back to the databases where the login credentials are and understand the results; an element that is reflected in the
stored. The browser should be compatible for all three user design of this interface. It is comprised of simple and easy-to-
types for them to upload their required files and add/view read panels that showcase the results of the static analysis and
comments on the website.
dynamic analysis, important metadata about the file itself such implement a project of such scope, and the importance of
as the upload date and hash value, and, the most significant teamwork and systematic coordination.
section of the interface, the analysis summary.
For future works, we will implement this project on par
with what was stated previously in the software design
specifications phase, including testing, deploying, and
launching the platform.
REFERENCES

[1] Z. Bazrafshan, H. Hashemi and S. Fard, "A survey on heuristic


malware detection techniques," in IEEE, Shiraz, 2013.

[2] A. Kamal, M. Derbali, S. Jan, J. I. Bangash, F. Q. Khan, H. Jerbi, R.


Abbassi and G. Ahmed, "A User-friendly Model for
Ransomware Analysis Using Sandboxing," CMC-Computers,
Fig. 7. MalScanner Results Page. Materials & Continua, vol. 67, no. 3, pp. 3833--3846, 2021.

[3] M. Ijaz, M. H. Durad and M. Ismail, "Static and Dynamic Malware


Since this is meant to be an application used by anyone, Analysis Using Machine Learning," in 2019 16th International
the results themselves are presented as three separate Bhurban Conference on Applied Sciences and Technology
components: the time taken to get the result, the final verdict (IBCAST), Islamabad, Pakistan, 2019.
on whether or not the file is malicious, and finally the [4] H.-V. Le and Q.-D. Ngo, "V-Sandbox For Dynamic Analysis IoT
application’s confidence percentage on said verdict. For Botnet," IEEE Access, vol. 8, pp. 145768-145786, 2020.
simplicity's sake, only these parts have been included in the [5] R. Kumar, K. Sethi, N. Prajapati, R. R. Rout and P. Bera, "Machine
results interface to show the outcome of the nature of the file. Learning based Malware Detection in Cloud Environment
using Clustering Approach," in 2020 11th International
To display how this solution should be used, and the Conference on Computing, Communication and Networking
sequence of its use cases; the following sequence diagram is Technologies (ICCCNT), Kharagpur, India, 2020.
inserted. J. Sainadh, Navya, Y. Sai, P. Raja, G. Tagore and G. K. Rao,
[6]
"Dynamic Malware analysis Using Cuckoo Sandbox," in 2018
Second International Conference on Inventive Communication
and Computational Technologies (ICICCT), Coimbatore,
India, 2018.

[7] C.-H. Lin, H.-K. Pao and J.-W. Liao, "Efficient dynamic malware
analysis using virtual time control mechanics," Computers &
Security, vol. 73, no. 0167-4048, pp. 359-373, 2018.

[8] K. a. L. S.-Y. a. Y. K. Lee, "Machine Learning Based File Entropy


Analysis for Ransomware Detection in Backup Systems," IEEE
Access, vol. 7, pp. 110205-110215, 2019.

[9] D. a. S. Z. Yuxin, "Malware Detection Based on Deep Learning


Algorithm," Neural Computing and Applications, vol. 31, p.
461–472, 2019.

[10] M. a. R. A. a. I. M. R. Chowdhury, "Malware Analysis and Detection


Using Data Mining and Machine Learning Classification," in
International Conference on Applications and Techniques in
Cyber Security and Intelligence, 2018.

[11] S. a. Y. Y. a. S. H. a. I. T. a. Y. T. {Tobiyama, "Malware Detection


with Deep Neural Network Using Process Behavior," in 2016
IEEE 40th Annual Computer Software and Applications
Fig. 8. Use Case Sequence Diagram. Conference (COMPSAC), 2016.

[12] S. a. L. S. a. C. W. a. L. Y. Yang, "A Real-Time and Adaptive-


V. CONCLUSION Learning Malware Detection Method Based on API-Pair
Graph," IEEE Access, vol. 8, pp. 208120-208135, 2020.
MalScanner is an elegant solution that can be used by
anyone anywhere with a simple upload; thus, preventing the [13] O. Ajayi, M. Cherian and T. Saadawi, "Secured Cyber-Attack
Signatures Distribution using," in IEEE International
attack from occurring in the first place. With an easy to use- Conference on Computational Science and Engineering (CSE)
interface and quick response time thanks to the underlying AI and IEEE International, New York, 2019.
training and blockchain technology, employees will be more S. Talukder, S. Roy and T. Al Mahmud, "A Distributed Anti-
[14]
likely to use it. Analyzing received files is critical in a tech- Malware Database Management System using BlockChain,"
dependent environment. This solution will save the BlockChain for IoT Security and Management, p. 6, December
companies money and data loss in the long run while 2018.
simultaneously educating computer-users on the dangers of [15] R. Fuji, S. Usuzaki, K. Aburada, H. Yamaba, T. Katayama, M. Park,
downloading files without inspecting them beforehand. N. Shiratori and N. Okazaki, "Investigation on Sharing
Signatures of Suspected," in International MultiConference of
Working on the project proposal helped us get a grasp on Engineers and Computer Scientists 2019, Hong Kong, 2019.
the state of malware analysis in today’s age. We also learned [16] N. A. Dawit, S. S. Mathew and K. Hayawi, "Suitability of Blockchain
about project management. The requirements needed to for Collaborative Intrusion," in 12th Annual Undergraduate
Research Conference on Applied Computing (URC2020) , Abu
Dhabi, 2020.

[17] A. S. Alotaibi, "Biserial Miyaguchi–Preneel Blockchain-Based


Ruzicka-Indexed," sensors, vol. 21, p. 21, 2021.

[18] S. Lad and A. Adamuthe, "Improved Deep Learning Model for Static
PE Files Malware Detection and Classification," I. J. Computer
Network and Information Security, vol. 2, pp. 14-26, 2022.

[19] N. Balram, G. Hsieh and C. McFall, "Static Malware Analysis using


Machine Learning Algorithms on APT1 Dataset with String
and PE Header Features," in International Conference on
Computational Science and Computational Intelligence
(CSCI), Las Vegas, 2019.

[20] K. Malik, M. Kumar, M. Sony, R. Mukhraiya, P. Girdhar and B.


Sharma, "Static Malware Detection And Analysis Using
Machine Learning Methods," Advances and Applications in
Mathematical Sciences, vol. 21, no. 7, pp. 4183-4196, 2022.

[21] A. Radwan, "Machine Learning Techniques to Detect Maliciousness


of Portable Executable Files," in International Conference on
Promising Electronic Technologies (ICPET), Gaza, 2019.

[22] S. Naz and D. Singh, "Review of Machine Learning Methods for


Windows Malware Detection," in International Conference on
Computing, Communicationn and Networking Technologies
(ICCCNT), Kanpur, 2019.

You might also like