0% found this document useful (0 votes)

35 views12 pages

PlausMal-GAN Plausible Malware Training Based On Generative Adversarial Networks For Analogous Zero-Day Malware Detection

The article presents PlausMal-GAN, a framework utilizing Generative Adversarial Networks (GAN) to enhance detection of analogous zero-day malware by generating high-quality and diverse malware images. The framework trains a discriminator to learn various malware features from both real and generated data, demonstrating improved performance in detecting zero-day malware. The study indicates that this approach is beneficial for developing and updating malware detection systems, addressing the challenges posed by traditional antivirus methods that often fail against zero-day threats.

Uploaded by

Manan Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views12 pages

PlausMal-GAN Plausible Malware Training Based On Generative Adversarial Networks For Analogous Zero-Day Malware Detection

Uploaded by

Manan Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TETC.2022.3170544, IEEE
Transactions on Emerging Topics in Computing
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, VOL. XX, NO. XX, XX 2022 1

PlausMal-GAN: Plausible Malware Training

Based on Generative Adversarial Networks for
Analogous Zero-day Malware Detection
Dong-Ok Won, Yong-Nam Jang, and Seong-Whan Lee, Fellow, IEEE

Abstract—Zero-day malicious software (malware) refers to a previously unknown or newly discovered software vulnerability. The
fundamental objective of this paper is to enhance detection for analogous zero-day malware by efficient learning to plausible generated
data. To detect zero-day malware, we proposed a malware training framework based on the generated analogous malware data using
generative adversarial networks (PlausMal-GAN). Thus, the PlausMal-GAN can suitably produce analogous zero-day malware images
with high quality and high diversity from the existing malware data. The discriminator, as a detector, learns various malware features
using both real and generated malware images. In terms of performance, the proposed framework showed higher and more stable
performances for the analogous zero-day malware images, which can be assumed to be analogous zero-day malware data. We
obtained reliable accuracy performances in the proposed PlausMal-GAN framework with representative GAN models (i.e., deep
convolutional GAN, least-squares GAN, Wasserstein GAN with gradient penalty, and evolutionary GAN). These results indicate that the
use of the proposed framework is beneficial for the detection and prediction of numerous and analogous zero-day malware data from
noted malware when developing and updating malware detection systems.

Index Terms—Zero-day Malware, Analogous Malware Detection, Malware Augmentation, Malware Data, Generative Adversarial
Networks
F

1 I NTRODUCTION

M ALWARE can be defined as malicious software that

is designed to cause outages, denial of activity, col-
lection of personal data without user consent, unauthorized
tems typically cannot detect zero-day malware. Zero-day
malware is an important threat to computer security, and
zero-day malware detection is a top priority for malware
access to system resources, and similar inappropriate behav- detection systems.
iors. With the rapid development of information technology, To detect zero-day malware, we propose a deep learning
the exponential increase in malware has become one of method of generating arbitrarily modified malware features
the main threats to computer security [1]–[3]. Malicious using the malware’s raw code without running it. Malware
software detection has become more difficult as the number code based on specific rules and actions generates certain
and variety of applications increase in computer security patterns. Examples of the malware sample used in this study
[4]–[6], with more than 143 thousand new malicious pro- are shown in Figure 1 and Figure 11 [9], [10].
grams targeting mobile devices detected during 2013 [5], While, when dealing with classification tasks using neu-
and as Kaspersky Lab’s research shows that nearly 30% of ral networks, data augmentation techniques have been used
all computers were threatened at least once during 2018 [7]. to compensate for imbalance or data insufficiency problems.
Zero-day malware is an unknown or unaddressed soft- In the malware detection research area, several papers also
ware vulnerability that hackers use to do malicious things, used simple data augmentation techniques (e.g., sliding
such as destroying programs, stealing data, or paralyzing window, transformation, etc.) to deal with these issues [11],
networks [8]. A range of antivirus systems and other strate- [12].
gies are used to help protect against the introduction of mal- In this study, we investigated and focused on the
ware, which helps in detection if such malware is already different direction of malware training technique with
present. Antivirus systems typically fail to detect zero-day generating zero-day malware data, not focused imbalance
malware because they rely on signatures to identify mal- or data insufficiency. We proposed a plausible malware
ware. Computers are more vulnerable to zero-day malware training framework capable of detecting analogous
than to general malware because traditional antivirus sys- zero-day malware that can handle newly plausible
malware (Plausible malware training framework based
• D.-O. Won is with the Department of Artificial Intelligence Convergence,
on generative adversarial networks, PlausMal-GAN).
Hallym University, Republic of Korea. E-mail: [email protected] Our main contribution is the proposed malware training
• Y.-N. Jang is with the Department of Brain and Cognitive Engineering, framework based on generative adversarial networks
Korea University, Seoul, Republic of Korea. E-mail: yn [email protected] (GAN) with generated analogous malware samples. The
D.-O. Won and Y.-N. Jang authors contributed equally to this work.
• S.-W. Lee is with the Department of Artificial Intelligence, Korea Univer- proposed framework trains a generator and discriminator
sity, Seoul, Republic of Korea. E-mail: [email protected] based on real malware data and the generated malware
S.-W. Lee is the corresponding author. data in the first phase. In the second phase, the generator
Manuscript received February 24, 2021; revised October 7, 2021. is fixed and the discriminator is re-trained based on real

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TETC.2022.3170544, IEEE
Transactions on Emerging Topics in Computing
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, VOL. XX, NO. XX, XX 2022 2

geneous deep-learning framework composed of an autoen-

coder stacked up with a layer of associative memory and
multilayer restricted Boltzmann machines [27]. Kabanga et
al. used data from converted malware images as input to
the convolutional neural networks (CNNs) [28]. Yan et al.
used CNN and long short-term memory networks to learn
Ramnit Lollipop Kelihos_ver3 from grayscale image and opcode sequence, respectively,
and takes a stacking ensemble for malware classification
[11]. The aforementioned methods have disadvantages that
detect only certain variants of malware. The developers of
malware use obfuscation techniques, such as null byte in-
jection, code exchange, and subroutine reordering, to create
new variants with signatures different from existing mal-
ware. However, the aforementioned methods use malware
Vundo Simda Tracur that has been discovered so far. Thus, unlearned malware
will not be detected. To detect attacks that bypass deep-
learning methods [33], Wang et al. proposed a resistant
method that is robust to adversarial malware samples by
nullifying arbitrary features [33]. However, in this way,
malware characteristics are randomly removed, which risks
removing not only unnecessary features but also important
Kelihos_ver1 Obfuscator.ACY Gatak ones. There are now hybrid methods that combine static and
dynamic methods [22], [34]. While these methods can be
Fig. 1: Examples of nine type classes of malware images. effective for malware detection, they have the disadvantage
of being time-consuming and highly complex.

malware data and the generated malware data by the fixed Recently, there have been some methods developed for
generator. Ideally, the proposed framework can apply any zero-day malware detection [13], [14], [35], [36]. Venka-
kind of GAN model, so we evaluated the performance by traman and Alazab used a similarity matrix of malware
applying the latest and repetitive GAN models. Moreover, for visualization in order to detect zero-day malware [14].
we obtained stable performance for abundant analogous This method can be used to visually observe that different
zero-day malware test data in relatively few training data malware families exhibit significantly different behavior
conditions. patterns. Gupta and Rani proposed a big data framework
to address the big data problem caused by increase in mal-
ware [35]. They also attempted to detect zero-day malware
2 BACKGROUND using big data analysis techniques and machine-learning
2.1 Malware Detection algorithms.
Owing to the increasing damage caused by malware and This method modeled a series of opcodes to detect zero-
zero-day malware, research on malware detection methods day malware. Due to the increasing threat of malware in
have been continuously improving. We discuss two aspects a cyber-physical system, Huda et al. proposed a detection
of malware detection: malware detection and zero-day mal- method that uses methods like SVM and K-means to detect
ware detection. unknown malware by extracting knowledge and essential
Several reported studies have dealt with malware de- structures from already unlabeled, cheap, available data
tection [10], [13]–[17]. Nataraj et al. presented a visualiza- [36]. In the aforementioned zero-day malware detection
tion approach that differs from traditional approaches for methods, certain rules are fixed, and zero-day malware that
malware detection [10], where they transformed the mal- does not follow these rules cannot be detected. Recently,
ware’s binary information into grayscale malware images. Kim et al. has proposed transferred deep-convolutional
Ye et al. and Ndibanje et al. used Windows Audit Log generative adversarial network (tDCGAN), which generates
and API Call for malware detection [18], [19]. Traditional fake malware and learns to distinguish it from real malware
machine learning algorithms such as hidden Markov mod- [13]. This method obtained not only enhanced performance
els, support vector machines (SVMs) and random forests in malware detection but also showed possibility in a zero-
were also used for malware detection [20]–[23]. Singh et al. day attack experiment. Since the method is no consideration
proposed a big data analysis framework based on random of high diversity (e.g., plausible diversity) or quality in gen-
forests for malware detection [24]. Chen et al. attempted to erated zero-day malware, nor was it measured numerically
detect malware by analyzing mobile network traffic with (i.e., fréchet inception distance, etc.), it is difficult to assume
machine-learning methods [25]. Recently, there have been that focused on zero-day malware detection. While, we
many methods to use deep learning and generative ad- implemented analogous zero-day malware classifier with
versarial networks (GAN) because the available computing GAN models to create new high-diversity and high-quality
power has increased [11], [12], [26]–[31]. Pascanu et al. used malware images for generating plausible malware augmen-
recurrent neural networks for time-series information in tation. The generated data is used to create a robust detector
malware classification [26], [32]. Ye et al. presented a hetero- for zero-day malware detection.

2.2 Data Augmentation

Malware Classification loss Class
Class
Data Augmentation encompasses a suite of techniques that Samples (Malware, Generated)
enhance the size and quality of training datasets such that x ~ ܲௗ௔௧௔ (x) Yes
better deep learning models can be built using them [37],
Did
[38]. The simple data augmentations based on basic image
manipulations are flipping, cropping, rotation, translation, Class c, Noise z ~ ܲ(c,z)
D Gఏ
Discriminator converge ?
etc [37], [38]. Recently, GAN based approach refers to the Generated
practice of creating artificial instances from a dataset such Gఏభ ǡఏమ ǥ ఏഋ Samples
No
that they retain similar characteristics to the original set [39], Generator
[40]. In malware detection, several papers applied data aug- Adversarial loss
Real / Fake
mentation method to solve imbalance or data insufficiency Classification loss
Class (Malware)
issues [12], [41]. Classification loss
Class (Generated)
To our best knowledge, there have been no studies to
date which focused on the high diversity and quality of
plausible malware in terms of analogous malware augmen- Fig. 2: The architectures of the proposed framework for
tation, which is an important factor to be investigated for analogous zero-day malware detection.
various transformations or analogous data augmentation
using a zero-day malware detection system. In this study,
we proposed a plausible malware training framework based to distinguish between the real data sample x ∼ pdata (x)
on GAN that could consider high diversity in generating and the generated data sample x̂ ∼ pgen (x̂).
analogous zero-day malware data. Moreover, the proposed
method showed stable performance even with relatively lit- LD = −Ex∼pdata [log D(x)] − Ex̂∼pgen [log(1 − D(x̂))]. (1)
tle training data. We applied different kinds of several recent
GAN models (i.e., deep convolutional GAN (DCGAN) [42],
least-squares GAN (LSGAN) [43], Wasserstein GAN with
3 M ETHODS
gradient penalty (WGAN-GP) [44], evolutionary GAN (E-
GAN) [40]) to our design, it could be shown as a potentially In this section, we describe a plausible malware train-
reliable adaptation in state-of-the-art GAN models. ing framework based on generative adversarial networks
(GAN) that generates analogous malware with a malware
classifier and training discriminator as a malware detector.
2.3 Generative Adversarial Networks Figure 2 is an architectures of our proposed framework.
GAN [39] is a deep-learning model that emerged for the
purpose of generating data similar to the training data using 3.1 PlausMal-GAN Framework
the given training data. Unlike the original GAN, which
uses only one objective function (e.g., minimax), Wang et al. To generate analogous malware samples for each kind of
proposed E-GAN [40] using several objective functions (i.e., malware, the proposed framework trains a generator and
minimax, heuristic, and least-squares). Generators using discriminator based on GAN with a malware classifier using
each objective function are evaluated by a discriminator, real malware data and the generated malware data in the
and the best-performing generator is chosen to evolve to the first step. The discriminator not only discriminates real or
next stage. In the process of evolution, the evolved generator fake, but also learns to classify malware classes. In the
is expected to gradually adapt to the discriminator, which second step, the generator is fixed and the discriminator
means that the evolved generator can provide high-quality, is re-trained based on real malware data and the generated
high-diversity samples and learn the real data distribution. malware data by the fixed generator. Figure 3 shows the
The evolutionary process consists of three stages (i.e., varia- overview and process of the proposed framework. The aux-
tion, evaluation, and selection): iliary classifier GAN (AC-GAN) [45] proposed a structure
that produces data that matches class labels as well as
First, the variation stage used the variation operators to
data that are close to real data. For malware classifier, the
produce its offspring {Gθ1 , Gθ2 , ...}, given an individual
architectures of the proposed framework is following the
Gθ in the population. In particular, several copies of each
AC-GAN structures (Figure 2). Our malware generator gen-
individual or parent were created, each of which was mod-
erates fake malware samples x̂ that contain noise sample z
ified by different mutations. Then, each modified copy is
by malware class c, and discriminator not only distinguishes
regarded as one child. Second, in the evaluation stage, we
between real x ∼ pdata (x) and fake x̂ ∼ pgen (x̂) but also
evaluated the performance or individual quality for each
class c. The difference between our method and the existing
child by a fitness function F that depends on the current
AC-GAN is that the discriminator does not learn the class
environment (i.e., discriminator D). Third, in the selection
information of the generated malware sample, only the class
stage, we selected all children according to their values and
information of the real malware sample. Our discrimination
removed the worst ones. The rest remained alive (i.e., free
training loss is defined as follows:
to act as parents) and evolved to the next iteration.
Compared to the generator using multiple objective
LD = − Ex∼pdata [log D(x) − log p(c|x)]
functions, the discriminator is the same as the objective (2)
function of the original GAN. The discriminator D is trained − Ex̂∼pgen [log(1 − D(x̂))].

Real
(a) Samples
Malware classes
Ramnit
…
ࡳࣂ૚ ࡳࣂ૛ ࡳࣂೖ Lollipop
‫ܯ‬૚ ‫ܯ‬૛ ‫࢑ܯ‬ Kelihos_ver3

Generated Discriminator D Vundo

Evaluation

Generator ࡳࣂ Samples
Simda
Noise ा
Tracur
Kelihos_ver1
Class ࢉ
Obfuscator.ACY

Gatak

…
ࡳࣂ૚ ࡳࣂ૛ ࡳࣂ࢑
ऐ૚ ൐ ऐ૛ ൐ … ൐ ऐ௞
Real or Fake

(b) Malware classes

Fixed (not training) Ramnit
Generated
Generator ࡳࣂ Samples Malware Detector Lollipop
Noise ा Discriminator D Kelihos_ver3

Vundo
Class ࢉ Real
Samples Simda

Tracur
Kelihos_ver1
Obfuscator.ACY

Gatak

Fig. 3: The proposed PlausMal-GAN framework consists of two-phases. (a) The generator and discriminator training
based on GAN with malware classifier. (b) Training the discriminator as a zero-day malware detector from plausible
malware augmentation. For an intuitive explanation, it is shown using evolutionary GAN, which is one of the
representative GANs.

And, we considered standard GAN approach (minmax), (i.e., D(x̂) → 0). In other words, if the discriminator is
least-squares approach, heuristic approach, and combin- confident that the generated malware data is fake malware
ing the preceding three-approach for DCGAN, LSGAN, data, the generator may not train well. However, we have
WGAN-GP, and E-GAN model in the proposed framework, been able to solve this problem to some extent by adding
respectively. In E-GAN, we considered an evolutionary a classification loss. Unlike early gentle gradients, if the
step consists of three sub-steps: variation, evaluation, and generated malware distribution is somewhat similar to the
selection. In the variation step, we adopt three objectives real malware distribution, the minimax mutation provides a
that are interpretable and complementary as mutations pro- steep gradient, which later allows stable learning.
posed by Wang et al. [40]. As shown in Figure 4, the dif-
ference between the three objective functions are minimax
mutation, heuristic mutation, and least-squares mutation. Mminimax = Ex̂∼pgen [log(1 − D(x̂)) − log p(c|x̂)]. (3)
G
In addition, we added a classification loss function to the
existing mutation functions, because not only the data is
close to real but also data corresponding to the class must be The heuristic mutation minimizes the log probability that
generated. The minimax mutation is similar to the minimax the discriminator will do well, which maximizes the log
objective function of the original GAN, which aimed to probability that the discriminator will go wrong. Using this
minimize the log probability that the discriminator would mutation, the gradient is steep even though the discrimi-
do well. In the original GAN, gradient vanishing can occur nator is convinced that the generated malware data is fake.
when the discriminator produces a result close to zero Thus, the heuristic mutation can avoid a vanishing gradient,
unlike the minimax mutation, which suggests the possibility

Algorithm 1 Plausible malware training framework (i.e.,

with E-GAN case)
Require: batch size m = 32. discriminator’s updating
steps per iteration nD = 1; number of parents µ = 1;
number of mutations nm = 3; Adam hyper-parameters
α = 0.0002, β1 = 0.5, β2 = 0.99; the hyper-parameter γ
of evaluation function.
Require: initial discriminator’s parameters w0 . initial gen-
µ
erator’s parameters {θ01 , θ02 , ..., θ0 }.
for number of training iterations do
for k = 0, ... , nD do n om
Sample a batch of x(i) ∼ pdata (training data),
n oi=1
m
and a batch of (c, z)(i) ∼ pc,z (noise sample z
i=1
by class c).
m
1 X
gw ←∇w [ log Dw (x(i) )
m i=1
µ m/µ
1 XX
Fig. 4: Mutation (or objective) functions with classification + log(1 − Dw (Gθj ((c, z)(i) )))
loss function. m j=1 i=1
µ m/µ
1 XX
+ log p(c(i) |x(i) )]
of better learning in the early stages than the minimax m j=1 i=1
mutation. w ← Adam(gw , w, α, β1 , β2 )
end for
Mheuristic = −Ex̂∼pgen [log(D(x̂)) + log p(c|x̂)]. (4) for j = 0, ... , µ do
G
for h = 0, ... , nm do n om
Lastly, the least-squares mutation is similar to the least- Sample a batch of (c, z)(i) ∼ pc,z (noise sam-
squares objective function of the LSGAN, which aimed at i=1
ple z by classh c). n
deceiving the discriminator by penalizing the generator. om i
Using this mutation, we get a gentle slope overall and can gθj,h ← ∇θj MhG (c, z)(i) , θj
i=1
j,h
avoid a vanishing gradient as in a heuristic mutation. Be- θchild ← Adam gθj,h , θj , α, β1 , β2
sides, least-squares mutations, when compared to heuristic F j,h ← Fqj,h + γFdj,h
mutations, do not assign very high costs to generate fake end for
malware samples but do not assign very low costs to mode end
j for
dropping, which partially avoids mode collapse [43]. F 1 ,h1 , F j2 ,h2 , . . . ← sort F j,h

j1 ,h1 j2 ,h2 jµ ,hµ
θ1 , θ2 , . . . , θµ ← θchild , θchild , . . . , θchild
Mleast-s. = Ex̂∼pgen (D(x̂) − 1)2 − log p(c|x̂) . end for

G (5)
In the evaluation step, the 1) malware quality and 2) diver-
sity of the generated malware samples are measured and the generated malware data is fake. In contrast, when the
evaluated. To detect zero-day malware, it was important generator generates data that does not change the discrim-
to generate samples of high-diversity malware with high inator gradient significantly, the generated malware data is
quality, so we adopted the evaluation step of the E-GAN not labeled as fake and tends to achieve high diversity.
architecture. First, the quality fitness score was used as a
measure of quality. This method puts the generated malware
image based on the noise sample by class into discriminator Fd = − log ||∇D || . (7)
D and uses the output value. We use the output of D
multiplied by the probability of that class to measure the Using the two fitness scores mentioned above, the criterion
image quality score for each class. And, we use the average for the E-GAN evaluation is as follows:
output value. The closer the value is to 1, the closer to reality
the malware data is. In other words, the closer to 1, the
F = Fq + γFd (8)
higher quality malware data.
where γ > 0 is the balance between the quality and diversity
measurements.
Fq = Ex̂∼pgen [D(x̂) × log p(c|x̂)]. (6)
In the selection step, the offspring with the highest
Second, the diversity fitness score is used as a measure fitness score is selected and proceeds to the next variation
of malware diversity. This method uses the minus log- step. Throughout the evolution process, the generator will
gradient-norm of the discriminator. When the generator gradually generate data for each class as well as generating
generates data that greatly changes the gradient of the data similar to real data. We use the converged generator for
discriminator, the discriminator is likely to determine that malware detection in the next step.

3.2 Malware Detection

16k
For analogous zero-day malware augmentation, the mal- R= (11)
C
ware generator generates high-quality and high-diversity
The malware images were so large that they were reduced
images. We use the discriminator’s classifier as a malware
to 128 × 128 using Pillow which python image library. Then
detector. The discriminator has trained anew as a malware
we used jet colormaps to represent RGB color images.
detector without adversarial training with the generator. As
a malware detector, the discriminator is trained using both 4.1.2 Malimg dataset
generated and real malware images. The objective function
In Supplementary Materials Appendix C, we show the fre-
of the discriminator is as redefined:
quency distribution of malware families and their variants
LD = −Ex∼pdata [log p(c|x)] − Ex̂∼pgen [log p(c|x̂)] (9) in the Malimg dataset [10]. We were able to find malware
data from malware class that shared the family name (i.e.,
when training the discriminator, the generator is not trained
Worm: Allaple.A and Allaple.L, PWS: C2Lop.gen!G and
and only generates malware images. Figure 3 shows training
C2Lop.P, Trojan: Lolyda.AA1 and Lolyda.AA2, TDown-
the discriminator with data augmentation as a malware
loader: Swizzor.gen!I and Swizzor.gen!E). In Table 1 and
detector.
Figure 11, eight different malware data have four pairs with
Algorithm 2 Training discriminator based on the proposed two different and similar family names and shared similar
framework properties. For the second zero-day malware experiments,
Require: batch size m = 32; discriminator’s updating we evaluated malware data with similar properties family
steps per iteration nD = 1; Adam hyper-parameters in the Malimg dataset, which consists of 5,543 malware
α = 0.0002, β1 = 0.5, β2 = 0.99. samples from 8 different malware families.
Require: initial discriminator’s parameters w0 ; initial gen- TABLE 1: Malware data with similar family names in the
erator’s parameters θ0 . Malimg dataset for the second zero-day malware
for number of training iterations do experiment
for k=0,...,nD do n om
Sample a batch of x(i) ∼ pdata (training data), Malware family names Type No. of Variants
n oi=1
m
and a batch of (c, z)(i) ∼ pc,z (noise sample z Allaple.A Worm 2949
i=1 Allaple.L Worm 1591
by class c). C2Lop.gen!G PWS 200
m
1 X C2Lop.P PWS 146
gw ←∇w [ log p(c(i) |x(i) ) Lolyda.AA1 Trojan 213
m i=1 Lolyda.AA2 Trojan 184
m Swizzor.gen!I TDownloader 132
1 X
log p(c(i) |Gθ ((c, z)(i) ))]
+ Swizzor.gen!E TDownloader 128
m i=1
w ← Adam(gw , w, α, β1 , β2 )
end for 4.2 Experimental Details
end for The experiment is divided into two parts: a existing mal-
ware classification and a analogous zero-day malware attack
experiments. In the existing malware classification exper-
4 E XPERIMENTS AND R ESULTS iment, we compared the proposed framework with rep-
resentative GANs (i.e., DCGAN, LSGAN, WGAN-GP, and
This section describes the experiments and results for eval-
E-GAN) and previous methods experimental results [13].
uating the proposed framework.
In the proposed framework, we used the same network
structure (Supplementary Table S2). In the first analogous
4.1 Datasets
zero-day malware attack experiment, we also compared
4.1.1 Microsoft malware classification challenge dataset our framework with the four GAN models and previous
To verify the data generation and detection performance methods results (i.e., random forest, decision tree, nearest
of the proposed framework, we used a malware data from neighbors, Naive Bayes, multi-layer perceptron (MLP) [46],
the Microsoft dataset [9]. The malware file was a byte file, CNN [47], GAN [39], and tDCGAN [13]). In the second
and we used binary code written to it. The total number zero-day malware experiment, we compared the proposed
of malware is 10,868, divided into 9,781 training sets and framework phase 1 and phases 1&2 with the representative
1,087 test sets (9:1 train-test ratio). Appendix B shows the four GAN models.
malware data types used and the number of malware for The operating system of the computer used in the exper-
each malware type [9]. iments was Ubuntu 16.04.2 LTS, and the central processing
As Nataraj et al. did [10], we convert malware binary unit was Intel Xeon Gold 6148. The random-access memory
code into an image called malware image. If k is the length of was Samsung DDR4 16 GB × 4, and the graphics processing
the binary code, C is the size of the converted column, and unit was TITAN XP. When implementing the proposed
R is the size of the converted row, this is how to calculate framework, we used the Pytorch library. The generative and
the size of the converted columns and rows: discriminative network architectures used in the generator
√
log 16k
and discriminator respectively, are shown in Supplementary
+1
C=2 log 2 (10) Table S2.

(a) (b)

Fig. 5: Examples of (a) real malware images and (b) generated malware images in the proposed framework.

TABLE 2: FIDs between generated malware images and

real malware images in the proposed framework

E-GAN
Model DCGAN LSGAN WGAN-GP
(r = 0.1 , r = 0.5)

$FFXUDF\

FID 220.16 190.70 206.23 146.39, 127.96

4.3 Analysis of Generated Malware Data '&*$1

/6*$1

Figure 5 shows examples of the generated malware images :$13

(*$1
using the Microsoft dataset [9]. In qualitative terms, Figure

5 shows the generation of malware images that are similar

to the real malware images, which shows that the proposed ,WHUDWLRQV
framework can also generate modified malware or analo-
gous zero-day malware. Fig. 6: Classification accuracy according to the training
We choose the Fréchet inception distance (FID) [48] as iterations for the proposed framework with four
a quantitative metric for evaluating generator convergence. representative models.
The FID uses pre-trained Inception v3 networks to extract
features of the generated images and real images. Then
experiment using the Microsoft dataset [9]. The average
model the data distribution for extracted features using a
classification accuracy achieved by the proposed framework
multivariate Gaussian distribution with mean µ and covari-
was 95.56%, which means that the performance of our
ance Σ. The FID between the real images x and generated
proposed framework was much better than the previous
images g is computed as below:
methods. Table 3 shows the numerical classification results
2 with four difference models (i.e., DCGAN, LSGAN, WGAN-
FID(x, g) = kµx − µg k2
1
(12) GP, and E-GAN). Because the performance was the most
+ Tr Σx + Σg − 2 (Σx Σg ) 2 , dominant when using the E-GAN model, only the proposed
framework with this model was used for some further
where Tr is the sum of all the diagonal elements.
analysis (i.e., Table 4, Figures 7 and 9).
A lower FID implies that the distribution distance be-
To verify the performance of the proposed malware
tween the real images and generated images is closer. It also
classifier model, we showed a confusion matrix in Figure
means that the generated images have high quality and high
7. We calculated the precision, recall, and F1-score for each
diversity. As shown in Table 2, our proposed framework has
malware type and summarized them in Table 4. Also, we
the lowest FID score. This means that the generator of our
compared the classification accuracies for the proposed
proposed framework generated a high-quality and high-
framework with difference four GAN models according to
diversity malware sample. While low FIDs do not actually
the training iterations in Figure 6. In results, the E-GAN
produce new malware, it is likely a variant of existing
models showed higher classification performance than other
malware. This allows us to expect data augmentation with
Representative models.
the generated data.
4.5 Zero-day Malware
4.4 Malware Classification 4.5.1 Zero-day malware experiment I using generated
To derive a more accurate estimate of model prediction analogous zero-day malware
performance, we used 10-fold cross-validation for all meth- We modeled plausible zero-day malware for analogous
ods and it was used for the existing malware classification zero-day malware attack experiments using the Microsoft

TABLE 3: Comparison of malware classification accuracies in the proposed framework with four representative GAN
models and previous methods

Proposed Framework (PlausMal-GAN)

Model MLP CNN GAN tGAN
DCGAN LSGAN WGAN-GP E-GAN
Accuracy (%) 83.06 94.63 87.81 88.10 94.99 96.02 94.86 96.35
Std. dev. 7.54e-04 2.12e-05 3.44e-05 8.05e-05 0.596 0.351 0.255 0.539

TABLE 5: Comparison of analogous zero-day malware

attack performances in two difference combined rates (CR)
for the proposed framework and previous methods (%)

Model\SSIM 0.60 0.62 0.64 0.66 0.68 CR

Random Forest 91.28 95.19 92.88 95.58 91.40
Decision Tree 95.64 96.46 96.71 96.41 96.18
Nearest Neighbors 97.71 97.72 98.36 98.34 98.09
Naive Bayes 90.60 90.89 91.51 91.16 90.45
MLP 96.78 96.46 97.26 97.23 96.82
CNN 98.16 98.23 98.63 98.61 98.41
GAN 96.32 96.96 96.99 96.95 96.50
tGAN 97.24 96.96 97.81 97.78 97.45
8:2
tDCGAN 98.39 98.73 98.63 98.61 98.41
Proposed framework
(with DCGAN) 99.59 99.66 99.60 98.58 99.54
(with LSGAN) 99.43 99.66 99.64 98.41 99.54
(with WGAN-GP) 99.59 99.66 99.60 98.58 98.63
(with E-GAN) 99.94 100.0 99.74 99.58 99.84
Proposed framework
(with DCGAN) 99.02 99.25 99.28 96.52 99.05
(with LSGAN) 97.28 99.10 99.42 97.00 99.05
Fig. 7: Confusion matrix for malware classification results (with WGAN-GP) 99.02 99.70 99.38 96.52 97.94
7:3
in 9:1 train-test ratio. (with E-GAN) 99.86 100.0 99.51 98.42 99.68

TABLE 4: Results of precision, recall, and F1-score for each

malware type in the proposed framework 5). The plausible zero-day malware modeling with noise is
calculated as follows:
R L K3 V S T K1 O G
Precision 0.954 0.971 0.996 0.854 0.666 0.864 1.000 0.982 0.960
Recall 0.961 0.975 0.993 0.979 0.500 0.933 0.975 0.894 0.950 Nk (x, y) = (1 − ξ)x + ξy, (14)
F1-score 0.957 0.973 0.994 0.912 0.571 0.897 0.987 0.936 0.955
where SSIM(x, y) > k , ξ is 0.3, 0.2 in combined ratio 7:3
and 8:2, respectively. Figure 8 shows examples of deformed
dataset (Figure 8) [9]. The previous study assumed that plausible zero-day malware.
the zero-day attacks can be modeled by introducing noise The results of the analogous zero-day malware attack
into existing malware data [13]. The noise was generated experiment in Table 5 divided the malware images into an
by the structure similarity (SSIM) method, which uses the experiment with an 8:2 combined ratio and a 7:3 combined
structural similarity of images [49]. We likewise used the ratio. We used 10-fold cross-validation (i.e., the train-test
SSIM method for systematic noise generation. The method ratios: 9:1). In 8:2 combined ratio experiments, the proposed
of calculating the SSIM values for a pair of images x,y frameworks’ models were more accurate than other previ-
includes calculating µx , µy as the means for the pixels of ous recent methods [13], and we obtained stable accuracy
the images x, y . performance in our frameworks with tested GAN mod-
els in all SSIM conditions. Moreover, in the 7:3 combined
ratio experiments, we also obtained reliable high aver-
(2µx µy + c1 ) (2σxy + c2 ) aged performance 98.62%, 98.37%, 98.51%, and 99.49% for
SSIM(x, y) = (13)
µ2x+ µ2y + c1 σx2 + σy2 + c2 the proposed framework methods with DCGAN, LSGAN,
WGAN-GP, and E-GAN model, respectively. In particular,
2 2
where, c1 = (k1 L) , c2 = (k2 L) , k1 = 0.01, k2 = 0.03, L = the decreasing SSIM values or combined high noise ratio
2# bits per pixel − 1. could be an analogous zero-day attack compared to exist-
We used altered malware images with 0.02 intervals ing malware, but the proposed framework showed stable
between 0.60 and 0.68 (SSIM value) for analogous zero-day performances in any SSIM values or combined ratios. As
malware evaluation [13], [49]. Then, for more diverse zero- a result, the proposed framework obtained high and stable
day malware evaluation, we regenerated the transformed performance even the large variations of existing malware
malware images with two combined ratios such as 7:3 and (e.g., combined ratio 7:3 or SSIM value 0.6) in a analogous
8:2 ratios. We also compared the proposed framework with zero-day malware attack. Moreover, we were conducted in
previous methods results (in 8:2 combined ratio) [13] (Table few training data condition by the changing train-test ratios

‫ݔ‬ ‫ݕ‬ ‫ݔ‬ ‫ݕ‬

+ +
ͳ െ ߦ ‫ ݔ‬൅ ߦ‫ݕ‬ ͳ െ ߦ ‫ ݕ‬൅ ߦ‫ݔ‬ ͳ െ ߦ ‫ ݔ‬൅ ߦ‫ݕ‬ ͳ െ ߦ ‫ ݕ‬൅ ߦ‫ݔ‬

(a) (b)

Fig. 8: Examples of plausible zero-day malware with SSIM values of (a) 0.60 and (b) 0.68.

experiment (9:1→5:5) for a thorough performance verifica-

tion evaluation with 2-fold cross-validation (10-fold → 2-
fold cross-validation). This experiment was able to evaluate
more various zero-day malware data by increasing the num-
ber of existing test data (the average number of zero-day
malware data: 506 (122∼1,122) → 14,850 (4,262∼33,710)).
As shown in Table 6 and Figure 9, we obtained stable
test performance even though not only the relatively few
training data (reduced to half) but also increased analogous
zero-day malware test data in the proposed framework (>
99%).

4.5.2 Zero-day malware experiment II using malware data

with similar family names
We conducted a zero-day malware attack experiment II with
different class malware data sharing the family name with
similar properties from Malimg dataset [10]. We discovered
data from the Malimg dataset that are very suitable for
use in zero-day malware experiments (Table 1 and Figure
Fig. 9: Confusion matrix for zero-day malware classification 11). We trained and tested four classes using two differ-
results in 5:5 train-test ratio with 8:2 combined ratio and ent family name data with similar properties (Four types
0.64 SSIM. (5,543); Worm: Allaple.A (2,949) and Allaple.L (1,591), PWS:
C2Lop.gen!G (200) and C2Lop.P (146), Trojan: Lolyda.AA1
(213) and Lolyda.AA2(184), TDownloader: Swizzor.gen!I
TABLE 6: Results of performances for zero-day malware (132) and Swizzor.gen!E (128)). For richer interpretation and
attack experiment in the few training data conditions (%) analysis, we designed the zero-day experiment into two ses-
sions and conducted training and testing. For session A, the
Model\SSIM 0.60 0.62 0.64 0.66 0.68 CR training dataset (3,494) consists of Allaple.A, C2Lop.gen!G,
Proposed framework
Lolyda.AA1, Swizzor.gen!I, and the test dataset (2,049) con-
(with DCGAN) 97.58 98.74 98.83 97.76 97.73 sists of Allaple.L, C2Lop.P, Lolyda.AA2, Swizzor.gen!E. In-
(with LSGAN) 98.79 99.17 99.29 98.79 98.74
8:2 versely, session B consists of a training dataset (2,049) and
(with WGAN-GP) 97.92 98.73 98.96 98.43 98.23 a test dataset (3,494). Session B has a challenging problem
(with E-GAN) 99.11 99.42 99.51 99.06 99.25
of learning with a small amount of training data. This is
Proposed framework a big issue not only in the field of machine learning but
(with DCGAN) 97.99 98.82 98.96 98.00 98.09
(with LSGAN) 98.98 99.18 99.64 99.06 98.78 also in developing malware detection, especially zero-day
7:3 malware detection technology. Even if it is derived from
(with WGAN-GP) 98.10 98.64 99.04 98.33 98.30
(with E-GAN) 99.40 99.67 99.70 99.31 99.13 the same malware family, it is zero-day malware that is
not previously learned, and it can cause a big performance

(a) Proposed framework with only phase 1 (b) Proposed framework with whole phases

Fig. 10: Confusion matrix for zero-day malware classification results (session B) in the proposed framework (E-GAN) with
(a) only phase 1 and (b) whole phases (1&2) used similar malware family from the Malimg dataset.

TABLE 7: Comparison of zero-day malware classification

accuracies for second zero-day experiment (Malimg
dataset) in the proposed framework with only phase 1 and
whole phases
Allaple.A C2Lop.gen!G Lolyda.AA1 Swizzor.gen!I
Proposed Train
DCGAN LSGAN WGAN-GP E-GAN
framework session
with only A 87.70 82.67 85.06 90.53
phase 1 B 39.15 39.78 39.32 39.78
with phase A 100 98.19 99.95 99.21
1&2 B 98.42 97.82 96.88 98.74
Allaple.L C2Lop.P Lolyda.AA2 Swizzor.gen!E

Fig. 11: Examples of eight malware images from Malimg

dataset for second zero-day experiment [10]. the proposed framework can learn very effectively when
there is little data, showing excellent performance in the
zero-day malware detection problem.
degradation problem in the initial period as there is a very In practice, it is known that zero-day malware is often
limited data to learn. To verify that the proposed framework derived from variations of existing malware [8], [13]. To
can handle zero-day malware problems and a few data explore the limits in the performance of proposed frame-
issues, we designed a second zero-day experiment using works, we performed on the restricted dataset for evaluation
a similar malware family from the Malimg dataset. The even using two different datasets [9], [10]. The first zero-
experiment consists of the training sessions that were not day experiment designed assumes a plausible zero-day
only composed of session A and B, but also we evaluated malware attack by transforming existing malware instead
the proposed framework with only phase 1 and with phases of the actual zero-day malware attack data. Additionally,
1&2. The proposed framework deal with analogous new we designed other zero-day experiments using a similar
data by composing phase 1 to train the generator and malware family from different malware types. Although
discriminator and phase 2 to train the discriminator on the we have obtained outstanding results in various zero-day
analogous zero-day malware data. In Table 7 and Figure experiments, we might have obtained more meaningful
10, we showed that the models trained up to phase 2 interpretation and discussion if we measured and utilized
performed better than only phase 1 learned in all sessions (A a richer malware database.
and B). In particular, very interesting results were obtained While, the GAN based image-processing approach
in session B, where training was performed with a small method has a one-way limitation about malware code to
amount of training data. In session B, the result of learning the image in the malware detection field [8], [13], [29].
only phase 1 of the proposed framework was disastrous However, conversion to the malware code is not required
in all tested GAN models. This experiment demonstrates to achieve the goals and objectives of this study. In this
that existing GAN studies (i.e., phase 1 in the proposed paper, the proposed framework is to detect a myriad of
framework) may not respond properly to new data. On the similar malware that can be made with slight changes. Even
other hand, the final model trained up to phase 2 of the if the proposed framework cannot reproduce the malware
proposed framework showed very stable and high averaged code, it is a model that can detect and classify the analogous
accuracy (> 98.65%) (Table 7 and Figure 10). Consequently, malware with high similarity to the learned sample malware

data. In addition, if a new type of zero-day malware that is R EFERENCES

not used for learning appears, the proposed method also [1] A. Mosenia and N. K. Jha, “A Comprehensive Study of Security
has the advantage of being able to quickly learn about the of Internet-of-Things,” IEEE Transactions on Emerging topics in
new type of malware and apply it. Therefore, in terms of Computing, vol. 5, no. 4, pp. 586–602, 2016.
practicality and convenience, it is a very helpful framework [2] A. Li, S. Xue, X. Li, L. Zhang, and J. Qian, “AppDNA: Profiling
App Behavior via Deep-Learning on Function Call Graphs,” IEEE
when developed zero-day malware detection software. Transactions on Emerging Topics in Computing, 2020.
Meanwhile, as it is known from the adversarial attack, [3] S. Homayoun, A. Dehghantanha, M. Ahmadzadeh, S. Hashemi,
the performance of many machine learning based systems and R. Khayami, “Know Abnormal, Find Evil: Frequent Pattern
Mining for Ransomware Threat Hunting and Intelligence,” IEEE
is greatly reduced and neutralized by small distortion (e.g., Transactions on Emerging topics in Computing, vol. 8, no. 2, pp. 341–
combining noise, etc.) [50], [51]. This is no different in this 351, 2017.
field, and some hackers will be taking this vulnerability. [4] T. Saha, N. Aaraj, N. Ajjarapu, and N. K. Jha, “Sharks: Smart hack-
Therefore, it is necessary to build a robust and stability ing approaches for risk scanning in internet-of-things and cyber-
physical systems based on machine learning,” IEEE Transactions on
security system from these easy modifications. The pro- Emerging Topics in Computing, 2021.
posed framework is intuitively generating and learning a [5] W. Zhang, Y. Wen, and X. Zhang, “Towards Virus Scanning as a
plausible new malware from existing malware, and it can Service in Mobile Cloud Computing: Energy-Efficient Dispatching
be a complementary measure to deal with these challenge Policy Under N -Version Protection,” IEEE Transactions on Emerg-
ing Topics in Computing, vol. 6, no. 1, pp. 122–134, 2015.
problems. [6] S. D. SL and C. Jaidhar, “Windows malware detector using con-
volutional neural network based on visualization images,” IEEE
Transactions on Emerging Topics in Computing, 2019.
5 C ONCLUSIONS [7] “Kaspersky Security Bulletin 2018. Statistics,”
In the present study, the proposed framework based on 2018. [Online]. Available: https://securelist.com/
kaspersky-security-bulletin-2018-statistics/89145
plausible malware training and augmentation using a gener-
[8] M. Grace, Y. Zhou, Q. Zhang, S. Zou, and X. Jiang, “Riskranker:
ative adversarial network was to solve the problems caused Scalable and Accurate Zero-day Android Malware Detection,” in
by malware and analogous zero-day malware. In particular, Proceedings of the 10th International Conference on Mobile Systems,
because zero-day malware is often created by the defor- Applications, and Services, 2012, pp. 281–294.
[9] R. Ronen, M. Radu, C. Feuerstein, E. Yom-Tov, and M. Ahmadi,
mation of existing malware, the proposed framework with “Microsoft Malware Classification Challenge,” arXiv preprint
representative GAN models augmented even for the high- arXiv:1802.10135, 2018.
quality and high-diversity evolved malware images. For de- [10] L. Nataraj, S. Karthikeyan, G. Jacob, and B. Manjunath, “Malware
tection and classification, the discriminator was trained us- images: Visualization and Automatic Classification,” in Proceedings
of the 8th International Symposium on Visualization for Cyber Security,
ing malware images generated by the generator and robust 2011, p. 4.
to zero-day malware. Moreover, the proposed framework [11] J. Yan, Y. Qi, and Q. Rao, “Detecting Malware with an Ensemble
achieved high and stable averaged accuracy in the analo- Method Based on Deep Neural Network,” Security and Communi-
cation Networks, vol. 2018, p. 7247095, 2018.
gous zero-day malware attack experiment. We believe that [12] Z. Cui, F. Xue, X. Cai, Y. Cao, G.-g. Wang, and J. Chen, “Detec-
the proposed framework based plausible zero-day malware tion of Malicious Code Variants Based on Deep Learning,” IEEE
detection approach has important advantages for antivirus Transactions on Industrial Informatics, vol. 14, no. 7, pp. 3187–3196,
systems in the computer security because it does not require 2018.
[13] J.-Y. Kim, S.-J. Bu, and S.-B. Cho, “Zero-day malware detection
inefficient malware signatures analysis. In this study, the using transferred generative adversarial networks based on deep
malware code has been converted to malware images with autoencoders,” Information Sciences, vol. 460, pp. 83–102, 2018.
fixed sizes through crop and pad operations for efficient [14] S. Venkatraman and M. Alazab, “Use of Data Visualisation for
Zero-Day Malware Detection,” Security and Communication Net-
learning. In fact, the processes could reduce the signatures works, vol. 2018, p. 1728303, 2018.
of malware. In future studies, we will expand the mal- [15] F. Xiao, Z. Lin, Y. Sun, and Y. Ma, “Malware Detection Based
ware types with various malware datasets (including zero- on Deep Learning of Behavior Graphs,” Mathematical Problems in
day malware) and solve the problem of various malware Engineering, vol. 2019, p. 8195395, 2019.
[16] J. Zhu, J. Jang-Jaccard, and P. A. Watters, “Multi-loss siamese
lengths. Moreover, further research should be conducted to neural network with batch normalization layer for malware de-
develop an optimized GAN model performing in our protection,” IEEE Access, vol. 8, pp. 171 542–171 550, 2020.
posed framework for extensive zero-day malware detection. [17] S. Sharmeen, S. Huda, J. Abawajy, and M. M. Hassan, “An
In future studies it will be interesting to use explainable adaptive framework against android privilege escalation threats
using deep learning and semi-supervised approaches,” Applied
AI techniques (e.g., [52]) to gain a further understanding Soft Computing, vol. 89, p. 106089, 2020.
of zero-day malware features, thus allowing the zero-day [18] K. Berlin, D. Slater, and J. Saxe, “Malicious Behavior Detection us-
malware detection AI and its creators to learn better from ing Windows Audit Logs,” in Proceedings of the 8th ACM Workshop
on Artificial Intelligence and Security, 2015, pp. 35–44.
their mistakes. Moreover, cases of extreme changes, such as [19] B. Ndibanje, K. H. Kim, Y. J. Kang, H. H. Kim, T. Y. Kim, and
new type of zero-day malware, deserve further investigation H. J. Lee, “Cross-Method-Based Analysis and Classification of
to extend the possible application spectrum. Malicious Behavior by API Calls Extraction,” Applied Sciences,
vol. 9, no. 2, p. 239, 2019.
[20] C. Annachhatre, T. H. Austin, and M. Stamp, “Hidden Markov
ACKNOWLEDGMENTS models for malware classification,” Journal of Computer Virology
and Hacking Techniques, vol. 11, no. 2, pp. 59–73, 2015.
This work was supported by Institute of Information & [21] S.-W. Lee and A. Verri, Pattern Recognition with Support Vector
communications Technology Planning & Evaluation (IITP) Machines: Proc. of First International Workshop, Niagara Falls, Canada.
grant funded by the Korea government (MSIT) (No. 2019- Springer, 2003.
0-00079, Artificial Intelligence Graduate School Program [22] P. Wang and Y.-S. Wang, “Malware behavioural detection and
vaccine development by using a support vector model classifier,”
(Korea University); No. 2021-0-02068, Artificial Intelligence Journal of Computer and System Sciences, vol. 81, no. 6, pp. 1012–
Innovation Hub). 1026, 2015.

[23] F. C. C. Garcia, I. Muga, and P. Felix, “Random Forest for Malware [44] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein GAN,” arXiv
Classification,” arXiv preprint arXiv:1609.07770, 2016. preprint arXiv:1701.07875, 2017.
[24] K. Singh, S. C. Guntuku, A. Thakur, and C. Hota, “Big Data Ana- [45] A. Odena, C. Olah, and J. Shlens, “Conditional Image Synthesis
lytics framework for Peer-to-Peer Botnet Detection using Random with Auxiliary Classifier GANs,” in Proceedings of International
Forests,” Information Sciences, vol. 278, pp. 488–497, 2014. Conference on Machine Learning. JMLR. org, 2017, pp. 2642–2651.
[25] Z. Chen, Q. Yan, H. Han, S. Wang, L. Peng, L. Wang, and B. Yang, [46] J. L. McClelland, D. E. Rumelhart, P. R. Group et al., “Parallel Dis-
“Machine learning based mobile malware detection using highly tributed Processing,” Explorations in the Microstructure of Cognition,
imbalanced network traffic,” Information Sciences, vol. 433, pp. 346– vol. 2, pp. 216–271, 1986.
364, 2018. [47] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classi-
[26] R. Pascanu, J. W. Stokes, H. Sanossian, M. Marinescu, and fication with Deep Convolutional Neural Networks,” in Advances
A. Thomas, “Malware classification with recurrent networks,” in in Neural Information Processing Systems, 2012, pp. 1097–1105.
Proceedings of IEEE International Conference on Acoustics, Speech and [48] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochre-
Signal Processing, 2015, pp. 1916–1920. iter, “GANs Trained by a Two Time-Scale Update Rule Converge
[27] Y. Ye, L. Chen, S. Hou, W. Hardy, and X. Li, “DeepAM: a to a Local Nash Equilibrium,” in Advances in Neural Information
heterogeneous deep learning framework for intelligent malware Processing Systems, 2017, pp. 6626–6637.
detection,” Knowledge and Information Systems, vol. 54, no. 2, pp. [49] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image
265–285, 2018. Quality Assessment: From Error Visibility to Structural Similarity,”
IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612,
[28] E. K. Kabanga and C. H. Kim, “Malware Images Classification
2004.
using Convolutional Neural Network,” Journal of Computer and
[50] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “To-
Communications, vol. 6, no. 1, p. 153, 2017.
wards Deep Learning Models Resistant to Adversarial Attacks,”
[29] V. S. Bhaskara and D. Bhattacharyya, “Emulating malware authors arXiv preprint arXiv:1706.06083, 2017.
for proactive protection using gans over a distributed image visu- [51] N. Carlini and D. Wagner, “Adversarial Examples Are Not Easily
alization of dynamic file behavior,” arXiv preprint arXiv:1807.07525, Detected: Bypassing Ten Detection Methods,” in Proceedings of the
2018. 10th ACM Workshop on Artificial Intelligence and Security, 2017, pp.
[30] W. Hu and Y. Tan, “Generating adversarial malware examples for 3–14.
black-box attacks based on gan,” arXiv preprint arXiv:1702.05983, [52] S. Lapuschkin, S. Wäldchen, A. Binder, G. Montavon, W. Samek,
2017. and K.-R. Müller, “Unmasking clever hans predictors and assess-
[31] M. Kawai, K. Ota, and M. Dong, “Improved malgan: Avoid- ing what machines really learn,” Nature communications, vol. 10,
ing malware detector by leaning cleanware features,” in 2019 no. 1, p. 1096, 2019.
international conference on artificial intelligence in information and
communication (ICAIIC). IEEE, 2019, pp. 040–045.
[32] S.-W. Lee and H.-H. Song, “A new recurrent neural-network
architecture for visual pattern recognition,” IEEE Transactions on
Neural Networks, vol. 8, no. 2, pp. 331–340, 1997.
Dong-Ok Won received his B.S. degree in Com-
[33] Q. Wang, W. Guo, K. Zhang, A. G. Ororbia II, X. Xing, X. Liu, and puter Engineering from Tech University of Ko-
C. L. Giles, “Adversary Resistant Deep Neural Networks with an rea, Republic of Korea, in 2012, and his Ph.D.
Application to Malware Detection,” in Proceedings of the 23rd ACM degree in Department of Brain and Cognitive
SIGKDD International Conference on Knowledge Discovery and Data Engineering from Korea University, Republic of
Mining, 2017, pp. 1145–1153. Korea, in 2019. He is currently working as an
[34] Z.-U. Rehman, S. N. Khan, K. Muhammad, J. W. Lee, Z. Lv, S. W. assistant professor in the Department of Artificial
Baik, P. A. Shah, K. Awan, and I. Mehmood, “Machine learning- Intelligence at Hallym University, Republic of Ko-
assisted signature and heuristic-based detection of malwares in rea. His research interests are pattern recogni-
Android devices,” Computers & Electrical Engineering, vol. 69, pp. tion, machine learning, artificial intelligence, and
828–841, 2018. computer security.
[35] D. Gupta and R. Rani, “Big Data Framework for Zero-Day Mal-
ware Detection,” Cybernetics and Systems, vol. 49, no. 2, pp. 103–
121, 2018.
[36] S. Huda, S. Miah, M. M. Hassan, R. Islam, J. Yearwood, M. Alruba-
ian, and A. Almogren, “Defending unknown attacks on cyber- Young-Nam Jang received M.S. degree in De-
physical systems by semi-supervised approach and available un- partment of Brain and Cognitive Engineering
labeled data,” Information Sciences, vol. 379, pp. 211–228, 2017. from Korea University, Republic of Korea, in
[37] C. Shorten and T. M. Khoshgoftaar, “A survey on Image Data 2020. His research interests are pattern recog-
Augmentation for Deep Learning,” Journal of Big Data, vol. 6, no. 1, nition, machine learning, and computer security.
p. 60, 2019.
[38] H.-G. Jung and S.-W. Lee, “Few-Shot Learning with Geometric
Constraints,” IEEE Transactions on Neural Networks and Learning
Systems, 2020.
[39] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
S. Ozair, A. Courville, and Y. Bengio, “Generative Adversarial
Nets,” in Advances in Neural Information Processing Systems, 2014,
pp. 2672–2680.
[40] C. Wang, C. Xu, X. Yao, and D. Tao, “Evolutionary Generative
Adversarial Networks,” IEEE Transactions on Evolutionary Compu- Seong-Whan Lee (S’84–M’89–SM’96–F’10) re-
tation, vol. 23, no. 6, pp. 921–934, 2019. ceived the B.S. degree in computer science and
[41] R. Burks, K. A. Islam, Y. Lu, and J. Li, “Data Augmentation with statistics from Seoul National University, Seoul,
Generative Models for Improved Malware Detection: A Compar- Republic of Korea, in 1984, and the M.S. and
ative Study,” in Proceedings of the IEEE 10th Annual Ubiquitous Ph.D. degrees in computer science from the
Computing, Electronics & Mobile Communication Conference, 2019, Korea Advanced Institute of Science and Tech-
pp. 0660–0665. nology, Republic of Korea, in 1986 and 1989,
[42] A. Radford, L. Metz, and S. Chintala, “Unsupervised Represen- respectively. He is currently the Head of the
tation Learning with Deep Convolutional Generative Adversarial Department of Artificial Intelligence, Korea Uni-
Networks,” arXiv preprint arXiv:1511.06434, 2015. versity, Republic of Korea. His current research
[43] X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. Paul Smolley, interests include artificial intelligence, pattern
“Least Squares Generative Adversarial Networks,” in Proceedings recognition, and brain engineering. Dr. Lee is a fellow of the International
of the IEEE International Conference on Computer Vision, 2017, pp. Association of Pattern Recognition (IAPR) and the Korea Academy of
2794–2802. Science and Technology.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

5) Automated, Reliable Zero-Day Malware Detection Based On Autoencoding Architecture
No ratings yet
5) Automated, Reliable Zero-Day Malware Detection Based On Autoencoding Architecture
15 pages
Enhancing Malware Detection Lightweight DAE GAN For Low Resource Devices
No ratings yet
Enhancing Malware Detection Lightweight DAE GAN For Low Resource Devices
24 pages
"MIGAN: GAN for Malware Image Synthesis"
No ratings yet
"MIGAN: GAN for Malware Image Synthesis"
22 pages
Malware Detection Using Convolutional Neural Network, A Deep Learning Framework: Comparative Analysis
No ratings yet
Malware Detection Using Convolutional Neural Network, A Deep Learning Framework: Comparative Analysis
14 pages
Malware Detection and Classification Using Generative Adversarial Network
No ratings yet
Malware Detection and Classification Using Generative Adversarial Network
18 pages
A Novel Machine Learning Approach For Detecting First-Time-Appeared Malware
No ratings yet
A Novel Machine Learning Approach For Detecting First-Time-Appeared Malware
17 pages
A New Malware Detection Model Using
No ratings yet
A New Malware Detection Model Using
9 pages
Deep Learning for Malware Detection
No ratings yet
Deep Learning for Malware Detection
16 pages
Dynamic Malware Detection in Wireless Networks Using Deep Learning
No ratings yet
Dynamic Malware Detection in Wireless Networks Using Deep Learning
16 pages
Leveraging Reinforcement Learning and Generative Adversarial Networks To Craft Mutants of Windows Malware Against Black-Box Malware Detectors
No ratings yet
Leveraging Reinforcement Learning and Generative Adversarial Networks To Craft Mutants of Windows Malware Against Black-Box Malware Detectors
8 pages
Guarding Against The Unknown - Deep Transfer Learning For Hardware
No ratings yet
Guarding Against The Unknown - Deep Transfer Learning For Hardware
18 pages
When Machine Learning Meets Hardware Cybersecurity Delving Into Accurate Zero-Day Malware Detection
No ratings yet
When Machine Learning Meets Hardware Cybersecurity Delving Into Accurate Zero-Day Malware Detection
6 pages
Adversarial Examples For Malware Detection: Abstract
No ratings yet
Adversarial Examples For Malware Detection: Abstract
18 pages
Detecting 0day
No ratings yet
Detecting 0day
8 pages
Malware Detection Using Deep Learning
No ratings yet
Malware Detection Using Deep Learning
9 pages
Deep Learning in Malware Detection
No ratings yet
Deep Learning in Malware Detection
19 pages
A Malware Detection Approach Using Autoencoder in Deep Learning
No ratings yet
A Malware Detection Approach Using Autoencoder in Deep Learning
11 pages
Promptsam+: Malware Detection Based On Prompt Segment Anything Model
No ratings yet
Promptsam+: Malware Detection Based On Prompt Segment Anything Model
13 pages
2024-A2-CLM Few-Shot Malware Detection Based On Adversarial Heterogeneous Graph Augmentation
No ratings yet
2024-A2-CLM Few-Shot Malware Detection Based On Adversarial Heterogeneous Graph Augmentation
16 pages
FuzzyRNN NIT SUB 2columns PDF
No ratings yet
FuzzyRNN NIT SUB 2columns PDF
8 pages
Generating Adversarial Malware Examples For Black-Box Attacks Based On GAN
No ratings yet
Generating Adversarial Malware Examples For Black-Box Attacks Based On GAN
7 pages
Windows Operating System Malware Detection Using M
No ratings yet
Windows Operating System Malware Detection Using M
10 pages
TSP Cmes 58352
No ratings yet
TSP Cmes 58352
19 pages
First Review B19
No ratings yet
First Review B19
24 pages
Malware Application Detection Using Machine Learning
No ratings yet
Malware Application Detection Using Machine Learning
8 pages
Design and Performance Analysis of An Anti-Malware System Based On Generative Adversarial Network Framework
No ratings yet
Design and Performance Analysis of An Anti-Malware System Based On Generative Adversarial Network Framework
26 pages
GR20 Final
No ratings yet
GR20 Final
10 pages
A Comparative Analysis of Malware
No ratings yet
A Comparative Analysis of Malware
10 pages
Obfuscated Malware Detection Using Deep Generative Models
No ratings yet
Obfuscated Malware Detection Using Deep Generative Models
13 pages
Deep Learning Models For Real-Time Automatic Malware Detection - Docx Main
No ratings yet
Deep Learning Models For Real-Time Automatic Malware Detection - Docx Main
17 pages
676006d84b482 IJAR-49403
No ratings yet
676006d84b482 IJAR-49403
15 pages
6 Thsemminiproject
No ratings yet
6 Thsemminiproject
12 pages
Innovation in Cyber Threat Detection: Transformer-Based Approach
No ratings yet
Innovation in Cyber Threat Detection: Transformer-Based Approach
15 pages
A Survey of The Recent Trends in Deep Le
No ratings yet
A Survey of The Recent Trends in Deep Le
30 pages
Combining Supervised and Unsupervised Learning For Zero-Day Malware Detection PDF
No ratings yet
Combining Supervised and Unsupervised Learning For Zero-Day Malware Detection PDF
9 pages
Radon Transform Based Malware Classification in Cyb 2024 Results in Control
No ratings yet
Radon Transform Based Malware Classification in Cyb 2024 Results in Control
14 pages
Detecting Malware Activities With MalpMiner A Dynamic Analysis Approach
No ratings yet
Detecting Malware Activities With MalpMiner A Dynamic Analysis Approach
13 pages
The Curious Case of Machine Learning in Malware Detection: Sherif Saad, William Briguglio and Haytham Elmiligi
No ratings yet
The Curious Case of Machine Learning in Malware Detection: Sherif Saad, William Briguglio and Haytham Elmiligi
9 pages
Review 0
No ratings yet
Review 0
7 pages
Cybersecurity ML for Malware Detection
No ratings yet
Cybersecurity ML for Malware Detection
15 pages
A Novel Method For Malware Detection On ML-based Visualization Technique
No ratings yet
A Novel Method For Malware Detection On ML-based Visualization Technique
41 pages
Zero Day Ransomware Detection With Pulse
No ratings yet
Zero Day Ransomware Detection With Pulse
14 pages
Machine Learning in Malware Detection
No ratings yet
Machine Learning in Malware Detection
8 pages
Dynamic Android Malware Category Classification
No ratings yet
Dynamic Android Malware Category Classification
8 pages
Robustness of Image-Based Malware Classification Models Trained With Generative Adversarial Networks
No ratings yet
Robustness of Image-Based Malware Classification Models Trained With Generative Adversarial Networks
7 pages
Ijcna 2021 o 56
No ratings yet
Ijcna 2021 o 56
18 pages
2110 13409
No ratings yet
2110 13409
20 pages
Comprehensive Review On CNN-based Malware Detection With Hybrid Optimization Algorithm
No ratings yet
Comprehensive Review On CNN-based Malware Detection With Hybrid Optimization Algorithm
13 pages
Malware Classification For The Cloud Via Semi-Supervised Transfer Learni
No ratings yet
Malware Classification For The Cloud Via Semi-Supervised Transfer Learni
13 pages
Malcode Detection
No ratings yet
Malcode Detection
5 pages
A - Multi-Strategy - Adversarial - Attack - Method - For - Deep - Learning - Based - Malware - Detectors
No ratings yet
A - Multi-Strategy - Adversarial - Attack - Method - For - Deep - Learning - Based - Malware - Detectors
5 pages
3malware Husein
No ratings yet
3malware Husein
4 pages
Electronics 11 03665 v2
No ratings yet
Electronics 11 03665 v2
20 pages
12741-Article Text-43097-3-10-20240910
No ratings yet
12741-Article Text-43097-3-10-20240910
14 pages
Chapter One 1.1 Background of The Study
No ratings yet
Chapter One 1.1 Background of The Study
40 pages
Malware Detection Using Deep Learning (DL)
No ratings yet
Malware Detection Using Deep Learning (DL)
21 pages
GDPR and Travel Industry PDF
100% (1)
GDPR and Travel Industry PDF
70 pages
Rbi - Operative Guidelines For Mobile Banking
100% (2)
Rbi - Operative Guidelines For Mobile Banking
7 pages
Viya 4 Basic Troubleshooting
No ratings yet
Viya 4 Basic Troubleshooting
5 pages
ELearnSecurity EWPT Notes
No ratings yet
ELearnSecurity EWPT Notes
370 pages
PDF FINAL V3 ISACA GeneralTemplate 2018+-+CyberSec2018
No ratings yet
PDF FINAL V3 ISACA GeneralTemplate 2018+-+CyberSec2018
12 pages
Lecture004 Wireless&Mobile Security
100% (1)
Lecture004 Wireless&Mobile Security
20 pages
Senior IT Officer CV & CoverLetter
No ratings yet
Senior IT Officer CV & CoverLetter
4 pages
NCSC-TG-005 Trusted Network Interpretation (Red Book)
No ratings yet
NCSC-TG-005 Trusted Network Interpretation (Red Book)
299 pages
Error Cognos
No ratings yet
Error Cognos
27 pages
To The World OF Engineers
No ratings yet
To The World OF Engineers
14 pages
Cryptology for Python Beginners
No ratings yet
Cryptology for Python Beginners
92 pages
Cybersecurity For ICS Industry Day Presentation - NAVFAC
No ratings yet
Cybersecurity For ICS Industry Day Presentation - NAVFAC
17 pages
Kerberos (Protocol) : Kerberos Is A Computer Network Authentication Protocol Which Allows Individuals
No ratings yet
Kerberos (Protocol) : Kerberos Is A Computer Network Authentication Protocol Which Allows Individuals
4 pages
T REC X.Sup32 201803 I!!PDF E
No ratings yet
T REC X.Sup32 201803 I!!PDF E
26 pages
Cyberoam Certified Network & Security Professional (CCNSP) : Learning
No ratings yet
Cyberoam Certified Network & Security Professional (CCNSP) : Learning
117 pages
Blockchain
No ratings yet
Blockchain
1 page
Salesforce Internship Report
No ratings yet
Salesforce Internship Report
37 pages
Nis Manual
No ratings yet
Nis Manual
38 pages
Basic It & Cyber Security Awareness Training
No ratings yet
Basic It & Cyber Security Awareness Training
31 pages
Script HTML Password FBTXT PDF Free
No ratings yet
Script HTML Password FBTXT PDF Free
1 page
ChromLab Security Edition
No ratings yet
ChromLab Security Edition
114 pages
RYUK Advisory Draft CP June 2019
No ratings yet
RYUK Advisory Draft CP June 2019
9 pages
InfoSec for Business Managers
No ratings yet
InfoSec for Business Managers
8 pages
Cybersecurity Skills & Education Overview
No ratings yet
Cybersecurity Skills & Education Overview
1 page
Identity as a Service (IDaaS) Guide
No ratings yet
Identity as a Service (IDaaS) Guide
53 pages
Cybersecurity Compliance Guide
100% (1)
Cybersecurity Compliance Guide
9 pages
Cisco ASA 5500 Configuration Guide
No ratings yet
Cisco ASA 5500 Configuration Guide
56 pages
Guidance For The Critical Infrastructure Risk Management Program
No ratings yet
Guidance For The Critical Infrastructure Risk Management Program
15 pages
MC4205 - Cyber Security-1
No ratings yet
MC4205 - Cyber Security-1
159 pages
CCNP Security SCORE 350 701 Full Guide 1730643753
No ratings yet
CCNP Security SCORE 350 701 Full Guide 1730643753
300 pages