Proceedings of the Asia-Pacific Advanced Network 2013 v. 36, p. 23-31.
[Link]
ISSN 2227-3026
Detecting Android Malware by Analyzing Manifest Files
Ryo Sato1,*, Daiki Chiba2 and Shigeki Goto1,*
1 Waseda University / 3-4-1 Okubo, Shinjuku-ku, Tokyo, 169-8555 Japan
2 NTT Secure Platform Laboratories / 3-9-11 Midori-cho, Musashino-shi, Tokyo, 180-8585 Japan
E-mails: r-sato@[Link], [Link]@[Link], goto@[Link]
* Tel.: +81-3-5286-3182; Fax: +81-3-5286-3182
Abstract: The threat of Android malware has increased owing to the increasing
popularity of smartphones. Once an Android smartphone is infected with malware, the
user suffers from various damages, such as the theft of personal information stored in the
smartphones, the unintentional sending of short messages to premium-rate numbers
without the user’s knowledge, and the ability for the infected smartphones to be remotely
operated and used for other malicious attacks. However, there are currently insufficient
defense mechanisms against Android malware. This study proposes a new method to
detect Android malware. The new method analyzes only manifest files that are required in
Android applications. It realizes a lightweight approach for detection, and its effectiveness
is experimentally confirmed by employing real samples of Android malware. The result
shows that the new method can effectively detect Android malware, even when the
sample is unknown.
Keywords: Android; Malware; Manifest files; Mobile Security; Static Analysis.
1. Introduction
With the rapid entry of smartphones into daily lives, Android malware has been rapidly
spreading. The Android operating system (OS) is an easy target for attackers, because the market
share of Android has increased, and many Android applications are written in the Java
programming language.
According to a global survey of the smartphone OS market, Android possessed 68.8% of the
market share in 2012 [1], implying that the popularity of Android has undergone significant
growth. It is easy for malware to infect Android smartphones because of the large number of
phones. Moreover, Android applications are easy targets for reverse engineering, which is a
specific characteristic of Java applications in general, and which is often abused by malicious
1
attackers, who attempt to embed malicious program into benign applications, hence creating
subspecies of existing malware. Yajin et al. [2] illustrated that 86.0% of Android malware are
created by conversion from benign applications. Hence, Android is considered to be an easy
target for malicious attackers, and therefore, the privacy and integrity of the user’s data are
seriously threatened.
There have been numerous studies focusing on the detection of Android malware. One of the
popular approaches includes signature-based methods, which extract signatures from malware
samples. While it is effective for detecting known malware, it is inadequate for detecting
unknown malware. Iland et al. [3] suggested a detection method at the network level. They
observed network traffic originating from a sample application and tried to detect malware by
comparing with DNS-based and IP-address-based blacklists. This method cannot detect
unknown malware, because the blacklists are generated from known malicious activities. Isohara
et al. [4] presented a method for detecting malware by analyzing attributes of files within sample
applications. While this approach can detect only some unknown malware that are undetected by
blacklist or signature-based methods, the analysis cost depends on the number of files within
sample applications. Enck et al. [5] proposed a lightweight method to block the installation of
applications that have dangerous permissions or intent filter (a mechanism for realizing
cooperation between Android applications) combinations. However, the method may lead to
incorrect detections, because the information used in the method is not sufficient to differentiate
malware from benign applications. Wu et al. [6] developed a system to provide a static analysis
paradigm for detecting malware, called DroidMat. They obtained some distinguishable
characteristics such as permissions, components (essential functions such as Activity, Service
and Receiver) and API calls by analyzing manifest files and smali files (disassembly codes). This
system can discriminate between malware and benign applications. However, the cost of their
analysis depends on the size and numbers of smali files. Our preliminary study measured the
average file sizes and number of files that are the main resources in Android applications. Table
1 and 2 show the results. We investigated 30 benign samples and 30 malware samples.
Table 1. Average : File size (KB).
smali files Resources Manifest file
30 Benign samples 6305 4759 7
30 Malware samples 3036 1431 4
Table 2. Average : number of files.
smali files Resources Manifest file
30 Benign samples 674 385 1
30 Malware samples 249 101 1
From Table 1 and 2, we can observe that the cost of analyzing smali files is higher than that of
manifest file.
2
This study proposes a new method for detecting Android malware by analyzing only manifest
files. Each Android application must have a manifest file, which presents essential information
about the application. Our preliminary investigations revealed that there are certain differences
between the manifest files of benign applications and malware. Our proposed method is based on
the characteristic analysis of Android manifest files and is effective for detecting well-known
existing malware and unknown malware. Moreover, the cost is low, because this method
analyzes only a manifest file. Table 1 and 2 show a manifest file is usually a small file.
The remained of this study is organized as follows: Section 2 proposes our new method.
Section 3 describes the experiment used to demonstrate the effectiveness of the new detection
method. Section 4 concludes this paper and discusses the possible future extension of the
proposed method.
2. New Method for Detecting Android Malware
This study proposes a new method for detecting Android malware by analyzing only manifest
files. Android applications consist of the following resources: a manifest file, application
programs for Dalvik virtual machine (VM), and application resources. Figure 1 shows an
Android application package (.apk), which includes a manifest file.
Android
application (.apk)
Application
Application
Manifest
file
programs resources
Figure 1. Android application package (.apk).
The manifest file takes the form of “[Link],” which must be present in all
Android applications. Application programs are collected as “[Link].” Application resources
consist of pictures, music, and some xml files, which describe the layout information.
Android malware is detected by the following steps: [Step 1] Extract specific information
described in the manifest file of a sample application. [Step 2] Compare the extracted
information with the keyword lists that are provided in this new method. Then, calculate the
malignancy score of the sample by comparing the information in Step 1 with the lists. [Step 3]
Compare the malignancy score in Step 2 and the threshold values, which are set by this new
method. If the malignancy score exceeds the threshold value, the sample is judged as malware.
Figure 2 shows the flow of the new detection method.
3
Sample
application
Extract
specific
information
Step
1 described
in
the
manifest
file
Compare
with
the
keyword
lists
and
Step
2 calculate
the
malignancy
score
Compare
with
the
threshold
values
and
Step
3 perform
judgment
Benign Malicious
Figure 2. Flow of Detecting Android Malware.
2.1. Extract information items
Manifest files have essential information about Android applications, such as the version
number of an application, the name of a package, required permission, and the API level. The
format of the manifest file is identical in both benign applications and malware. However, there
are certain differences in the characteristics of several information items. We investigated 30
benign samples and 30 malware samples, giving a total of 60 samples. We selected specific
information items that show a wide variety of malware as compared to benign applications.
Table 3 shows six information items that are extracted from manifest files and that are used to
detect Android malware by the proposed method. The items are represented as text strings or
numbers.
Table 3. List of information items.
(1) Permission
(2) Intent filter (action)
(3) Intent filter (category)
(4) Process name
(5) Intent filter (priority)
(6) Number of redefined permission
2.2. Keyword lists and malignancy score
With this new method, several keyword lists are compiled for an application. Benign or
malicious strings in a manifest file are recorded in the keyword list. We make four types of
keyword lists: (1) Permission, (2) Intent filter (action), (3) Intent filter (category), and (4)
Process name. Because (5) Intent filter (priority) and (6) Number of redefined permission are
represented by an integer, and not a text string, we have no keyword lists for them. Figure 3
counts the number of keywords, which are classified as “Permission” items.
4
0
5
10
15
20
0
10
20
30
40
INTERNET
READ_PHONE_STATE
WRITE_EXTERNAL_STORAGE
INTERNET
ACCESS_NETWORK_STATE
SEND_SMS
WAKE_LOCK
WRITE_EXTERNAL_STORAGE
READ_PHONE_STATE
ACCESS_NETWORK_STATE
VIBRATE
RECEIVE_SMS
ACCESS_WIFI_STATE
READ_SMS
USE_CREDENTIALS
ACCESS_WIFI_STATE
MANAGE_ACCOUNTS
WRITE_SMS
SET_WALLPAPER
READ_CONTACTS
30 benign samples 30 malware samples
Figure 3. Permission keywords in each set of 30 samples.
Figure 3 shows the occurrences of popular permission keywords. This figure shows that the
permissions which are related to short message service (SMS), such as SEND_SMS,
RECEIVE_SMS, and READ_SMS, are frequently used by malware samples. These permissions
are registered with the keyword list as malicious strings. A similar process is also performed for
(2) Intent filter (action), (3) Intent filter (category), and (4) Process name. We have four keyword
lists, which are shown in Table 4. Most keywords are considered to be malicious, while some are
classified as benign.
Table 4. Keyword lists.
(List 1) Permission
1. READ_SMS 7. READ_HISTORY_BOOKMARKS
2. SEND_SMS 8. WRITE_HISTORY_BOOKMARKS
3. RECEIVE_SMS 9. READ_LOGS
4. WRITE_SMS 10. INSTALL_PACKAGES
5. PROCESS_OUTGOING_CALLS 11. MODIFY_PHONE_STATE
6. MOUNT_UNMOUNT_FILESYSTEMS
(List 2) Intent-filter (action)
1. BOOT_COMPLETED 8. INSTALL_SHORTCUT
2. SMS_RECEIVED 9. left_up
3. CONNECTIVITY_CHANGE 10. right_up
4. USER_PRESENT 11. left_down
5. PHONE_STATE 12. right_down
6. NEW_OUTGOING_CALL 13. SIG_STR
7. UNINSTALL_SHORTCUT 14. VIEW (benign keyword)
(List 3) Intent-filter (category) (List 4) Process name
1. HOME 1. remote2
2. BROWSABLE (benign keyword) 2. main
3. two
4. three
5
After we obtained the keyword lists, the malignancy score for the above four information
items are calculated. This process is performed by classifying the keywords as being benign or
malicious. The malignancy score is calculated by formula (1).
M − B
P = (1)
E
where P: malignancy score, M: number of malicious strings, B: number of benign strings,
E: number of total information items.
Table 5 shows an example. This sample uses five permissions items.
Table 5. Permissions keywords in a sample.
<uses-permission android:name=”[Link]” />
<uses-permission android:name=”[Link] PHONE STATE” />
<uses-permission android:name=”[Link] SMS” />
<uses-permission android:name=”[Link] SMS” />
<uses-permission android:name=”[Link] SMS” />
Among these five permissions, READ_SMS, RECEIVE SMS, and SEND SMS are recorded
in the keyword list and are classified as malicious strings in Table 4. Then, the malignancy score
of this sample is calculated by formula (2).
3 − 0
P = = 0.6 (2)
5
Similar calculations are also performed for (2) Intent filter (action), (3) Intent filter (category),
and (4) Process name. With regards to (5) Intent filter (priority), the set-up value is counted and
used for the judgment in Step 3. (6) Number of redefined permission is also counted and
considered.
2.3. Thresholds and judgment
The proposed method provides threshold values for the malignancy score. We use a data
mining tool, Weka [7], to determine the threshold values. As with the four categories of
information items (1), (2), (3), and (4), the threshold values are set using the Weka J48
algorithm, which is based on a decision tree. We use both benign samples and malicious samples
for machine learning. Specific samples are explained in Section 3. With regards to the threshold
value for items (5) and (6), we set the threshold value at 1000 for (5) and 3 for (6) based on the
result of our preliminary analysis, which was described in Section 2.1.
Judgment for an application sample is performed on the basis of conditions 1, 2, and formula
(3), which are given below. Condition 1 describes the characteristics of malware. Condition 2 is
made to avoid incorrect judgments. In formula (3), the SCORE refers to the final malignancy
6
score of the sample. C1 and C2 count the number of items satisfied by a sample in condition 1,
and condition 2, respectively.
Condition 1:
l Malignancy score is greater than the threshold value determined by Weka.
l Count of Intent filter (priority) is greater than the threshold value.
l Count of redefined permissions is greater than the threshold value
Condition 2:
l Malignancy score of (2) Intent filter (action) is negative (< 0)
l Malignancy score of (3) Intent filter (category) is negative (< 0)
Criteria formula:
SCORE = C1 − C2 (3)
If the final SCORE is greater than or equal to 1, the sample application is considered to be
malware.
3. Experiment
To evaluate the performance of the proposed method, we conducted the following experiment
with Android application samples.
3.1. Overview of the experiment
We collected 235 benign application samples and 130 malware samples. Benign samples
were collected from Google Play [8] and some related markets. Malware samples were obtained
from a web site that provides samples for research purposes [9]. All samples have a unique MD5
hash value and are classified into two groups: “Learning data” and “Test data.” Learning data is
used to determine the suitable threshold values used by Weka, and the keyword lists are the same
as in Table 4. Test data is used to evaluate the proposed new method. In this experiment, the
samples are first analyzed by VirusTotal [10], which is an on-line scanning tool for malware. We
classified a malware sample into the Learning data if the first registered data is before September
2011. The remaining malware samples are used for Test data. This date is selected to enable the
acquisition of a sufficient number of malware samples for learning and testing. We can treat
malicious Learning data as known samples and malicious Testing data as unknown samples.
Note that malicious Testing data include samples that are not detected by signature-based
methods. Incidentally, benign Learning data and Test data were randomly selected from the
collected benign application samples. Table 6 shows the number of samples that were used in
this experiment.
7
Table 6. Number of samples used in the experiment.
Learning data Test data Total
Benign samples 60 175 235
Malware samples 34 96 130
3.2. Result of the evaluation
Table 7 shows the result of the experiment. It shows that the correct ratio of detecting benign
samples is 91.4%, detecting malware samples is 87.5%, and it is 90.0% in total. This result
indicated that the proposed method can accurately classify Android applications. The samples
that are used as Learning data consist of only those whose first detected time is earlier than that
of any Test data samples. Therefore, the proposed method is shown to successfully extract
essential information from manifest files, although it only learns from old samples for which the
first detected times were before September 2011. Therefore, it can detect unknown malware
samples successfully.
Table 7. Result of the experiment.
Correct detection (%) Incorrect detection (%)
Benign samples 91.4 8.6
Malware samples 87.5 12.5
Total 90.0 10.0
3.3. Discussion
Some malware samples were not detected by the proposed method. We found that the
proposed method was inadequate for detecting adware samples. In addition to actions that
display some advertisements superfluously, there is often a marginal difference between a benign
application and adware. This means that both manifest files appear to be similar, and it is
difficult for the proposed method to effectively detect adware based on the manifest analysis.
4. Conclusion and Future works
This paper proposed a new detection method for Android malware. The advantage of this new
method is that it uses only manifest files to detect malware. Manifest files are required in all
Android applications, and thus, the proposed method is applicable to all Android applications.
Our results show that the proposed method can detect unknown malware samples that are
undetectable by a simple signature-based approach. Moreover, the cost of analyzing only the
manifest file is extremely low. The new method can also be combined with other methods to
realize an even more precise detection method.
8
Our evaluation uses only a small number of samples; only 365 samples in total. In future, we
plan to collect additional samples to obtain more precise results for the evaluation experiments.
The proposed method extracts six types of information from manifest files and uses them to
detect Android malware. The essential information items can be easily changed, and we should
closely observe trends in Android malware to determine whether to keep or revise the effective
information items in the manifest file.
References
1. IDC, “Android and iOS Combine for 91.1% of the Worldwide Smartphone OS Market in
4Q12 and 87.6% for the Year,” Feb 2013.
[Link] (accessed on 21 Feb. 2013).
2. Yajin Z.; Xuxian J. Dissecting Android Malware: Characterization and Evolution. Security
and Privacy (SP), 2012 IEEE Symposium on, San Francisco, USA, 2012, 5, 95–109.
3. Iland D.; Pucher A.; Schauble T. Detecting Android Malware on Network Level.
University of California, Santa Barbara, 2011, 12.
4. Isohara T.; Kawabata H.; Yakemori K.; Kubota A.; Kani J.; Agematsu H.; Nishigaki A.
Detection Technique of Android Malware with Second Application. Proceedings of
Computer Security Symposium 2011, Niigata, Japan, 2011, 10, 19-21.
5. Enck W.; Ongtang M.; McDaniel P. On Lightweight Mobile Phone
ApplicationCertification. ACM CCS 2009, 2009, New York, USA, 11, 9-13.
6. Wu D.; Mao C.; Wei T.; Lee H.; Wu K. DroidMat: Android Malware Detection through
Manifest and API Calls Tracing. Seventh Asia Joint Conference on Information Security,
2012, 8, 62-69.
7. Weka. [Link]
8. GooglePlay. [Link]
9. contagio mobile. [Link]
10. VirusTotal. [Link]
11. Sato R.; Chiba D.; Goto S. Analysis of Manifest File for Detecting Android Malware. The
75th National Convention of IPSJ, 2013, 3.
© 2013 by the authors; licensee Asia Pacific Advanced Network. This article is an open-access
article distributed under the terms and conditions of the Creative Commons Attribution license
([Link]
9