Vous êtes sur la page 1sur 18

By: Intan Nurfarahin Binti Ahmad

Supervisor: Dr Farida Hazwani Mohd Ridzuan


Co-Supervisor: Prof. Madya Dr. Madihah
Mohd Saudi
Master of Science (Computer Science )

Android Mobile Malware


Classification using
Tokenization Approach based
on System Call Sequenced
Outlines
1.Introduction
2.Problem statement
3.Research Objectives and Activities
4.Literature Review
5.Research Methodology
6.Result
7.Evaluation
8.Future Work
Introduction
The advancement of mobile technology with many
multifunction applications are attracting people to
choose smart phones as the prime device used for
communication (Saudi et al., 2015).
Along with its popularity, Android continues to be the
most targeted mobile operating system.
It allows user to install application from many
sources which creates a big holes for malware attacks
Introduction (contd)
Based on the McAfee Lab Threat Report on March
2016, there are more than 12 billion mobile malwares
reported and it is continuously increasing from year
to year (McAfee Labs, 2016).
The current classification and detection approaches
are still lacking in providing efficient and accurate
result.
Therefore, a suitable method is needed in order to
improve the efficiency and accuracy of mobile
malware classification and detection performance.
Problem Statement
Increasing number of smartphone user reflects the
increasing number of malicious application in the
market.
New malware generation exploiting user sensitive
information and gain profit from it.
Existing malware classification and detection tools
are still lack in providing efficient and accurate result
Objectives and Activities
Research Objectives Research Activities

To study and evaluate the existing Reviewing literatures regarding to

1
works related with tokenization the existing Android mobile
approach for Android mobile malware classification and
malware classification and detection approach and making
detection. comparative studies.

To developed an Android mobile Developing a new Android mobile

2 malware classification model based


on malicious system call sequenced
using tokenization approach
malware classification model based
on malicious system call sequenced
using tokenization approach.

To evaluate the proposed model Calculate the classification


3 and measure its effectiveness with
a better accuracy of classification
rate
accuracy rate of the new Android
Mobile malware classification
model using WEKA .
Literature Review
Table 1: Comparison Techniques and Features Used from The Previous Study for Android
Mobile Malware Classification and Detection
Techniques Features References
Static analysis Permission (Wang et al., 2013)
Permission (Bai et al., 2010)
(Wu et al., 2012)
Intent Filter (manifest file (Sanz et al., 2014)
and API calls)
String
Dynamic analysis System call (Saudi et al., 2015)
System call (Zhao et al., 2011)
System call (Burguera et al., 2011)
Hybrid analysis Permission, java code, (Blsing et al., 2010)
(static+dynamic) system call java code, user (Wei et al., 2012)
interaction, system call,
network, Android
Manifest.xml
Literature Review (cont)
Table 2: Comparison of Previous Study on Mobile Malware Detection and Classification

Author/Tittle Approach Features Dataset Classifier Accuracy


rate
Amos et al. Dynamic Memory, CPU 1330 malware Random forest, Nave 95%
analysis and binder and 408 benign Bayes, Multilayer
information applications perceptron, Bayes nets,
logistic regression and
decision trees
MALINE Dynamic System calls 4289 malware Histogram, Markov 93%
(Dimjasevic et analysis 12789 benign chain representation
al.) applications
Droid-Sec Hybrid Permissions, 250 malicious DNN 96.5%
(Yuan et al.) analysis sensitive API, and 250 benign
system call applications
(200 features)
DroidDetector Hybrid Permissions, 20,000 benign DNN-based deep 96.76%
(Yuan et al.) analysis sensitive API, and 1760 learning model
system call malware
(198 features) applications
Research Methodology
Figure 1: Research processes
1.0 Defined Lab Architecture

2.0 Dataset is Downloaded

3.0 Tools Installation

4.0 Analysis of Permissions and System Call Structure in Android


Malware Application

5.0 System Call Classification using Tokenization Approach

6.0 Testing

7.0 Result Evaluation


Research Methodology (contd)
Figure 2: Lab Architecture

Dataset Static analysis Dynamic analysis

Permission-based analysis System Call-based analysis

.Apk file Android


Windows 8,
Training: Drebin emulator, Ver.
Android
Dataset (5560) 4.1.1, API level-
emulator
16
(genymotion)
Testing: Google ADB shell
System Call
Playstore (500) Strace tools
extraction

Malicious permission Malicious System


database call database
Research Methodology (contd)
Figure 3: System Call Classification using Tokenization approach
Android apps that
Android apps expected to exploit call
Sample logs System call sequences
System call ------------------
Permissions Sequenced
extraction ----------------
extraction

Tokenization
Binary to Hex converter

n=1 n=2 n=3 n=4 n=5


Malicious
System call system call
sequenced patterns
classifier
Benign system
call patterns
Compare
classification
accuracy
Result & Findings
1) Final extraction shows = 464 unique patterns of malicious
system call sequenced that expected to exploit call logs.
Table 3: Example of malicious system call sequenced patterns
Dataset Binary-patterns Hexadecimal-patterns (N=1)
11101111100110000000001000000000111110011000001000101000000
1 0000000 03BE600803E608A000
10101000011100000001000000000000111110100100000000000000001
2 1100001 02A1C04003E90000E1
11011011100111000110000100000000000000011000000000101000000
3 0000000 036E7184000600A000
10000111000010000000000000000000000001011001001000101000000
4 0000000 021C2000001648A000
111111111001100000000010000100001111100110000010001010000000
5 000000 03FE600843E608A000
11101111100110000000001000000000000000000000000000000000000
6 0000000 03BE60080000000000
111111111101110111100000000000000000010110010010001010000000
7 000000 03FF7780001648A000
Result & Findings (contd)
2) Classification accuracy
CLASSIFICATION ACCURACY
J48 Nave Bayes Random Forest SVM
450.00%

400.00% 98.09% 99.58% 99.02%


350.00% 97.62% 96.92%
300.00% 99.05% 97.76% 94.54%
250.00% 82.77% 69.19%
200.00% 97.96% 34.31% 99.86% 99.02% 97.48% 95.80%
150.00% 64.99%

100.00% 97.41% 34.31% 95.38% 94.12% 82.63% 88.94%


50.00%
64.99%

0.00%
Binary Pattern Hex n=1 Hex n=2 Hex n=3 Hex n=4 Hex n=5
SVM 98.09% 34.31% 99.58% 99.02% 97.62% 96.92%
Random Forest 99.05% 64.99% 97.76% 94.54% 82.77% 69.19%
Nave Bayes 97.96% 34.31% 99.86% 99.02% 97.48% 95.80%
J48 97.41% 64.99% 95.38% 94.12% 82.63% 88.94%
Result & Findings (contd)
3) Result comparison with previous works
Author/work Dataset Features used Analysis Accuracy Rate
Approach
Amos et al. 1330 malware Memory, CPU Dynamic 95%
and 408 and binder
benign information
MALINE 4289 malware System calls Dynamic 93%
(Dimjasevic et al) 12789 benign
Canfora et al. 200 benign System calls, Hybrid 80%
200 malware permissions
Droid-Sec ( Yuan et 250 malware Permissions, API, Hybrid 96.5%
al.) 250 benign System calls
Droid Detector 20,000 benign Permissions, API, Hybrid 96.76%
(Yuan et al.) 1760 malware System Calls
Current work 5560 malware Permissions, Hybrid 99. 86%
500 benign System Calls.
Evaluation

Not done yet. waiting for 500 sample of


google playstore apps.
*detection with 500 random apps from google
playstore
Future Work
For further research, a new classification and
detection model need to focus on different features
and methods.

Other than that, a powerful classification and


detection tools can be developed by improving its
performance in terms of accuracy and speed.
Thank you

www.usim.edu.my
Thank You

Vous aimerez peut-être aussi