ACCEPTED Meidan Et Al (2017) ProfilIoT A Machine Learning Approach For IoT Device Identification Based On Network Traffic Analysis

ProfilIoT: A Machine Learning Approach for IoT Device
Identification Based on Network Traffic Analysis
Yair Meidan1 , Michael Bohadana1 , Asaf Shabtai1 ,

Juan David Guarnizo2 , Martn Ochoa2 , Nils Ole Tippenhauer2 , and Yuval Elovici1,2
1
Department of Software and Information Systems Engineering, Ben-Gurion University, Beer-Sheva, Israel
2
Singapore University of Technology and Design, Singapore
ABSTRACT connected to their network, a situation which threatens the

In this work we apply machine learning algorithms on net- security and integrity of the network and the devices.
work traffic data for accurate identification of IoT devices In this work we address the challenge of IoT device identi-
connected to a network. To train and evaluate the classification within a network by analyzing and classifying net-
fier, we collected and labeled network traffic data from nine work traffic data. Even if the prefixes of MAC addresses can
distinct IoT devices, and PCs and smartphones. Using su- be used to identify the manufacturer of a particular device,
pervised learning, we trained a multi-stage meta classifier; in there is no standard to identify brands or types of devices.
the first stage, the classifier can distinguish between traffic However, high-level network statistics, such as the the ratio
generated by IoT and non-IoT devices. In the second stage, between incoming and outgoing bytes and the average time
each IoT device is associated a specific IoT device class. The to live have been used to identify malicious traffic in net-
overall IoT classification accuracy of our model is 99.281%. work communications [3]. We propose to use such network
features to identify IoT devices. We believe our approach is
generic and flexible enough to be applied in the rapid evolv-
CCS Concepts ing IoT landscape and at the same time targeted enough to
Security and privacy Mobile and wireless secu- cater to efficient device identification.
rity; Computing methodologies Machine learn-
ing; Research questions. This research proposes a novel method
to classify devices connected to an organizations network
based solely on network traffic analysis. More specifically,
Keywords we focus on the following questions: a) Can we accurately
Internet of Things (IoT); Cyber Security; Machine Learning; distinguish between IoT devices and non-IoT devices? b)
Device Identification; Network Traffic Analysis. For a specific IoT device (e.g., smart TV, IP camera), can
we accurately model its network behavior and detect the
devices presence within the network traffic?
1. INTRODUCTION
Summary of contributions. a) To the best of our knowl-
The term Internet of Things (IoT), is used as an umbrella
edge, we are the first to apply machine learning techniques
keyword covering various aspects related to the extension
to network traffic for IoT device classification and identifica-
of the Internet and Web into the physical realm, by means
tion. b) We demonstrate that we can accurately distinguish
of the widespread deployment of spatially distributed de-
between IoT and non-IoT devices using traffic analysis, ma-
vices with embedded identification, sensing and/or actua-
chine learning, and HTTP packet property (user agent). c)
tion capabilities [9]. Among the challenges the IoT poses to
We show that our approach can accurately detect the pres-
organizations are security and governance issues stemming
ence of a specific IoT device within the network traffic.
from the proliferation of such devices and the ever increasing
number of IoT-enabled organizational assets. In the future,
organizations might not know exactly what IoT devices are
2. DATA ACQUISITION
Permission to make digital or hard copies of part or all of this work for To induce our model and evaluate it we collected traffic data
personal or classroom use is granted without fee provided that copies
are not made or distributed for profit or commercial advantage and that from local IoT devices, PCs, and smartphones (see Table 1).
copies bear this notice and the full citation on the first page. Copyrights As commonly practiced, these devices were connected to
for third-party components of this work must be honored. For all other a Wi-Fi access point, and their network traffic data were
uses, contact the owner/author(s). recorded as *.pcap files [6] for further analysis.
SAC17, April 3-7, 2017, Marrakech, Morocco
Copyright 2017 ACM Feature Extraction. The TCP packets were first con-
ACM 978-1-4503-4486-9/17/04. . . $15.00 verted by the feature extractor tool [4] to sessions (unique
http://dx.doi.org/10.1145/3019612.3019878 4-tuples consisting of source and destination IP addresses
i
DStest [a]: The ath session (originating from di ) in DStest
i .
Table 1: Devices included in the dataset
Specific Device
Make and Model
Number of 3.2 Model Training
Device Type Type TCP Sessions Let D be the set of known devices (i.e., devices that we want
Baby Monitor IoT Beseye Baby Monitor Pro 2,072 to be able to identify based on their traffic). Deriving the
Motion Sensor IoT Wemo F7C028uk 254
Printer IoT HP Officejet Pro 6830 70 device identification model consists of the following steps.
Refrigerator IoT Samsung RF30HSMRTSL 7,008
Security Camera IoT Withings WBP02/WT9510 980
Induce single-session binary classifier. For each di D
Socket IoT Efergy Ego 342 we induce a single-session binary classifier, denoted by Ci ,
Thermostat IoT Nest Learning Thermostat 3 6,353 that given a feature vector of a single session (denoted by
TV IoT Samsung UA55J5500AKXXS 4,854
Smartwatch IoT LG Urban 687
s), outputs a posterior probability psi that the session was
PC Non-IoT Dell Optiplex 9020 3,138 generated by device di . The single-session classifiers C are
Laptop Non-IoT Lenovo X260 4,907 obtained using the DSs dataset. For training Ci for de-
Smartphone Non-IoT LG G2 2,178
Smartphone Non-IoT Galaxy S4 643
vice di we derive binary labels, such that all feature vectors
of sessions that belong to di are labeled as di , and feature
vectors of sessions that do not belong to di are labeled as
and port numbers, from SYN to FIN). Then, each session other. Thus, given an unlabeled feature vector extracted
was represented by a vector of features from the network, from a session s, we can apply all single-session classifiers C
transport, and application layers, and enriched with pub- to obtain a vector of posterior probabilities (ps1 , . . . , psn ).
licly available data such as Alexa Rank [1] and GeoIP [2]. Determine optimal thresholds for single-session clas-
sifiers. For each classifier Ci we determined the optimal
Data Partitioning. After constructing the labeled dataset, classification threshold (cutoff value), denoted by tri , for la-
we chronologically divided it into three mutually exclusive beling a given session s with probability psi as di or other.
sets. The first set, denoted DSs , is used for inducing a set We used the DSm dataset to evaluate the performance of C,
of single-session classifiers. The second set, denoted DSm , the set of single-session classifiers, and for setting the thresh-
is used for optimizing the parameters of a multi-session clas- old values of the classifiers. Each optimal threshold tri was
sifier. As practiced in machine learning research, the third selected such that it maximizes the accuracy of classifier Ci .
set was used as a test set (denoted DStest ) for evaluating
our proposed method and deriving performance measures. Determine optimal session sequence size si for each
classifier. Here we derive the optimal session sequence size
si for each classifier Ci that is used for defining the multi-
3. PROPOSED METHOD FOR IOT DEVICE session classifier. First, for each IP (device) in DSm we
apply the set C of single-session classifiers to all session fea-
IDENTIFICATION ture vectors for obtaining classifications. Then we utilize tri
We propose a multi-stage process in which a set of machine and DSm to analyze the classification results of each opti-
learning based classifiers are applied to a stream of sessions mized classifier. Afterwards we look for the minimal number
that originate from a specific device (i.e., a specific IP ad- of consecutive session classifications, based on which a ma-
dress). The goal is to determine whether the traffic belongs jority vote will provide zero false positives and zero false
to a PC, a smartphone or a specific (known) IoT device. negatives on the entire DSm . We denote this number by si
and refer to it as the optimal size of the moving window.
3.1 Notation The lower si is for a given di , the smaller number of consec-
The notation we use to describe our method and the means utive sessions we need to accurately determine whether the
of evaluating it are summarized below. sessions that emanated from an IP were generated by di or
not. Algorithm 1 describes how si is calculated.
D: Set {d1 , . . . , dn } of known IoT devices.
DSs : Dataset for inducing single-session classifiers. To conclude, for every device di we have a classifier Ci with
Ci : Single-session classifier for di , induced from DSs . threshold tri , and upon a majority voting on its si consec-
tri : Optimal classification threshold for Ci . utive classifications we can determine whether sessions that
DSm : Dataset, sorted in chronological order, for inducing
emanated from a given IP were generated by di with 100%
multi-session based classifiers.
DSm i : Subset of sessions in DSm , origin device di . accuracy. Note that it was easy to differentiate between IoT
DSmi [a]: The ath session, originating from di in DSm i . devices and PCs and smartphones based on a single session
|DSmi |: The number of sessions in DSm i . (see Section 4), so the rest of the discussion in this section
psi : Posterior probability of a session s to originate will focus on identifying the specific IoT device. Table 2
from di ; derived by applying Ci to session s. presents the performance of the single-session classifiers af-
si : The optimal (minimal) size of a sequence of ses-
ter being optimized with tri and their optimal si .
sions for which Ci classifies correctly most sessions
in any sequence of sessions of size si in DSm .
Sd: Sequence of sessions originating from device d. 3.3 Application for Device Identification
C: Set {(C1 , tr1 , s1 ), . . . , (Cn , trn
, s )} of single-session
n Algorithm 2, our final classification algorithm, is based on C:
classifiers for devices in D with optimal thresholds

tri and sequence sizes si . the trained classifiers and their corresponding parameters
DStest : Dataset used for evaluating the proposed method (C1 , tr1 , s1 ), . . . , (Cn , trn , sn ). The classification algorithm
(sorted in chronological order). receives a stream of session vectors that emanated from an
i
DStest : Subset of DStest , originating from device di . IP and were generated by an unknown device. It checks if
Algorithm 1: Calculating si Algorithm 2: IoT device classification
1: procedure findSiStar(D, DSm , Ci ) 1: procedure classifydevice(C, S d )
2: si 1 2: Sort C by ascending si
3: for dj in D do 3: for (Ci , tri , si ) in C do
j
4: DSm subset of DSm with origin dj 4: a1
5: a1 5: n0
6: s1 6: while a + si 1 <= |S d | do
j
7: while a + s 1 <= |DSm | do 7: for sess in {S d [a], ..., S d [a + si 1]} do
8: n0 8: psi classify(Ci , sess)
j j
9: for sess in {DSm [a], . . . , DSm [a + s 1]} do 9: if psi tri then
10: psi classify(Ci , sess) 10: nn+1
11: if psi > tri then 11: if n > si /2 then
12: nn+1 12: return di
13: if i = j and n > s/2 then 13: else
14: aa+1 14: aa+1
15: else 15: return unknown
16: a1
17: ss+2
18: if si < s then evaluating the performance for IoT device identification. We
19: si s note that Algorithm 2 is optimized to derive the type of an
20: return si IoT device by analyzing a minimal number of consecutive
sessions. In a worst case scenario it needs to analyze max(si )
consecutive sessions. To properly evaluate the performance
Table 2: Single-session based classifier performance of our method we reran Algorithm 2 multiple times, and each
time we omitted the first session of the sequence from the
IoT Device tr Method FNR FPR s previous run. This was done to compensate for a possible
Printer 0.35 GBM 0.3 0 11 bias that may occur when the sequence begins with differ-
Sec. Camera 0.5 Random Forest 0 0 1 ent sessions. Given dataset DStest sorted in a chronological
i
Refrigerator 0.2 XGBoost 0.001 0.001 3 order, let DStest be a subset of sessions in DStest originat-
i
Motion Sensor 0.2 XGBoost 0.012 0 3 ing from di , and let DStest [a] be the ath session originating
i
Baby Monitor 0.3 XGBoost 0.006 0 9 from di in DStest . We used DStest for evaluating the pro-
Thermostat 0.2 Random Forest 0.011 0.004 45 posed method, and for each device di D we repeated the
TV 0.1 GBM 0.026 0.001 23 evaluation by applying Algorithm 2 (i.e., the trained model)
i
Smartwatch 0.8 XGBoost 0.184 0 77 on all of the subsequences of the sessions in DStest , starting
Socket 0.25 Random Forest 0 0 1 i
from session a {1, . . . , |(DStest )| si + 1} and ending at
a + si 1 (with maximal value a + si 1 = |(DStest i
)|). so,
for each device di D we repeated the evaluation as follows:
the stream of sessions was generated by device di by clas-
sifying using Ci with si consecutive sessions, and checking
whether most of the si sessions were classified as di . In order i
1: for a in {1, . . . , (|(DStest )| si + 1)} do
to optimize the search for the device, the device inspection 2: d i
s {DStest [a], . . . , DStest i
[a + si 1]}
order is determined by si , so the algorithm starts to inspect 3: CLASSIFYDEVICE (C, sd )
devices with the lowest si , and continues with ascending or-
der of si . A possible modification is to take into account
also the prior probability of a device being observed. As seen in Table 3, the classification accuracy on DStest was
high. Out of 7,376 test cases (each defined by the first session
in the sequence) 19 cases were misclassified and 34 were
4. EVALUATION unclassified, thus the total accuracy was 99.281%. Note that
We evaluate our method using the third dataset, DStest . the classification accuracy on DStest was not 100%, so we
The results indicate that by analyzing network traffic we executed Algorithm 1 (previously run on DSm ) once again,
can distinguish between IPs that belong to IoT devices and this time on DStest . We then compared si s obtained from
IPs that belong to PCs and smartphones. Smartphones were DSm to si s obtained from DStest . Classification accuracy
classified by analyzing the user agent HTTP property, and measures on DStest , plus the recalculated si s, are presented
thus the classification accuracy for smartphones was 100%. in Table 4. We note that the required si for perfect results
The classification of PCs was performed by classifying a ses- for all devices in DStest should be higher. For perfect results
sion by a single-session classifier. The performance for PCs on DStest we recommend using an si which is 4.333 times
was almost perfect (a very low false positive rate of 0.003 higher than the ones computed by Algorithm 1 on DSm .
and a very low false negative rate of 0.003).
Having accurately classified smartphones and PCs, we ap-
plied Algorithm 2 (IoT device classification) on DStest for 5. RELATED WORK
ergy consumption) and provide a very preliminary proof of
Table 3: Accuracy (Algorithm 2) on DStest concept of their algorithm based on simulations.
Number of sessions classified

Tested Device Correctly Incorrectly Unknown
6. CONCLUSION
Printer 14 0 0
In this paper, we demonstrated that an IoT device can be
Security camera 325 0 1 accurately identified based on characteristics of the network
Refrigerator 2334 0 0 traffic it generates. Our method can classify an IoT de-
Motion Sensor 83 0 0 vice, including by brand and model, with 99.281% accuracy
Baby Monitor 663 5 15 which makes our method feasible for real organizations. We
Thermostat 2074 0 0 believe that our novel approach can be used to automati-
TV 1566 12 18 cally and accurately recognize (possibly unauthorized) con-
Smartwatch 151 2 0 nections of IoT devices to an enterprises computer network,
Socket 113 0 0 and thus mitigate violations of operational policies. In fu-
ture research, we plan to explore applications and adapt our
technology to additional scenarios, possibly including differ-
Table 4: Accuracy and recalculation of si on DStest ent network protocols and various data capturing points, to
better understand how our approach scales and generalizes.
s on
Device tr s Method FNR FPR Acc.
DStest
Printer 0.35 11 GBM 0 0 1 5 7. REFERENCES
Security Camera 0.5 1 Random Forest 0.004 0 0.999 3 [1] Alexa top sites. Retrieved April 14, 2016 from
Refrigerator 0.2 3 XGBoost 0 0.001 0.999 5
Motion Sensor 0.2 3 XGBoost 0 0 1 1 http://www.alexa.com/topsites.
Baby Monitor 0.3 9 XGBoost 0.03 0 0.999 39 [2] Geoip lookup service. Retrieved April 14, 2016 from
Thermostat 0.2 45 Random Forest 0 0 1 39
TV 0.1 23 GBM 0.014 0 0.997 45
http://geoip.com/.
Smartwatch 0.8 77 XGBoost 0 0 1 43 [3] D. Bekerman. Network features. Retrieved April 14,
Socket 0.25 1 Random Forest 0 0 1 1 2016 from http://www.ise.bgu.ac.il/dima/network
traffic features set.pdf.
[4] D. Bekerman, B. Shapira, L. Rokach, and A. Bar.
Device identification based on characteristics of communica- Unknown malware detection using network traffic
tion properties such as signals and emissions was discussed classification. In Proc. of IEEE Conference on
in [12], which reports respective efforts since the 1960s. More Communications and Network Security (CNS), 2015.
recently, the authors of [5] leverage passive radio frequency [5] V. Brik, S. Banerjee, M. Gruteser, and S. Oh. Wireless
analysis to identify network interface cards (NIC) that trans- device identification with radiometric signatures. In
mitted IEEE 802.11 frames. Our work, however, is more Proc. of ACM conference on Mobile computing and
closely related to efforts on identifying devices based on log- networking, 2008.
ical characteristics of their network traffic. The idea of using
[6] G. Combs et al. Wireshark-network protocol analyzer.
network characteristics to identify different kinds of nodes
Version 0.99, 5, 2008.
in a network has been applied in several contexts. For in-
stance, [10] describes a technique to identify rogue wireless [7] G. Gu, R. Perdisci, J. Zhang, and W. Lee. BotMiner:
access points that try to mislead victims to connect to them. Clustering analysis of network traffic for protocol-and
structure-independent botnet detection. In Proc. of
A large area of research is the identification of clients in USENIX Security Symposium, 2008.
a network that have been infected with malware (e.g., in [8] P. N. Mahalle, N. R. Prasad, and R. Prasad. Object
the context of botnets) and the identification of the asso- classification based context management for identity
ciated command and control servers. In [7, 11], techniques management in internet of things. International
to cluster network traffic patterns associated with botnets Journal of Computer Applications, 63(12), 2013.
are presented. The characteristics observed include the flow [9] D. Miorandi, S. Sicari, F. De Pellegrini, and
patterns between hosts, such as the number of connections I. Chlamtac. Internet of Things: Vision, applications
and amount of data exchanged. Similarly, using machine and research challenges. Ad Hoc Networks,
learning and network traffic features, [4] presents an ap- 10(7):14971516, 2012.
proach to detect malware related traffic. Their work is close [10] I. H. Saruhan. Detecting and preventing rogue devices
to ours, as we use similar network traffic features. However, on the network. SANS Institute InfoSec Reading
our work differs from this and the works on botnet detec- Room, sans. org, 2007.
tion cited above, because we build a classifier for IoT devices
[11] W. T. Strayer, D. Lapsely, R. Walsh, and C. Livadas.
based on the network traffic communication data gathered
Botnet detection based on network behavior. In
from a heterogeneous set of devices.
Botnet Detection: Countering the Largest Security
To the best of our knowledge, there is no other work in the Threat, pages 124. Springer, 2008.
literature that applies machine learning on network traffic [12] K. I. Talbot, P. R. Duley, and M. H. Hyatt. Specific
features to identify IoT devices. In [8], a logical framework emitter identification and verification. Technology
to classify IoT devices is proposed, however they classify Review, page 113, 2003.
IoT devices into just two categories (i.e., high vs. low en-

ACCEPTED Meidan Et Al (2017) ProfilIoT A Machine Learning Approach For IoT Device Identification Based On Network Traffic Analysis

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

ACCEPTED Meidan Et Al (2017) ProfilIoT A Machine Learning Approach For IoT Device Identification Based On Network Traffic Analysis

Transféré par

Droits d'auteur :

Formats disponibles

ProfilIoT: A Machine Learning Approach for IoT Device

Identification Based on Network Traffic Analysis

Yair Meidan1 , Michael Bohadana1 , Asaf Shabtai1 ,

ABSTRACT connected to their network, a situation which threatens the

Number of sessions classified

Vous aimerez peut-être aussi