Vous êtes sur la page 1sur 8

An Investigation on Identifying SSL Traffic

Curtis McCarthy A. Nur Zincir-Heywood


Dalhousie University Dalhousie University
Halifax, Nova Scotia, Canada Halifax, Nova Scotia, Canada
Email: mccarthy@cs.dal.ca Email: zincir@cs.dal.ca

Abstract—The importance of knowing what type of traffic purposes. Generally speaking, the disadvantages brought about
is flowing through a network is paramount to its success. by encrypting traffic create a security trade-off between a
Traffic engineering, quality of service, identifying critical business secure protocol and losing knowledge about a network [4].
applications, intrusion detection systems, as well as network
management activities all require the base knowledge of what Past investigations into traffic classification have shown
traffic is flowing over a network before any further steps promising results primarily in the classification of unencrypted
can be taken. With Secure Socket Layer (SSL) traffic on the traffic with more recent explorations into encrypted traffic.
rise due to applications securing or concealing their traffic These methods rely on the statistical patterns left behind by
via encryption, the ability to determine what applications are the packet attributes or flows to determine which application is
running within a network is getting more and more difficult.
Traditional methods of traffic classification through port numbers within a given stream. Nevertheless, few have incorporated the
and deep packet inspection tools have been deemed inadequate use of SSL in their training and those that have conglomerated
despite their continued popular usage. The purpose of this work the traffic together treating it as a single label with no
is to investigate if a machine learning approach can be used with regard to the underlying application [5], [6], [7]. As such,
flow features to identify SSL traffic in a given network trace. To the objective of this research is to investigate a statistical
this end, different machine learning methods, namely AdaBoost,
C4.5, RIPPER, and Naive Bayesian techniques, are investigated flow-based approach for identifying SSL traffic in a given
without the use of port numbers, Internet Protocol addresses, or network trace based on machine learning techniques, namely
payload information. AdaBoost, C4.5, RIPPER and Naieve Bayesian, without using
IP addresses, port numbers, and payload information. The
I. I NTRODUCTION generated machine learning models are then tested against an
Correct classification of network traffic is a fundamental unseen dataset to demonstrate the robustness of the machine
step for many pivotal services required by various stakeholders learning techniques.
including Internet Service Providers (ISPs), governments, and In the rest of this paper, Section II summarizes the related
system administrators. These services include traffic shaping, work. Data sets employed and the methodology followed are
ensuring the uptime of networked mission-critical applications, presented in Section III and Section IV, respectively. Section V
workload modeling, managing bandwidth budgets, detecting discusses the results and conclusions are drawn and future
bottlenecks, and balancing Quality of Service (QoS) [1], [2], work is given in Section VI.
[3]. Successful methods pursued in the past have relied on deep
II. BACKGROUND
packet inspection by examining the contents of the payload
or using port numbers to correctly identify the application A. SSL Overview
behind a traffic stream. Unfortunately, they no longer hold The concept behind SSL is to provide secure communi-
as much weight as they once did due to encryption (rendering cation over a public network through the use of multiple
packet payloads non-transparent) and dynamic port allocation algorithms for cryptography, digests, and signatures. By sup-
(enabling applications to connect on alternate ports than the porting this type of dynamic authentication, SSL servers are
ones assigned by the Internet Assigned Numbers Association, able to adapt to any legal obligations surrounding the use of
IANA). Alternatively, applications may conceal packets by cryptography by choosing which algorithms to use during the
masquerading as a different protocol through tunneling or even handshake. SSL is designed to be application independent,
employ encryption to protect their payloads. laying between the transport layer (specifically over TCP)
Secure Socket Layer (SSL) is a fundamental security pro- and the application layer of the TCP/IP protocol stack. It
tocol belonging to the application layer in the Internet Trans- was originally designed by Netscape to secure e-commerce
mission Control Protocol / Internet protocol (TCP/IP) model transactions over the HyperText Transfer Protocol (HTTP).
but residing below the higher level application protocols and However, the more prominent HyperText Transfer Protocol
above the TCP. It enables e-commerce transactions and other Secure (HTTPS) is not the only application that can run
applications to communicate securely over a public network over SSL. As stated by Bernaille et al. [8], other application
by encrypting the packet payload. Despite its good intentions protocols are realizing the need to encrypt and conceal their
SSL also creates a black hole of encrypted network traffic, data from packet sniffers. They are implementing Application
which may be used for illegitimate or non-network sanctioned Programming Interfaces (APIs) to use back-end SSL libraries.

978-1-4244-9941-0/11/$26.00 ©2011 IEEE


For those application instances that are not able to support SSL pose of the works done by Riyad Alshammari et al. was
communication, unencrypted TCP traffic can also be wrapped to determine the possibilities of traffic signatures amongst
by programs such as Stunnel [9]. Regrettably, nefarious users encrypted SSH traffic. Williams et al. actually proved that
may try to conceal their attacks within this protocol creating greater classification accuracy can be obtained through reduced
the illusion that a high amount of legitimate SSL traffic is feature sets [25], which was also confirmed in the encrypted
occurring. traffic classification work of Alshammari et al. [2], [3]. SSL
In 1994 Netscape developed SSLv2 to protect the consumer and SSH share many similarities but also differ greatly as
during e-commerce transactions [10]. It was later superseded shown by Barrett et al. [26]. Aside from focusing on encrypted
by SSLv3 in 1995 due to security issues [10]. The upgrade SSL traffic, this work also differs from previous literature
permitted the use of a wider variety of encryption algorithms on the datasets employed. The usage of public data sets for
and certificate authentication [10]. Then, in an attempt to training can be misleading as they are only as accurate as
standardize and have an open community supported standard, the labeler employed [19]. Others have failed to properly
TLSv1 was created by the Internet Engineering Task Force represent their datasets by excluding parts of traffic, such as
(IETF) in 1999 [11]. It is important to note that TLS and UDP [17], or fixing the algorithms used to only account for
SSL are not interchangeable and one of them must be selected one type of handshake encryption [27].
before the handshake is completed [11]. As such, it is widely
accepted that SSLv3 has evolved into TLSv1 despite several III. DATA C OLLECTION
minor differences. In this work, we have generated our own data set on
our lab network in order to evaluate our approach for SSL
B. Previous Work
traffic classification. The reasoning behind generating a new
Most research on SSL analysis has concentrated either dataset can be attributed to the lack of publicly available
on building Intrusion Detection Systems (IDSs) that rely on labeled datasets. Indeed, requiring the absolute ground truth
behavior-based anomaly detection [4] or developing algorith- is imperative for the ML algorithms to accurately identify
mic optimizations to reduce computational cost for SSL [12], key classification features. Finally, many existing datasets have
[13], [14], [15]. Yet of the few researchers that have in- very little SSL traffic with the majority belonging to HTTPS
vestigated the classification of SSL traffic [16], none have but not much tunneling. Thus, a training data set is generated
implemented a binary SSL classification methodology. Most to overcome these problems.
traffic classification methods focus on unencrypted data, de- In this case, to prevent a single network hardware from
spite the rise in applications securing or concealing their data influencing the packet attributes, two networks were used to
with SSL [8], [13]. For those works, which include encrypted gather the data: one which operated under Dalhousie Uni-
traffic, whether it be SSL or another form of encrypted traffic, versity and the other independently under TARA (a telecom-
many simply state that their methods are either not affected by munication organization in our city). The Dalhousie network
the encryption process as they rely on packet or flow attributes, contained one computer, nims, whereas the TARA network
or they simply conglomerate the encrypted traffic together into housed both taraserver and taraclient computers. Since there
a single class [17], [18], [5], [7]. Nevertheless, few researchers was only one physical computer at Dalhousie, nims hosted
actually attempt to parse those encrypted streams to determine two virtual machines using VMware 1 - nims-server and nims-
the SSL application instances [19]. client. All machines ran the open source OS Ubuntu 2 Linux
The level of classification granularity becomes further ob- distribution, but each site differed in versions varying from
fuscated by the use of tunneling programs in an attempt to 9.04 ’Jaunty Jackalope’ to 9.10 ’Karmic Koala’. At each
encrypt or conceal network traffic. With the exception of Dusi network, the machines were segregated on a separate subnet
et al., very few seem to have actually attempted to classify with static IP addresses to prevent contamination and respect
tunneled traffic, much less encrypted tunneled traffic. Research network privacy policies 3 . Since some of the services required
done by Dusi et al. has shown promising results, but it focuses hostnames, dynamic Domain Name System (DNS) through
solely on HTTP [20] and Secure Shell (SSH) [21] tunnel Dynamic Network Services Inc. 4 was used for all of the
detection. Even then, in order for Tunnel Hunter to work on TARA services with the exception of mail. The machines
encrypted traffic, network administrators must enforce only on the Dalhousie network were supplied with their own
one type of user authentication. Indeed, such a requirement is hostnames through Dalhousie’s DNS servers.
not realistic in practice. Packet captures were acquired using the TShark5 TCP-
On the other hand, research studying encrypted flow-based dump6 program in libpcap binary format. Each machine
streams has clearly demonstrated that machine learning algo-
rithms are able to classify encrypted network traffic with a 1 http://www.vmware.com/
2 www.ubuntu.com
greater degree of accuracy then expert driven systems [22],
3 Dalhousie University Computing and Information Services policy on
[1], [2], [3]. This was shown in the results found by Riyad
capturing network traffic http://its.dal.ca/policies/5.5.2-data-sets.pdf
Alshammari et al.[23] compared to the performance of the 4 http://www.dyndns.com/
expert driven systems by Montigny-Leboeuf [24], from the 5 http://www.wireshark.org/docs/man-pages/tshark.html

Communications Research Centre Canada (CRC). The pur- 6 http://www.tcpdump.org/


TABLE I
A PPLICATIONS AND THEIR ASSOCIATED NUMBER OF FLOWS . WEKA [28].

Application Flows Application Flows A. System Overview


SSH 852748 Subversion 40897 One of the primary goals in generating this dataset was to
HTTP 10092691 HTTPS 8562626
design the system to be self-sufficient. In order to achieve
POP3 29818 SMTP 1479638
POP3S 8786 SMTPS 420502 this level of automation, the use of primarily open source
FTP 130826 FTPS 151580 programs, with the exclusion of the proprietary closed-source
Bit-torrent 1422414 SSL Bit-torrent 2332976 Skype application, were employed. The datasets generated for
Chat 489282 SSL Chat 457024
Telnet 732541 Telnet-SSL 122
this research are made available to the research community at
Skype 3489 the Dalhousie University NIMS Laboratory website 9 .
B. Client Setup
captured its own traffic to avoid any possibility of packet The client machines, nims-client and taraclient, were re-
loss between the networks. The applications selected for the sponsible for generating the data through the use of customized
data set were based on their ability to be reproduced by shell scripts and cron jobs. The process began with a cron job
being mainly open source, the heterogeneity of protocol traffic being set to run at a certain time in order to automatically
they provided, their popularity, and their inclusion in the execute a certain shell script with specific command-line
literature. TCP, UDP, and DNS traffic were all important to arguments. In order to avoid the classifiers from biasing
provide a good mix of protocols and packet data. Without this on consistent packet sizes and trends due to automation,
background noise in the dataset, the classifiers would be too commands were chosen that would alter the packet sizes of
restrictive and not scalable to different networks [22], [1]. the data being transfered. For instance, the SVN application
Furthermore, these traffic traces included many different and instance script would not only check out with some files from
popular applications as shown in Table I. Traffic for all of the the repository, but also alter and commit files to simulate
application instances was captured over a period of 4 months. human interaction. Additionally, optional command-line argu-
The application instances chosen were by no means meant ments for each application’s script, which supported tunneling,
to represent an exhaustive list of those that generate traffic was programmed. This allowed scripts to either send packets
on a given network, but instead to cover different areas of straight through the Internet, or traverse through an optional
application traffic flowing through many networks. proxy tunnel.
Three open source tunneling programs were used to con-
C. Server Setup
ceal and encrypt the traffic from several applications listed
in Table I. The additional concealed or encrypted tunneled The server machines had primarily two responsibilities: (i)
traffic was captured between a client and its local server running a variety of services, which respond to various client
while the underlying actual application traffic was captured requests, and (ii) acting as a proxy server for the tunneling
between the local proxy server and the final destination. To programs. Both of these duties required nims-server and
this end, HTTP tunneling encapsulates the original data packet taraserver to become host connection platforms responding
as an HTTP packet thus concealing, but not encrypting, the to client queries. To setup proxy tunnels for the tunneling
original payload. HTTP tunnel7 is a useful tool for users programs, each client relied on its local network server to
restricted by network firewalls, which only permits HTTP tunnel connections to the final destination. Depending on the
traffic by enabling them to tunnel other traffic through the tunneling program used, the proxy servers handled requests
firewall. On the other hand, Ptunnel8 takes ordinary TCP accordingly with configuration files either specifying back-
packets and transforms them into ICMP (ping) echo request ground daemons to listen on a certain port or providing the
and reply packets, thus tunneling the connection between a end destination IP and port for incoming tunnels.
server, proxy, and client. By doing so, many network firewalls D. Data Pre-processing
can be bypassed since the TCP packets are concealed, but
Once all of the TCPdump traffic was collected, they are
not encrypted, as ICMP packets. Moreover, for application
converted into flows using NetMate. NetMate is a traffic mon-
instances that do not support SSL encryption, Stunnel was used
itoring tool, which converts IP packets into bi-directional flows
to wrap TCP-only traffic within a secure stream [9]. Unlike
and generates several statistics regarding these flows [29].
the other tunnels discussed thus far, Stunnel actually encrypts
Flows are defined by sequences of packets that present the
the tunneled application packets by making use of the external
same values for source IP address, destination IP address,
libraries found in OpenSSL.
source port, destination port, and type of protocol. Since flows
IV. M ETHODOLOGY are of limited duration, in this work UDP flows are terminated
This section discusses the steps followed to setup the data by a flow timeout, and TCP flows are terminated upon proper
generation system and the ML algorithms employed using connection tear-down or by a flow timeout, whichever occurs
first. A 600 second flow timeout value was employed here. The
7 http://www.nocrew.org/software/httptunnel.html
8 http://www.cs.uit.no/˜daniels/PingTunnel/ 9 http://www.cs.dal.ca/projectx
TABLE II
N ET M ATE FLOW ATTRIBUTES USED . of applications employed in the data sets and their varying
sizes, performing subset sampling required a custom shell
Attribute name Abbreviation script to generate the training data set randomly. To this end,
Minimum forward packet length min fpkt we have employed training data sets of different sizes in order
Mean forward packet length mean fpktl
Standard deviation forward packet length std fpktl to understand its effect on the performance. To this end, we
Minimum backwards packet length min bpktl have experimented with 6000, 9000, 12000, 20000, 50000,
Mean backwards packet length mean bpktl 100000, 150000, and 500000 flows of traffic. Each training
Maximum backwards packet length max bpktl
Standard deviation backwards packet length std bpktl
set was balanced in a 50/50 split of the respective classes to
Minimum forward inter-arrival time min fiat be trained.
Mean forward inter-arrival time mean fiat
Maximum forward inter-arrival time max fiat
Standard deviation forward inter-arrival time std fiat F. WEKA and Machine Learning Algorithms
Minimum backwards inter-arrival time min biat
Mean backwards inter-arrival time mean biat
In order to apply the ML algorithms and determine which
Maximum backwards inter-arrival time max biat among them had the best performance given the different
Standard deviation backwards inter-arrival time std biat sizes of training sets, the WEKA platform was chosen. While
Duration of the flow duration WEKA supports many different ML algorithms that could be
Protocol used proto
Total amount of forward packets total fpackets used to classify the data, we wanted to choose those, which
Total forward volume in bytes of packets total fvolume have had success in the past and could provide human readable
Total amount of backward packets total bpackets solutions [22], [1]. To this end, we have employed AdaBoost
Total backward volume in bytes of packets total bvolume
Class label (added later by labeler) class
meta-learner, C4.5 decision tree, RIPPER rule based, and
Naive Bayesian as an example of a probabilistic machine
learning algorithm. It is important to note that each algorithm
NetMate flow attributes used in the experiments are listed in was run with 10 fold cross-validation and WEKA’s default
Table II. settings were used unless explicitly stated.
Since in this work, supervised machine learning techniques 1) AdaBoost: AdaBoost, formulated by Yoav Freund and
are employed to identify SSL traffic, labeled data is required Robert Schapire [30], is a ML algorithm using ensemble
for the training of the classifiers. To this end, generating a data learning, which alters weak learning classifiers by assigning
set in a lab environment ensured that we know the ground weights with the overall goal to properly identify instances.
truth about the data set and therefore could label each packet Generic boosting is accomplished by starting with all weights
based on the tunnel and the application used. To evaluate the at one and then iteratively adjusting them by increasing the
performance of the classifiers on the dataset, traffic was split weights of misclassified examples and decreasing those of
into several sub-classes. The different class labels to identify correctly classified examples. Each iteration generates a new
the types of traffic within an application instance included hypothesis and holds a unique weight pending how accurately
the following: Native SSL, SSL-Tunneled, and Non-SSL. The it classified the set. A hypothesis achieving a higher degree
Native SSL class consisted of applications, which employ of correctly classified examples will hold more weight in the
the usage of SSL encryption through back-end libraries or final ensemble; meaning the size of any given hypothesis
APIs to OpenSSL. The SSL-Tunneled class was composed of correlates to the weight it has in the total ensemble. The
application instances making use of the open source Stunnel final hypothesis output is a weighted-majority collection of the
program and its ties to the backend OpenSSL libraries to entire hypothesis collection. AdaBoosting differs from generic
encrypt communication between a client and a proxy. Finally, boosting by using weak learners to determine a weighted error
the Non-SSL class was composed of applications that were on the example giving it a more accurate result then random
either not encrypted or did not contain any form of SSL guessing [31]. In this work, the AdaBoost+C4.5 algorithm as
encryption. well as decision stumps are employed.
Once flow records were separated into their classes, multiple 2) C4.5 Decision Tree: Created by Ross Quinlan [32],
experiments were performed to produce the final results. Each the C4.5 decision tree takes into account all available input
of these comparisons generated a different model showing how attributes making numerous decisions based on the data to
well the classifiers performed using binary classification. There predict the inevitable outcome with a high degree of accu-
were three distinct sets of experiments: SSL vs Non-SSL, SSL racy [31]. The reasoning behind calling them decision trees
vs STunnel, and Non-SSL vs Stunnel. In total, 5 runs were can be attributed to their resulting structure resembling that
performed for each set of experiment and each run employed of a tree with each decision creating a branch. Branches, or
10 fold cross-validation to minimize any data set biases. decisions, in a decision tree are created by attribute compar-
isons, which relay the highest amount of information gain.
E. Training Information gain represents how well a given decision will
Once all the applications were labeled and contained all separate the output classes the most. This process is repeated
the NetMate flow attributes, a sample set had to be chosen until all the data is properly classified, as such, they are
randomly for training purposes. Given the 46 different types particularly useful when dealing with noisy data.
3) Naive Bayes: As a probabilistic classifier, Naive Bayes as a certain class and the total number of all possible instances
uses conditional probabilities to arrive at a final decision by truly belonging to that class. Recall is calculated as follows:
applying Bayes theorem [31]. Making use of the probability
#Correctly Classif ed [IN | OU T ] Class Instances
rules, Rev. Thomas Bayes invented the Bayes’ Theorem to Recall = T otal N umber of [IN | OU T ] Class Instances
deduce the inverse probability of an event. For instance, if
we know the probability of an event A and B occurring Finally, the False Positive Rate Analysis (FPRA) was per-
independently, as well as the probability of B given A, we formed in order to understand the behavior of the classifiers
can determine the probability of A given B using the following in terms of which in-class application is confused with which
notation: P(A|B) = P(B|A)P(A) / P(B). out-class application.
4) Repeated Incremental Pruning to Produce Error Re-
duction — RIPPER: Originally developed by William Co- V. R ESULTS AND A NALYSIS
hen [33], RIPPER concatenates rules using logical OR and This section highlights the results obtained for the different
AND operators to generate a rule set in which the classifier experiments and presents an analysis behind the performance
can accurately detect the out-classes [31]. RIPPER contains of the top algorithms. It also discusses the results of testing
two main phases, the first loops through a building stage the proposed approach against an unseen dataset.
where it grows, and prunes the classifier, while the second
is responsible for optimization. During the growing step, the A. Results on SSL vs Non-SSL Experiments — Baseline
algorithm continually adds conditional statements until the In this set of experiments, the AdaBoost ML algorithm
rule obtains complete and accurate classification. The second achieved the best classification performance with the largest
stage involves optimizing this initial set of conditionals, or 500000 training set size. Over 95% of the total flow records
rule set, that has been created thus far. Firstly, two pruned were classified correctly as SSL with a 4% SSL FPR, Table III.
alternatives are produced for each rule of the initial rule set In general, the results show that all the classifiers improve
from a random data sample using the growth and pruning performance as the training set sizes increased. Of the total 22
processes. Specifically, one of these alternatives is produced flow attributes available, AdaBoost only used seven to classify
from an empty rule whereas the other comes from greedily the SSL traffic. The algorithm focused on the mean fpktl
adding antecedents to the original rule. The rule with the (mean forward packet length in a flow) primarily as the
smallest length is chosen as the final representative in the rule attribute carrying the most weight throughout all of the larger
set. Then, more rules are then generated using the remaining training set sizes. By doing so, AdaBoost was classifying
newly calculated positives in the building stage. SSL and Non-SSL traffic by relying on the mean forward
packet length sent from the client. This indicates AdaBoost
G. Performance Metrics Employed was picking up on the extra SSL overhead, which adjusted the
In order to properly evaluate the performance of classifiers, size of the packets. The packet’s new mean size could then be
various metrics popular amongst other network traffic classi- used to separate encrypted SSL requests from the unencrypted
fication methods were employed. Specifically, Detection Rate requests coming from the client. As the packets made their way
(DR), False Positive Rate (FPR), and recall are all utilized to back from the server to the client, AdaBoost picked up on the
determine the overall performance. mean bpktl (mean backward packet length in a flow) attribute
DR represents the ratio measurement of how closely indicating a unique response from the server for many of the
classified the ML algorithm came to achieving 100% perfect application instances. This was most likely accomplished by
classification. It is calculated using the ratio of the number focusing on the SSL handshake process response from the
of incorrectly classified instances and the total number of server side. A breakdown of the AdaBoost weights is presented
instances belonging to that class. DR is calculated as follows: in Table IV, while the results of the classifiers can be found
in Table III.
#F alse N egative Classif ications
DR = 1 − T otal N umber of Classif ications
B. Results on SSL vs SSL-TUnnel and Non-SSL vs SSL-Tunnel
The FPR, according to the WEKA documentation, experiments
represents how many instances the classifier misclassified as 1) SSL vs SSL-Tunnel: AdaBoost+C4.5 using a training
the other class. It is calculated using the ratio of the number set size of 6000 was ranked the highest amongst the ML algo-
of incorrectly classified instances and the total number rithms with 98% correctly identified flow records, a 0.6% SSL
of instances belonging to the misclassified class. FPR is FPR, Table V. Compared to the previous set of experiments,
calculated as follows: the increased FPR levels amongst most of the ML algorithms
may be attributed to the fact that both classes use SSL at
#F alseP ositive [IN | OU T ] Class Instances
FPR = T otal N umber of [IN | OU T ] Class Instances one point during packet communication between the client,
the optional proxy, and the server. Omitting the extra hop in
Recall is the same as True Positives and measures the network traffic, this would cause the feature sets to become
precision of the ML algorithm for a certain class. It is almost identical leading to a higher misclassification of SSL
calculated using the ratio of the number instances identified traffic. Figure 1 shows the AdaBoost classifier focusing on
TABLE III
R ESULTS FROM VARIOUS TRAINING SET SIZES USING THE DIFFERENT PERFORMANCE METRICS FOR THE SSL VS N ON -SSL RUN .

Training size: 6000 AdaBoost C4.5 AdaBoost C4.5 Naive Bayes RIPPER
DR 71.56% 62.28% 58.79% 8.12% 63.86%
FPR SSL 0.28 0.37 0.40 0.94 0.36
FPR Non-SSL 0.02 0.02 0.02 0.01 0.01
Recall SSL 0.99 0.96 0.98 0.99 0.99
Recall Non-SSL 0.72 0.63 0.60 0.06 0.64
Training size: 12000 AdaBoost C4.5 AdaBoost C4.5 Naive Bayes RIPPER
DR 71.86% 67.94% 70.50% 7.80% 70.30%
FPR SSL 0.28 0.32 0.29 0.95 0.30
FPR Non-SSL 0.01 0.01 0.01 0.01 0.01
Recall SSL 0.99 0.99 0.99 1 0.99
Recall Non-SSL 0.72 0.68 0.71 0.05 0.70
Training size: 500000 AdaBoost C4.5 AdaBoost C4.5 Naive Bayes RIPPER
DR 87.45% 95.69% 85.13% 89.26% 82.59%
FPR SSL 0.12 0.04 0.14 0.11 0.17
FPR Non-SSL 0.01 0.02 0.01 0.01 0.01
Recall SSL 0.99 0.99 0.99 0.99 0.99
Recall Non-SSL 0.88 0.95 0.86 0.89 0.83

TABLE IV
F EATURES CHOSEN BY A DA B OOST FOR THE SSL VS N ON -SSL
EXPERIMENTS

mean fpktl 37.33%


total fvolume 20.12%
mean bpktl 15.86%
std biat 10.83%
total bvolume 7.16%
min bpktl 6.00%
max bpktl 2.71%

the packet length from the server as either being encrypted


for native SSL or unencrypted for SSL-Tunnel communicating
with a proxy. Additionally, the difference in the flow attribute
size can be attributed to how the tunneled packets shed their
encryption overhead at one point during the flow.
2) Non-SSL vs SSL-Tunnel: RIPPER with a training set
size of 12000 that held the best performance with 99%
correctly identified flow records, and a 0.5% SSL-Tunnel FPR,
Table VI. Only seven of the 22 flow attributes presented
by NetMate were used by RIPPER to build a classifying
ruleset. In this case, the algorithm starts by focusing primarily
on the bi-directional packet length flow statistics and inter-
arrival times. The use of the forward packet length statistics
can be attributed to the added overhead of encryption while
packets are being sent to the proxy. Furthermore, the inter-
arrival times will differ for the packets as they have an extra
hop to go through the proxy server. The usage of the mean
backwards packet lengths illustrates the algorithm’s ability to
recognize return trip packet connections in subsequent rules. Fig. 1. Highest weighted AdaBoost C4.5 decision tree in SSL vs SSL-Tunnel
The duration of the flow plays an important role as Stunnel run
traffic would have a longer flow time due to the extra proxy
hop. Finally, during the return trip the packets must maintain
a minimum size to support the SSL encryption overhead. al. [22], [1], [2], [3] was employed in this work, too. Since
its purpose was judging the effectiveness of ML algorithms on
C. Comparison against unseen dataset other forms of encrypted traffic, little SSL traffic was included
In order to test the robustness of the classifiers, a different with more emphasis placed on SSH and Skype. Also, there
unseen dataset called NIMS generated by Alshammari et were no tunneling experiments done in the NIMS dataset. The
TABLE V
R ESULTS FROM VARIOUS TRAINING SET SIZES USING THE DIFFERENT PERFORMANCE METRICS FOR THE SSL VS SSL-T UNNEL RUN .

Training size: 6000 AdaBoost C4.5 AdaBoost C4.5 Naive Bayes RIPPER
DR 98.38% 91.89% 98.14% 70.33% 87.49%
FPR SSL 0.01 0 0.01 0.04 0.01
FPR SSL-Tunnel 0.01 0.11 0.01 0.30 0.12
Recall SSL 0.99 0.89 0.99 0.70 0.88
Recall SSL-Tunnel 0.99 1 0.99 0.96 0.99
Training size: 12000 AdaBoost C4.5 AdaBoost C4.5 Naive Bayes RIPPER
Correctly Id. 98.98% 93.31% 98.59% 70.11% 97.33%
FPR SSL 0 0.01 0.01 0.02 0.02
FPR SSL-Tunnel 0.01 0.06 0.01 0.35 0.01
Recall SSL 0.99 0.94 0.99 0.65 0.99
Recall SSL-Tunnel 1 0.99 0.99 0.98 0.98

TABLE VI
R ESULTS FROM VARIOUS TRAINING SET SIZES USING THE DIFFERENT PERFORMANCE METRICS FOR THE N ON -SSL VS SSL-T UNNEL RUN .

Training size: 6000 AdaBoost C4.5 AdaBoost C4.5 Naive Bayes RIPPER
DR 94.74% 93.88% 93.51% 9.69% 91.73%
FPR SSL-Tunnel 0.05 0.06 0.06 0.91 0.08
FPR Non-SSL 0 0 0 0 0
Recall SSL-Tunnel 1 1 1 1 1
Recall Non-SSL 0.95 0.94 0.94 0.09 0.92
Training size: 12000 AdaBoost C4.5 AdaBoost C4.5 Naive Bayes RIPPER
DR 89.64% 93.54% 91.27% 10.43% 99.41%
FPR SSL-Tunnel 0.10 0.07 0.09 0.90 0.01
FPR Non-SSL 0 0 0 0 0
Recall SSL-Tunnel 1 1 1 1 1
Recall Non-SSL 0.9 0.93 0.91 0.1 1

best performing model from the SSL vs Non-SSL experiments as well as a popular traffic analysis tool further solidifies the
was chosen to be tested on the NIMS data set. The results methodology taken.
showed that 95% of the flow records were correctly identified
as either SSL or Non-SSL traffic with a 0.5% SSL FPR, clearly For SSL vs Non-SSL, AdaBoost proved to have the best
indicating the robustness of the ML algorithm across different classification performance with a 96% classification accuracy,
datasets. The accuracy achieved despite the different network 4% SSL FPR. In the case of the native SSL vs SSL-Tunnel,
setup, background noise, and applications used demonstrates a modified version of AdaBoost using C4.5 decision trees
that the flow features chosen by AdaBoost to identify SSL instead of decision stumps performed the best with a 98%
applications are generalizable enough to recognize traffic on classification accuracy and 0.6% SSL FPR. Finally, the ML
previously unseen datasets. algorithm best able to distinguish the Non-SSL vs SSL-Tunnel
class run was RIPPER achieving a 99% classification accuracy
VI. C ONCLUSION and 0.5% SSL-Tunnel FPR. In general, it was found that
AdaBoost maintained the highest overall performance across
In conclusion, the investigation into classifying applications all the different class runs. While the training set sizes varied
encrypted by SSL achieved promising results using flow-based amongst all the runs, it can be seen that without adequate
statistics and ML algorithms. This was accomplished without representation from an application instance, the classifiers will
employing IP addresses, port numbers, or packet payloads. To be unable to perform well during testing. Comparing these
this end, we have generated and captured data sets in a lab results to the Wireshark traffic analysis tool confirmed the
environment and benchmarked the performances of AdaBoost, incompetencies of payload inspection methods and inability
C4.5, RIPPER, and Naive Bayesian learning algorithms to to rely on port numbers for classification.
identify SSL traffic.
The generated dataset represented not only real network There are several areas for future work is given that this is
traffic without imposing restrictions on SSL encryption algo- only investigative in nature. To this end, exploring alternative
rithms, but also ensured the ground truth during the labeling ML algorithms and other data sets are the next natural di-
process, and included the use of tunnels to increase the overall rections. Furthermore with the changes implemented between
entropy of the dataset. We have made the data set public to IPv4 and IPv6, it would be interesting to see how much (if at
the research community for encouraging further research in the all) the results achieved in this work may be affected. Further
area. Additionally, investigation on the sizes of the training set investigation on parameter sensitivity and the affect they have
helped to further optimize based on FPRA. The comparison of on the ML algorithms may offer more explanation as to why
the results from the multiple class runs against recent literature the classifiers acted the way they did.
ACKNOWLEDGMENT [22] R. Alshammari, A. Zincir-Heywood, and A. Farrag, “Performance com-
parison of four rule sets: An example for encrypted traffic classification,”
This work was supported by NSERC. Our thanks to Dana Privacy, Security, Trust and the Management of e-Business, 2009, pp.
Echtner, Jeff Allen, Krista Skodje, and TARA for provid- 21 –28, 2009.
[23] R. Alshammari and A. N. Zincir-Heywood, “Investigating two different
ing us the lab environment to generate our data sets. This approaches for encrypted traffic classification,” Proceedings - IEEE
research was conducted at the Dalhousie NIMS Laboratory, Conference on Privacy, Security and Trust, pp. pages 156–166, 2008.
http://www.cs.dal.ca/projectx. [24] A. Montigny-Leboeuf, “Flow attributes for use in traffic characteriza-
tion,” Journal of CRC Technical Note, CRCTN-2005-003 Ottawa, ON,
Canada, 2005.
R EFERENCES [25] N. Williams, S. Z, and G. Armitage, “A preliminary performance
comparison of five machine learning algorithms for practical ip traffic
[1] R. Alshammari and A. N. Zincir-Heywood, “Machine learning based flow classification,” Computer Communication Review, vol. 30, 2006.
encrypted traffic classification: Identifying ssh and skype,” IEEE Sym- [26] D. J. Barrett and R. E. Silverman, SSH, The Secure Shell: The Definitive
posium on Computational Intelligence for Security and Defense Appli- Guide, ISBN: 978-0-596-00011-0, O’Reilly & Associates, Inc., 2001.
cations, pp. 8 – 8, 2009. [27] C. V. Wright, F. Monrose, and G. M. Masson, “On inferring application
[2] R. Alshammari and A. Zincir-Heywood, “Investigating two different protocol behaviors in encrypted network traffic.” Journal of Machine
approaches for encrypted traffic classification,” Proceedings - 6th Annual Learning Research, vol. 7, no. 12, pp. 2745 – 2769, 2006.
Conference on Privacy, Security and Trust, pp. 156 – 166, 2008. [28] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H.
[3] R. Alshammari and A. N. Zincir-Heywood, “Generalization of signatures Witten, “The weka data mining software: an update,” SIGKDD Explor.
for ssh encrypted traffic identification,” Proceedings - IEEE Symposium Newsl., vol. 11, no. 1, pp. 10–18, 2009.
on Computational Intelligence in Cyber Security, pages 174 - 174, 2009. [29] A. Dupay, S. Sengupta, O. Wolfson, and Y. Yemini, “Netmate: A
[4] A. Yamada, Y. Miyake, K. Takemori, A. Studer, and A. Perrig, network management environment,” Network, IEEE, vol. 5, no. 2, pp.
“Intrusion detection for encrypted web accesses,” 21st International 35 –40, 43, mar. 1991.
Conference on Advanced Information Networking and Applications [30] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of
Workshops/Symposia, vol. 2, pp. 569 – 576, 2007. on-line learning and an application to boosting,” in Proceedings of
[5] A. Este, F. Gringoli, and L. Salgarelli, “Support vector machines for tcp the Second European Conference on Computational Learning Theory.
traffic classification,” Computer Networks, vol. 53, no. 14, pp. 2476 – London, UK: Springer-Verlag, 1995, pp. 23–37.
2490, 2009. [31] S. J. Russell and P. Norvig, Artificial Intelligence: A Modern Approach,
[6] J. Erman, A. Mahanti, M. Arlitt, I. Cohen, and C. Williamson, “Of- ISBN: 0137903952, Pearson Education, 2003.
fline/realtime traffic classification using semi-supervised learning,” Per- [32] J. R. Quinlan, C4.5: programs for machine learning. San Francisco,
formance Evaluation, vol. 64, no. 9-12, pp. 1194 – 1213, 2007. CA, USA: Morgan Kaufmann Publishers Inc., 1993.
[7] A. Callado, J. Kelner, D. Sadok, C. A. Kamienski, and S. Fernandes, [33] W. W. Cohen, “Fast effective rule induction,” in Proceedings of the
“Better network traffic identification through the independent combi- Twelfth International Conference on Machine Learning. Morgan
nation of techniques,” Journal of Network and Computer Applications, Kaufmann, 1995, pp. 115–123.
vol. 33, no. 4, pp. 433 – 446, 2010.
[8] T. R. Bernaille, Laurent, “Early recognition of encrypted applications,”
Passive and Active Network Measurement, vol. 4427, pp. 165–175, 2007.
[9] M. Trojnara, “Stunnel: Ssl tunnel,” www.stunnel.org. Accessed June
2010.
[10] J. Viega, P. Chandra, and M. Messier, Network Security with Openssl,
ISBN: 059600270X, O’Reilly & Associates, Inc., 2002.
[11] T. IETF, Accessed June 2010, http://www.ietf.org/rfc/rfc2246.txt.
[12] D. M. Nicol and N. Schear, “Models of privacy preserving traffic
tunneling,” Simulation, vol. 85, no. 9, pp. 589 – 607, 2009.
[13] N. Schear and D. M. Nicol, “Performance analysis of real traffic carried
with encrypted cover flows,” Workshop on Principles of Advanced and
Distributed Simulation, pp. 80 – 87, 2008.
[14] L. Qing and L. Yaping, “Analysis and comparison of several algorithms
in ssl/tls handshake protocol,” ITCS ’09: Proceedings of the 2009
International Conference on Information Technology and Computer
Science, pp. 613–617, 2009.
[15] L. Zhao, R. Iyer, S. Makineni, and L. Bhuyan, “Anatomy and perfor-
mance of ssl processing,” IEEE International Symposium on Perfor-
mance Analysis of Systems and Software, pp. 197 – 206, 2005.
[16] F. Allard, R. Dubois, P. Gompel, and M. Morel, “Tunneling activities
detection using machine learning techniques,” NATO Research and
Technology Organization Symposium on Information Assurance and
Cyber Defence, 2010.
[17] A. B. Mohd and D. S. bin Mohd Nor, “Towards a flow-based internet
traffic classification for bandwidth optimization,” International Journal
of Computer Science and Security, vol. 3, pp. 146–153, 2009.
[18] R. Yuan, Z. Li, X. Guan, and L. Xu, “An svm-based machine learning
method for accurate internet traffic classification,” Information Systems
Frontiers, vol. Volume 12, pp. 149 – 156, 2010.
[19] M. Soysal and E. G. Schmidt, “Machine learning algorithms for accurate
flow-based network traffic classification: Evaluation and comparison,”
Performance Evaluation, vol. 67, no. 6, pp. 451 – 467, 2010.
[20] M. Crotti, M. Dusi, F. Gringoli, and L. Salgarelli, “Detecting http
tunnels with statistical mechanisms,” IEEE International Conference on
Communications, pp. 6162 – 6168, 2007.
[21] M. Dusi, M. Crotti, F. Gringoli, and L. Salgarelli, “Tunnel hunter:
Detecting application-layer tunnels with statistical fingerprinting,” Com-
puter Networks, vol. 53, no. 1, pp. 81 – 97, 2009.

Vous aimerez peut-être aussi