Académique Documents
Professionnel Documents
Culture Documents
2018
1
1 TABLE OF CONTENTS
2 Acknowledgements .................................................................................................. 3
3 Abstract ....................................................................................................................... 4
4 Introduction ............................................................................................................... 5
5 Related work .............................................................................................................. 7
6 Methodology, Architecture and Design ............................................................. 9
6.1 IntelMQ ........................................................................................................................... 11
6.2 IntelMQ Manager ......................................................................................................... 22
6.3 Scripts ............................................................................................................................. 25
6.4 General overview of the study ................................................................................. 29
7 Evaluation and Results ......................................................................................... 30
7.1 Requirements ................................................................................................................ 30
7.1.1 Hardware requirements .............................................................................................. 30
7.1.2 Software requirements ................................................................................................ 30
7.2 Installation .................................................................................................................... 30
7.3 Utilization of the software ........................................................................................ 33
7.3.1 Command line ................................................................................................................ 33
7.3.2 IntelMQ Manager ......................................................................................................... 38
7.3.3 Management .................................................................................................................. 61
7.3.4 Monitor ............................................................................................................................ 61
7.4 Obtaining File Output: ............................................................................................... 62
7.4.1 Command lines .............................................................................................................. 62
7.4.2 IntelMQ Manager ......................................................................................................... 63
7.5 Statistical Analysis ...................................................................................................... 70
7.5.1 Map of the world ........................................................................................................... 70
7.5.2 Pie charts and tables .................................................................................................... 76
8 Conclusion ................................................................................................................ 88
9 Declaration ............................................................................................................... 90
10 References ............................................................................................................. 91
11 Appendix ................................................................................................................ 93
11.1 Configuration files of IntelMQ: ............................................................................... 93
11.1.1 Runtime.conf .................................................................................................................. 93
11.1.2 Pipeline.conf ................................................................................................................... 99
11.2 GenerateMap.py ......................................................................................................... 101
11.3 ISO3166-1-Alpha-2.txt ............................................................................................... 104
11.4 GenerateChart.py ...................................................................................................... 109
2
2 ACKNOWLEDGEMENTS
I am using this section to express my warm thanks to the University of Alicante,
which gave me the opportunity to study at Athlone Institute of Technology using an
Erasmus program during the academic year 2017/2018. Moreover, I am thankful to
my whole family because they supported me in my decision to study abroad. In this
sense, I would like to make a special mention to parents, Antonio and Pilar, and my
little sister, Nuria, for their advices and counseling during this project.
3
3 ABSTRACT
Cybercrime activity has been growing over the years and there is no evidence that
this tendency will stop in the future. Hence, this act raises the obligation of the
organization’s cybersecurity team to strengthen the cybersecurity in order to avoid
serious damages in a connected world. Nowadays, there are some external sources
which identify a large amount of data related with cyber threats with up-to-date
information that organization’s cybersecurity identifies. However, the data allocated
in these resources are quite heterogeneous and they are presented to the cyber
analyst in different formats (text files, HTML pages, csv files…) and structures. This
situation makes the study and the analysis of the threat feeds quite tough. For this
reason, it is necessary to utilize some mechanism to correlate these threat feeds.
Therefore, this paper describes how to integrate and correlate the obtained data
from a few external sources related with cyber threats using a tool called IntelMQ.
Then, we will perform some visualizations using scripts which are coded in Python 3
programming language with specific frameworks and libraries in order to extract
results, evaluations and conclusions.
4
4 INTRODUCTION
The connected electronic information network has become an integral part of our
lives. In fact, all kinds of organizations (financial, medical, education institutions,
governments…) use the network for collecting, processing, storing and sharing
amounts of digital information which could be bank accounts, passwords, private
documents, contracts or personal identities, among others. The protection and
control of this data is crucial to guarantee the privacy and the safety of the user over
the Internet. In this sense, Cybersecurity has an important role because it
investigates the way to protect the systems which are connected over the network
from unauthorized use or harm. Related to the previous topic, Cyber Threat
Intelligence (CTI) is based in services and a set of organised files with in-depth
information about specific threats, which provides reports and analysis to the users
through external feeds or security feeds. Then, this paper reflects the study,
analyse and the correlation of some of the existing cyber threats using a
platform which allows to obtain the data from these security feeds.
Nowadays, there are studies which confirm that the cyberattacks have been
increased in the recent years due to the attackers are finding new ways to target
networks to access, change, destroy, extorting or interrupting digital data over the
Internet. As an example, the number of ransomware attacks increased 300% in 2016
in relation to the previous year where 1,000 ransomware attacks were seen per day
[1]. As for bot activity, Symantec observe an increase of 6.7 million hosts in 2016 [2].
Moreover, the new attacks are mostly distributed and reported by different tools
which may seem normal activity individually. The main problem is that multiple
alerts should be correlated together to raise an alarm of an actual attack.
The interest in this field derives from the recognition that it is impossible to stop
technically advanced adversaries without foreknowledge of their intentions and
methods.
5
Then, the main goal of the paper is to gather alerts from diverse Threat
Intelligence resources and perform a statistical analysis in which the data
is collected, examined, summarized, manipulated and interpreted to
discover patterns, trends, relationships or underlying causes. The tool that
we are going to focus in order to obtain the data related with cyber threats will be
IntelMQ. It consists a solution for CERTs (Computer Emergency Response Team)
for collecting and processing security feeds using a message queue protocol [3]. After
this, we will create a couple of scripts to analyse and study the data that we have
obtained in a graphical way: map of the world, that indicates where the attacks
come from, tables and charts. The results will depend of the parameter that the user
introduces in these scripts.
The rest of this paper is organized as follows. Section five provides a Related
Work in which we will describe other studies that are working on the correlation of
attacks. Section six explains the Methodology, Architecture and Design where
this work will be described using diagrams with the system components and their
functionality. Section seven talks about the Evaluation and Results which are
essential to understand the got achievements and the proposal itself. We conclude
the whole study in the section eight. After that, there is a Declaration chapter
which indicates that this is a unique research. Moreover, the document has a
Bibliography, which provides a list of links used in this research. The paper ends
with the code that we have used to perform our study in the Appendix section.
6
5 RELATED WORK
There are a few studies about the correlation of cyber threats. The major part of
literature provides a discussion of spam features [4]. However, only a few studies
also include other types of malicious traffic. The author of [5] focuses an analysis of
alert reports of various detection systems deployed at a local network, called
National Research and Education Network in Czech Republic (CESNET NREN)
such as honeypots as well as flow-based traffic analysis systems. The study splits the
alerts by their source and attack type, i.e.: scanning activity, bruteforce, web accesses
on honeypots and SYN flood attacks… into individual datasets to make the analysis
from two perspectives. The first one is about the time correlations of alerts where
the authors ask whether is it usual that the same IP address is detected and
reported as malicious repeatedly and how long does it take for such address to be
reported again. In this sense, the study demonstrates that the observation of more
than one report from the same IP address is probably affected by dynamic address
assignment, with causes that a single malicious host could own other IP addresses.
The second perspective discusses the correlations between individual types of alerts,
i.e. how many addresses from one dataset group can be found in another group as
well and where datasets are grouped by their type of malicious traffic. The obtained
results evidences that characteristics of malicious traffic from blacklist and other
sources are valid when observing traffic in a local network.
7
structured presentation of interconnected and linked objects in order to reveal
correlations. Then, the output of the program is a filtered structured dataset, which
is clustered based on common linked patterns from all involved sources. Thereupon,
the result of the study, demonstrates that deLink method facilitates the detection of
correlations in evidence existing on the hard drives of multiples machines.
Most of the related work utilize its own mechanisms to perform correlations such as
local networks or specific methods. In contrast, this research gathers alerts from
external security feeds perform some data mining to extract correlation patterns
using an open source platform which is easy to handle and manipulate called
IntelMQ. Since it is open source, the software is available to the use for its use and
modification from its original design. After that, the Python scripts will provide us a
better way to analyse and study the results that we have obtained from this
program.
8
6 METHODOLOGY, ARCHITECTURE AND DESIGN
The information and storage of cyber threats is quite important in the management
of cybersecurity. As soon as the incidents and the vulnerabilities are detected, a
management process is generated which creates plenty of information that is
necessary to know and process in the shortest possible time. These data come from
several tracking methods from cybersecurity organizations (private or public). The
process and sharing these information is, therefore, a critic aspect in the
management of the cybersecurity. Nowadays, the cybersecurity community offers
diverse update information resources related with cyber threats. However, the way
to give them to the user is quite diverse and its integration could be difficult for the
platforms who are looking for automate them. In this sense, the sources that we are
going to use in our study are the following:
1. http://www.abuse.ch: It offers threads referring to the monitoring of the
threats corresponding to the harmful code of ZeuS, Palevo, SpyEye and Feodo.
1.1. Zeus is a Trojan malware that runs on Microsoft Windows versions. It
is used to steal banking information by registering browser keys and
hoarding forms.
1.2. Palevo is a worm-type malware that affects computers with a Windows
operating system. Once a computer have been infected, it becomes part
of a network of bots, which are controlled remotely by a central node. It
can be used to carry out a multitude of criminal activities, for example
in denial of service (DoS) attacks. Another feature is the ability to block
security software.
1.3. SpyEye is a banking Trojan that allows an attacker to create a botnet
very easily and collect sensitive data from its victims.
1.4. Feodo is another banking Trojan which can record sensitive user data
such as bank access credentials, cards and other additional services
such as PayPal or Amazon. When the victim access to the online
banking site and before the transmission of data by HTTPS, the Trojan
9
saves the same in plain text that are then collected and sent to the
attacker.
2. http://malwaredomains.lehigh.edu/: It is promoted by the North American
university Lehigh and it provides information related with a various
malicious code.
3. http://www.malwaredomainlist.com: It is a non-commercial initiative that
provides lists of domains related to harmful code: phishing, fraud, Trojan,
ransomware…
4. http://www.phishtank.com: Open initiative for URL reporting of phishing
sites.
5. http://malc0de.com/dashboard/: It provides a large malware database,
malicious websites and spam in a text files.
6. https://www.spamhaus.org/: The Spamhaus Project is a non-profit
organization that looks for spam, phishing, malware and botnets, providing
real-time, highly accurate, actionable threat intelligence.
In order to provide a better way to analyse the data allocated in each website, we
are going to use a platform whose functionality is based in graphs. A graph is a
model of interconnected data where the connections are just important as the data
elements themselves. They are modelled as nodes, edges and properties. In fact,
many technologies exist to work with graphs including graph databases, graph
analytics and graph visualization libraries. Thereupon, it is a visual model of data,
and it could be accessible by non-scientists and they can convey a deeper
understanding of the information. In fact, graphs are used by cyber security and
cyber intelligence, anti-fraud, government and intelligence because the data is very
complex for the following reasons:
1. Large: for big organizations, storing years of raw data means a large amount
of pieces of information.
10
2. Unstructured: the data is coming from different sources, it could be
incomplete and evolves. Therefore, it is hard to employ a structured data
model.
3. Dynamic: the IT systems generate new data constantly.
As an example, the next two images correspond to two screenshots of data which
represents malicious sites. The data is allocated in two different websites, which
come from Spamhaus and Abuse.ch. As we can see, the data is given in a different
format (text file and URL) and with different values (IP and website):
The security teams use graphs to extract insights from complex data in order to
provide distinct points of view. From the analytical point of view, it helps to analyse
large datasets to find interesting data. From the visualization point of view, it helps
users to interpret the data and, therefore, make smart decisions. In this context, the
name of the platform that we are going to use in this research is IntelMQ.
6.1 INTELMQ
It is an open source program, which means that the user can modify the source of
the program without license restrictions. Moreover, it consists a solution for the
Information Technology (IT) security teams such as CSIRTs, CERT, abuse
departments… to process and collect security feeds using a message queueing
system to process properly the different external sources. In our case, we are going
to use Redis, which is an in-memory database engine, based on storage in hashes
tables (key / value). Basically, IntelMQ processes the data mostly automatically,
11
ensures the accuracy, enrich the data (AS, geolocation) and filter it for collecting and
processing threat intelligence.
The design of this platform was influenced by AbuseHelper which is a tool that
allows the redistributing and receiving threats and abuse feeds as well, but IntelMQ
was coded from scratch to reduce the complexity and the losses when the tool is
performing the correlation. Moreover, it provides and easy manner to store the
results, to create your own blacklists, the communication with other systems such as
RestFul API. Its configuration files are written in JSON format in order to
understand the configuration files in an easy way.
12
Each bot owns a source queue and it can have several destination queues.
Nevertheless, the outputs don’t have destination queues. Moreover, multiple bots
can write to the same queue. As a result, there will be multiple inputs for the next
bot. Every bot runs in a separate process and each of them is identified by a unique
number, called bot-id. Currently we can execute multiple processes of the same bot
in parallel with different bots-id.
/opt/intelmq
/etc /var
.log files
defaults.conf /bots .pid files
and .dump
harmonization.con
f /file-output
pipeline.conf events.txt
runtime.conf
13
The folder /var is used by the program in order to provide the file output where the
cyber threats are allocated, the log files and dump files are stored in the /log
directory and finally the number of pib for each bot is stored in the /run folder. We
will describe these files and folders below:
/etc/ directory
Defaults.conf
It contains the predetermined values for all bots and their behaviour, error handling
and registration options. The next table shows these values:
{
"accuracy": 100,
"broker": "redis",
"destination_pipeline_db": 2,
"destination_pipeline_host": "127.0.0.1",
"destination_pipeline_password": null,
"destination_pipeline_port": 6379,
"error_dump_message": true,
"error_log_exception": true,
"error_log_message": true,
"error_max_retries": 3,
"error_procedure": "pass",
"error_retry_delay": 15,
"http_proxy": null,
"http_timeout_max_tries": 3,
"http_timeout_sec": 30,
"http_user_agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36",
"http_verify_cert": true,
"https_proxy": null,
"load_balance": false,
"log_processed_messages_count": 500,
14
"log_processed_messages_seconds": 900,
"logging_handler": "file",
"logging_level": "DEBUG",
"logging_path": "/opt/intelmq/var/log/",
"logging_syslog": "/dev/log",
"proccess_manager": "intelmq",
"rate_limit": 0,
"source_pipeline_db": 2,
"source_pipeline_host": "127.0.0.1",
"source_pipeline_password": null,
"source_pipeline_port": 6379
}
Value Description
Broker It allows the option to select which broker you
want to use. In this case, the value is “Redis”.
destination_pipeline_db It corresponds to the broker database which the
source_pipeline_db
bot will use to connect and exchange messages.
This is a requirement for Redis broker. The
value that we used is 2 for both attributes.
destination_pipeline_host In this case, this refers to the broker IP, FQDN
source_pipeline_host
or Unix socket that the bot will use to connect
and send messages. The value is 127.0.0.1.
destination_pipeline_password It corresponds to the password of the broker
source_pipeline_password
that the bot will use to connect and exchange
messages. It can be null for unprotected broker.
The value is null.
destination_pipeline_port It corresponds to the broker port that the bot
source_pipeline_port
will use to connect and exchange messages. Its
value can be null for Unix socket. The value is
6379.
15
error_dump_message If the value is true, it indicates whether the bot
will write queued up messages to its dump file.
The dump file is used to see the possible errors
when we run the botnet.
error_log_exception If the value is true, it indicates that if there is
an exception when we run the botnet, the option
will allow to write errors reports on the log file.
error_log_message If the value is true, it indicates that if there is
an error when we run the botnet, the option will
allow to write errors reports on the log file.
error_max_retries If there is an error, the bot will try to re-start
processing the current message the number of
times which are defined here. In this case, the
value is 3.
error_procedure If there is an error, this option defines the
procedure that the bot will adopt. The value
here is pass.
error_retry_delay It is an integer value which defines the number
of seconds to wait between subsequent re-tries
in there is an error. The value in this case is 15.
http_proxy It is a HTTP proxy the that bot will use when
performing HTTP requests. For instance:
bots/collectors/collector_http.py. Since we have
null value in this field, this parameter does not
affect to our results.
http_timeout_max_tries This field defines the number of times that the
bot will try to connect when there is a timeout.
In this case, the value is 3.
http_timeout_sec It defines the seconds of the timeout. In this
16
case, the value is 30.
http_user_agent It defines the user-agent string that the bot will
use when performing HTTP/HTTPS requests.
The values selected are Mozilla, AppleWebKit,
Chrome and Safari.
http_verify_cert If the value is true, it indicates that if the bot
will verify SSL certificates when performing
HTTPS requests.
https_proxy It defines the HTTPS proxy that the bot will use
when performing secure HTTPS requests. Since
we have null value in this field, this parameter
does not affect to our results.
load_balance It allows to choose the behaviour of the queue. If
the value is true, it splits the message into
several queues without duplication and if the
value is false it duplicates the message into each
queue.
log_processed_messages_count It defines the count of log processed messages,
500 in this case.
log_processed_messages_seconds It defines the seconds of log processed messages,
900 in this case.
logging_handler There are two options: "file" or "syslog".
logging_level It is used to define the system-wide log level
that will be use by all bots and the intelmqctl
tool. The possible values are "CRITICAL",
"ERROR", "WARNING", "INFO" and "DEBUG".
logging_path It only can be applied when the logging_handler
property is file. Basically, it defines for the
system-wide log/ folder that will be use by all
17
bots and the intelmqctl tool. Default value is
allocated in /opt/intelmq/var/log/
logging_syslog It only can be applied when the logging_handler
property is syslog. Either a list with hostname
and UDP port of syslog service, e.g. the default
value is allocated in "/var/log".
rate_limit It is an integer which indicates the time interval
(in seconds) between messages processing. In
our study, the time interval is 0.
Runtime.conf
It contains the configuration for the individual bots by specifying specific fields.
Thereupon, each bot which are defined here, corresponds to a node in the graph.
Structure:
"<bot ID>": {
"group": "<bot type (Collector, Parser, Expert, Output)>",
"name": "<human-readable bot name>",
"module": "<bot code (python module)>",
"description": "<generic description of the bot>",
"parameters": {
"<parameter 1>": "<value 1>",
"<parameter 2>": "<value 2>",
"<parameter 3>": "<value 3>"
}
}
}
Examples:
"abusech-feodo-ip-collector": {
"parameters": {
"feed": "Abuse.ch Feodo IP",
"provider": "Abuse.ch",
"http_url":
"https://feodotracker.abuse.ch/blocklist/?download=ipblocklist",
"http_url_formatting": false,
"http_username": null,
18
"http_password": null,
"ssl_client_certificate": null,
"rate_limit": 129600
},
"name": "Generic URL Fetcher",
"group": "Collector",
"module": "intelmq.bots.collectors.http.collector_http",
"description": "Abuse.ch Feodo IP",
"enabled": true,
"run_mode": "continuous"
},
"Abusech-IP-Parser": {
"parameters": {},
"name": "Abuse.ch IP",
"group": "Parser",
"module": "intelmq.bots.parsers.abusech.parser_ip",
"description": "Abuse.ch IP Parser is the bot responsible to parse the
report and sanitize the information.",
"enabled": true,
"run_mode": "continuous"
},
The previous table shows the values that each bot can have. Firstly, the group
attribute indicates if the bot pertains to Collector, Parser, Expert or Output.
The name and module fields indicate the bot code, which corresponds to a name in
order to identify each bot. Moreover, there can be a description to provide details of
the bot.
After that, we can add additional parameters such as ssl certificates, username or
password in http or its formatting. In some cases, the bots are configured as
continuous run mode in order to have them always running in order to take the
data constantly. In addition to this, if the value of enable is true means that the
bot is started when we start the whole botnet. To disable a bot, we should change
the value of the previous attribute to false. This file is allocated in the appendix
section and in the remote repository:
https://github.com/jgfc1/ThesisRepository/blob/master/IntelMQ/runtime.conf
19
Pipeline.conf
It defines the source and the destination queues per bot (the edges of the graph).
Structure:
"<bot ID>": {
"source-queue": "<source pipeline name>",
"destination-queues": [
"<first destination pipeline name>",
"<second destination pipeline name>",
...
] },
Example:
"abusech-feodo-ip-collector": {
"destination-queues": [
"Abusech-IP-Parser-queue"
]
}
"Abusech-IP-Parser": {
"source-queue": "Abusech-IP-Parser-queue",
}
We can observe the contents of the whole file in the Appendix section and through
this website:
https://github.com/jgfc1/ThesisRepository/blob/master/IntelMQ/pipeline.conf
20
Harmonization.conf
This file contains the configuration to specify the fields for all the message types. In
fact, the harmonization library will load this configuration to check if the values are
according to the harmonization format. This file is maintained by IntelMQ platform.
Structure:
{
"<message type>": {
"<field 1>": {
"description": "<field 1 description>",
"type": "<field value type>"
},
"<field 2>": {
"description": "<field 2 description>",
"type": "<field value type>"
}
},
}
Example:
"feed.accuracy": {
"description": "A float between 0 and 100 that represents how
accurate the data in the feed is",
"type": "Accuracy"
},
"feed.name": {
"description": "Name for the feed, usually found in collector bot configuration.",
"type": "String"
},
BOTS
It contains the configuration hints for all the bots. This file can be accessed through
this URL:
https://github.com/jgfc1/ThesisRepository/blob/master/IntelMQ/BOTS
21
/var/ directory
events.txt
This is the file where the cyber threats will be correlated in order to perform some
visualization to extract an analysis of the data. We will observe an example of this
file in the next chapter.
.pid files
They consist in files with a number, which identifies the bot active processes.
22
23
Each bot has a distinct functionality, let’s walk through each of them:
Collector bots
........…………………….................…………………….................…………………….................
Abusech-Zeus-
https://zeustracker.abuse.ch/blocklist.php?download=domain
Domainblocklist-
blocklist
Collector
PhishTank-Collector http://data.phishtank.com/data/online-valid.csv
Malware-Domain-List-
http://www.malwaredomainlist.com/hostslist/mdlcsv.php
Collector
Malc0de-Windows-
http://malc0de.com/bl/BOOT
Format-Collector
Parser bots
24
Expert bots
Url2fqdn-expert It is the bot responsible to parsing the fqdn from the url
GethostbyName-2-
It is the bot responsible to parsing the ip from the fqdn.
expert
GetHostByName-1-
expert
Cymru-Whois- It is the bot responsible to add network information to the
Expert events (BGP, ASN, AS Name, Country, etc …)
Output bots
File-output It is the bot responsible to send events to a file
The files that we have commented before are related with the graph in the sense
that each bot (node) is defined in the runtime.conf file and each relationship
between bots (edge) are defined in the pipeline.conf.
6.3 SCRIPTS
Once we have obtained the file output, the next step is to provide some
visualizations using charts or maps of the world indicating the number of attacks
that we have identified in order to realise a better analysis about the external threat
feeds that we have selected for this research. For that, we have created two scripts
in Python using Object Oriented Programming. These scripts are allocated on a
25
remote repository (GitHub) that we can access here:
https://github.com/jgfc1/ThesisRepository and they are allocated in the appendix
section as well. The scripts use the file output that we have generated with IntelMQ
as an input of the program.
First of all, we have a class called ‘Struct’, which provides two variables: the
taxonomy and an integer (count). There is a constructor and functions which
are used to obtain the value of the previous variables (getTaxonomy and
getCount).
The “Chart” class is used to generate the graph itself. The methods that this
class uses are:
26
a. getFileEvents: it returns the name of the fileEvents.
b. obtainDistinctTaxonomy: it eliminates duplicates from the list of the
taxonomy.
c. countTaxonomy: this function will count the taxonomy from the list.
d. printTaxonomy: this function will print the name of the countries and
its occurrence.
e. getOcurrences: this function will count the times that a specific
attribute appears in the file events.txt.
f. loadData: it stores the data generated.
g. createChart: it generates the graph itself with the bubbles around it.
2. generateMap.py: this script will create a map of the world with a specific
number of circles using a determined size. The size represents how many
countries there are in the file-output of IntelMQ. Then, the program takes the
code of the countries in the iso3166-1-alpha-2 (i.e. IE is the country code of
Ireland or ES is the country code of Spain) format that IntelMQ has identified
of each cyber threat resource in the file-output and then it compares each code
with another file, iso3166-1-alpha-2.txt, which indicates the code of the
country, the latitude, longitude and the name of country. After that, it takes
the latitude and longitude of each country identified and represents it into the
map of the world with circles. The size of the circle indicates the size of cyber
threats in each country (bigger size, more attacks). There can be the
possibility that some countries don’t have attacks.
For that, we have used folium, which allows to add bubbles to a map in which
each bubble has a size related to a specific value. In addition to this, the
program provides another file output, whose name is output.txt, which
indicates the number of attacks per country in descending mode. The
following picture illustrates the class diagram that the script follows in order
to perform its functionality and give the map of the world with the points:
27
In this case, we have a class, whose name is “Country” with the attributes
name, count, latitude, longitude and its constructor. However, the class “Map”
has the attributes “fileEvents” and “fileCodeISOCountries”, both
strings. The methods are:
1. getFileEvents: it returns the name of the fileEvents.
2. getFileCodeISOCountries: it gets the file with the name of the countries
and their longitude and latitude (geolocation).
3. obtainDistinctCountries: it returns a new list without duplicates of
the countries.
4. countCountries: this function will count the countries from the list.
5. printCountriesCount: this function will print out the name of the
countries and its ocurrence in the command prompt.
6. getOcurrences: this function will count the times that each country
appears in the file event.txt.
7. loadData: this function makes a data frame (data table) with the points to
show on the map.
8. createMap: it generates the map itself with the bubbles around it in red
colour.
9. obtainLongitudeLatitude: it obtains the longitude and latitude of each
country identified using the file iso3166-1-alpha-2.txt. This file is
allocated in the Appendix section.
28
6.4 GENERAL OVERVIEW OF THE STUDY
Summarizing, the following picture represents the general schema that we are going
to follow in this study. First of all, we will take diverse external sources related with
threat feeds. Then, we will use the platform IntelMQ, which is based in graphs, in
order to correlate the external feeds that we have selected. Then, we will use the
output of the platform (events.txt), which contains all the correlated thread feeds,
as an input of two python scripts in order to perform a statistical analysis and some
data visualization. Finally, we will provide an evaluation of the results and a
conclusion of the study:
29
7 EVALUATION AND RESULTS
7.1 REQUIREMENTS
IntelMQ and the Python scripts need a specific hardware and software requirements
in order to install it and work properly.
In this context, we are going to use Ubuntu 16.04 LTS to install the platform.
7.2 INSTALLATION
Once the requirements have been accomplished, we should install the necessary
dependences by typing the following commands in the command prompt:
apt-get install python3 python3-pip
apt-get install git build-essential libffi-dev
apt-get install python3-dev
apt-get install redis-server
apt install python3-pip python3-dnspython python3-psutil python3-redis python3-
requests python3-termstyle python3-tz
apt install git redis-server
30
sudo sh -c "echo 'deb
http://download.opensuse.org/repositories/home:/sebix:/intelmq/xUbuntu_18.04/ /' >
/etc/apt/sources.list.d/home:sebix:intelmq.list"
wget -nv
https://download.opensuse.org/repositories/home:sebix:intelmq/xUbuntu_18.04/Release.k
ey -O Release.key
Note that the previous commands can take some time. In this context, we have
downloaded successfully the installation of the files and folders that IntelMQ needs
(/etc and /var). However, it is necessary to include extra commands to install the
graphic interface (IntelMQ Manager). For that, we have to install the following
dependences:
After this, we should type the following lines in the command prompt:
sudo sh -c "echo 'deb
http://download.opensuse.org/repositories/home:/sebix:/intelmq/xUbuntu_18.04/ /' >
/etc/apt/sources.list.d/home:sebix:intelmq.list"
wget -nv
https://download.opensuse.org/repositories/home:sebix:intelmq/xUbuntu_18.04/Release.k
ey -O Release.key
We will be asked for a username and a password during the installation. After this,
IntelMQ Manager has been installed in our computer. At this point, we will able to
access to each directory using the command prompt.
31
Moreover, we can access to the platform using the web-browser by typing localhost:
32
7.3 UTILIZATION OF THE SOFTWARE
7.3.1.1 Intelmqctl
Intelmqctl is the main tool for managing the platform. We will focus on the basic
activities that this command can provide us (start, stop, status, restart,
reload and list):
33
7.3.1.1.3 List bots
• Intelmqctl list bots: it shows the id of the bots which are in the current
botnet.
7.3.1.1.5 Help
There is an option, which is intelmqctl –h, which provide us a list of commands
that we can perform:
$ intelmqctl -h
usage: intelmqctl [-h] [-v] [--type {text,json}] [--quiet]
{list,check,clear,log,run,help,start,stop,restart,reload,status,enable,disable}
...
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
--type {text,json}, -t {text,json}
choose if it should return regular text or other
machine-readable
--quiet, -q Quiet mode, useful for reloads initiated scripts like
logrotate
subcommands:
{list,check,clear,log,run,help,start,stop,restart,reload,status,enable,disable}
list Listing bots or queues
check Check installation and configuration
clear Clear a queue
log Get last log lines of a bot
run Run a bot interactively
check Check installation and configuration
help Show the help
start Start a bot or botnet
stop Stop a bot or botnet
restart Restart a bot or botnet
34
reload Reload a bot or botnet
status Status of a bot or botnet
enable Enable a bot
disable Disable a bot
Starting a bot:
intelmqctl start bot-id
Stopping a bot:
intelmqctl stop bot-id
Reloading a bot:
intelmqctl reload bot-id
Restarting a bot:
intelmqctl restart bot-id
Get status of a bot:
intelmqctl status bot-id
Run a bot directly for debugging purpose and temporarily leverage the logging level
to DEBUG:
intelmqctl run bot-id
Get a pdb (or ipdb if installed) live console.
intelmqctl run bot-id console
See the message that waits in the input queue.
intelmqctl run bot-id message get
See additional help for further explanation.
intelmqctl run bot-id --help
35
Get a list of all queues:
intelmqctl list queues
If -q is given, only queues with more than one item are listed.
Clear a queue:
intelmqctl clear queue-id
7.3.1.2 Intelmqdump
When bots are failing due to programming errors or bad input, they can write the
problematic message to a .dump file. As we have explained previously, these dump
files are saved at the directory: /opt/intelmq/var/log/[botid].dump with a
JSON format. In this context, intelmqdump is an interactive tool to show these
dumped files and the number of dumps per file as well. The following screenshot
represents an example of the functionality of this file:
The number means that, when we have executed the program. In this example,
there are 1287 bad input data for the bot abusech-domain-parser. In particular,
most of the errors are due to the program cannot obtain a parameter defined in the
expert bots from the data that we are using.
36
7.3.1.2.1 Help
There is the possibility to obtain a list of the actions that this platform can perform
by typing intelmqdump – h:
$ intelmqdump -h
usage:
intelmqdump [botid]
intelmqdump [-h|--help]
intelmqdump can inspect dumped messages, show, delete or reinject them into
the pipeline. It's an interactive tool, directly start it to get a list of
available dumps or call it with a known bot id as parameter.
positional arguments:
botid botid to inspect dumps of
optional arguments:
-h, --help show this help message and exit
37
$ intelmqdump
id: name (bot id) content
0: abusech-domain-parser 1287 dumps
38
4. About: this place is used to learn and read more about the project’s goals and
the contributions.
7.3.2.1 Configuration
It allows to perform operations (add, edit, clear, delete) with the nodes and edges.
Moreover, we can redraw the botnet, clear configuration and save all the changes
that we have done:
7.3.2.1.1 Add:
We are going to add nodes (collector, parser, expert and output) in the IntelMQ
platform using IntelMQ Manager in order to create the whole botnet that we are
going to use in our study.
39
7.3.2.1.1.1.1 Spamhaus-Drop-Collector
40
"provider": "Spamhaus",
"rate_limit": 3600,
"ssl_client_certificate": null
},
"enabled": true,
"run_mode": "continuous"
},
7.3.2.1.1.1.2 Abusech-Feodo-Ip-Collector
The only attributes that we should change are the id, description, feed,
provider and http_url. Therefore, we will provide the following table:
Generic Id abusech-feodo-ip-collector
Provider Abuse.ch
Http_url https://feodotracker.abuse.ch/blocklist/?download=ipblocklist
41
7.3.2.1.1.1.3 Abusech-Zeus-Baddomains-Collector
Generic Id abusech-zeus-baddomains-collector
Description Generic URL Fetcher is the bot responsible to get the report
from an URL.
Runtime Feed Abuse.ch Zeus Collector
Provider Abuse.ch
Http_url https://zeustracker.abuse.ch/blocklist.php?download=baddomains
7.3.2.1.1.1.4 Abusech-Zeus-Domainblocklist-Collector
Generic Id abusech-zeus-domainblocklist
Description Zeus Tracker
Runtime Feed Abuse.ch Zeus Domain
Provider Abuse.ch
Http_url https://zeustracker.abuse.ch/blocklist.php?do
42
The runtime.conf file is updated:
"abusech-zeus-domainblocklist-collector": {
"parameters": {
"feed": "Abuse.ch Zeus Domain Block List",
"provider": "Abuse.ch",
"http_url":
"https://zeustracker.abuse.ch/blocklist.php?download=domainblocklist",
"http_url_formatting": false,
"http_username": null,
"http_password": null,
"ssl_client_certificate": null,
"rate_limit": 129600
},
"name": "Generic URL Fetcher",
"group": "Collector",
"module": "intelmq.bots.collectors.http.collector_http",
"description": "Zeus Tracker",
"enabled": true,
"run_mode": "continuous"
},
7.3.2.1.1.1.5 PhishTank-Collector
Generic Id Phishtank-collector
Description Generic URL Fetcher is the bot responsible to get the report
from an URL.
Runtime Feed Phishtank
Provider Phishtank-Collector
Http_url https://www.phishtank.com/developer_info.php
43
"rate_limit": 129600
},
"name": "Generic URL Fetcher",
"group": "Collector",
"module": "intelmq.bots.collectors.http.collector_http",
"description": "Generic URL Fetcher is the bot responsible to get the
report from an URL.",
"enabled": true,
"run_mode": "continuous"
},
7.3.2.1.1.1.6 Malware-Domain-List-Collector
Generic Id Malware-domain-list-collector
Http_url http://www.malwaredomainlist.com/mdl.php
44
7.3.2.1.1.1.7 Malc0de-Windows-Format-Collector
Generic Id Malc0de-windows-format-collector
Description Generic URL Fetcher is the bot responsible to get the report
from an URL
Runtime Feed Generic Fetcher
Provider Malc0de
Http_url http://malc0de.com/bl/BOOT
45
7.3.2.1.1.2 Adding parser bots:
We have to create a parser bot each collector bot. For that, we click in the Add
button and we generate the parser bot using the graphical interface. The last step is
the create a relationship between these two nodes by clicking “add edge”.
7.3.2.1.1.2.1 Spamhaus-Drop-Parser
46
7.3.2.1.1.2.2 Abusech-Ip-Parser
47
7.3.2.1.1.2.3 Abusech-Domain-Parser
48
7.3.2.1.1.2.4 PhishTank-Parser
49
7.3.2.1.1.2.5 Malware-Domain-List-Parser
50
7.3.2.1.1.2.6 Malc0de-parser
The next image represents the bots that we have created so far:
51
7.3.2.1.1.3 Adding expert bots:
Being at this point, we are going the deduplicator-expert bot. For that, we click in
the menu allocated to the left and we select experts, deduplicator expert. Once we
have clicked, we accept the default configuration and we add relationships between
the parser nodes and the expert nodes that we have created:
52
7.3.2.1.1.3.1 Deduplicator-Expert
53
7.3.2.1.1.3.2 Taxonomy-Expert
54
7.3.2.1.1.3.3 Url2fqdn-expert
55
7.3.2.1.1.3.4 GethostbyName-2-expert and GethostbyName-1-expert
56
7.3.2.1.1.3.5 Cymru-Whois-Expert
57
The following picture, represents the nodes that we have created so far:
58
The file runtime.conf is updated:
"file-output": {
"description": "File is the bot responsible to send events to a file.",
"group": "Output",
"module": "intelmq.bots.outputs.file.output",
"name": "File",
"parameters": {
"file": "/opt/intelmq/var/lib/bots/file-output/events.txt",
"hierarchical_output": false
},
"enabled": true,
"run_mode": "continuous"
},
The picture which are allocated below corresponds to the graph that we have
generated so far:
59
The changes is also applied in the pipeline.conf file (we can see its contents in
the Appendix section).
60
7.3.3 Management
It allows the possibility to manage the individual bots or manage the whole botnet
using the operations that we have commented before: start, stop, restart and
reload. In addition, it lists all the current bots with their states. Initially, all the
7.3.4 Monitor
It shows the number of cyber threats which IntelMQ has to process in a graphical
way (queues):
61
7.4 OBTAINING FILE OUTPUT:
There are two ways to get the file output: by command lines or by IntelMQ Manager.
We can see the status of each bot if we introduce in the command line intelmqctl
status:
62
We can use the command prompt in order to watch the queues as well. In fact, the
following image shows us a piece of the result that we can see in the command
prompt:
The file output that we are looking for is allocated in the file /var/lib/bots/file-
output.
Having said that, let’s run the whole botnet in order to obtain the file output and
perform some visualizations later on. For that, we click the button on the
management main screen. Note that all the bots are in green, which indicates that
each bot is running:
63
If we pop up on the monitor button, we will see the queues per bot. Note that the
number indicates the number of lines that each bot has to process in order to
perform the correlation:
This operation takes some time, because we are processing large amounts of data
from several external sources. Moreover, if we pop up one bot, we can observe the
64
logs messages. The picture below represents an example of this. It shows the logs
messages of abuse-domain-parser-queue:
As we can observe, there is no errors. After this, we have obtained the file in the
folder /var/lib/bots/file-output where we can look at each correlated cyber
thread. The file is also available on the remote repository and its URL is:
https://github.com/jgfc1/ThesisRepository/blob/master/Map%20World/events.txt
The following image represents an example of the file, whose name is events.txt:
65
{"feed.accuracy": 100.0, "feed.name": "Abuse.ch Zeus Bad Domains", "feed.provider":
"Abuse.ch", "feed.url":
"https://zeustracker.abuse.ch/blocklist.php?download=baddomains", "time.observation":
"2018-04-15T16:37:27+00:00", "classification.taxonomy": "malicious code",
"classification.type": "c&c", "source.fqdn": "afobal.cl", "raw": "YWZvYmFsLmNs",
"malware.name": "zeus", "source.ip": "66.7.198.165", "source.asn": 33182,
"source.network": "66.7.192.0/19", "source.geolocation.cc": "US", "source.registry":
"ARIN", "source.allocated": "2006-05-18T00:00:00+00:00", "source.as_name": "DIMENOC -
HostDime.com, Inc., US"}
In the previous example, we can observe the following parameters with its specific
value:
• Feed.accuracy: It is a decimal number between 0 and 100 that represents how
accurate is the information that we have obtained from the external sources.
• Feed.name: It corresponds to the name of the feed.
• Time.observation: It corresponds to the time in which the source bot have seen the
event (threat).
• Classification.taxonomy: Cyber threats can be grouped using a specific
classification which the European Union Agency for Network and Information
Security has established.
66
• Classification.type: once the program has classified the threat feed using its
taxonomy; it defines the type. The following table illustrates these two types of
classification:
Virus
Trojan
Software that is included or inserted in a
Worm computer system to harm or damage to the
Malicious code
Spyware final user.
Dialler
Rootkit
Information
Sniffing Recording and observing network traffic.
Gathering
67
They try to disrupt a service or compromise a
Exploiting known
system by exploiting vulnerabilities with a
vulnerabilities
specific identifier.
Privileged account
compromise These threats consist on a successful
Unprivileged compromise of a system or application
account (service). In fact, it could have been caused
Intrusions
compromised remotely by a new vulnerability, unauthorized
Dos
In this case, a system is bombarded with
DDoS several packets. As a result, the operations
Availability
could be delayed or the system could be
Sabotage
crashed.
Outage (no malice)
Unauthorized access
Information to information This kind of the attacks intercept and access
Content Unauthorized information during transmission (wiretapping,
resources
68
Unauthorized use of It uses sources for unauthorized purposes. For
The incidents which are not listed in one of the previous classes should
Other
be here.
• Source.fqdn: It corresponds to the DNS name related to the host from which the
connection originated.
• Raw: It is a line of the event from encoded in base64.
connection.
• Source.network: It uses the CIDR (BGP prefix) system in order to provide the
source.network.
• Source.geolocation.cc: It provides the country-code using the ISO3166-1
69
• Source.registry: It indicates the IP registry in which a given IP address is
allocated.
• Source.allocated: Allocation date corresponding to BGP prefix.
connection created.
• Event_description.target: It gives the target (organization) of an attack.
70
The result of the program is a html file which contains the map of the world using
bubbles or circles with a specific size. The file is called mymap.html and we can
show its content through this link:
https://github.com/jgfc1/ThesisRepository/blob/master/Map%20World/mymap.html
The above image illustrates that north America is the place where there are more
originated attacks. In fact, United States (E.E.U.U) is the country which performs a
large amount of cyber threats in our study (56,041%) followed by Canada (2,156%).
In contrast, in south America there are less quantity of generated attacks in which
Brazil (1,257%) is the country which performs the major part of cyber threats.
In second place, we have Europe with the countries Netherlands (6,667%), Germany
(2,898%), France (2,874%), Italy (1,995%) and Poland (1,411%) that makes more
cyber threats. There are other countries such as Romania (1,147%), United Kingdom
(1,131%), Ukraine (0,848%), Spain (0,457%) and Ireland (0,387%) which produces
less attacks in comparison with the previous places.
71
In addition to this, Russian Federation (3,362%) performs a large quantity of cyber
threats as well as the rest of Asian countries: China (0,968%) and India (0,877%). In
fourth place, we have Oceania continent, in which Australia (2,005%) and Indonesia
(1,061%) produces the major part of the attacks. South Africa (0,454%) is the
country that generates more attacks in Africa. The next table resumes where the
attacks come from with their respective frequencies:
ni
Where the attack %
Position Country 𝑓/
comes from 𝑛/ = 𝑛/ · 100
𝑁
1 United States 43205 0,56041 56,041
2 Netherlands 5140 0,06667 6,667
3 Russian Federation 2592 0,03362 3,362
4 Germany 2234 0,02898 2,898
5 France 2216 0,02874 2,874
6 Canada 1662 0,02156 2,156
7 Australia 1546 0,02005 2,005
8 Italy 1538 0,01995 1,995
9 Poland 1088 0,01411 1,411
10 Brazil 969 0,01257 1,257
11 Romania 884 0,01147 1,147
12 United Kingdom 872 0,01131 1,131
13 Indonesia 818 0,01061 1,061
14 Turkey 810 0,01051 1,051
15 China 746 0,00968 0,968
16 Bulgaria 732 0,00949 0,949
17 India 676 0,00877 0,877
18 Ukraine 654 0,00848 0,848
19 Singapore 514 0,00667 0,667
20 Hong Kong 508 0,00659 0,659
21 Chile 432 0,00560 0,560
72
22 Vietnam 394 0,00511 0,511
23 Taiwan 372 0,00483 0,483
24 Spain 352 0,00457 0,457
25 South Africa 350 0,00454 0,454
26 Czech Republic 322 0,00418 0,418
27 Republic of Korea 322 0,00418 0,418
28 Sweden 308 0,00400 0,400
29 Ireland 298 0,00387 0,387
30 Switzerland 272 0,00353 0,353
31 Portugal 262 0,00340 0,340
32 Belarus 246 0,00319 0,319
33 Argentina 222 0,00288 0,288
34 Hungary 212 0,00275 0,275
35 Thailand 210 0,00272 0,272
36 Lithuania 198 0,00257 0,257
37 Japan 180 0,00233 0,233
38 Israel 172 0,00223 0,223
39 Bangladesh 166 0,00215 0,215
40 Malaysia 166 0,00215 0,215
41 Georgia 136 0,00176 0,176
42 Serbia 136 0,00176 0,176
43 Finland 124 0,00161 0,161
44 Peru 118 0,00153 0,153
45 Iran 108 0,00140 0,140
46 Latvia 106 0,00137 0,137
47 Denmark 100 0,00130 0,130
48 Iceland 98 0,00127 0,127
49 New Zealand 86 0,00112 0,112
50 Greece 78 0,00101 0,101
51 Slovenia 72 0,00093 0,093
73
52 Kenya 62 0,00080 0,080
53 Austria 62 0,00080 0,080
54 Luxembourg 62 0,00080 0,080
55 Kazakhstan 62 0,00080 0,080
56 Norway 56 0,00073 0,073
57 Croatia 54 0,00070 0,070
58 Mongolia 52 0,00067 0,067
59 Nigeria 50 0,00065 0,065
60 Tanzania 48 0,00062 0,062
61 Colombia 46 0,00060 0,060
62 United Arab
46 0,00060 0,060
Emirates
63 Slovakia 44 0,00057 0,057
64 Belgium 44 0,00057 0,057
65 Panama 44 0,00057 0,057
66 Mexico 38 0,00049 0,049
67 Macedonia 24 0,00031 0,031
68 Estonia 24 0,00031 0,031
69 Mauritius 24 0,00031 0,031
70 Cyprus 22 0,00029 0,029
71 Gambia 18 0,00023 0,023
72 Moldova 18 0,00023 0,023
73 Egypt 16 0,00021 0,021
74 Morocco 14 0,00018 0,018
75 Ecuador 14 0,00018 0,018
76 Bosnia and
12 0,00016 0,016
Herzegovina
77 Costa Rica 10 0,00013 0,013
78 Uruguay 8 0,00010 0,010
79 Saudi Arabia 8 0,00010 0,010
80 Pakistan 6 0,00008 0,008
74
81 Uzbekistan 6 0,00008 0,008
82 Azerbaijan 6 0,00008 0,008
83 El Salvador 6 0,00008 0,008
84 Sri Lanka 6 0,00008 0,008
85 Venezuela 4 0,00005 0,005
86 Albania 4 0,00005 0,005
87 Philippines 4 0,00005 0,005
89 Paraguay 4 0,00005 0,005
90 Palestinian 4 0,00005 0,005
Territory
91 Mali 4 0,00005 0,005
Libyan Arab 0,00005 0,005
92 4
Jamahiriya
93 Iraq 4 0,00005 0,005
94 Seychelles 2 0,00003 0,003
95 French Polynesia 2 0,00003 0,003
96 Barbados 2 0,00003 0,003
97 Belize 2 0,00003 0,003
98 C dIvoire 2 0,00003 0,003
99 Cuba 2 0,00003 0,003
100 Dominican Republic 2 0,00003 0,003
101 Antigua and 2 0,00003 0,003
Barbuda
102 Guatemala 2 0,00003 0,003
103 Honduras 2 0,00003 0,003
104 Senegal 2 0,00003 0,003
105 Nepal 2 0,00003 0,003
106 Sudan 2 0,00003 0,003
107 Puerto Rico 2 0,00003 0,003
108 Kuwait 2 0,00003 0,003
:
𝑓/ = 𝑓6 𝑓7 + ⋯ + 𝑓: = 𝑁 77096 1 100
/;6
75
7.5.2 Pie charts and tables
The another script is used to generate pie charts and its tables. For that, we need to
execute the program in this way: 𝑝𝑦𝑡ℎ𝑜𝑛3 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝐶ℎ𝑎𝑟𝑡. 𝑝𝑦 < 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 >
In order to obtain distinct results, we should indicate to the program the parameter
that we want from the file output (feed provider, feed name, classification taxonomy,
classification type or the target of the attacks). If we introduce a parameter that
doesn’t exist in the event.txt file, the program will give an error to indicate that we
have written a wrong parameter. Moreover, if we don’t write parameter, the
program will give an error indicating that we should introduce a parameter.
ni
Count %
Feed provider 𝑓/
fi 𝑛/ = 𝑛/ · 100
𝑁
Phishtank 72664 0,882 88,2
Malc0de 4578 0,056 5,6
Spamhaus 3418 0,041 4,1
Abuse.ch 1642 0,020 2,0
Malware Domain List 114 0,001 0,1
:
𝑓/ = 𝑓6 𝑓7 + ⋯ + 𝑓: = 𝑁 82416 1 100,0
/;6
76
The image below represents a pie charts using the previous values of each feed
provider
ni
Count %
Feed name 𝑓/
fi 𝑛/ = 𝑛/ · 100
𝑁
77
Abuse.ch Feodo IP 2582 0,031
3,1
𝑓/ = 𝑓6 𝑓7 + ⋯ + 𝑓: = 𝑁 82416 1 100
/;6
The following chart indicates the values that we have obtained from the previous
table:
78
ni
Count %
Taxonomy 𝑓/
fi 𝑛/ = 𝑛/ · 100
𝑁
Fraud 72664 0,882 88,2
Malicious code 8110 0,098 9,8
Abusive content 1642 0,020 2,0
:
𝑓/ = 𝑓6 𝑓7 + ⋯ + 𝑓: = 𝑁 82416 1 100
/;6
As we have seen before, the malicious code can be divided by: virus, Trojan, worm
Spyware, Dialer and Rootkit. The next table represents the count of some of the
types of malicious code that we have obtained. Note that the major part of the
malware identified correspond to Trojans. For that, we should write the following
line in the command prompt.
𝑝𝑦𝑡ℎ𝑜𝑛3 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝐶ℎ𝑎𝑟𝑡. 𝑝𝑦 𝑒𝑣𝑒𝑛𝑡_𝑑𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛. 𝑡𝑒𝑥𝑡
79
Position Name of malware Count
1 Trojan.Ransom 332
2 Trojan 298
3 Script.Explot 238
4 Trojan.Zbot 234
5 Win32/FirseriaInstaller.C 198
6 VBS.Trojan.Downloader 156
7 Gateway to EK 132
8 Directs to exploits 126
9 Fake av 112
10 Trojan.FakeAlert 80
11 Leads to Trojan.Banload 68
12 Leads to exploit at jolygoestobeinvester.ru 68
13 Trojan 64
14 iframe on compromised site leads to EK 58
15 exploit kit 58
16 RFI 56
17 Exploit 56
18 Compromised site directs to exploits 52
19 Compromised site (DHL malspam campaign 48
20 Leads to exploit 46
21 Trojan.FakeFlash 40
22 malware calls home 36
23 Trojan.Downloader 32
24 iFrame.Exploit 32
25 Leads to ransomware 32
26 Trojan.Extension.Exploit 30
27 Win32/Trojan.Spy 30
28 Used by malspam to lead victims to 28
80
Trojan.Banload
29 redirects to exploit kit 28
30 Trojan.Backdoor 26
31 Spyware.Zbot 24
32 Compromised site (Natwest malspam campaign 24
33 IE exploit 24
34 trojan OnlineGames 24
35 compromised site leads to exploit kit 22
36 P2PZeus.WebInject 22
37 trojan downloader 22
38 directs to rogue 22
39 Malvertisin 22
40 trojan Banker 20
41 obfuscated script directs to exploits 20
42 20
VBScript.Drive-b
43 Leads to Trojan.Zbot 20
44 Trojan.Zeus.GameOver 18
45 exploit 18
46 Trojan.Banker 18
47 SpyEye C&C 16
48 Trojan.Zeus.GO 16
49 Ransom WindowsSecurity 16
50 Worm.Autorun 16
We can see the entire document on the remote repository on GitHub through this
direction:
https://github.com/jgfc1/ThesisRepository/blob/master/Pie%20Charts/output_classifi
cation_malware.txt
81
7.5.2.5 Classification type
In this case, it is necessary to type the following parameter:
𝑝𝑦𝑡ℎ𝑜𝑛3 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝐶ℎ𝑎𝑟𝑡. 𝑝𝑦 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛. 𝑡𝑦𝑝𝑒
In the same way, we can classify the cyber threats depending on its type. As we can
observe in the table which is allocated below, the 88.2% of the cyber threats
identified are associated with phishing attacks, the 5.7% is related to malware, the
2.0% is related to spam and finally, the 4.1% is associated with c&c attacks.
ni
Count %
Taxonomy 𝑓/
fi 𝑛/ = 𝑛/ · 100
𝑁
Phishing 72664 0,882 88,2
Malware 4692 0,057 5,7
C&C 3418 0,041 4,1
Spam 1642 0,020 2,0
:
𝑓/ = 𝑓6 𝑓7 + ⋯ + 𝑓: = 𝑁 82416 1 100
/;6
82
7.5.2.6 Target (organization) of the attacks
The parameter that we have to include in this case is:
𝑝𝑦𝑡ℎ𝑜𝑛3 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝐶ℎ𝑎𝑟𝑡. 𝑝𝑦 𝑒𝑣𝑒𝑛𝑡_𝑑𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛. 𝑡𝑎𝑟𝑔𝑒𝑡
The next table represents the organizations that have been suffered the threats that
we have identified in this research. Most of them are important companies which
people are using day by day such as bank web sites (PayPal), social networks
(Facebook), cloud platforms (Dropbox), electronic commerce (eBay), email services…
Note that the attribute “Other” could be a firm which are not listed, i.e. small or
medium companies or organizations.
83
Netflix 84
United Services Automobile Association 70
Steam 70
DHL 60
ASB Bank Limited 54
Wells Fargo 50
Bank of America Corporation 50
ABSA Bank 48
LinkedIn 44
Allegro 36
WhatsApp 34
Orange 34
American Express 32
Cartasi 28
WalMart 26
Blockchain 26
Capitec Bank 24
NatWest Bank 24
Sulake Corporation 22
Caixa 22
Cielo 18
Visa 18
Hotmail 16
PNC Bank 16
Poste Italiane 16
HSBC Group 14
84
Australia and New Zealand Banking 12
Group Limited
US Bank 12
RuneScape 12
Mastercard 10
Citibank 10
Twitter 10
Centurylink 10
Volksbanken Raiffeisenbanken 8
TD Canada Trust 8
Discover Bank 8
MyEtherWallet 6
Capital One 6
Citizens Bank 6
Vodafone 6
TAM Fidelidade 6
Orkut 6
Accurint 6
ING Direct 6
CareerBuilder 6
GitHub 6
American Greetings 6
Lloyds Bank 4
Tesco 4
Key Bank 4
85
Suncorp 4
Westpac 4
Metro Bank 4
CIMB Bank 4
PagSeguro 2
Western Union 2
British Telecom 2
Sky Financial 2
Live 2
Nordea Bank 2
Nets 2
Aetna Health Plans & Dental 2
Coverage
Deutsche Bank 2
Halifax 2
Compass Bank 2
ArenaNet 2
Rackspace 2
BMO Financial 2
Discover Card 2
Craigslist 2
Binance 2
US Airways 2
UniCredit 2
86
World of Warcraft 2
Washington Mutual 2
Wachovia 2
EPPICard 2
American Airlines 2
Groupon 2
Alliance Bank 2
TSB 2
Salesforce 2
ZML 2
Smile Bank 2
Bitfinex 2
N 72662
87
8 CONCLUSION
Internet is present in daily life. In fact, all the information which are related with
people, organizations or companies are stored over the Internet: bank accounts,
financial and health records… In this interconnected world, cyberattacks have
been increased in the recent years due to the attackers are finding new ways to
target networks in order to access, change, destroy, extorting or interrupting digital
pieces of information over the Internet. Cyber Threat Intelligence, which is a field of
Cybersecurity, provides resources over the Internet which gives a list of malicious
software, bad domains or IP’s among others in order to provide to the cyber analysts
a way to know which are the latest attacks which are being producing. Nevertheless,
the way to provide the data is quite heterogeneous because the information is stored
in different digital formats (csv files, html pages, text files…) and structure. In this
context, it is quite useful to develop some correlations between them in order to
extract some data meaning. For that, we have use an open source tool, whose name
is IntelMQ, for collecting and processing external resources. The platform is based in
a graph (botnet) and a nodes (bots) with a relationship between them in order to
process each threat feed.
Once we have executed the whole botnet, we have obtained a file called
“events.txt”, that corresponds to the output of the program, in which the cyber
threats are correlated using specific attributes with concrete values. After that, we
have performed some visualizations using a couple of scripts written in Python such
as a map of the word with points in each country indicating the size of attacks, pie
charts and tables. Therefore, looking at the results obtained, we can conclude that
the major part of the attacks was originated from north America and Europe, where
the most common malware is the Trojan (in fact, we have shown that there were
plenty of types of Trojans). Moreover, since the major part of the data is obtained
from the external resource Phistank, the most common malware that we have
analysed is Phishing and Fraud.
88
This thesis provides an excellent learning opportunity to expand the knowledge of
cybersecurity by using a platform which cyber analysts employs to track cyber
threats from external sources over the Internet. Cyberattacks activity has been
growing over the years and there is no evidence that this tendency will stop in the
future so an experience in the field of cybersecurity will be a great boon in the future.
As future work, there is the possibility to integrate more bots in whole botnet. For
that, collector bots, parser bots and expert bot should be created and configured. For
instance, the integration of Alien Vault to IntelMQ could be a baseline to detect
more types of attacks. Alien Vault is a digital security management platform that
provides unified and coordinated Security Monitoring, Management and Security
Event Management, Intelligence against Continued Security Threats and multiple
security features in a single console.
89
9 DECLARATION
I hereby certify that the material allocated in this thesis, which has been submitted
at Athlone Institute of Technology (Network Management and Cloud Infrastructure),
is entirely my own work and it has not been submitted for any academic assessment.
Future students may read and use this thesis to learn about the topic that I talked
about or future research.
90
10 REFERENCES
[1] Briana Gammons. 6 must-know cybersecurity statistics for 2017 from
https://blog.barkly.com/cyber-security-statistics-2017 [online: accessed 10 January
2018]
[2] Symantec. (2017). Internet Security Threat Report (ISTR) Government, vol. 22.
Retrieved September 17, 2017, from
https://www.symantec.com/content/dam/symantec/docs/reports/gistr22- government-
report.pdf. [online: accessed 22 January 2018]
[6] Anders Flaglien, Katrin Franke and Andre Arnes. Identifying Malware using
Cross-Evidence Correlation from
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.666.4180&rep=rep1&type=
pdf [online: accessed 5 March 2018]
91
[9] European Union Agency for Network and Information Security, from
https://www.enisa.europa.eu/topics/csirt-cert-services/community-projects/existing-
taxonomies [online: accessed 27 April 2018]
[10] Raywood, Dan (April 24, 2015). "HP partner with AlienVault on Cyber Threat-
Sharing Initiative". ITPortal.com. Retrieved November 8, 2015 from
https://www.itproportal.com/2015/04/22/hp-partner-alienvault-cyber-threat-sharing-
initiative/ [online: accessed 24 April 2018]
[11] FireEye. What is Cybersecurity? Protecting your cyber assets and critical data
from https://www.fireeye.com/current-threats/what-is-cyber-security.html [online:
accessed 28 April 2018]
[12] FireEye. Threat Intelligence: against cyber threats, knowledge is power from
https://www.fireeye.com/solutions/cyber-threat-intelligence.html [online: accessed 29
April 2018]
[13] Gary Hayslip. Cyber Threat Intelligence [CTI] from
https://www.csoonline.com/article/3234714/data-protection/cyber-threat-intelligence-
cti-part-1.html [online: accessed 1 May 2018]
[14] Cyberpunk. Automate Incident Handling Process: IntelMQ from
https://n0where.net/automate-incident-handling-process-intelmq [online: accessed 3
May 2018]
92
11 APPENDIX
11.1 CONFIGURATION FILES OF INTELMQ:
11.1.1 Runtime.conf
{
"abusech-domain-parser": {
"description": "Abuse.ch Domain Parser is the bot responsible to parse the
report and sanitize the information.",
"group": "Parser",
"module": "intelmq.bots.parsers.abusech.parser_domain",
"name": "Abuse.ch Domain",
"parameters": {},
"enabled": true,
"run_mode": "continuous"
},
"cymru-whois-expert": {
"description": "Cymry Whois (IP to ASN) is the bot responsible to add network
information to the events (BGP, ASN, AS Name, Country, etc..).",
"group": "Expert",
"module": "intelmq.bots.experts.cymru_whois.expert",
"name": "Cymru Whois",
"parameters": {
"redis_cache_db": 5,
"redis_cache_host": "127.0.0.1",
"redis_cache_password": null,
"redis_cache_port": 6379,
"redis_cache_ttl": 86400
},
"enabled": true,
"run_mode": "continuous"
},
"deduplicator-expert": {
"description": "Deduplicator is the bot responsible for detection and removal
of duplicate messages. Messages get cached for <redis_cache_ttl> seconds. If found in
the cache, it is assumed to be a duplicate.",
"group": "Expert",
"module": "intelmq.bots.experts.deduplicator.expert",
"name": "Deduplicator",
"parameters": {
"filter_keys": "raw,time.observation",
"filter_type": "blacklist",
"redis_cache_db": 6,
"redis_cache_host": "127.0.0.1",
"redis_cache_password": null,
93
"redis_cache_port": 6379,
"redis_cache_ttl": 86400
},
"enabled": true,
"run_mode": "continuous"
},
"file-output": {
"description": "File is the bot responsible to send events to a file.",
"group": "Output",
"module": "intelmq.bots.outputs.file.output",
"name": "File",
"parameters": {
"file": "/opt/intelmq/var/lib/bots/file-output/events.txt",
"hierarchical_output": false
},
"enabled": true,
"run_mode": "continuous"
},
"malc0de-parser": {
"description": "Malc0de Parser is the bot responsible to parse the IP Blacklist
and either Windows Format or Bind Format reports and sanitize the information.",
"group": "Parser",
"module": "intelmq.bots.parsers.malc0de.parser",
"name": "Malc0de",
"parameters": {},
"enabled": true,
"run_mode": "continuous"
},
"malc0de-windows-format-collector": {
"description": "",
"group": "Collector",
"module": "intelmq.bots.collectors.http.collector_http",
"name": "Malc0de Windows Format",
"parameters": {
"feed": "Generic URL Fetcher is the bot responsible to get the report from
an URL.",
"http_password": null,
"http_url": "https://malc0de.com/bl/BOOT",
"http_username": null,
"provider": "Malc0de",
"rate_limit": 10800,
"ssl_client_certificate": null
},
"enabled": true,
"run_mode": "continuous"
},
"malware-domain-list-collector": {
94
"parameters": {
"feed": "Malware Domain List",
"http_url": "http://www.malwaredomainlist.com/mdlcsv.php",
"provider": "Malware Domain List",
"rate_limit": 3600
},
"description": "Malware Domain List Collector is the bot responsible to get the
report from source of information.",
"group": "Collector",
"module": "intelmq.bots.collectors.http.collector_http",
"name": "Malware Domain List",
"enabled": true,
"run_mode": "continuous"
},
"malware-domain-list-parser": {
"description": "Malware Domain List Parser is the bot responsible to parse the
report and sanitize the information.",
"group": "Parser",
"module": "intelmq.bots.parsers.malwaredomainlist.parser",
"name": "Malware Domain List",
"parameters": {},
"enabled": true,
"run_mode": "continuous"
},
"spamhaus-drop-collector": {
"description": "",
"group": "Collector",
"module": "intelmq.bots.collectors.http.collector_http",
"name": "Spamhaus Drop",
"parameters": {
"feed": "Spamhaus Drop",
"http_password": null,
"http_url": "https://www.spamhaus.org/drop/drop.txt",
"http_username": null,
"provider": "Spamhaus",
"rate_limit": 3600,
"ssl_client_certificate": null
},
"enabled": true,
"run_mode": "continuous"
},
"spamhaus-drop-parser": {
"description": "Spamhaus Drop Parser is the bot responsible to parse the DROP,
EDROP, DROPv6, and ASN-DROP reports and sanitize the information.",
"group": "Parser",
"module": "intelmq.bots.parsers.spamhaus.parser_drop",
"name": "Spamhaus Drop",
95
"parameters": {},
"enabled": true,
"run_mode": "continuous"
},
"taxonomy-expert": {
"description": "Taxonomy is the bot responsible to apply the eCSIRT Taxonomy to
all events.",
"group": "Expert",
"module": "intelmq.bots.experts.taxonomy.expert",
"name": "Taxonomy",
"parameters": {},
"enabled": true,
"run_mode": "continuous"
},
"abusech-feodo-ip-collector": {
"parameters": {
"feed": "Abuse.ch Feodo IP",
"provider": "Abuse.ch",
"http_url":
"https://feodotracker.abuse.ch/blocklist/?download=ipblocklist",
"http_url_formatting": false,
"http_username": null,
"http_password": null,
"ssl_client_certificate": null,
"rate_limit": 129600
},
"name": "Generic URL Fetcher",
"group": "Collector",
"module": "intelmq.bots.collectors.http.collector_http",
"description": "Abuse.ch Feodo IP",
"enabled": true,
"run_mode": "continuous"
},
"Abusech-IP-Parser": {
"parameters": {},
"name": "Abuse.ch IP",
"group": "Parser",
"module": "intelmq.bots.parsers.abusech.parser_ip",
"description": "Abuse.ch IP Parser is the bot responsible to parse the report
and sanitize the information.",
"enabled": true,
"run_mode": "continuous"
},
"abusech-zeus-domainblocklist-collector": {
"parameters": {
"feed": "Abuse.ch Zeus Domain Block List",
"provider": "Abuse.ch",
96
"http_url":
"https://zeustracker.abuse.ch/blocklist.php?download=domainblocklist",
"http_url_formatting": false,
"http_username": null,
"http_password": null,
"ssl_client_certificate": null,
"rate_limit": 129600
},
"name": "Generic URL Fetcher",
"group": "Collector",
"module": "intelmq.bots.collectors.http.collector_http",
"description": "Zeus Tracker",
"enabled": true,
"run_mode": "continuous"
},
"abusech-zeus-baddomains-collector": {
"parameters": {
"feed": "Abuse.ch Zeus Bad Domains",
"provider": "Abuse.ch",
"http_url":
"https://zeustracker.abuse.ch/blocklist.php?download=baddomains",
"http_url_formatting": false,
"http_username": null,
"http_password": null,
"ssl_client_certificate": null,
"rate_limit": 129600
},
"name": "Generic URL Fetcher",
"group": "Collector",
"module": "intelmq.bots.collectors.http.collector_http",
"description": "Generic URL Fetcher is the bot responsible to get the report
from an URL.",
"enabled": true,
"run_mode": "continuous"
},
"PhishTank-Parser": {
"parameters": {},
"name": "PhishTank",
"group": "Parser",
"module": "intelmq.bots.parsers.phishtank.parser",
"description": "PhishTank Parser is the bot responsible to parse the report and
sanitize the information.",
"enabled": true,
"run_mode": "continuous"
},
"phishtank-collector": {
"parameters": {
97
"feed": "Phishtank csv",
"provider": "Phishtank ",
"http_url": "http://data.phishtank.com/data/online-valid.csv",
"http_url_formatting": false,
"http_username": null,
"http_password": null,
"ssl_client_certificate": null,
"rate_limit": 129600
},
"name": "Generic URL Fetcher",
"group": "Collector",
"module": "intelmq.bots.collectors.http.collector_http",
"description": "Generic URL Fetcher is the bot responsible to get the report
from an URL.",
"enabled": true,
"run_mode": "continuous"
},
"url2fqdn-expert": {
"parameters": {
"overwrite": false
},
"name": "url2fqdn",
"group": "Expert",
"module": "intelmq.bots.experts.url2fqdn.expert",
"description": "url2fqdn is the bot responsible to parsing the fqdn from the
url.",
"enabled": true,
"run_mode": "continuous"
},
"gethostbyname-1-expert": {
"parameters": {},
"name": "Gethostbyname",
"group": "Expert",
"module": "intelmq.bots.experts.gethostbyname.expert",
"description": "fqdn2ip is the bot responsible to parsing the ip from the
fqdn.",
"enabled": true,
"run_mode": "continuous"
},
"gethostbyname-2-expert": {
"parameters": {},
"name": "Gethostbyname",
"group": "Expert",
"module": "intelmq.bots.experts.gethostbyname.expert",
"description": "fqdn2ip is the bot responsible to parsing the ip from the
fqdn.",
"enabled": true,
98
"run_mode": "continuous"
}
}
11.1.2 Pipeline.conf
{
"Abusech-IP-Parser": {
"source-queue": "Abusech-IP-Parser-queue",
"destination-queues": [
"deduplicator-expert-queue"
]
},
"PhishTank-Parser": {
"source-queue": "PhishTank-Parser-queue",
"destination-queues": [
"deduplicator-expert-queue"
]
},
"abusech-domain-parser": {
"source-queue": "abusech-domain-parser-queue",
"destination-queues": [
"deduplicator-expert-queue"
]
},
"abusech-feodo-ip-collector": {
"destination-queues": [
"Abusech-IP-Parser-queue"
]
},
"abusech-zeus-baddomains-collector": {
"destination-queues": [
"abusech-domain-parser-queue"
]
},
"abusech-zeus-domainblocklist-collector": {
"destination-queues": [
"abusech-domain-parser-queue"
]
},
"cymru-whois-expert": {
"source-queue": "cymru-whois-expert-queue",
"destination-queues": [
"file-output-queue"
]
},
"deduplicator-expert": {
"source-queue": "deduplicator-expert-queue",
"destination-queues": [
"taxonomy-expert-queue"
]
99
},
"file-output": {
"source-queue": "file-output-queue"
},
"gethostbyname-1-expert": {
"source-queue": "gethostbyname-1-expert-queue",
"destination-queues": [
"cymru-whois-expert-queue"
]
},
"gethostbyname-2-expert": {
"source-queue": "gethostbyname-2-expert-queue",
"destination-queues": [
"cymru-whois-expert-queue"
]
},
"malc0de-parser": {
"source-queue": "malc0de-parser-queue",
"destination-queues": [
"deduplicator-expert-queue"
]
},
"malc0de-windows-format-collector": {
"destination-queues": [
"malc0de-parser-queue"
]
},
"malware-domain-list-collector": {
"destination-queues": [
"malware-domain-list-parser-queue"
]
},
"malware-domain-list-parser": {
"source-queue": "malware-domain-list-parser-queue",
"destination-queues": [
"deduplicator-expert-queue"
]
},
"phishtank-collector": {
"destination-queues": [
"PhishTank-Parser-queue"
]
},
"spamhaus-drop-collector": {
"destination-queues": [
"spamhaus-drop-parser-queue"
]
},
"spamhaus-drop-parser": {
"source-queue": "spamhaus-drop-parser-queue",
"destination-queues": [
"deduplicator-expert-queue"
]
},
100
"taxonomy-expert": {
"source-queue": "taxonomy-expert-queue",
"destination-queues": [
"url2fqdn-expert-queue"
]
},
"url2fqdn-expert": {
"source-queue": "url2fqdn-expert-queue",
"destination-queues": [
"gethostbyname-1-expert-queue",
"gethostbyname-2-expert-queue"
]
}
}
11.2 GENERATEMAP.PY
"""
Author: "Javier Gombao Fernandez-Calvillo"
eMail: a00248414@student.ait.ie
College: Athlone Institute of Technology (AIT)
Subject: Final Project
File: generateMap.py
The purpose of this script is to obtain all the names of the countries which
IntelMQ has detected as a cyber threat
Variables (lists):
nameCountries: it provides the name if the country. Examples: Ireland, Spain,
England, France...
values: it defines the number of times which the country apperars in the result
file given by IntelMQ
latitude: it defines the latitude value
longitude: it defines the longitude value
"""
# Import modules
import folium
import pandas as pd
import matplotlib.pyplot as plt
"""This is a struct which defines the code of the country and its ocurrences"""
class Country():
def __init__(self, country, count):
self.country = country
self.count = count
101
self.latitude = latitude
self.longitude = longitude
class Map(object):
"""It gets the file with the name of the countries and their longitude and
latitude (geolocation)"""
def getFileCodeISOCountries(self):
return self._fileCodeISOCountries
"""This function will print the name of the countries and its ocurrence"""
def printCountriesCount(self, countriesCountList):
for i in countriesCountList:
print(i.country, i.count)
"""This function will count the ocurrences that the program read from the file
events.txt"""
def getOcurrencesCountry(self):
search_name = "source.geolocation.cc"
country_list = []
try:
with open(self.getFileEvents()) as attacks:
for attack in attacks:
# We comprobate that there is a geolocation available in the line
if search_name in attack:
attributes = attack.split(", ")
i = 0
while i < len(attributes):
g = attributes[i].split(": ")
#We are looking for the attribute "source.location.cc in
each line:
for a in g:
if "\"source.geolocation.cc\"" == a:
"""We obtain the code of the country: """
temp = len(g[1])
102
s1 = g[1][:temp - 1]
s2 = s1[1:]
"""print(s2)"""
country_list.append(s2)
i += 1
distinct_countries = self.obtainDistinctCountries(country_list)
#We insert the countries and its ocurrence in the class 'Struct'
for c in distinct_countries:
countries.append(Country(c, self.countCountries(c, country_list)))
except Exception:
print("Error: File not found.")
"""This function makes a data frame with points to show on the map"""
def loadData(self):
data = pd.DataFrame({
'name':nameCountries,
'nº attacks':values,
'lat':latitude,
'lon':longitude
})
data
return data
"""It generates the map itself with the bubblets around it"""
def createMap(self):
# Sort the dataframe’s rows by reports, in descending order:
data = self.loadData().sort_values(by='nº attacks', ascending=0)
#In the file output we can see the country, the ocurrences and the values of
latitude and longitude
file = open(FILEOUTPUT,"w")
file.write(str(data))
# Save it as html
m.save('mymap.html')
"""It obtains the longitude and latitude using the file iso3166-1-alpha-2.txt"""
def obtainLongitudeLatitude(self):
try:
with open(self.getFileCodeISOCountries()) as lines:
for line in lines:
attributes = line.split(",")
103
temp = len(attributes[0])
s1 = attributes[0][:temp - 1]
code = s1[1:]
lon = attributes[1]
lat = attributes[2]
temp_aux = len(attributes[3])
a = attributes[3][:temp_aux - 2]
name = a[1:]
except Exception:
print("Error: File not found.")
s = Map("events.txt", "iso3166-1-alpha-2.txt")
s.getOcurrencesCountry()
s.obtainLongitudeLatitude()
s.createMap()
11.3 ISO3166-1-ALPHA-2.TXT
'AF',33.93911,67.709953,'Afghanistan'
'AX',37.0625,-95.677068,'Ãland Islands'
'AL',41.153332,20.168331,'Albania'
'DZ',28.033886,1.659626,'Algeria'
'AS',-14.270972,-170.132217,'American Samoa'
'AD',42.546245,1.601554,'Andorra'
'AO',-11.202692,17.873887,'Angola'
'AI',18.220554,-63.068615,'Anguilla'
'AQ',-75.250973,-0.071389,'Antarctica'
'AG',17.060816,-61.796428,'Antigua and Barbuda'
'AR',-38.416097,-63.616672,'Argentina'
'AM',40.069099,45.038189,'Armenia'
'AW',12.52111,-69.968338,'Aruba'
'AU',-25.274398,133.775136,'Australia'
'AT',47.516231,14.550072,'Austria'
'AZ',40.143105,47.576927,'Azerbaijan'
'BS',25.03428,-77.39628,'Bahamas'
'BH',25.930414,50.637772,'Bahrain'
'BD',23.684994,90.356331,'Bangladesh'
'BB',13.193887,-59.543198,'Barbados'
'BY',53.709807,27.953389,'Belarus'
'BE',50.503887,4.469936,'Belgium'
'BZ',17.189877,-88.49765,'Belize'
'BJ',9.30769,2.315834,'Benin'
104
'BM',32.321384,-64.75737,'Bermuda'
'BT',27.514162,90.433601,'Bhutan'
'BO',-16.290154,-63.588653,'Bolivia'
'BA',43.915886,17.679076,'Bosnia and Herzegovina'
'BW',-22.328474,24.684866,'Botswana'
'BV',-54.423199,3.413194,'Bouvet Island'
'BR',-14.235004,-51.92528,'Brazil'
'IO',-6.343194,71.876519,'British Indian Ocean Territory'
'BN',4.535277,114.727669,'Brunei Darussalam'
'BG',42.733883,25.48583,'Bulgaria'
'BF',12.238333,-1.561593,'Burkina Faso'
'BI',-3.373056,29.918886,'Burundi'
'KH',12.565679,104.990963,'Cambodia'
'CM',7.369722,12.354722,'Cameroon'
'CA',56.130366,-106.346771,'Canada'
'CV',16.002082,-24.013197,'Cape Verde'
'KY',19.513469,-80.566956,'Cayman Islands'
'CF',6.611111,20.939444,'Central African Republic'
'TD',15.454166,18.732207,'Chad'
'CL',-35.675147,-71.542969,'Chile'
'CN',35.86166,104.195397,'China'
'CX',-10.447525,105.690449,'Christmas Island'
'CC',37.0625,-95.677068,'Cocos (Keeling) Islands'
'CO',4.570868,-74.297333,'Colombia'
'KM',-11.875001,43.872219,'Comoros'
'CG',-0.228021,15.827659,'Congo'
'CD',-0.228021,15.827659,'The Democratic Republic of Congo '
'CK',-21.236736,-159.777671,'Cook Islands'
'CR',9.748917,-83.753428,'Costa Rica'
'CI',7.539989,-5.54708,'C dIvoire'
'HR',45.1,15.2,'Croatia'
'CU',21.521757,-77.781167,'Cuba'
'CY',35.126413,33.429859,'Cyprus'
'CZ',49.817492,15.472962,'Czech Republic'
'DK',56.26392,9.501785,'Denmark'
'DJ',11.825138,42.590275,'Djibouti'
'DM',15.414999,-61.370976,'Dominica'
'DO',18.735693,-70.162651,'Dominican Republic'
'EC',-1.831239,-78.183406,'Ecuador'
'EG',26.820553,30.802498,'Egypt'
'SV',13.794185,-88.89653,'El Salvador'
'GQ',1.650801,10.267895,'Equatorial Guinea'
'ER',15.179384,39.782334,'Eritrea'
'EE',58.595272,25.013607,'Estonia'
'ET',9.145,40.489673,'Ethiopia'
'FK',-51.796253,-59.523613,'Falkland Islands (Malvinas)'
'FO',61.892635,-6.911806,'Faroe Islands'
'FJ',-16.578193,179.414413,'Fiji'
'FI',61.92411,25.748151,'Finland'
'FR',46.227638,2.213749,'France'
'GF',3.933889,-53.125782,'French Guiana'
105
'PF',-17.679742,-149.406843,'French Polynesia'
'TF',37.0625,-95.677068,'French Southern Territories'
'GA',-0.803689,11.609444,'Gabon'
'GM',13.443182,-15.310139,'Gambia'
'GE',32.157435,-82.907123,'Georgia'
'DE',51.165691,10.451526,'Germany'
'GH',7.946527,-1.023194,'Ghana'
'GI',36.137741,-5.345374,'Gibraltar'
'GR',39.074208,21.824312,'Greece'
'GL',71.706936,-42.604303,'Greenland'
'GD',12.262776,-61.604171,'Grenada'
'GP',16.995971,-62.067641,'Guadeloupe'
'GU',13.444304,144.793731,'Guam'
'GT',15.783471,-90.230759,'Guatemala'
'GG',49.465691,-2.585278,'Guernsey'
'GN',9.945587,-9.696645,'Guinea'
'GW',11.803749,-15.180413,'Guinea-Bissau'
'GY',4.860416,-58.93018,'Guyana'
'HT',18.971187,-72.285215,'Haiti'
'HM',-53.08181,73.504158,'Heard Island and McDonald Islands'
'VA',37.0625,-95.677068,'Holy See (Vatican City State)'
'HN',15.199999,-86.241905,'Honduras'
'HK',22.396428,114.109497,'Hong Kong'
'HU',47.162494,19.503304,'Hungary'
'IS',64.963051,-19.020835,'Iceland'
'IN',20.593684,78.96288,'India'
'ID',-0.789275,113.921327,'Indonesia'
'IR',32.427908,53.688046,'Iran'
'IQ',33.223191,43.679291,'Iraq'
'IE',53.41291,-8.24389,'Ireland'
'IM',54.236107,-4.548056,'Isle of Man'
'IL',31.046051,34.851612,'Israel'
'IT',41.87194,12.56738,'Italy'
'JM',18.109581,-77.297508,'Jamaica'
'JP',36.204824,138.252924,'Japan'
'JE',49.214439,-2.13125,'Jersey'
'JO',30.585164,36.238414,'Jordan'
'KZ',48.019573,66.923684,'Kazakhstan'
'KE',-0.023559,37.906193,'Kenya'
'KI',-3.370417,-168.734039,'Kiribati'
'KP',35.907757,127.766922,'Democratic People Republic of Korea'
'KR',35.907757,127.766922,'Republic of Korea'
'KW',29.31166,47.481766,'Kuwait'
'KG',41.20438,74.766098,'Kyrgyzstan'
'LA',19.85627,102.495496,'Lao People Democratic Republic'
'LV',56.879635,24.603189,'Latvia'
'LB',33.854721,35.862285,'Lebanon'
'LS',-29.609988,28.233608,'Lesotho'
'LR',6.428055,-9.429499,'Liberia'
'LY',37.0625,-95.677068,'Libyan Arab Jamahiriya'
'LI',47.166,9.555373,'Liechtenstein'
106
'LT',55.169438,23.881275,'Lithuania'
'LU',49.815273,6.129583,'Luxembourg'
'MO',22.198745,113.543873,'Macao'
'MK',41.608635,21.745275,'Macedonia'
'MG',-18.766947,46.869107,'Madagascar'
'MW',-13.254308,34.301525,'Malawi'
'MY',4.210484,101.975766,'Malaysia'
'MV',3.202778,73.22068,'Maldives'
'ML',17.570692,-3.996166,'Mali'
'MT',35.937496,14.375416,'Malta'
'MH',7.131474,171.184478,'Marshall Islands'
'MQ',14.641528,-61.024174,'Martinique'
'MR',21.00789,-10.940835,'Mauritania'
'MU',-20.348404,57.552152,'Mauritius'
'YT',-12.8275,45.166244,'Mayotte'
'MX',23.634501,-102.552784,'Mexico'
'FM',7.425554,150.550812,'Micronesia'
'MD',47.411631,28.369885,'Moldova, Republic of'
'MC',43.750298,7.412841,'Monaco'
'MN',46.862496,103.846656,'Mongolia'
'ME',42.708678,19.37439,'Montenegro'
'MS',16.742498,-62.187366,'Montserrat'
'MA',31.791702,-7.09262,'Morocco'
'MZ',-18.665695,35.529562,'Mozambique'
'MM',21.913965,95.956223,'Myanmar'
'NA',-22.95764,18.49041,'Namibia'
'NR',-0.522778,166.931503,'Nauru'
'NP',28.394857,84.124008,'Nepal'
'NL',52.132633,5.291266,'Netherlands'
'AN',12.226079,-69.060087,'Netherlands Antilles'
'NC',-20.904305,165.618042,'New Caledonia'
'NZ',-40.900557,174.885971,'New Zealand'
'NI',12.865416,-85.207229,'Nicaragua'
'NE',17.607789,8.081666,'Niger'
'NG',9.081999,8.675277,'Nigeria'
'NU',-19.054445,-169.867233,'Niue'
'NF',-29.040835,167.954712,'Norfolk Island'
'MP',17.33083,145.38469,'Northern Mariana Islands'
'NO',60.472024,8.468946,'Norway'
'OM',21.512583,55.923255,'Oman'
'PK',30.375321,69.345116,'Pakistan'
'PW',7.51498,134.58252,'Palau'
'PS',42.094445,17.266614,'Palestinian Territory'
'PA',8.537981,-80.782127,'Panama'
'PG',-6.314993,143.95555,'Papua New Guinea'
'PY',-23.442503,-58.443832,'Paraguay'
'PE',-9.189967,-75.015152,'Peru'
'PH',12.879721,121.774017,'Philippines'
'PN',-24.703615,-127.439308,'Pitcairn'
'PL',51.919438,19.145136,'Poland'
'PT',39.399872,-8.224454,'Portugal'
107
'PR',18.220833,-66.590149,'Puerto Rico'
'QA',25.354826,51.183884,'Qatar'
'RE',-21.115141,55.536384,'Réunion'
'RO',45.943161,24.96676,'Romania'
'RU',61.52401,105.318756,'Russian Federation'
'RW',-1.940278,29.873888,'Rwanda'
'BL',37.0625,-95.677068,'Saint Bartholemy'
'SH',-24.143474,-10.030696,'Saint Helena, Ascension and Tristan da Cunha'
'KN',17.357822,-62.782998,'Saint Kitts and Nevis'
'LC',13.909444,-60.978893,'Saint Lucia'
'MF',43.589046,5.885031,'Saint Martin (French part)'
'PM',46.941936,-56.27111,'Saint Pierre and Miquelon'
'VC',12.984305,-61.287228,'Saint Vincent and the Grenadines'
'WS',-13.759029,-172.104629,'Samoa'
'SM',43.94236,12.457777,'San Marino'
'ST',0.18636,6.613081,'Sao Tome and Principe'
'SA',23.885942,45.079162,'Saudi Arabia'
'SN',14.497401,-14.452362,'Senegal'
'RS',44.016521,21.005859,'Serbia'
'SC',-4.679574,55.491977,'Seychelles'
'SL',8.460555,-11.779889,'Sierra Leone'
'SG',1.352083,103.819836,'Singapore'
'SK',48.669026,19.699024,'Slovakia'
'SI',46.151241,14.995463,'Slovenia'
'SB',-9.64571,160.156194,'Solomon Islands'
'SO',5.152149,46.199616,'Somalia'
'ZA',-30.559482,22.937506,'South Africa'
'GS',-54.429579,-36.587909,'South Georgia and the South Sandwich Islands'
'ES',40.463667,-3.74922,'Spain'
'LK',7.873054,80.771797,'Sri Lanka'
'SD',12.862807,30.217636,'Sudan'
'SR',3.919305,-56.027783,'Suriname'
'SJ',77.553604,23.670272,'Svalbard and Jan Mayen'
'SZ',-26.522503,31.465866,'Swaziland'
'SE',60.128161,18.643501,'Sweden'
'CH',46.818188,8.227512,'Switzerland'
'SY',34.802075,38.996815,'Syrian Arab Republic'
'TW',23.69781,120.960515,'Taiwan'
'TJ',38.861034,71.276093,'Tajikistan'
'TZ',-6.369028,34.888822,'Tanzania, United Republic of'
'TH',15.870032,100.992541,'Thailand'
'TL',-8.874217,125.727539,'Timor-Leste'
'TG',8.619543,0.824782,'Togo'
'TK',-8.967363,-171.855881,'Tokelau'
'TO',-21.178986,-175.198242,'Tonga'
'TT',10.691803,-61.222503,'Trinidad and Tobago'
'TN',33.886917,9.537499,'Tunisia'
'TR',38.963745,35.243322,'Turkey'
'TM',38.969719,59.556278,'Turkmenistan'
'TC',21.694025,-71.797928,'Turks and Caicos Islands'
'TV',-7.109535,177.64933,'Tuvalu'
108
'UG',1.373333,32.290275,'Uganda'
'UA',48.379433,31.16558,'Ukraine'
'AE',23.424076,53.847818,'United Arab Emirates'
'GB',55.378051,-3.435973,'United Kingdom'
'US',37.09024,-95.712891,'United States'
'UM',24.747346,-167.594906,'United States Minor Outlying Islands'
'UY',-32.522779,-55.765835,'Uruguay'
'UZ',41.377491,64.585262,'Uzbekistan'
'VU',-15.376706,166.959158,'Vanuatu'
'VE',6.42375,-66.58973,'Venezuela'
'VN',14.058324,108.277199,'VietNam'
'VI',18.335765,-64.896335,'Virgin Islands'
'WF',-13.768752,-177.156097,'Wallis and Futuna'
'EH',24.215527,-12.885834,'Western Sahara'
'YE',15.552727,48.516388,'Yemen'
'ZM',-13.133897,27.849332,'Zambia'
'ZW',-19.015438,29.154857,'Zimbabwe'
11.4 GENERATECHART.PY
"""
Author: "Javier Gombao Fernandez-Calvillo"
eMail: a00248414@student.ait.ie
College: Athlone Institute of Technology (AIT)
Subject: Final Project
File: generateMap.py
The purpose of this code is to obtain graphics in order to represents the data of
IntelMQ:
nameCountries: it provides the name if the country. Examples: Ireland, Spain,
England, France...
values: it defines the number of times which the country apperars in the result
file given by IntelMQ
latitude: it defines the latitude value
longitude: it defines the longitude value
"""
# Import modules
import sys
import matplotlib.pyplot as plt
import pandas as pd
class Struct():
def __init__(self, taxonomy, count):
self.taxonomy = taxonomy
self.count = count
def getTaxonomy(self):
109
return self.taxonomy
def getCount(self):
return self.count
class generateChart(object):
"""Constructor of the function"""
"""This function will print the name of the countries and its ocurrence"""
def printTaxonomy(self, classificationTaxonomyList):
for i in classificationTaxonomyList:
print(i.taxonomy, i.count)
"""This function will count the ocurrences that the program read from the file
events.txt"""
110
elif "fraud" in s2:
taxonomy_list.append("fraud")
elif "abusive content" in s2:
taxonomy_list.append("abusive content")
else:
taxonomy_list.append(s2)
i += 1
distinct_taxonomy = self.obtainDistinctTaxonomy(taxonomy_list)
# We insert the countries and its ocurrence in the class 'Struct'
for c in distinct_taxonomy:
classificationTaxonomyCount.append(Struct(c, self.countTaxonomy(c,
taxonomy_list)))
if not classificationTaxonomyCount:
print("No search matches")
else:
file = open(FILEOUTPUT, "w")
data = self.loadData().sort_values(by='Count', ascending=0)
print(str(data))
file.write(str(data))
except Exception:
print("Error: File not found.")
def loadData(self):
taxonomy = []
count = []
for i in classificationTaxonomyCount:
taxonomy.append(i.taxonomy)
count.append(i.count)
data = pd.DataFrame({
'Taxonomy': taxonomy,
'Count': count
})
return data
"""It generates the graph itself with the bubblets around it"""
def createChart(self):
if classificationTaxonomyCount:
x = []
labels = []
for i in classificationTaxonomyCount:
x.append(i.count)
labels.append(i.taxonomy)
plt.pie(x, labels=labels, autopct='%1.1f%%')
plt.axis('equal')
plt.show()
s = generateChart("events.txt")
s.getOcurrences(str(sys.argv[1]))
s.createChart()
111
Institiúid Teicneolaíochta Bhaile Átha Luain
Ireland, Éire
Athlone
May 18
112
113