Académique Documents
Professionnel Documents
Culture Documents
GRUNDNIV, 15 HP
STOCKHOLM, SVERIGE 2017
Alternative Information
Gathering on Mobile Devices
EDIN JAKUPOVIC
KTH
SKOLAN FR INFORMATIONS- OCH KOMMUNIKATIONSTEKNIK
Abstract
Searching and gathering information about specific topics is a time wasting, but
vital practise. With the continuous growth and surpassing of desktop devices, the
mobile market is becoming a more important area to consider. Due to the porta-
bility of mobile devices, certain tasks are more difficult to perform, compared to
on a desktop device. Searching for information online is generally slower on mobile
devices than on desktop devices, even though the majority of searches are performed
on mobile devices.
The largest challenges with searching for information online using mobile devices,
are the smaller screen sizes, and the time spent jumping between sources and search
results in a browser. These challenges could be solved by using an application that
focuses on the relevancy of search results, summarizes the content of them, and
presents them on a single screen.
The aim of this study was to find an alternative data gathering method with a
faster and simpler searching experience. This data gathering method was able to
quickly find and gather data requested through a search term by a user. The data
was then analyzed and presented to the user in a summarized form, to eliminate the
need to visit the source of the content.
A survey was performed by having a smaller target group of users answer a question-
naire. The results showed that the method was quick, results were often relevant,
and the summaries reduced the need to visit the source page. But while the method
had potential for future development, it is hindered by ethical issues related to the
use of web scrapers.
3
Abstrakt
Sokning och insamling av information om specifika amnen ar en tidskravande, men
nodvandig praxis. Med den kontinuerliga tillvaxten som gatt forbi stationara en-
heters andel, blir mobilmarknaden ett viktigt omrade att overvaga. Med tanke pa
rorligheten av barbara enheter, sa blir vissa uppgifter svarare att utfora, jamfort
med pa stationara enheter. Att soka efter information pa Internet ar generellt
langsammare pa mobila enheter an pa stationara.
De storsta utmaningarna med att soka efter information pa Internet med mobila
enheter, ar de mindre skarmstorlekarna, och tiden spenderad pa att ta sig mel-
lan kallor och sokresultat i en webblasare. Dessa utmaningar kan losas genom att
anvanda en applikation som fokuserar pa relevanta sokresultat och sammanfattar
innehallet av dem, samt presenterar dem pa en enda vy.
Syftet med denna studie ar att hitta en alternativ datainsamlingsmetod for att
skapa en snabbare och enklare sokupplevelse. Denna datainsamlingsmetod kom-
mer snabbt att kunna hitta och samla in data som begarts via en sokterm av en
anvandare. Darefter analyseras och presenteras data for anvandaren i en samman-
fattad form for att eliminera behovet av att besoka innehallets kalla.
4
Acknowledgements
We would like to thank our advisers Fadil Galjic and Leif Lindback at the Royal
Institute of Technology. The feedback and help we received during this project
proved invaluable for this thesis.
5
Contents
1 Introduction 11
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.6 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Theoretical Background 15
2.1 Web Search engines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.1 Web Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.2 Web Scraping . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Asynchronous Programming . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.1 Concurrent Programming . . . . . . . . . . . . . . . . . . . . 16
2.2.2 Multithreaded Android Programming . . . . . . . . . . . . . . 16
2.2.3 AsyncTask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Managing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.1 SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.2 PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 User Interface Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.1 Colour Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.2 User Interactivity . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 Text Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5.1 Natural Language Processing . . . . . . . . . . . . . . . . . . 19
2.5.2 Automatic Summarization . . . . . . . . . . . . . . . . . . . . 20
2.5.3 Generic Summarization . . . . . . . . . . . . . . . . . . . . . . 20
2.6 Web Browsers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.6.1 HTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.6.2 CSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.7.1 Web Scraping . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.7.2 Summarization . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.7.3 Similar Applications . . . . . . . . . . . . . . . . . . . . . . . 22
3 Methods 25
3.1 Research Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.1 Research Methods . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.2 Research Process . . . . . . . . . . . . . . . . . . . . . . . . . 26
7
3.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.1 Literature Study . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.2 Interview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 Design and Implementation of Prototype . . . . . . . . . . . . . . . . 28
3.3.1 Design of Prototype . . . . . . . . . . . . . . . . . . . . . . . 29
3.3.2 Implementation of Prototype . . . . . . . . . . . . . . . . . . 29
3.3.3 Development Environment . . . . . . . . . . . . . . . . . . . . 30
3.4 Evaluation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4.1 Formative Evaluation . . . . . . . . . . . . . . . . . . . . . . . 30
3.4.2 Heuristic Evaluation . . . . . . . . . . . . . . . . . . . . . . . 30
3.4.3 Summative Evaluation . . . . . . . . . . . . . . . . . . . . . . 31
3.5 Evaluating Performance . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5.1 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . 31
3.5.2 Methods of Evaluation . . . . . . . . . . . . . . . . . . . . . . 32
8
6.3.1 How Relevant were the Summaries? . . . . . . . . . . . . . . . 54
6.3.2 Did the Swipe Functionality Positively Impact the Experience. 54
7 Discussion 55
7.1 Methodology and Consequences of the Study . . . . . . . . . . . . . . 55
7.1.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.1.2 Consequences of the Study . . . . . . . . . . . . . . . . . . . . 56
7.2 Problem Statement Revisited . . . . . . . . . . . . . . . . . . . . . . 57
7.2.1 Design Decisions . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.3 Ethical Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.3.1 Lost Clicks and Ad Revenue . . . . . . . . . . . . . . . . . . . 59
7.3.2 Information and Copyright issues . . . . . . . . . . . . . . . . 59
7.3.3 Anti web scraping Industry . . . . . . . . . . . . . . . . . . . 59
7.4 Sustainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7.4.1 Effect on Environment . . . . . . . . . . . . . . . . . . . . . . 60
7.4.2 Economical Sustainability . . . . . . . . . . . . . . . . . . . . 60
8 Conclusions 61
8.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
8.2 Future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
9
Chapter 1
Introduction
With the exponential growth of online data[1] accessed through mobile devices, it is
becoming more difficult to search and find desired information about a topic. Time
is often wasted sifting through data that is either irrelevant or duplicate information
of already collected data. Search engines today rely on each individual user to sift
through the found links in order to get to the desired information. The time and
number of page traversals it takes to find the desired data could be reduced, by hav-
ing the search application do the work of finding and presenting the information.
The objective of such a system would be to reduce the bandwidth, and the time
spent searching for relevant information.
Improving on the current methods for collecting data requires the information
searched for to be presented faster, while maintaining relevance to the desired topic.
Improving how data is collected in a way that benefits the user over traditional
methods introduces the concern of presentation. How should the data be presented
to the user in a way that both saves time and helps them find the desired informa-
tion? This thesis presents the task of developing a information gathering method
for Android devices, which finds and presents relevant data to the user, and explores
how to apply certain methods in Android application development. The rest of this
chapter introduces the specific problems that defines and motivates the focus and
purpose of this thesis.
1.1 Background
Finding and gathering data online is mostly done through search engines, such as
Google and Bing. The companies that offer these search services use programmable
bots, known as web crawlers that traverse the World Wide Web, and create indexes
for each site they gain access to.
The information gathered by the web crawler is then used to present the searcher
with links to websites that are most relevant. Presenting the most relevant sites
first is done by analysing the page, using many types of questions to determine its
relevancy. Web crawlers can also be used to fetch specific data from a web page,
and are then referred to as a web scrapers.
11
1.2 Problem
There is a need to reduce the work required by a user to gather data on a specific
subject on mobile devices. Different options must be explored concerning the gath-
ering of data without the use of traditional search engines. One option that could
be viable is using web scrapers to scrape the web for data, and analyze what content
best characterizes the desired data. The data then has to be presented to the user
in a coherent manner.
All this is needed to produce an alternative to the desktop friendly search engines
that are more difficult to use on mobile devices. This thesis explores ways that web
crawlers and web scrapers can be used and discusses how to implement them in a
smart environment in the form of an Android application. Since the data collected
by the web scraper has to be processed by the application, the thesis will also discuss
methods of storing the data and processing it.
Another problem that arises is the issue of presenting the collected data to a user
in a clear way. As we try to improve upon search engines, there must be a thought
out design plan when developing for the Android platform. The data presented to
the user must not only be simple to read and understand, but also summarize the
content without leaving important details out. This means there is a problem with
both the technical and aesthetic part of presenting data.
The task of improving upon existing methods for gathering data is a difficult one
for many reasons. A successful implementation of a smart information gathering
tool would need to reduce search time and bandwidth. While looking up a short de-
scription or a wanted link is easy to do in your smart devices browser using existing
search engines, gathering data from several sources becomes more difficult the more
data one needs on the subject.
In which way can a web scraper be used to collect relevant data on a subject?
How can the collected data be stored and analyzed?
In which way can an Android application use a web scraper for data gathering?
How can the collected data be presented to promote easy access to the desired
information?
1.4 Purpose
This thesis aims to find a search solution for mobile devices that reduces the band-
width and time used for finding relevant information for the user. The experiences
from this study could also aspire to lay a foundation for other people who wish
to develop Android applications that make use of multithreading, summarization
methods and databases.
12
Android developers that want to gather data from the web using their software,
can use this thesis to determine if web scrapers are a viable option to accomplish
that. There are also different problems that arise when developing for the Android
OS, concerning data gathering and presentation. These problems include issues such
as how to find, store and analyze data using web scrapers. Further issues that arise
are the user friendliness of an application.
1.5 Delimitations
Creating an application of this type can range from hundreds of lines of code to
millions with varying complexity. Since the goal is to return data which can vary
from hundreds of different file types and extensions, the decision was made to limit
it to only gather raw text. Local and server side caching was also excluded from this
project due to uncertainties regarding legal aspects. Furthermore, the UI design of
the application was kept simple, with a main focus on functionality.
In chapter 3 our research strategies and methods are presented and briefly
discussed. The chapter gives an overview of which different research strategies
were chosen and why.
Chapter 4 covers the challenges and possibilities that arise when performing a
web search using a mobile device.
Chapter 6 presents the results gathered from user feedback received through
questionnaires, and the statistical results generated from the data contained
in the database.
Chapter 7 discusses the design decisions made when implementing the applica-
tion and what motivated these decisions. The problem statement is revisited
and reflected upon. Furthermore, ethical aspects of the thesis is discussed.
Chapter 8 ends the thesis with conclusions, future uses and possible future
research within the thesis topic.
13
Chapter 2
Theoretical Background
Web indexing refers to various methods of indexing either a set of web pages or
the whole Internet. Indexing is achieved using web crawlers that recursively visit
each link on a web page. When a search engine finds a website, it takes a snapshot
of the content of the website and saves it in a database. When the search engine has
a websites contents, it can quickly match the website with a users search query.
15
Because the content and structure of websites varies, each website requires a dif-
ferent solution for fetching content. The desired data is found by identifying the
elements or attributes where the data resides. Web scrapers can be used with the
addition of web crawlers, to gather information throughout many links.
CPU Cores
CPUs found in mobile and desktop devices usually have several cores of execution.
A core can only perform a single instruction at a time, while maintaining a pointer
to the next instruction and a small memory known as a registry. By splitting up the
workload between the CPUs cores parallelism can be achieved, which can provide a
speedup to a degree[5].
Threads
A thread of execution is a sequence of instructions which a CPU core can perform[6].
Threads are spawned by processes, which are programs running on a device. Threads
spawned by processes contain instructions that can be executed independently of
other code, and does not need to be ran sequentially.
A single core can have several threads running at the same time, but are not ran in
parallel. By switching between running different threads on a single CPU core, the
CPU can provide the illusion of concurrency. This prevents a process from blocking
other operations and thus making the program unresponsive.
2.2.3 AsyncTask
AsyncTask is a class in the Android OS package, which provides a simple way of
performing background operations and handling the result on the UI thread[8]. The
16
UI thread is the main thread of an Android Application, which updates the graphical
interface. Blocking the UI thread prevents the application from rerendering the
screen, and thus gives the impression that the application is frozen. By handling
tasks on a separate thread using the AsyncTask class, the UI thread is not blocked.
Unlike thread handlers such as Executors, AsyncTasks are made for shorter, less
CPU intensive operations.
2.3.1 SQL
SQL stands for Structured Query Language and is a programming language created
by IBM in the 1970s to help developers manage databases easier[9]. SQL being
a query language, means that users can create queries that holds the information
needed for the DBMS to accomplish a specific task on the database. While there
were many query languages created, SQL became the most popular and is the most
used query language today. When a user wants to manage their database, i.e. adding
an entry to a table, a query has to be created and handled by the DBMS.
2.3.2 PHP
PHP is a scripting language that is primarily used on web servers. It is often used
in web development to provide interaction between a client and data stored on a
server. PHP code can also be embedded directly into HTML documents to perform
various functions, such as generating dynamic content. PHP code that is embedded
in HTML is executed on the server and the generated HTML is sent to the client[10].
17
2.4 User Interface Design
Designing a mobile application provides certain challenges not found on a desktop
device. The smaller screen and touch controls require extra care to ensure the
application is easy to use.
Analogous colours: Colours that lie next to each other in the colour wheel, are
often found in nature and are considered harmonious and pleasing to the eye[13].
Because analogous colours dont create a high contrast, they are commonly used for
deciding the overall colour theme of a design.
Complementary colours: Colours that lie on the opposite side of the colour
wheel create a high contrast, and are commonly used when something needs to
stand out. Unlike analogous colours, complementary colours can be quite jarring
and should thus not be used for the overall design colour palette.
18
2.4.2 User Interactivity
Due to the size and mobility, mobile devices have vastly different interactions than
desktop devices and thus need to be designed accordingly. Contrary to desktop
devices, there are many ways of holding and interacting with a touchscreen. Ac-
counting for all kinds of device requires the design to be responsive and simple.
Accomplishing this requires the following certain principles.[14]
Visibility: Everything the user needs to navigate and use the application
should be available without distractions. Navigation should always be pre-
sented in a way thats clear and natural. The user should never have to guess
or spend large amounts of time navigating between pages.
Feedback: The design should never have the user guessing what is happening.
The state and condition of an application should always be visible, so the user
does not think the application has frozen when its loading.
When designing a UI, placement of objects needs extra consideration. Because the
interface is interacted by trough touch controls, parts of the screen will be covered
by fingers.
19
2.5.2 Automatic Summarization
The process of summarizing texts using software is an applied method of NLP known
as automatic summarization. The goal of summarization algorithms is often to
generate a summary from a non predefined text[15]. There are many varied methods
of achieving this, but they all rely on identifying a list of keywords that define the
topic of the text.
How many words in the sentence was also found in the search term.
The search term is split up and a set of search specific keywords are identified.
If a sentence contains one or more keywords from the search term, its more
likely to be relevant.
Where the sentence was found in the text. A lot of articles and reports
follow a text structure, where general topics are introduced in the beginning
and concluded at the end. More specific subjects are discussed in the middle.
A summary consists of more general information and does not go into the
details. Sentences that are found in the beginning and end of the original text
is thus given a higher weighted position score. This style of writing is known
as the Hourglass Model[18].
Finally, the weighted scores are combined and the 5 most relevant sentences are
combined to compose a summary. The sentences are combined in the same order as
they occurred in the text.
20
Figure 2.2: Summarization Flowchart.
2.6.2 CSS
Cascading Style Sheets(CSS) is a style language used to describe the presentation
of structured documents, such as HTML or XML. While HTML documents can
be styled using style attributes for each element, a style sheet makes it possible to
separate the content of a document from the presentation. The style of a HTML
element is declared by having a keyword called a selector, which is a part of the
stylesheet that specifies the tag name of the element.
The properties of the selector such as colour, font and many more are then ap-
plied to each element matching the tag. By using attribute selectors, it is possible
to target specific elements that have matching id or class attributes in the target
document. In addition to specifying the colour and font of elements, CSS is also
used to design the layout of a web page or document[20].
21
2.7 Related Work
This section covers some of the related works found throughout the thesis work. It
presents the issues and possibilities found in similar applications.
2.7.2 Summarization
Automatic summarization techniques are used to automatically create summaries,
with little or none human intervention. Summarizers are useful for getting an
overview of a complete text in a shorter time. Automatic summarization is not
that commonly used in business, because the technology is not mature enough and
the results can vary in quality. Summarization tools can be used to summarize any
media including text or videos, but is mostly used for text. Software that performs
summarizations is often written in machine learning courses as an exercise, but the
business applications are still rather unused.
22
http://smmry.com/ SMMRY is a website[22] that offers a tool for summa-
rizing text. The text to summarize can be provided either through a file, a url
or by typing in the text manually. The length of the summary can be spec-
ified in sentences. The website does not offer any information regarding the
owner, but is selling a service provided through its API. The website offers the
options to set the amount of sentences. Five sentences seemed to be the best
compromise between length and content quality. Both the SMMRY website
and the developed application summarizes content, but the application allows
users to create summaries without providing a source.
23
Chapter 3
Methods
This chapter covers the research strategies and methodologies that were used in this
study. Furthermore, the research process is outlined and the data collection, and
result gatherings methods used to answer the problem statement are presented.
Literature Study
A literature study is a process of gathering information about a subject from
various sources, such as articles, books and research papers. The gathered
information can then be processed and summarized to help gain an under-
standing of a researched topic. A literature study is performed with either
a quantitative or qualitative method[23]. The qualitative method looks at
thoughts and opinions. This uncovers new problems and possibilities and al-
lows one to delve deeper into the problem. The quantitative method looks at
measured or deducted data. The initial literature study is often conducted
with a qualitative research approach, in order to find new thoughts and trends
about a researched topic. The qualitative study is usually then followed by a
quantitative study, where measurable data is evaluated and interpreted to for-
mulate facts. In order to assure the validity of the material, literature studies
require one to critically evaluate the information and sources, to determine the
legitimacy of the content. Without a critical analysis, information gathered
cant be used in summarises or integrated into ones work. The choice of using
a literature study was performed to get a better understanding of the fields in
which the thesis was conducted in.
25
Interview
Due to the involvement of many different technologies and challenges, an inter-
view was deemed useful in order to learn from the experiences of someone who
has worked with similar systems. The desired information was collected by
performing a general interview with some predetermined questions, and was
conducted with a software engineer that had work experience in many of the
technologies used. The purpose of the interview was to learn the best prac-
tices of storing and analyzing data, collected from an Android applications.
Additionally we gained insight into common problems that can occur during
the development process, and how to avoid them.
26
3.2 Data Collection
This section contains the methods used to gather and summarize data, and how
they were applied.
As more knowledge was obtained and applied, new topics arose which needed
to be studied. Because the technologies used were rather new, such as web
scraping, most sources had to be fetched from online articles, documentations
and studies.
27
Evaluating the Content and Sources
A crucial aspect of the literature studies was determining the validity and
relevance of a source. Because the concepts and technologies used in this thesis
are new and ever evolving, it was important to assure the sources where up to
date in addition to being trustworthy. When possible, official documentations
were used for the literature study.
3.2.2 Interview
After performing and summarizing the results from the literature study, it was clear
that not all questions were answered, or could be reliably found online or in books.
The purpose and aim of the interview was to get a better understanding of the
development process, and how to develop applications specifically targeting mobile
devices. The interview was held as part informal and part general interview, which
means that while there were some general questions, the interview was kept rather
open for discussions. This was done in order to get answers to some questions, and
coming up with new questions that were previously not thought of. The interview
was held with Eric Von Knorring who is a software engineer. The choice to inter-
view him was made due to his wast experience in many of the technologies used
surrounding the application.
28
3.3.1 Design of Prototype
Due to the complexity of the application, the design process can be split into several
different parts that need to be analyzed and designed.
Web Scraping Design
The gathering of data through web scraping was independently designed to fit
the specific requirements of the application.
Database Design
Designing the database refers to the implementation of a separate system used
to store and update values, through function calls from the mobile application.
The design and choice of technology was decided based on previous knowledge
in database paradigms and due to fulfilling requirements of handling concur-
rent writes to the same database rows.
UI Design
From the literature study, baselines for creating a simple and aesthetic UI were
established and documented. Implementing a proper design required several
iterations, in order to reduce any distractions from the core purpose of the
application.
Information Design
The aspect of the application that required the most research was the presenta-
tion of the results. Developing the solutions required to promote better access
to the desired information, resulted in branching out to researching Natural
Language Processing.
The implementation of the prototype was done in two different parts. First was
the front-end design of the prototype, which was the different views of the Android
application. Secondly was the back-end design, which includes the database, model
classes and the PHP code used to query the results. During the implementation
phase, both the front-end and back-end were developed in parallel. This was needed
to be able to test and evaluate parts of the prototype before continuing with the de-
velopment. The front-end of the prototype was from the beginning very dependant
on the back-end, such as the database to test that certain features work.
The parallel implementation of the front-end and back-end was used to create the
resulting prototype. The prototype was then used to generate the results needed for
the evaluation of the thesis.
29
3.3.3 Development Environment
This project used the IDE Android Studio to develop the application, due to being
the official environment to develop and supported by Google. The benefits of using
Android Studio was the inclusion of tools required to develop Android applications
such as emulators, built in tools for version control and dependency management.
Development of the database and backend was done by using the program XAMPP
to setup a local server and design the database schemas. The backend PHP code
was written using the text editor Atom. The mobile application was developed to
run on Android version 4.1(JELLY BEAN) and newer.
Visibility of System Status: Give the user feedback on what going on.
The heuristics applies to both the aesthetics and design of the UI, but also to the
functionality of the underlying system.
30
3.4.3 Summative Evaluation
While formative evaluation methods were applied during the development process,
a summative evaluation was conducted after the application was implemented. The
evaluation was performed to measure various metrics of the application. The sum-
mative evaluation strived to answer how viable the application is with performance
and functionality in mind[25].
Was the performance targets met in terms of speed, RAM and data
usage?
Speed: How long it takes from entering a search term until information is
visible.
Data usage: How much network data is used from that a search is started
until the information is fully loaded.
RAM usage: How much RAM is used at the peak of RAM usage.
Relevancy: How many of the search results return relevant results and what
share of results are relevant to the search term.
The results need to be relevant to the search term, or else the application does not
fulfill its purpose. The application must display the result at least as quick as a
regular search engine. The application also cant use more network data then a
regular search. Lastly, the application cant leak memory or use significantly more
memory than a regular search engine.
31
3.5.2 Methods of Evaluation
In order to obtain results of the performance metrics, methods that accurately and
reliable measure data had to be used. Furthermore, a range of devices with different
performance needed to be tested.
Speed
Measuring the speed of the application requires checking how long time it takes
from hitting the search button, until having the result fully load. Furthermore, each
major function will be measured to identify possible bottlenecks.
Obtaining the time measurements was achieved by calling a built in function for
saving the system time, before and after the desired measurement. The elapsed
time was then obtained by taking the difference between the time stamps.
Data Usage
Measuring the network data was done by using the built in tool Android Device
Manager. By selecting a process, the total amount of network data could be mea-
sured over a time period.
RAM Usage
The amount of RAM used could be measured by taking snapshots during runtime.
This was done by using the built in monitor, and measuring during peak RAM
usage.
Relevancy
The relevancy was measured by user feedback from a target group, that tested dif-
ferent search inputs and gave feedback on the results. The feedback was collected
through a questionnaire where the users were prompted to answer questions regard-
ing the application. The feedback was received in terms of numerical scores and
text for each question. How much of the contents that is relevant, was measured by
statistically evaluating the database data.
32
Chapter 4
This chapter presents and analyzes the issues with searching on mobile devices. The
result of the literature study was used to design a plan for implementing a prototype
and test its viability.
4.1.1 Performance
Since the first web page was created in 1990, websites have become a lot larger with
the addition of various features such as images, videos, fonts, CSS and JavaScript
to name a few. What started as a simple way of sharing information in the form
of text, has evolved into often building fully fledged web applications with complex
features, and as a consequence often large JavaScript file sizes.
This increase in size and performance required has been a noticeable issue for desk-
top users, as websites have been trending towards implementing feature sets of web
apps[26]. The issues are further magnified when considering that factors, such as
bandwidth and power draw, are less of an issue on connected devices. Not only are
mobile devices limited by their batteries and data plans, the network connection,
processor and RAM are often significantly slower, which affects the performance.
Performing a search on a mobile device has not gotten much slower due to the
search engines used, but rather due to the loading of found data and navigating of
the result. The heavy use of JavaScript in modern websites, introduces features that
are often unwanted when trying to find information quickly, such as animations and
ads that load in dynamically. Trying to find information while on a mobile device
often takes a significant amount of time, especially if the desired information is hard
to find and requires the user to check several links.
33
4.1.2 Data Usage
The average web page has more than doubled in size from 2012 to 2016, where it
was over 2.3Mb large[26]. The trend is moving towards larger websites, and its
mostly due to the increase of images, video and other media to become more visu-
ally appealing. The use of JavaScript has also increased, with most websites using
one or more larger frameworks in addition to any other code. This increase in web-
site size has had a greater impact on mobile device users, as faster mobile network
connections, such as the 4G network, are not always available[27].
Trying to find information online while using a mobile device on a data plan, can be
costly in terms of data used. Search engines provide only a sentence or two below
each result link, which makes it difficult to assess the content quality of a web-
site before loading it. This issue is further magnified in nations where the network
infrastructure is weaker and mobile devices slower.
Due to the smaller screens of mobile device, fewer links can be seen at once, which
further inhibits the user experience. The links are presented in order, where the most
relevant links are placed at the top, according to whatever algorithm the search en-
gine uses for ranking. This still requires the user to either trust the search engine
with the top link, or to manually visit sites until the information they are searching
for is found.
34
The desired information is always at least two clicks away and even if its found,
it often comes with additional undesired information. Mobile searches are mostly
made to get quick and convenient answers. Finding the desired information from a
long article or web page is time consuming, which is detrimental to the goal of most
mobile searches.
By only fetching the content that is required to extract the desired information,
less time can be spent waiting for images, fonts and CSS to load and data usage
can be reduced. This extraction of data can be done manually, or be automated
using web scrapers. Libraries such as BeautifulSoup[30], phantomjs[31] and
jsoup[32], which are made for extracting information from websites are available
in many languages, which reduces the need to implement a custom solution.
35
The search links only have a sentence or two of content, which does not indicate
enough about the information stored in the link. When presenting the search results
on a result page, introducing an abstract helps the user make a decision on the
websites relevance and could be enough to solve the information need.
4.4.1 Challenges
With the many different challenges discovered, special focus was given to the two
most difficult. The performance aspect had to be prioritized in order for the ap-
plication to be considered an alternative to regular search method. In particular,
the time to resolution was a key performance metric to have in mind. Displaying
relevant data was the second difficult challenge, due to the subjectivity of different
summary methods.
In the case of the performance, it was discovered that mobile network speed had the
largest impact on web scraping performance. And while it doesnt take very long to
scrape just a single web page, the time it would take to scrape several pages after
each other would add up to an unacceptable amount of time. As the application
needs to present several results from many websites, this challenge had to be solved.
In the case of showing relevant data, finding and extracting the correct content
from web pages proved to be an issue. Because there were no preset rules for a spe-
cific page, the web scraper has to be configured to work on all kinds of pages. Due to
the smaller screen sizes available on mobile devices, choosing which sentences to be
present is important to make good use of screen space. Choosing the best sentences
is not easy, as the most relevant sentences can be located anywhere on the page,
with many different page layouts.
Other smaller challenges such as how data should be stored and designing a good
UI were also discovered, but didnt require as much in depth research.
36
4.4.2 Possibilities
Along with discovering challenges, a number of possibilities were also found.
As the speed of web scraping can be slow when scraping several pages after each
other, solutions for this problem were researched. One of the possible solutions that
were found was the use of threads. By taking advantage of a mobile devices differ-
ent CPU cores (if they have more than one), threads could be used to scrape several
web pages simultaneously. This could reduce the time to resolution if implemented
effectively.
Another challenge was to decide which sentences in a web pages content to present
to the user. A solution that was found for this challenge was the use of automati-
cally created summaries. To be able to create these summaries, research had to be
conducted in the field of summarization methods. By using generic summarization
to give sentences a weighted score based on a scoring algorithm, more relevant sen-
tences could be separated from the more irrelevant ones.
Possible solutions to the smaller problems include creating a database to store all
necessary data, and creating a UI based on the research made on colour theory and
user interactivity.
37
Chapter 5
Information Gathering
Application: Design and
Implementation
This chapter provides an overview of the created application and all of its function-
ality. Furthermore, flowcharts and diagrams are presented. The implementation of
each component of the application is described.
The main functionality of the Android application is to provide a user with a search
bar, where they can enter a searchterm. After a user has entered a search, they are
presented with a couple of search results, which are ranked based on how relevant
they are. The search results consist of a text summary and a link to the web page
from where the specific summary was generated. The user can then either expand
the summary to read more, visit the link or affect the relevance ranking by swiping
left or right on the result. From this page the user can either perform a new search
or get a larger summary, which consists of all the results that were swiped right.
39
5.1.2 Webscraping for Information
The Implementation of this search application was achieved by making use of the
already indexed web, provided by search engines such as Google. In order to gather
the relevant information from the found links, the links where web scraped to fetch
the HTML document and retrieve text from the desired elements.
40
5.1.4 Application Flowchart
Start Screen
When the application is started it fills a list with stop words that will be used by
the summarize algorithm. Entering a search term switches from the MainActivity
intent to the ResultPage intent, as depicted in figure 5.2.
Before the page loads, it fetches the search term and creates an AsyncTask to perform
tasks on a background thread. Relevant links which contain information are web
scraped from a search engine and used to update the database. A thread is spawned
for each link, and data is web scraped and summarized before being collected. The
gathered result is sorted and displayed for the user. A new search can be made by
pressing the New Search button and a summary of relevant results are presented
with the Continue button. Flowchart is depicted in figure 5.3
41
The relevant summaries are fetched and the database is called to update which
links are relevant on a background thread. A new search is made by pressing the
New Search button. Final result is depicted in figure 5.4 and overall flowchart is
depicted in figure 5.5
42
In App Views
Screenshots from the application are depicted in figures 5.6, 5.7 and 5.8. Further-
more the colour pallet is depicted in figure 5.9
First of all, depicted in figure 5.6, is the start screen of the application. This is
the initial view presented to the user after starting the application. This view in-
cludes a search bar where the user can enter its desired search term and a button
to initiate a search.
Depicted in figure 5.7 is the result view that the user reaches after initiating a
search. A list of search results, which consist of summaries, are presented. Beneath
each summary, there is a link to the source page, which leads to the page from where
the summary was generated. The user can swipe left or right on search results to
decide its relevance. A left swipe declares that the result is considered irrelevant and
a right swipe is for relevant results. The complementary colours red and green are
used to represent the two choices. The view includes a button that returns the user
to the start screen in purpose of making a new search. After one or more relevant
choices has been swiped, there is a button for continuing on to view the collected
summaries that were deemed relevant.
In figure 5.8, the summary view of the application is depicted. This is the view
that is reached when continuing from the result screen. This view presents the cho-
sen summaries from the search results on a scrollable page. There is also a new
search button to initiate a new search.
Lastly, the chosen colors are presented in figure 5.9. The application has a theme
that consists of shades of green that are analogous on the color wheel. The text is
black on white, which has high contrast and is easy to read. To clarify which swipe
direction indicates a relevant or irrelevant result without providing instructions, the
complementary colours red and green are used.
43
Figure 5.6: Application Start Screen. Figure 5.7: Application Result Screen.
44
5.2 Implementation
This section covers the implementation of the mobile application.
In order to display relevant data, Google was used as a source for finding rele-
vant links. This was achieved by fetching the links from a Google search query as
depicted in listing 5.1. The links were found to be children of <h3> elements with
the class name r by inspecting the HTML source code.
Document doc = Jsoup.connect("https://www.google.se/search?q="+searchTerm)
.get();
Elements searchLinks = doc.select("h3.r > a");
Listing 5.1: Code for fetching relevant links
Links that contain relevant data could now be identified and scraped for their con-
tent. Scraping the content of a website required the application to wait for a TCP
connection to be established before a GET request could be made. The time it
takes to establish a connection and start downloading content from the web server
is substantial, and thus fetching data from several sources sequentially was not an
option. To circumvent this issue, a thread was created for each instance where
HTTP requests were made. The handling of threads and data was achieved by us-
ing a ExecutorService from the java.util.concurrent package.
45
5.2.2 Storing and Updating of Data
In order to keep track of what results were deemed relevant, a database was used
to store the relevancies related to each search result. By keeping track of which
summaries were swiped left or right, a summaries relevancy score was updated and
could be presented in different order with the highest relevance first. Performing a
complete search requires two connections to the database.
Using the database required a web server hosting PHP files. The PHP files would
handle requests from the application and was used to perform queries. Setting up
the connection was done by including a PHP file with the configuration for con-
necting to the web server as depicted in listing 5.4. The queries were performed
through PHP instead of directly calling the database due to security reasons. Using
software, mobile applications can be decompiled and the database configuration file
can be found.
$host = "localhost";
$dbname = "kandidat";
$username = "root";
$password = ******;
$connection = new PDO("mysql:host=$host; dbname=$dbname",$username,$password);
Listing 5.5 Query used to update the database with URLs and domains.
The web server would then perform a SQL query and update the database with the
new information. In order to prevent security issues such as SQL injections, user
input was validated using built in functions such as binding parameters as depicted
in listing 5.6.
$stmt->bindParam(:search,$ POST[searchUrl.$x]);
$stmt->execute();
After the information was uploaded and updated, the relevance of each link was
fetched and returned from the web server.
46
After Choosing which Links are Relevant
Following a successful search, users were able to swipe either left or right, to rank
the relevancy of a summary. To update this information, the database was queried
with the necessary information required to identify which summaries was ranked and
what score they got. This was used to update how relevant a summary was for each
search term, which would change the order of future results, based on relevancy.
Database Design
The database was designed to contain all the information that was necessary to
identify and score each summary. The url, domain and searchterm were used to
identify a unique summary, without having to store the actual text. Furthermore,
some additional columns such as numOfHits and noOfSearches were used to
collect user data as depicted in figure 5.10.
47
Chapter 6
Information Gathering
Application: Evaluation
What is Brexit?: This search term was the third most searched on Google
during 2016 and the pages were on average 1.78mb large across the tested
devices.
Theory of Relativity: This search term was chosen to test the perfor-
mance on more mobile friendly web pages with an average page size of 0.63mb
across the tested devices.
Performance : Speed
The speed of the application was measured in nanoseconds for each function, from
entering a search term until the result is displayed. The time that passes from
pressing the search button until retrieving the result, consists of four major functions
that gather and operate on data. The performance metrics displays these individual
functions and how much of the total time they account for. The result is displayed
in milliseconds.
49
GoogleScrape: Is the function that performs a web scrape on Google to fetch
relevant links for a search term.
QueryTime: The time it takes to updates the database with the new results
and receive an answer.
ThreadSearch: Function that splits the workload up into threads and gathers
the summarized data from each web scraped link.
Avg Summarize: Average time it took for a device to create a summariza-
tion.
Max Summarize: Longest time it took for a device to create a summariza-
tion.
In figure 6.1 the results from the search term What is Brexit? is displayed. The
difference between the results of the smartphones was small. On average 89.356%
of the time was spent web scraping, while creating the summary accounted for less
than 5%.
50
Performance: Single vs Multiple Threads
Two different implementations of the application was tested, to see how long it
takes from performing a search, to receiving the result. One version made use
of multiple threads to webscrape and create the summaries, while the other used
a single thread. The average time was computed by taking the average from 30
searches with caching disabled. The single threaded version took on average 8485 ms
while the multithreaded took on average 2381 ms. The single threaded application
was on average 3.36 times slower as depicted in figure 6.3. The depicted time is the
sum of the functions shown in figure 6.1
51
Performance: Ram
The amount of RAM used by the application was measured by performing a search
and measuring the peak RAM usage and is depicted in figure 6.4 and 6.5. The RAM
usage was low throughout the testing and no memory leaks were discovered.
52
Performance: Data
Measuring the amount of network data used, was achieved by using the IDEs built
in network monitor tool. The average data used for a search was measured both
through the application and through a normal web browser. Browser data was
measured by searching for a searchterm, clicking a link and letting the page fully
load. The data was collected by measuring the average data usage from the same
links scraped by the application. The application data was measured by performing
a search term and getting the summaries. On average the application would reduce
the data amount by 60-80%, depending on the search term and mobile device as
depicted in figure 6.6 and 6.7.
53
6.2.1 App Usage
From the data collected the following results were had.
54
Chapter 7
Discussion
This chapter covers the discussions on the different methods, solutions and prob-
lems found during this project. The problem statement is revisited and discussed.
Furthermore the repercussions of the application are discussed from a sustainable
and ethical viewpoint.
7.1.1 Methods
These following methods were the ones used during the project.
Literature study
Investigating possible solutions for the problem statement required having back-
ground knowledge in a wide set of topics. Because there was a lack of experience
with some of the technologies required to develop the Android application, a litera-
ture study was performed to gain the required knowledge. Furthermore, presenting
the data in a way that would promote faster access to the desired information, was
difficult due to there not existing obvious solutions. Performing the literature study
was more a necessity than a choice.
Interview
The reason for doing the interview was partly to gain insight into what best practices
exist, but also to settle some uncertainties that arose during the early stages of
designing the application. The interview did not cover many questions regarding
Android applications, but it proved invaluable for managing the project in terms of
time and scope.
55
Design and implementation
Some of the minor issues were UI related. While most modern web pages are de-
signed to work well on mobile devices, they are often harder to navigate than the
desktop counterparts. More major issues that appear were due to performance and
network data limits. Because most modern websites make use of images and large
JavaScript frameworks, a lot of data and performance is used to load web pages. We
chose to apply the MVC architectural pattern and evaluate our work formatively.
The reason for choosing the design pattern was mostly due to having previous ex-
perience developing using MVC. Because Android development was a new concept
that had to be learnt for this thesis, it was beneficial to apply already known con-
cepts. By using the MVC pattern, and evaluating the work formatively, it was made
easier to modify and rewrite a layer of the application without other layers being
affected. This was essential for the development cycle where the design often had to
be changed. Some of the issues with using MVC was that it was difficult to define
what the control layer was. The view of an Android application acts more as a
view-controller hybrid which reduces cohesion and reduces the amount of code that
could be reused. Other design patterns were considered such as MVVM and more
Android specific patterns.
Evaluation
Formative Evaluation: Formative evaluation was used during the design
and implementation process. This proved very useful for developing an appli-
cation that was constantly changing, as decisions to change the applications
functionality were made based on incoming new information.
56
7.2 Problem Statement Revisited
The following questions were the problem statements of this thesis
In which way can a web scraper be used to collect relevant data on a subject?
How can the collected data be stored and analyzed?
In which way can an Android application use a web scraper for data gather-
ing? How can the collected data be presented to promote easy access to the
desired information?
How can the collected data be stored and analyzed? A decision was
made to store and update data on a web server. This was done in order to get
easier access to metrics, such as which search terms were popular and what was
deemed relevant. By having users update the same information, a collective
effort was made to push more relevant summaries to the top. From the results
gathered through the questionnaire, most user found that this could improve
the search results over time. Preventing vote manipulation was something that
was considered, but would have expanded the scope of the project too much.
By collecting various data regarding searches, it was made simple to gather
and analyze how well different aspects application worked.
In which way can an Android application use a web scraper for data
gathering? Gathering data in an effective manner was achieved by utilizing
threads, which reduced the time it took waiting for HTTP requests. Since
Android applications are built using Java, several web scraping libraries were
available. Performing the web scrape was done by including a library for web
scraping, due to the improved performance rather than implementing our own.
The initial implementation which only used a single thread proved to be much
slower than a search made on a regular search engine. Since most Android
devices can utilize several cores, multithreading became a good solution for
improving the search performance as shown by the measured results.
57
How can the collected data be presented to promote easy access
to the desired information? By sorting all the gathered links by their
relevancy score, the most relevant result was presented at the top of the list
of results. By having the most relevant results at the top, the time to resolve
could be shortened by hiding irrelevant results further down. An issue with
finding information online, is that the text often has filler sentences or is to
long. In order to more easily and faster understand the information, we decided
to present the user with summaries. These summaries were long enough to
solve the information need, but short enough to be understood quickly. In
the case that the summary wasnt detailed enough, each summary had a link
to the source page. According to the user feedback, the summaries generated
by the application were overall rather good. Most of the negative feedback
was due to the swipe functionality being considered too sensitive. Overall,
the feedback collected indicated that the application could definitely se some
usage but would require more work in order to smooth out some minor issues.
Caching Summaries
A decisions was made not to cache the summaries. Caching the summaries would
have reduced the time it takes search for the same term from a couple of seconds to
less than a second. Still, a decision was made to not include caching due to possible
copyrights issues of storing website data on our server. Even though the content
that would have been saved consists of summaries, we decided not to take any risks.
Choice of Platform
The reason for developing this application on the mobile platform were many. The
problems in the problem statement are amplified on mobile devices compared to
desktop devices, due to the shortcoming of mobile technology, such as limited battery
and network data. Mobile devices account for more than 50% of searches and is only
increasing. Thus it made sense to us to target a platform where the benefits would
be seen most clearly.
58
7.3 Ethical Aspects
During the literature study, certain issues regarding unethical behaviour of web
scraping software were discovered. This section covers the discussion about these
ethical topics.
And while the website gets no revenue from the web scraper it still has to pro-
vide the scraper with files, which puts a load on the websites server. This creates
the ethical issue of not giving anything back to the creator of the content. If all
search engines tried to present the data of a search result better than the actual
website the data is hosted on, there would be a less incentive for creators to publish
their data on ad driven free websites. A consequence of this could be a decrease in
overall free content on the internet and a growth in content behind paywalls. To
try and help content creators, we make sure to always link to the source page when
presenting a search result.
59
7.4 Sustainability
This section covers the possible effects the thesis could have on sustainability if the
application is used or the ideas applied.
60
Chapter 8
Conclusions
This chapter concludes what was achieved throughout the thesis and what future
implications the results could hold.
8.1 Summary
This study set out to investigate how problems related to information gathering on
mobile devices, could be solved using technologies such as web scraping. Based on
the results from our literature study, issues that occur when searching for infor-
mation on mobile devices were identified. These issues include problems with long
loading times, small and difficult to use interfaces, large website size and presenting
data in a mobile friendly way to users.
With the issues in mind, an application was designed that would try to improve the
information gathering process on mobile devices. By making use of web scrapers
for gathering the information, rethinking the way information could be presented,
an application was implemented that makes use of these ideas. The results that
were gathered from the application were presented and analyzed. The user feedback
indicated that the application did achieve its goal of quickly presenting relevant in-
formation, but was inconsistent and would require some work before it could replace
regular search engines.
The different results of the thesis were discussed, and the authors came to the con-
clusion that the technical aspects of the application could be improved to a point
where it could potentially be used by a specific target group, that has a need for
raw text gathering. But while the technical side had potential, the application was
hindered by the unethical aspects of web scraping.
61
If legal grounds cant be decided because of too many different opinions, an official
list of guidelines on how to use web scrapers should be created.
Lastly, the storage of the data have to be increased and improved if the appli-
cation would be available and used by a large mass of people. By using a big server
architecture that can handle a large amount of concurrent users, the issues of stor-
ing and managing data would be minimal, as the client-to-server interaction is small.
If all these issues would be solved, there could be potential for the app to work
as a research tool. It would work by collecting and condensing data on more scien-
tific topics, and presenting these to the user as i.e. a summarized report.
62
Bibliography
[1] Growth of data online [updated June 19, 2017: cited June 19, 2017]
http://www.cisco.com/c/en/us/solutions/collateral/service-
provider/visual-networking-index-vni/vni-hyperconnectivity-
wp.html
[5] Theoretical Speedup from Parallelizing Computations [updated August 28, 2000:
cited June 19, 2017]
http://www.phy.duke.edu/~rgb/brahma/brahma_old/als/als/node3.html
[6] CPU Threads [cited August 24, 2013 : cited June 19, 2017]
https://www.cs.uic.edu/~jbell/CourseNotes/OperatingSystems/4_
Threads.html
[7] MultiThreading in Android [updated May 17, 2017 : cited June 19, 2017]
https://developer.android.com/reference/java/util/concurrent/
ThreadPoolExecutor.html
[8] AsyncTasks in Android [updated May 17, 2017 : cited June 19, 2017]
https://developer.android.com/reference/android/os/AsyncTask.html
[9] The SQL language [updated April 21,2017 : cited June 19, 2017]
https://docs.microsoft.com/en-us/sql/odbc/reference/structured-
query-language-sql
[10] PHP Scripting Language [updated June 19, 2017; cited June 19, 2017]
http://php.net/manual/en/intro-whatis.php
[12] Color Wheel Image [updated May 21,2017 : cited June 19, 2017]
https://commons.wikimedia.org/wiki/File:RGV_color_wheel_1908.png
63
[13] Analogous Colors, Joen Wolfrom (1992) The Magical Effects Of Color
pp.31-32
[14] Principles of Interaction Design [updated June 19, 2017: cited June 19, 2017]
http://asktog.com/atc/principles-of-interaction-design/
[17] Yihong Gong, Xin Liu Generic Text Summarization Using Relevance Measure
and Latent Semantic Analysis (2001)
[20] CSS standard [updated April 5 2017 : cited June 19, 2017]
https://www.w3.org/standards/webdesign/htmlcss#whatcss
[23] Qualitative and Quantitative Methods [updated April 4,2017 : cited June 19,
2017]
http://www.lib.vt.edu/research/methodology/quantitative-
qualitative.html
[24] Nielsen Heuristics [updated June 19, 2017: cited June 19, 2017]
https://www.nngroup.com/articles/ten-usability-heuristics/
[25] Summative Evaluation [updated April 4,2017 : cited June 19, 2017]
https://cyfar.org/different-types-evaluation#Summative
[26] Web Page Size [updated May 31,2017 : cited June 19, 2017]
https://www.keycdn.com/support/the-growth-of-web-page-size/
[27] Global 4G Coverage [updated April 4,2017 : cited June 19, 2017]
https://opensignal.com/reports/2016/11/state-of-lte
[28] Mobile Shopping Behaviour [updated April 4,2017 : cited June 19, 2017]
https://www.thinkwithgoogle.com/articles/mobile-shoppers-
consumer-decision-journey.html
[29] Website size [updated April 4,2017 : cited June 19, 2017]
http://httparchive.org/interesting.php?a=All&l=Apr%2015%202017
64
[30] BeautifulSoup Python Library [accessed June 19, 2017]
https://www.crummy.com/software/BeautifulSoup/
[33] Mobile Data [updated April 4,2017 : cited June 19, 2017]
https://www.thinkwithgoogle.com/nordics/research-study/the-need-
for-mobile-speed-how-mobile-latency-impacts-publisher-revenue/
[34] Search Engine Algorithms [updated February 5,2007 : cited June 19, 2017]
http://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm
65
TRITA TRITA-ICT-EX-2017:61
www.kth.se