Vous êtes sur la page 1sur 11

Ebola The Geographic & Demographic Impact of

the 2014 outbreak in West African Countries


IN4086 Data Visualization - Student Group 15
(2016-2017 Q2)
Faculty of Electrical Engineering, Mathematics and Computer Science
Delft University of Technology
December 15, 2016

Jayachithra Kumar 4617312


Sneha Saha 4600916
Maria Touranakou 4623525

Introduction
This report is part of an information-visualization (InfoVis) project for the IN4086 Data Visualization
course at Delft University of Technology in the Netherlands and its primary goal is to apply
principles and concepts of Data Visualization introduced in class into practice.
The aim of the project is to visually encode complex datasets using existing data visualization tools
while utilizing human (visual) perception factors in order to represent the raw, complex data in a
way that leverages the extraction of valuable knowledge and insights that were not directly derived
or perceived by the non-visualized data.

The subject of our project is the latest Ebola 2014 outbreak in the West African countries, and the
geographic and demographic impact of Ebola on the epicenter countries of the outbreak in West
Africa: Guinea, Liberia and Sierra Leone between 2014 and 2015.

In the first part of the report, we initially show an overview of the geographic spread of the latest
Ebola 2014 outbreak globally and the escalation of its impact on the epicenter countries.
In the second part, we perform a visual analysis spanning in two main directions: first, the spread of
Ebola differentiated by gender and second the spread of Ebola across different age groups in the
epicenter countries.

Data Analysis: Scope and Context


After deciding the subject of our project and research the topic over the Web, we realized that in
order to undertake a proper data analysis we would need to narrow down the scope of our analysis
and focusing on certain features of the Ebola impact rather than performing a broad analysis that
may end up being too generic and doubtful. We would also need to think about the time limits and
target the best result we could end up with respect to time and resources. Thus, we limited our
data analysis by temporal scope and thematic scope.

Under the temporal scope, we are only visualizing the impact of the 2014 Ebola outbreak between
2014 and 2015. After conducting some research on the available Ebola data corresponding to the
year 2016, we did find some level of inconsistency in the data available across different resources
and thus, we decided to exclude the data of 2016 from the scope of our design study.

Under the thematic scope, as most official reports were already available investigating the health,
social and economic impact of the Ebola Virus Disease (EVD), we decided to focus the design and
analysis of our project on the geographic and demographic scope of Ebola while at the same time
restricting our analysis to the impact of Ebola on the epicenter countries rather than to its global
effect.

In that direction, regarding the geographic impact, we are visualizing the impact of Ebola on the
global scale so that comparison of affected areas and the emergency of the outbreak would be
clearly conceived. For the demographic impact, we are only shedding light over the Ebola impact on
the epicenter countries of West Africa and its effect over 3 different age groups and the 2 genders.

Data Analysis: Challenges and Limitations


Topic Identification: Whilst we were still in the process of trying to identify interesting datasets for
our project, we came up with the idea of researching public health threats with global impact. After
gaining information on different global health issues we had to deal with two main challenges:
a. we needed to find data from reliable sources, so we had to search whether health organizations
and related institutes were gathering such data and whether the data were publicly available, and
b. we had to find data of significant volume and variety in order to be able to identify some
interesting observations out of them. We also had to decide how much data would be an
adequate sample for analysis and what would be a valid timeframe for our sample of data to be
analyzed.

Validity of Drawn Conclusions: When aspiring to visualize data about a global public health threat
like Ebola, the volume of data you visualize as well as the accuracy and timeliness of the data may
vitally affect the accuracy and truthfulness of your final observations. That is why we came to the
conclusion that it would be necessary to restrict the scope of our analysis and specify clear goals.
However, the dataset we assessed is still a subset (sample) of the available data so we acknowledge
that a significant larger sample may highlight the impact of Ebola in much more detail and validity.

Visualization Tool: We wanted to work on a visualization tool that would enable us to easily create
plots out of the data, but at the same time it would enable us to create an overview of different
plots where data would be interactively connected across different plots e.g. on the click of the
mouse. Using tableau as a visualization tool was quite helpful as we could create a dashboard with
different sheets and visualize data from different datasets in an aspiration to raise some important
correlations among them.

Data Acquisition and Data Formating: Although Ebola was a major threat for public health and
multiple sources of data were available, deciding on the acquisition of data available and filtering
the data was a major challenge due to different datasets available deviating in measurements.

Data Acquisition
In our report, we use datasets acquired from the websites of major humanitarian and research
organizations such as the World Health Organization (WHO), the World DataBank, the
Humanitarian Data Exchange and the WorldPop that value open data practices and high level of
reliability and variety on the data collected.
When we started working on the project, we also aspired to go into a much more in-depth analysis,
for example we wanted to zoom further in the regions of the epicenter countries and visualize the
impact of Ebola across a specific country, but unfortunately we could not find enough available
data to perform this task. Even when we did find some related data of Ebola cases per district
concerning the epicenter countries, the period of time the data concerned differed from the
predefined timeframe scope we had already set for our project, so we finally had to give up this
idea, nevertheless we allocated quite some time working on that direction.

Data Formatting
In order to be able to later on create data visualizations in tableau, we first had to format the
acquired data to create a dataset that would be functional and useful for our project scope and
analysis. Therefore, for example for the demographic analysis, we collected data from the World
Health Organisation); we first downloaded various individual data points from December 2014 to
December 2015 and then, we merged and organized them using Microsoft Visual Basic so that our
dataset would be refined for tableau visualization. An example of our formatting in shown at the
Figure 1.
Figure 1 shows on the upper side the individual data points concerning a specific date in excel format as acquired
by the World Health Organization website, while on the bottom is shown the formatted dataset for Guinea after
merging and cleaning the individual datasets for different dates between December 2014 and December 2015 in
Microsoft Visual Basic.

Framing Hypothesis Issues:


When starting working on the project, it was quite unclear what questions to ask and how to
proceed, so that we decided to focus on certain aspects of the disease and we made some
assumptions by reading related material from the press to form a myopic first picture of what we
were about to look for in the data.
In order to draw some initial assumptions regarding the data and form hypothesis to be tested, we
made some draft operations on the data such as aggregation so that we could get an intuition of
what to expect.

Our Tool
Tableau is an existing data visualization tool that enables powerful and interactive analysis of
complex data. Tableau consists of a user-friendly, drag-and-drop interface and a lot of
documentation and training videos are available online to be consulted. As soon as we decided to
use tableau for our project, we started experimenting on the different visualization possibilities and
realizing the advantages and disadvantages of alternative visual encodings along with the
understanding that according to the dataset and the category of data we were using, there was a
visualization that would reveal or validate our initial hypothesis in the most representative-to-the
reality way.

Our Approach
Our main approach to conduct the data analysis was:
a. Research on the Web and initial hypothesis formulation intuitive, shortsighted assumption,
b. Experimentation on alternative correlation of data (interactive queries or calculations) and data
visualization observation,
c. Final decision of the most beneficial visualization technique to confirm or reject the initial
hypothesis through data visualization.

Visualization Techniques
By focusing on the geographic and demographic data analysis of the Ebola outbreak, we had the
chance to experiment on different information visualization techniques:
a. For the geographic analysis we did use a multiple views dashboard where the user is able to
perform a dynamic (visual) query on the data and interact with it by simply selecting a range of
interest (in our case the country). In this section, we mostly focused on multiple and interactive
data representation rather than performing extensive operations on the data, while
b. For the demographic analysis we augmented filtering on the data by performing aggregation of
multiple dataset and normalization to a common scale in order to confirm or revoke initial
assumptions and be able to draw stable conclusions.

In short, from a perceptual point of view, part 1 promotes more data exploration (exploratory
visualization) through the interaction among data and the color saturation that represents the
escalation of the Ebola spread and impact, whereas part 2 verifies or refutes pre-conceived
hypothesis through perceiving the data distribution and pattern of evolution over time. Both parts
serve the purpose of presenting the data in a more communicative, visual approach.

Techniques
Multiple linked views on a dashboard (Map and stacked bar charts)
Treemaps: We used treemaps to visualize the mortality rate of Ebola per country. The color
saturation along with the area of the scheme highlight the impact of Ebola per country in
terms of fatality.
Bar charts. We used bar charts in order to best compare the lookup values and study their
distribution over time to uncover any pattern on the occurrence of Ebola in people of
different age groups and gender.
Stacked bar charts. Differentiation in color and length showcases the evolution of cases per
country while also highlights the fatality rate among the total cases of Ebola namely the
part-to-whole relationship between cases of infected people and fatal cases of Ebola.
Motion chart. Used for visualizing the evolution on the rate of affected population as well as
the Ebola effect on diverse age groups.
We used different colors for each country; each country has the same color in each bar
graph and also at the motion chart visualization at the demographic analysis.

Geographic Impact
Through the geographic data analysis, we were able to design a dashboard on tableau of interlinked
multiple views in order to enable interaction between the data and make better sense of the Ebola
impact on the epicenter countries compared to the global spread of Ebola in terms of the
cumulative number of affected people (cases of people infected by Ebola plus fatal cases) and
whether there were emerging cases of Ebola present over the last 21 days in our data subset.

Although, our figures are mostly showcasing the impact of the outbreak in Guinea, Liberia and
Sierra Leone, we mapped the global impact of the outbreak so that comparative analysis and
perception would be better conceived through statistics and color density (saturation). On the right
of the multiple views dashboard, there is a scale of red color whose saturation highlights the
impact of Ebola with respect to the combined cases of infected and deaths people per country.
The user is able to apply filters on the global map in order to see the geographical and cumulative
distribution of specific features of the data across the globe and the impact of Ebola per region.
In the map view, the redder colored a country is, the more the impact of the Ebola outbreak is, and
thus the more urgent the situation has been. At the same time, the user can perceive the
emergency of the situation per country by checking whether there were still cases of Ebola
detected in the selected country over the last 21 days and the national fatality rates.

Figure 2: By selecting filters on the right, the user can discover the geographic distribution of
metrics such as the Ebola cases, the fatality rate, etc. along with their cumulative distribution over
time. An important observation is the impressive variation in the range of affected people; in some
countries the number of total cases is only 3 whereas in the epicenter countries this number
reaches and exceeds the 2.8 million affected people. The redder a country is on the global map, the
more affected by Ebola.

Figure 2 shows the interlinked multiple views dashboard for the Ebola worldwide impact.

Figure 3: By the click of the mouse, the user can see the geographic impact of Ebola per country,
along with the total number of cases of infected and people died of Ebola in yellow and red color
respectively, while also checking the number of cases of affected people over the last 21 days.

Figure 3 shows the interlinked multiple views dashboard for Ebola impact to the epicenter countries in West
Africa.
Figure 4: Although numerous countries were affected by Ebola, the disease was deemed deadly
only in the countries shown in Figure 4. Out of all the countries Mali has the highest mortality rate.

Figure 4 shows the mortality rate for the Ebola affected countries worldwide.

Design Choices for the Dashboard of Multiple Views


Views visibility: the plots are placed side by side
Views count: the number of views shown is few so that perception is easier
Views arrangement: the configuration of dashboard was manual, but the filtering was an
option made available on tableau
Linkage between views: Actions in the map view are propagated to the bar chart of
accumulated cases of people infected by Ebola and people died from Ebola whereas the
second bar chart discloses the existence of cases recently reported (the last 21 days). The
user can select a filter(e.g. per country) and interactively see the local cases of infection and
fatality due to the Ebola outbreak as well as whether there are any cases of Ebola reported
in the area during the last 21 days.

Design Choices for the Treemap


In the treemap, we visualized the mortality rate of Ebola per country. Each rectangle denotes one
country. The larger and darker blue the rectangle is, the higher the mortality rate is in that country.
Color saturation is used to easily differentiate the urgency of fatality per country.

Demographic Impact
In this part, in order to explore the data in more depth, instead of filtering and interacting with it,
we make calculations with the data and visualize its distribution over time so that we can derive
meaningful insights and patterns. More specifically, in this section, we analyzed the spread of Ebola
in terms of age and gender distribution for the epicenter countries of Guinea, Liberia and Sierra
Leone.
The main questions we aspired to reply were:
Does Ebola present a pattern related to the age group that people belong to?
What is the infection rate on children and elderly? Are they less or more affected?
Does Ebola affect more women or men? Is there any pattern on the occurrence of the
disease with respect to gender?
Is Ebola actually an epidemic?

Although Ebola first appeared back in 70s, it was not until the very recent outbreak in 2014 that
has been considered an epidemic disease as the number of cases (people infected) and the number
of deaths due to the disease increased significantly alerting the global community. According to
Wikipedia, the West African Ebola virus epidemic was the most widespread outbreak of Ebola virus
disease in historycausing major loss of life and socioeconomic disruption in the region, mainly in
the countries of Guinea, Liberia, and Sierra Leone [1].

Initially, when we saw the statistics of the affected population and after computing the percentage
of the affected population over the general population of the West African countries, the
percentage of affected people seemed to be relatively slow. That is why we did wonder why Ebola
was defined as an epidemic.
After visualizing the data and searched on the Web how an epidemic is defined, we observed that
according to the global metric of an epidemic definition, Ebola has been extensively contagious and
dangerous for people as shown in Figure 5.
More specifically, according to Principle of Epidemiology in Wikipedia an epidemic is the rapid
spread of infectious disease to a large number of people in a given population within a short period
of time, usually two weeks or less. An attack rate in excess of 15 cases per 100,000 people for two
consecutive weeks is considered an epidemic [2]. In Figure 5, It is obvious that the disease
exceeded by far the 15 cases per 100,000 for 2 consecutive weeks, so we can see that Ebola was
actually an epidemic outbreak in the African countries.

Figure 5 showcases the cumulative number of Ebola cases in West African countries over 2014 - 2015.

Design Choices for the Motion Chart


In order to visualize a trend over time and its evolution we used motion chart.
Axis: The X and Y axis represent the female and male population affected by Ebola. Any point in the
XY plane denotes the distinct number of female and male people affected at a certain time.
Size of circle: The size of the circle (diameter) is defined by its diameter and varies depending on
the percentage of affected population by Ebola, quantitatively. The larger a circle is, the more the
impact of Ebola on that country is, as the diameter is computed by the average value of the male
and female affected people by Ebola (scaled-down by 100).
Colors: Different colors are used for different countries.
Figure 6: We can observe that there is a similar pattern on the increase of affected people in the
epicenter countries and that the disease seems to be slightly more occurrent to the female
population. As the cases of affected people are increasing, the size of the circe is also increasing.
Sierra Leone seems to have larger circles on the motion chart, showcasing that the impact of Ebola
was bigger there. The cursor is pointed to a particular date to visualize there is a decreasing rate
observed in all three countries around May 2015.

Figure 6 shows the increasing trend of the 2014 Ebola outbreak on male and female population between
December 2014 and December 2015.

Figure 7: Our initial intuition from the motion chart on Figure 6 was that Ebola was more occurrent
to female population. In order to test this hypothesis, we plotted the number of cases of affected
people by Ebola varied by gender. To avoid drawing wrong conclusions influenced by the potentially
unequal gender distribution over a country's population - for example, a country may seem to have
more female affected people of Ebola, however the female population may be larger in that
country -, we normalized the data over the population of each country.

Figure 7 shows the visualization of the affected people by Ebola per gender normalized in respect to the total
population of a country. The bar charts regards to year 2015.
Figure 8: For the analysis per age group, our initial hypothesis was that the cases of affected people
must be more in the category of people that are more active socially (15-44), namely the workers
and the people who are taking care of the more vulnerable age groups e.g. children (0-14) and
elderly people (45+). We plotted the Ebola affected population classified in 3 different age groups in
a motion chart. The chart showcases that the people who belong to the middle-age category are
mostly affected.

Figure 8 visualizes the pattern of the Ebola impact over the age groups of 0-14, 15-44 ,45+ for the epicenter
countries.

Conclusion: Interesting Observations


Out of the analysis of our data visualization project, we validated the concept that the 2014
outbreak was indeed one of the most fatal and widespread ones in history (as shown in Figures 2-
8). While, the percentage of the affected population seemed to be relatively slow compared to the
general population statistics of the West African countries, according to the global metric of an
epidemic definition, Ebola has been extensively fatal and dangerous for people (as shown in Figure
5). We also validated our initial assumption that Ebola occurred a bit higher on women more than
on men; however there was not a significant divergence on that occurrence. The normalized bar
charts (Figure 7) showed similarity in their distribution. That potential dependency on the
occurrence of Ebola in the 2 genders could be due the sexual transition of the disease or maybe due
to the fact that female people are the ones who mostly take care of others while at home.
At the same time, Ebola did not diverge significantly per age group. However, the percentage of
affected people was largest in the age group between 15-44 which we perceived that was due to
the fact that this is the age group that corresponds to the labor force and thus, this age group of
people was mostly interacting with each other while also taking care of the affected loved ones, so
they were more exposed to the danger of getting infected by Ebola.
In other words, the affected age groups of 0-14 and 45+ would be the ones to be taken care of by
the female population that was most likely coming from the middle-age group of 15-44.

After May 2015, we also observed a sudden decrease in number of Ebola affected male and female
(as shown in Figure 6). Through our analysis and findings, we came to know that there was a series
of vaccinations carried out in three phases by the World Health Organization (WHO) in the affected
African countries that resulted to a decrease on the affected population.

References
[1]https://en.wikipedia.org/wiki/West_African_Ebola_virus_epidemic. Accessed: 16.12.2016.
[2] https://en.wikipedia.org/wiki/Epidemic. Accessed: 16.12.2016.
[3] A. Vilanova. Information Visualization.
https://blackboard.tudelft.nl/webapps/blackboard/execute/content/file?cmd=view&content_id=_2
901901_1&course_id=_56543_1. Accessed: 10-12-2016.
https://blackboard.tudelft.nl/webapps/blackboard/execute/content/file?cmd=view&content_id=_2
904070_1&course_id=_56543_1. Accessed: 12-12-2016.

Dataset Sources:
[1] https://en.wikipedia.org/wiki/Epidemic
[2]http://apps.who.int/ebola/en/status-outbreak/situation-reports/ebola-situation-report-31-
december-2014
[3]https://data.humdata.org/dataset
[4]http://www.worldpop.org.uk/data/data_sources/
[5] http://datatopics.worldbank.org/gender/indicators\
[6] http://databank.worldbank.org/data/reports.aspx?source=global-bilateral-migration

Vous aimerez peut-être aussi