Vous êtes sur la page 1sur 48

Data Science: What, why and

how to use it!

Manoj Chiba
March 2017: Uganda
Who is Insight2Impact (i2i):
Resource centre that aims to catalyse the provision and use of data by
private and public sector actors to improve financial inclusion for:
Evidence based decisions;
Data-driven policies; &
Client-centric product design
Achieved through:
o Innovative use of data
o Promotion of open data:
Platform for datasets;
Capacity building; and
Knowledge sharing

What is data?
Forms & access of and to data

What is data science

Where are we?
Where are we going?

How do you use data?

o Why use data?

Data is becoming the new

raw material of business
Craig Mundie

Data is [becoming] the

[new] raw material of
Craig Mundie modified
Context: If harnessed.
Better decisions EBDM
Better performance through understanding
the levers: product, pricing, sales and
New clients
New services & better customer
INCLUSIONnot exclusion
Context: Globalization
The breaking-down of country barriers;
Multichoice (DSTV)
Increased competition
Increased disruption
Opportunity: Products & services to ensure financial inclusion

What is data?
Forms & access of and to data

What is data science

Where are we?
Where are we going?

How do you use data?

What is data?

Data is [becoming] the [new] raw material of business ideas AND

financial inclusion
What is data?

What many think data is

noun informal

Language that is meaningless or is made unintelligible by excessive use of

abtruse technical terms, nonsense
Synonyms: gibberish, claptrap, nonsense, balderdash, blather, garbage
The often forgotten data.
The MOST often forgotten data.
Where is this data generated
Where is this data generated predominantly
Summary what is data?

Almost anything [data] in raw or

unorganized form, broadly categorized in
the following forms:
Numbers; and/or
Text (free & hypertext); and
Images (multimedia)

Which allows for Insight and problem

discovery, and ultimately Innovation for
Financial Inclusion
Essentially Scraping

Web scraping:
Technique to extract information from websites. Copying & pasting is tedious therefore the use
of a tool assists in ensuring we get the information (https://import.io)

What cannot be scraped- specifically web-sources of data:

1. Badly formatted HTML code: older websites- question remains why has it not been
2. Authentication systems (Captcha codes)
3. Session-based systems (Cookies that keep track of what the user has been doing)
4. Access control
5. Legal barriers

What is data?
Forms & access of and to data

What is data science

Where are we?
Where are we going?

How do you use data?

What is data science???
What is data science and who is a data scientist?
Who is a data scientist?

Basis (Technical)
Coding Skills
Core (Analytical)
Computational tools (software)
Basic software development
Statistics & Math (Business)
Modeling & Simulation
Project management
Computer Science
Evaluation & Development
Data Visualization
TECH-SAVVY Business processes
Experiment design
Change Management
Research expertise
Communication Skills
Leadership Skills


Where are we? What is the current state of Data Science
Where are we going with data science?

History of data science:

o 10-15 years ago- thoughts and ideas only
2015 onwards: Reality, like drinking water (commodotised)
Ideas are possible We can analyse, link, and cross-reference
patterns this is old now!
o Current state of automation! Is automation the working ?????
Not really- even though claims are made- these fail to understand
that YOU need to make sense of what is an output
Data science is NOT about data analysis, rather an
understanding of what the data is telling YOU!!! Thinking is a
human capability
Explosion of technologies and services
Explosion of technologies and services
Where are we going with data science?

The future is in:

o Deep learning (artificial intelligence):
Voice recognition, computer vision, multilingual text analysis
Google: AI alogorithm analyses videos for patterns
Computers with actual knowedge of subjects
Moving beyond automatically tagging images, analysing videos and
text, and analysing numbers- the next wave is learning more about
their subjects (individuals) and their behaviours!
Know more about individuals than individuals know about
Future is no longer what product/service we need to get
into the market, rather solving a problem the individual
never knew they had!
Where are we going?

What is data?
Forms & access of and to data

What is data science

Where are we?
Where are we going?

How do you use data?

IDENTIFY what the problem is

Without defining the problem, you will

find yourself on a wild goose chase, to
tackle a vague phenomenon (does it
really exist)
Identify what the problem is

o Personal experience?
Over-heard someone saying?
o Complaints (think of Twitter and Facebook)
o Survey
o Latest developments (regulation change, political change, new
o Given a problem statement?
Define what the problem is

Defining the problem:

o Through brain-storming
o Consensus building
o ..
o The 5 (sometimes 6 or 7) WHY technique
Illustration of the 5 Why Technique

Cannot get a
new clients WHY WHY WHY WHY WHY

We have done a No materials Marketing has Marketing does has been
poor job at client (resources) to not updated not have the sitting the IT
presentations do a good job the material information Directors
Key questions in defining the problem

Who What Where

Who is causing the problem?- What will happen if this problem is Where does this problem occur?
process/individual/government/dis not solved? Individuals remain Specific region, sector (e.g.
ruptors financially excluded? agriculture).
Who says this is a problem? What are the symptoms? The Where does this problem have an
Who is impacted by this? indicators impact?
Company/individual/process- does What are the impact?
it make financial exclusion

When Why How

When does this problem occur? Why is this problem occurring? How should the process or system
During the rainy season or Why? work?
drought for example? How should individuals be
When did this problem first start financially included? Practically
occurring? and theoretically
How are individuals currently
Why are individuals financially
handling the problem?
Drilling deeper Multiple issues
Key topics we should
Bank of Uganda explore to help solve this
wants to diversify problem?
its business. Bank
of Uganda wants to
extend credit to the
low income market Is there a market for
Issue #1 divesting this business? i.e.
to ensure financial
market interest

Is this a cost effective

Problem Issue #2 solution for the target
market? i.e. cost

What impacts will there be

Issue #3 for the displaced
employees? i.e. impact on
Drilling deeper Multiple issues

Key to identification of issues:

o Develop a comprehensive list of all possible issues related to the

o Reduce the comprehensive list by eliminating duplicates and
combining overlapping issues
o Using consensus building, get down to a major issues list (3-4
issues that you will focus on)
SO How do YOU use data?

Use of data is specific to the problem to see what is out there

We all have GREAT ideas, BUT what does the data say? Is this really a

So how you use data is:

1. Help identify that the problem exists;
2. Prove that the problem exists
3. To provide evidence that the problem has NOT been addressed and can be
addressed by (App; new method of credit scoring; new service)
SO How do YOU use data?

This allows you to build:

1. Statistical models, e.g. regression model
2. Test your model with data
1. Your model may be a new credit scoring technique
1. How will you test it? How will you know that the product/service
you are proposing may actually work?
Data provides evidence that your solution is viable and works!

Very important for your current idea:

o MUST be based in data;
o MUST be addressing financial inclusion! Including the excluded!
SO How do YOU use data?- Example on the use of analytics to
solve a problem
Case Study: Sport and Emotional
Theres something special about sharing the heartbreak of a
loss or the elation over a win with a group of people
Data Generated
Leveraging the customer (fan) base
Lets understand

1. There is engagement with the page, but

this increases and decreases throughout
the season.
2. Many of the fans actually have forgotten
who the sponsor is (it falls into the
3. There is NO sponsor engagement or
Using Predictive Analytics (within the constraints)

1. Understand when conversations peak

2. What peaks conversations
3. During what period of the season Likes
and conversations attract greater
4. The mood of the conversations based
on results
Timing is everything
Right offering (A): Price points, their communication channels, and
Price discrimination improved our ability to drive additional benefits for
members, and a surge in new paid for members (including the excluded!)
Right Time (B): Understood their favourite topics that they would engage in.
+100 hours of pure analysis
We understood when they would engage and WHY
Right Channel (C): Understanding which channel generates
ENGAGEMENT, for the target market- Twitter is NOT followed

A + B + C = Growth in bottom-line while including the excluded

Some take this option:
o Look at a dataset- when provided
o Look at the variables:
What is being measured
How is it being measured
Why is it being measured
Transactional data:
What type of transactional data
o Run analysis on current data
Look for trends
Do the trends make sense?
Why are you observing what you are observing?
Define the problem from the point (follow steps 2-5)
What data will compliment the current data that you have?
Where will you get this data from?
How will you access this data?
SO the use of data

You have an idea developed, think about:

1. How will you test the idea?
2. What are the assumptions you made, when developing the idea?
3. What analysis will support the viability of the idea?
4. Is your idea practical?
1. Why do you think it is?
2. How will you communicate it?
5. Why should someone invest in your idea?
6. What data will you use?
In conclusion.

1. The aim is to ensure financial inclusion of the excluded.

2. This must be data-driven
3. You must demonstrate how your idea assists with financial inclusion
4. You must demonstrate the what, how, where of Data!
1. What: Type of data, type of analysis!
2. How: How did you test your idea?
3. Where: Where did you get the data from? Did you use other data? If not,
why not?
Nkosi Ncube Dumisani Dube Manoj Chiba

T: +27(0)11 315 9197 T: +27(0)11 315 9197 T: +27(0)11 315 9197

E: nkosi@i2ifacility.org E: dumi@i2ifacility.org E: manoj@i2ifacility.org