Vous êtes sur la page 1sur 41

Data-Driven Culture

DATA-DRIVEN and DATA-SCIENCE


Johan Himberg / Reaktor 29.2.2016

WHY

Why data-driven
Brynjolfson et al (2011) on Data-Driven
survey data on the business practices and IT investments of 179 large,
publicly traded companies
Firms that emphasise data driven decision making
have output and productivity that is 5-6% higher than what would be
expected given other investments and IT usage.
relationship also appears in asset utilisation, return on equity and market
value

WHY

Data Science in business


Business acumen
what for
Operations Research
optimal decisions and actions
Probability theory
how to handle uncertainties
Analytics
insights and machine learning from data
Computer Science
how to implement all that

Data Science & analytics


BASICS

BASICS

Some dimensions
1. Business case
2. Analytical task
1. Active - Passive system
2. Informative - Operative aim
3. Modelling (model selection and fitting)
4. Data: structure, amount, velocity, and source

REAKTOR / JOHAN HIMBERG


FEBRUARY 2016

Data Science & analytics


BUSINESS CASES

SECTION TITLE

Beware of empty data-speak

A quote from my colleague Janne Sinkkonen from a presentation at Helsinki University Machine
learning course:
Data-speak hides the processes behind data.
What creates the data? What is done with the results?
The goal is not data analysis
Define your goal and setup without using the word data.

REAKTOR
2016

BUSINESS CASE

Information business
Sell audiences
Google, Facebook, media,
Sell information
credit rating, car register,

BUSINESS CASE

Operations
Create beneficial events
marketing: targeting, cross-sell, up-sell, conversion
find right product/service to sell or buy, find a good doctor, expert etc.
Avoid non-beneficial events
churn, people leaving, waste,
credit loss, fraud,
system failures,
Optimize
customer value,
work force, schedules,
prices, discounts, stocks,
relevancy for customer,
production quality, speed
Rationalise
process efficiency, lead times, handle complexity, search time
Understand: customer & product base, transactions, or processes
internally: ERP, CRM, HR, sales systems, production,
externally: location, routes, weather, demographics, estates,

BUSINESS CASE

Strategic
Efficiency and competition
React faster, streamlined decision making, risk awareness
Financial efficiency
Innovations
Well-informed strategic decisions
Understanding customer groups needs for product and service
development
Understanding and predicting world events, economics, demographics, .
React to market fluctuation or changes in financial environment
Internal and external image and culture
Transparency, learning as a part of company culture
Customer satisfaction, personalisation, brand
10

VIRTUES

Example
Netflix
"The goal of our ranking system is to find the best possible ordering of a set of items for a member, within a specific
context, in real-time.... Our business objective is to maximize member satisfaction and month-to-month
subscription retention, which correlates well with maximizing consumption of video content.
- 2012 Xavier Amatriainand Justin Basilico, Personalization Science and Engineering

11

Data Science & analytics


TASKS & RISKS

BASICS

Some dimensions
1. Business case
2. Analytical task
1. Active - Passive system
2. Informative - Operative aim
3. Modelling (model selection and fitting)
4. Data: structure, amount, velocity, and source

REAKTOR / JOHAN HIMBERG


FEBRUARY 2016

13

BASICS

Informative - Operative
Informative (for understanding)
Analysis results for understanding things, results for management for making decisions:
reports, predictions, what-if analyses, simulations, visualisations,
Operative
Automated system that makes decisions based on some rules or models, or
results that are directly operative, if not automated.

REAKTOR / JOHAN HIMBERG


FEBRUARY 2016

14

BASICS

Active - Passive
Active
You make an intervention and gather evidence in tests designed to reveal an effect.
Example: A/B testing.
Passive
Data is just collected, captured as it happens: customer transactions, sales, web-browsing,
tweets

REAKTOR / JOHAN HIMBERG


FEBRUARY 2016

15

BASICS

Use cases
Passive
Informative

Operative

Descriptive
What has happened?

Active
Diagnostic
Why did it happen?

Customer profiles

Marketing impact analysis

Customer segmentation

Price elasticity analysis

Shopping cart analysis

Web design testing

Predictive
What will happen?

Prescriptive
What should I do?

Up-sell/cross-sell

REAKTOR
2016

New customer acquisition

Marketing impact optimisation

Churn prediction

Recommendation system

Life-time value prediction

in a dynamic environment

Demography prediction

16

Data Science & analytics


RISKS & PROBLEMS

RISKS / PROBLEMS

Issues by analytics use case


Passive
Informative

Descriptive
isolated / ad hoc reports
isolated ad hoc decisions
feedback loop (report - decision
- effect)
ignoring statistics
analysts as sql-monkeys
UI / visualization

Predictive

Operative

what to predict: how to


quantify the target
access to historical data
quantifying and understanding
the risk(s)
prediction accuracy validation
for future

Active
Diagnostic
statistical skills
testing and organisation
correlation vs. causality
requires lots of
communication

Prescriptive
what to optimize?
complex software system
technical feedback loop
co-op between human and
artificial intelligence
monitoring

REAKTOR
2016

18

RISKS / PROBLEMS

Examples
Focusing on wrong things
not recognising the analytics use cases
data first: long time from investment to benefits
not starting from the beef: actions and decisions
thinking only IT solutions and products
careful examination and validation of the algorithms, but not setting targets
and risks according to the business target
Organisation
silos: communication through hierarchy
no access to data, internal politics
technical details decided by business people
business criteria set by technical people
19

RISKS / PROBLEMS

more examples
Underestimating complexity (time & scope)
both software and analytics to be build simultaneously
the time and effort needed with data wrangling
the time used for UIs and visualisations
the feedback loop
Unrealistic expectations (quality)
on analytical systems in general (they are not that intelligent); rules needed
a product, a model, an algorithm, a data scientist solves all the problems
risks and targets cannot always be defined properly right away
there is no guarantee on accuracy on a particular case before trying

20

Culture that helps to handle risk


WISE - DETERMINED - CURIOUS

VIRTUES

Culture that helps to handle risk


Wise: Solve the right problems with analytics!
Determined: aim at specific, concrete things
Curious: be ready to divert, seek for evidence
Bayesian: understand uncertainties and risks
Truthful: dont bend results upon wishes, its data science
Courageous: act on evidence
Active and Agile: test, dont just observe; inspect - adapt - learn
Transparent and Helpful: co-operate from end-to-end, dont silo

22

Culture that helps to handle risk


WISE - DETERMINED - CURIOUS

VIRTUES

Aim at the right things


Netflix prize competition (2006-2008)
Who gets the best RMSE (root mean squared error) on true user likings?
BUT

"The goal of our ranking system is to find the best possible ordering of a set of items for a member, within a specific
context, in real-time.... Our business objective is to maximize member satisfaction and month-to-month
subscription retention, which correlates well with maximizing consumption of video content. We therefore optimize
our algorithms to give the highest scores to titles that a member is most likely to play and enjoy.---Netflix Prize
objective... is just one of the many components of an effective recommendation system... We also need to take
into account factors such as context, title popularity Supporting all the different contexts in which we want to make
recommendations requires a range of algorithms that are tuned to the needs of those contexts.
- 2012 Xavier Amatriainand Justin Basilico, Personalization Science and Engineering

24

VIRTUES

Curiosity
Always aim at something specific but be open-minded and curious
Example: Rntgen and Fleming (Nobel laureates)
their most famous findings were accidental, but
they were skilled scientists doing disciplined research for some other aim
Explore occasionally from data to insights. But not aimlessly.
If you find something interesting, make a disciplined analysis, preferably a test.

25

Culture that helps to handle risk


BAYESIAN - TRUTHFUL

VIRTUES

Understanding probabilities
The main ingredients of data science!
Making decisions based on data analysis requires the concepts of risk and
probability.

27

Culture that helps to handle risk


COURAGE

Courage
Data drivenmeans that progress in an activity is
compelled bydata rather than byintuition
experience. It is often labeled as the business jargon for
what scientists call
making
I take risks, sometimes patients die. But
causes more patients to die
problem is
math.
series House]

Culture that helps to handle risk


HELPFUL - TRANSPARENT - AGILE

Agile - Transparent
Doing data-driven work and data science in any organisation model boils
down to

Involve everyone along the information path

Agile development - Team decides details


Start from
concrete actions that can be optimized
decisions they require, and
how to measure the effects properly
Remember the feedback loop!
Develop constantly
Lecture @AaltoBIZ, Johan Himberg, 2015

Pick an aim!

Business drivers

testing
what is the impact

aim 1
aim 2
aim 3
aim 4

modelling
what are the actions
what are the insights

Action

Information

optimize
decide
deploy

report
visualize
model

wrangling
what data means

Data
big, small, open
local, web, meta,

aim 5
For example

For example

For example

Automatised decisions;
recommendation, targeting

documentation on meaning
of the data

source integrations

Simulation

KPIs, profiles, segments,


factors, DW dashboards

prescriptive, predictive
modelling

Extract - Load - Transform


Metadata
modelling for cleansing &
consistency

descriptive, diagnostic,
predictive modelling

Think & plan from deployment to data


Lecture @AaltoBIZ, Johan Himberg, 2015

Data-Driven is inherently iterative and benefits from agility.


Data and processes are often not like assumed.
Be curious, keep backlog, inspect, adapt.

Business drivers

testing
what is the impact

modelling
what are the actions
what are the insights

aim 1
start from here!

Action

wrangling
what data means

Information

Data

aim 3
aim 4
aim 5
For example

For example

For example

Business: need optimising


for customer retention

Solution expert: Field ZPOR


means revenue per unit and
it is calculated based on

Now we have transactions


for 1M users for 1 yr fields
a,b,c,d,e

Customer transactions are


not in Data Warehouse,
theyre aggregated on
monthly level - Lets get daily
data from system Z

Marketing: we could start


with special offer by SMS
Data Scientist: well set up
test & control groups!

Lecture @AaltoBIZ, Johan Himberg, 2015

THE LOOP: results

Business drivers

testing
what is the impact

modelling
what are the actions
what are the insights

wrangling
what data means

aim 1
aim 2

Action

Information

Data

aim 3
aim 4
aim 5
For example

For example

For example

deploy campaign, collect


responses

calibrate & apply model

get data for modeling


store results

Execute based on model, collect data

Information path focused backlog

Business drivers

testing
what is the impact

modelling
what are the actions
what are the insights

wrangling
what data means

aim 1
aim 2

Action

Information

Data

aim 3
aim 4
aim 5
Backlog example

Backlog example

Backlog example

test & control group


handling in marketing
automation

define new information


source

better system configuration


& architecture

Look for a new data source


for determining income on
zip code areas

automatization for the


campaign process

Involve N.N. to the process

correct documentation

new data: record information


on all campaigns

automatization for the


campaign modelling

Lecture @AaltoBIZ, Johan Himberg, 2015

Dont silo

A change of culture; information (not data) is everybodys business as well as


money

One data scientist cant excel all of this:

PO / Technical Account Manager

Business specialist

Solution owner / process owner

Data Steward

Developer

Visualization / UX expert

Data Scientists special role


Data scientists main tasks are in methods, but also in
processes and machinery of
making evidence based decisions (automated if possible)
finding out confidence on the outcome (by active tests if
possible)
getting insights based on models and data
Data scientist often act as a glue.

Lecture @AaltoBIZ, Johan Himberg, 2015

Culture that helps to handle risk


TECHNOLOGY

Technology

Different analytical tasks need different tools. One has to integrate


different systems. Remember that you need a feedback loop!

Prefer systems

that give mass-access to historical, transactional data on


individual level instead of just aggregates (avoid being blinded by
averages)

from which youll get the data, transformations, and results out to
another system (avoid being data hostage)

where you see what the analytics actually does at least on modular
level (avoid being method hostage) Prefer being able to see the
actual implementation (open source)
Pick a product when you know the task, your needs, the product
quality.
Lecture @AaltoBIZ, Johan Himberg, 2015

References

Brynjolfsson, Erik and Hitt, Lorin M. and Kim, Heekyung Hellen, Strength in Numbers: How Does DataDriven Decisionmaking Affect Firm Performance? (April 22, 2011). Available at SSRN:http://ssrn.com/
abstract=1819486orhttp://dx.doi.org/10.2139/ssrn.1819486

Netflix case: http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html

Big Data landscape: http://mattturck.com/2016/02/01/big-data-landscape/#more-917

Data science skills

http://www.oralytics.com/2012/06/data-science-is-multidisciplinary.html

http://www.oralytics.com/2013/03/type-i-and-type-ii-data-scientists.html

www.reaktor.com

Vous aimerez peut-être aussi