Académique Documents
Professionnel Documents
Culture Documents
WHY
Why data-driven
Brynjolfson et al (2011) on Data-Driven
survey data on the business practices and IT investments of 179 large,
publicly traded companies
Firms that emphasise data driven decision making
have output and productivity that is 5-6% higher than what would be
expected given other investments and IT usage.
relationship also appears in asset utilisation, return on equity and market
value
WHY
BASICS
Some dimensions
1. Business case
2. Analytical task
1. Active - Passive system
2. Informative - Operative aim
3. Modelling (model selection and fitting)
4. Data: structure, amount, velocity, and source
SECTION TITLE
A quote from my colleague Janne Sinkkonen from a presentation at Helsinki University Machine
learning course:
Data-speak hides the processes behind data.
What creates the data? What is done with the results?
The goal is not data analysis
Define your goal and setup without using the word data.
REAKTOR
2016
BUSINESS CASE
Information business
Sell audiences
Google, Facebook, media,
Sell information
credit rating, car register,
BUSINESS CASE
Operations
Create beneficial events
marketing: targeting, cross-sell, up-sell, conversion
find right product/service to sell or buy, find a good doctor, expert etc.
Avoid non-beneficial events
churn, people leaving, waste,
credit loss, fraud,
system failures,
Optimize
customer value,
work force, schedules,
prices, discounts, stocks,
relevancy for customer,
production quality, speed
Rationalise
process efficiency, lead times, handle complexity, search time
Understand: customer & product base, transactions, or processes
internally: ERP, CRM, HR, sales systems, production,
externally: location, routes, weather, demographics, estates,
BUSINESS CASE
Strategic
Efficiency and competition
React faster, streamlined decision making, risk awareness
Financial efficiency
Innovations
Well-informed strategic decisions
Understanding customer groups needs for product and service
development
Understanding and predicting world events, economics, demographics, .
React to market fluctuation or changes in financial environment
Internal and external image and culture
Transparency, learning as a part of company culture
Customer satisfaction, personalisation, brand
10
VIRTUES
Example
Netflix
"The goal of our ranking system is to find the best possible ordering of a set of items for a member, within a specific
context, in real-time.... Our business objective is to maximize member satisfaction and month-to-month
subscription retention, which correlates well with maximizing consumption of video content.
- 2012 Xavier Amatriainand Justin Basilico, Personalization Science and Engineering
11
BASICS
Some dimensions
1. Business case
2. Analytical task
1. Active - Passive system
2. Informative - Operative aim
3. Modelling (model selection and fitting)
4. Data: structure, amount, velocity, and source
13
BASICS
Informative - Operative
Informative (for understanding)
Analysis results for understanding things, results for management for making decisions:
reports, predictions, what-if analyses, simulations, visualisations,
Operative
Automated system that makes decisions based on some rules or models, or
results that are directly operative, if not automated.
14
BASICS
Active - Passive
Active
You make an intervention and gather evidence in tests designed to reveal an effect.
Example: A/B testing.
Passive
Data is just collected, captured as it happens: customer transactions, sales, web-browsing,
tweets
15
BASICS
Use cases
Passive
Informative
Operative
Descriptive
What has happened?
Active
Diagnostic
Why did it happen?
Customer profiles
Customer segmentation
Predictive
What will happen?
Prescriptive
What should I do?
Up-sell/cross-sell
REAKTOR
2016
Churn prediction
Recommendation system
in a dynamic environment
Demography prediction
16
RISKS / PROBLEMS
Descriptive
isolated / ad hoc reports
isolated ad hoc decisions
feedback loop (report - decision
- effect)
ignoring statistics
analysts as sql-monkeys
UI / visualization
Predictive
Operative
Active
Diagnostic
statistical skills
testing and organisation
correlation vs. causality
requires lots of
communication
Prescriptive
what to optimize?
complex software system
technical feedback loop
co-op between human and
artificial intelligence
monitoring
REAKTOR
2016
18
RISKS / PROBLEMS
Examples
Focusing on wrong things
not recognising the analytics use cases
data first: long time from investment to benefits
not starting from the beef: actions and decisions
thinking only IT solutions and products
careful examination and validation of the algorithms, but not setting targets
and risks according to the business target
Organisation
silos: communication through hierarchy
no access to data, internal politics
technical details decided by business people
business criteria set by technical people
19
RISKS / PROBLEMS
more examples
Underestimating complexity (time & scope)
both software and analytics to be build simultaneously
the time and effort needed with data wrangling
the time used for UIs and visualisations
the feedback loop
Unrealistic expectations (quality)
on analytical systems in general (they are not that intelligent); rules needed
a product, a model, an algorithm, a data scientist solves all the problems
risks and targets cannot always be defined properly right away
there is no guarantee on accuracy on a particular case before trying
20
VIRTUES
22
VIRTUES
"The goal of our ranking system is to find the best possible ordering of a set of items for a member, within a specific
context, in real-time.... Our business objective is to maximize member satisfaction and month-to-month
subscription retention, which correlates well with maximizing consumption of video content. We therefore optimize
our algorithms to give the highest scores to titles that a member is most likely to play and enjoy.---Netflix Prize
objective... is just one of the many components of an effective recommendation system... We also need to take
into account factors such as context, title popularity Supporting all the different contexts in which we want to make
recommendations requires a range of algorithms that are tuned to the needs of those contexts.
- 2012 Xavier Amatriainand Justin Basilico, Personalization Science and Engineering
24
VIRTUES
Curiosity
Always aim at something specific but be open-minded and curious
Example: Rntgen and Fleming (Nobel laureates)
their most famous findings were accidental, but
they were skilled scientists doing disciplined research for some other aim
Explore occasionally from data to insights. But not aimlessly.
If you find something interesting, make a disciplined analysis, preferably a test.
25
VIRTUES
Understanding probabilities
The main ingredients of data science!
Making decisions based on data analysis requires the concepts of risk and
probability.
27
Courage
Data drivenmeans that progress in an activity is
compelled bydata rather than byintuition
experience. It is often labeled as the business jargon for
what scientists call
making
I take risks, sometimes patients die. But
causes more patients to die
problem is
math.
series House]
Agile - Transparent
Doing data-driven work and data science in any organisation model boils
down to
Pick an aim!
Business drivers
testing
what is the impact
aim 1
aim 2
aim 3
aim 4
modelling
what are the actions
what are the insights
Action
Information
optimize
decide
deploy
report
visualize
model
wrangling
what data means
Data
big, small, open
local, web, meta,
aim 5
For example
For example
For example
Automatised decisions;
recommendation, targeting
documentation on meaning
of the data
source integrations
Simulation
prescriptive, predictive
modelling
descriptive, diagnostic,
predictive modelling
Business drivers
testing
what is the impact
modelling
what are the actions
what are the insights
aim 1
start from here!
Action
wrangling
what data means
Information
Data
aim 3
aim 4
aim 5
For example
For example
For example
Business drivers
testing
what is the impact
modelling
what are the actions
what are the insights
wrangling
what data means
aim 1
aim 2
Action
Information
Data
aim 3
aim 4
aim 5
For example
For example
For example
Business drivers
testing
what is the impact
modelling
what are the actions
what are the insights
wrangling
what data means
aim 1
aim 2
Action
Information
Data
aim 3
aim 4
aim 5
Backlog example
Backlog example
Backlog example
correct documentation
Dont silo
Business specialist
Data Steward
Developer
Visualization / UX expert
Technology
Prefer systems
from which youll get the data, transformations, and results out to
another system (avoid being data hostage)
where you see what the analytics actually does at least on modular
level (avoid being method hostage) Prefer being able to see the
actual implementation (open source)
Pick a product when you know the task, your needs, the product
quality.
Lecture @AaltoBIZ, Johan Himberg, 2015
References
Brynjolfsson, Erik and Hitt, Lorin M. and Kim, Heekyung Hellen, Strength in Numbers: How Does DataDriven Decisionmaking Affect Firm Performance? (April 22, 2011). Available at SSRN:http://ssrn.com/
abstract=1819486orhttp://dx.doi.org/10.2139/ssrn.1819486
http://www.oralytics.com/2012/06/data-science-is-multidisciplinary.html
http://www.oralytics.com/2013/03/type-i-and-type-ii-data-scientists.html
www.reaktor.com