Vous êtes sur la page 1sur 24

Agile Data Warehousing

with Production Sandboxing


Stephen Brobst
Chief Technology

Oliver Ratzesberger
Officer

Teradata Corporation

Senior Director
eBay

TERADATA
Raising Intelligence
Adastra Information

Management

Conference

2009

The Carlu. Toronto. Canada


April 23. 2009

Agile Data Warehousing


with Production Sandboxing
Stephen Brobst
Chief Technology Officer
Teradata Corporation
stephen. brobst@teradata.com

Oliver.Ratzesberger
Senior Director
eBay
oratzesberger@ebay

.com

TERADATA
Raising Intelligence

Agenda

The Business Need


Implementation
Governance

Options

eBay Case Study


Discussion

fF..RADATA
.....

The Business Need

_ eb_-'_

The
Busi ness

Need

-.

TfAADATA
.....

Sustaining Success in Data Warehousing

The biggest enemy to sustained success in data


warehousing is stagnation.
Continued delivery of value from a data warehouse
demands that organizations aggressively encourage
new and creative methods in their use of
information and analytics.
If an organization does not evolve and improve its
data warehouse, value will diminish over time.

"'-

TERADATA
...

Value of a New Analytic Capability

Initial
Deployment

Repeated
Use

Value

Time

--.

TERADATA

Value of a New Analytic Capability


When a report or analytic capability is first introduced, it
offers the potential for new insight and ways of exploiting
information.
Value initially increases as adoption of the new capability takes place .
As time goes on those once groundbreaking

insights become old news.

The value of the report or analytic capability begins to decrease over


time after the point where the organization has incorporated the insights
into its standard operating procedures.

Over time, the data warehouse will be burdened with


generating lots and lots of reports that eventually deliver
value that is less than the cost of maintaining them.

--

TFRADATA

The Requirement for Innovation

Organizations must engage in two actions to


ensure sustained success of a data warehouse:
1. Constantly innovate and develop new capabilities.
2. Prune older capabilities where the value no longer
justifies the cost.

It is not enough to encourage and sponsor


innovation ...organizations must provide an
appropriate platform and environment to
facilitate innovation in the area of information
exploitation.

--

TE RADATA

Use Case

eb._

Marketing has an outside source of data


that it wants to bring into the data
warehouse, but they are not yet sure if it
has high value or not ... so we need to
experiment with the data before signing
the purchase contract.

---

TERADATA

Rigorous Solution Methodology

C(

Strategy

Research

Opportunity

Assessment

Enterprise

Assessment

==

--

~rojecf
Manc(gemerfi
Design
Equip
Build

Ana yze

Application
Requirement
Logical
Model

--.a... ~,
-~--

( 0

Intearate

Manage

System
Architecture

II

Package
Adaptation

Data
Mapping
nfrastructur,
Education

&

User
Curriculum

--

TERADATA
10

Rigorous Solutions Methodology

eb'

Takes too long!

--

TERADATA
11

Agile Data Warehousing

_____ eb~

The Manifesto for Agile Software Development puts forth


the following principles:

Value
Value
Value
Value

individuals and interactions over processes and tools,


working software over comprehensive documentation,
customer collaboration over contract negotiation, and
response to change over following a plan.

While it is recognized that there is value in the items on


the right, the items on the left are valued even more in
the agile development methodology.

--

TEAADATA
12

Agile Data Warehousing

The underlying philosophy that drives these values


in the context of data warehousing is to put the
highest priority on satisfying end user
(knowledge worker) requirements through early
and continuous delivery of analytic capability.
This does not mean that requirements documents,
design documents, entity-relationship diagrams,
data dictionaries, etc. are not important - but it
does mean that the emphasis is much more on
delivery than process.

----

TERADATA
13

Agile Data Warehousing

Goal: Same day


availability of data into
the analytic environment.

--

TfRADATA

Agile Data Warehousing

Need a way to get data into the data warehouse


without the overhead of a full blown development
methodology:
Allow for "load and go" analytics.
Non-certified content to be used in cooperation with
content in the enterprise data warehouse (EDW).
Limited users and limited use.

---

TERADATA
15

Implementation

Options

Implementation
Options

---

TEiYtDATA
16

Option 1:
Separate Developmel!.t S_ystem

_ eb_

Deploy the "experimental" data in the


development/test environment.
Configure the development/test environment
to the full size of the production environment.
Demonstrate the value of the data and then
(assuming positive ROI) use best practices to
bring data into production OW.

--

TERADATA
17

Option 1:
Separate Development

System

Objection: Costs too much to


have development/test
environment at full size and kept
fully up-to-date with production
environment.

--

TfRADATA
11

Option 2:

Downsized Development
------------System

Deploy the "experimental" data in the


development/test environment.
Configure the development/test environment
with "sampling" from the production
environment.
Demonstrate the value of the data and then
(assuming positive ROI) use best practices to
bring data into production OW.

-"-

TERADATA

"

Option 2:
Downsized Development System

Objection: More difficult to


implement prototype with
sampled data sets and more
difficult to make ROI case.

--

TERADATA
2.

10

Option 3:

------

---

--

---

- Federated Development System


-

Deploy the "experimental" data in the


development/test environment.
Join across the development and production
environments to perform the analysis.
Demonstrate the value of the data and then
(assuming positive ROI) use best practices to
bring data into production OW.

--

TERADATA
21

Option 3:
Federated Development System

Objection: Performance sucks


when joining large data sets
across a network (even when both
systems are Teradata!).

--

TERADATA
22

11

Option 4:
Production Sandbox

Deploy the "experimental" data directly into the


production environment.
Separate Teradata "databases" for sandbox data
with joins allowed to the production data all on a
single system.
Demonstrate the value of the data and then
(assuming positive ROI) use best practices to
bring data into production DW.

----

TERAOAIA

"

Option 4:
Production Sandbox

Objection: End users manage


their own space and have create
table privileges on the production
system.

"

--

TERAOA1A

12

Option 4:
Production Sandbox

Objection: End users manage


their own space and have create
table privileges on the production
system.

TERADATA

25

1It1

Governance

Governance
-"-

TERADATA
2.

13

Controls
No "production" reporting
from the sandbox
environment.
Automated resource governors
to prevent "runaway" queries
in the sandbox area.
Data residency no more tha n
XX days.

--

TERADATA
27

eb"' r

Promoting Content into the EDW


Monitoring and "promoting" data content in the
sandbox.
Once the value is proven, use "proper"
methodologies to integrate content into the
enterprise data warehouse.

Encourage refinement of requirements as part of


the sandbox experience.

"

--

TfRADATA

14

Case Study

The eBay
Experience

--

TERADATA
2.

eBay Analytics Technology Highlights

>50 TB/day

of new, incremental data

>1 OOk

>50A10

new records/day

>50k

Active/Active

>5000

data elements

chains of logic

business users & analysts

turning over a

TB

24X7X365

every

5 seconds

Millions

Always online

99.9+%

of queries/day

Availability
Near-Real-time

30

--

TERADATA

15

eBay Analytics Core

Unica

MicroStrategy

2.SPB

L.o~llnt.rconn.ct

T.d.ta

ationalDlta

SAS

Crystal

SOL

r:

2.2PB

LinuxMPP
Secondary
Primary
Relatlona' Data

LoQIIl'lterconllt

Wid.""".
!nterconne<:r:

1000 mllu
Sun Fir. <txxx

Solarl5

Sot.ns

2.2PB
XML, nam_tvalue,

6.6PB
MPP/HPC/Grid

raw

Phoenix, A2

Dlta JnteOr8tion

Informatica

--

TfRAOATA

"

---

Design for the Unknown


>85%

of ebay analytical workload is

Unknown

Exploration

II

~; The metrics

I
j

&

is the core of an analytical company

ou know are 'chea

The metrics you

NEW

'

don't know are expensive but also high

in potential ROI
or dimensions

Desig~

can.'t be static

L.r~

or dependent on specific questions

------

II
~~1
~

--

l'

16

"We Need Data Marts!"

..,

WlltKh'Ui

Respthleols

0.1.11 iIIII!'Mu$e

~M-nC

Mfh

tiN mh

(i,t

"~ub

dat. miffs ~nt c.ttI\ls.ltnt de\l&tt1

~f'1

90
10

JI

CtRtrll~a'll"""-'JSttnJy

Cordonaedd~l. INrts (tonr.lUeIfIl

and

~ftl

y.tu(tIr1aI warttlouw(l.t
IMIfl,cL1ladJINmallylrom
SQIfLt ~~-\It".I1\whffi 1W'f'dM)

!J

(1'*It.M\y,f'ftd1

17

Data Mart Dilemma

..

_"'*~

.' ,,~

'.,

"

;~ ""':<i~~"~.:_'
~.

. ',,(,.;..:::
,

,;;,:;iii?;;

:"

'

:, -

;.-",.t-~~~

"'J!

;;'~"

'
~.~

.'-

,Iio-.. ,

i~"

~-?~

Total Cost of Ownership (TCO)


Fully loaded cost staggering $500k++
Biggest drivers are
Maintaining separate databases
weekly/daily/hourly
data transfers
Data inconsistencies
Data

redundancy

Increased complexity
Loss of lineage over time

Analytics as a Service

Massive scale

Analytical ,=,tili~yComputing

Bring your data - Perform your Analytics

From Simple Web~~~sgd ~afa uploj3d

jJ,

Combine ~~~a:dd-

dlCI

~v~36

,..

"

to fully ALL
p'rivate
Utilitydata
access
c:ode~Wltti
existing
~
;

IC:I.
I

ce.

--

TERADATA

18

dJ_

Analytics as a Service
o

-I

ma'-!

~.

........ ,-..

---

s...,.t

c-... ..

cc:::x
Dt __

-..-...,-

..
.-11II.-'" ,

III::!C:II!

::~::
.-....

CCII!
w~ __

'."':.I&.:.A.-

"'"

---s..-._.,_"

-'1. ,_

"",c::::r:
.' ---..

-~ .1
" __

...

---

-.

,. __

--

.
a._ ..-..,

L_"

,_ ..

'00-

,-,."....

----- ..--"
_n_

--

--'--

a:::I
... ....
---.....",

.. ,

t-.

....

...

37

Analytics as a Service
From a simple web based table upload:

.......
-----

'''''-''',r

-n._
-- .........
"',.........."'f~ ~

ctMfe~LW_~top~_\JI
...,:Cll'lUJ'OU
..MId.~

rtom you< C"iT!

38

.,....-"'dD.'..
h ....._
you"'to.
elf''''''
"-*"-.P,.,.,
. n..--. SCl..hf"'~"'"
n.slllhellec:or'teMg
. lobt~
ITtIot""'o::MAT[TAfU
CM:An:
1MU:

I o.a.Tt
-I & I
t"'1I'-v~

__
~r"""tdIon
o:;oMrM(J.-.d_~_-..db'f'''\1".N.U
b::IIIOIfIOI'''tonn.oru'\''DUIfI!II~
OIeOe,fotoIIIfetI1ItIrlWy
12~)OO7-Oi
~;hJNIbc*~e
~$,20Q8.t2.ot
.987$112.'
:WS81WJ,2IXJB.t2-01
~"
.,.
~W\twllU)"",,,_,
toAIon
~..,.._t-~
Bv*
~ ~__

p.on:;"~,,prv0U;40-"''*''''

twll4Jk:>8dp'w
I~")
~___
OII:h
8kId
'kI~.,..,........
~_
I..MU:PfIIM.t.IItYt()f)I(
~2:ltwtoloowlJ_noI~~.
)w~J_)f~_'_JU'QIt<I
1o
CSV~(~
__
...DI_by
nm

-,

SOL

--

TfRADATA

19

...to fully private utility access


We call them PET

(Prototyping Environment in Teradata)

More than

In most cases they are

small

75 active

right now

(100GB-5TB)

since all the main data is already in the


They are

free

EDW

to the business units

3.

Analytics as a Service: Benefits


Improved Time To Market Enable the business to do

Enable the users to

DaYS/Weeks versus Months.

agile prototyping.

"F al-I Fast" - Make It. easy

to try out new ideas.


Eliminate stray Data Marts.

II

20

Governance Rule 1

----~-

Keep the production data clean.


The data life cycle methodology is there for a reason.
e Do not "pollute" production data with data of unknown
source and validation.
Equivalent to a viral injection ...and you may not recover.

Do not inject prototype data into "core"

DW

data:

Data ingest (ETLjELT) does NOT have access to sandbox .


Not even to populate the sandbox.

Strictly and conceptually enforced on both Batch and User


accounts.

-"-

TERADATA
"

Governance Rule 2

g-

Prototypes written by experienced personnel:


PETs assigned to NAMED personnel.
Previous Experience and Training Required.

Prototype personnel are typically former DW


developers who transitioned into a business unit.
Speed of implementation.
Knowledge of DW processes and methodologies.
Knowledge of data.

.,

TERADATA

21

Governance Rule 3

Sunset dates must be applied:


Hold a post mortem.
Retire it or promote it.

The prototype must not become a "black market"


production application.
Business cannot depend on them.
DW cannot give them appropriate support.

--

TERADATA

Key Process 1

Pre-defined methods, templates, and rules for setup


and tea rdown :

..

Well-defined rules for usage.


Defined, named owners.
Pre-defined security templates.
Pre-defined Help Desk responses to add/drop

users.

--

TERAOATA

22

Key Process 2
Help Desk support is critical.
Direct access for PETpersonnel to the most senior
architectural and technical personnel.
"Bidirectional" mentoring:
Best and brightest technical resources get closer to the business ...
Business gets closer to fast and effective implementations.
It does not take long for PET personnel to become self-sufficient.

----

TERADAfA
45

Key Learning

--~~~

In-place processes enable "time-to-market"

benefits.

Put the processes and security in place first.

Failure = Learning
Do so with great effectiveness ...
Fail fast, fail early.

Most business units now maintain a permanent


sandbox.
Complex analysis and decision making within a business day!

--

Tf RAD,.\TA
4.

23

Questions?

--

TERADA1A
'7

References

[ 1] Higgins, D. Don't Just Tread Water. Teradata


Magazine. Volume 8, Number 1. 2008. pp. 23-25.
[ 2] Brobst,S., M. McIntire, and E. Rado. Agile Data
Warehousing with Integrated Sandboxing. The BI
Journal. First Quarter, 2008.
[ 3] Beck, K., M. Beedle, A. van Bennekum, A. Cockburn,
W. Cunningham, M. Fowler, J. Grenning, J. Highsmith, A.
Hunt, R. Jeffries, J. Kern, B. Marick, R. Martin, S.
Mellor, K. Schwaber, J. Sutherland, and D. Thomas. The
Manifesto for Agile Software Development. February,
2001.

--

rFR,\Df\Tt\

24

Vous aimerez peut-être aussi