Académique Documents
Professionnel Documents
Culture Documents
Agenda
Blue Chips Overview Evolution of etL The new hybrid My $1,000,000 mistake Background on our data warehouse SMP -vs- MPP Embracing eLt Gulp New Hybrid implemented Gulp Again SPIL methodology invented Lessons Learned
Hand Code
Engine
Hand Code
Code Generators
Engine
Hand Code
Code Generators
Engine
Hand Code
Code Generators
Engine
Hand Code
New Hybrid
Supporting Infrastructure
2 FTP Servers 4 ETL Servers 2 Data Acquisition Servers 4 App Servers 2 Domain Controllers 1 Gateway 1 File Server .5TB raw storage 1 Staging server .75TB raw storage
CPU's Performance
10
0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 Throughput
Dedicated I/O
Dedicated Memory
Because of the shared nothing MPP Example itsarchitecture, the query only takes as long as it takes for a single blade to go through data.
When a query is executed the query is replicated and executed across all blades.
If I had 1 Billion Records Distributed across 100 Blades 1 blade would only have to query 10 Million Rows A billion row query would would respond as if it were only querying 10 Million records.
60.025
40.050
20.075
Embracing eLt
Source Data Acquisition
Direct Access
SAS
Reports
Hosted Datamart
DB
Gulp
The eLt strategy was a complete success Data transformation performance was incredibly fast We were meeting operational windows that our competitors could not As a result, we received a new data feed which represented an additional 36 months of data then the gulp! Reality hit when we realized that we did not have enough space within our database to perform the transformations on the new data feed We were deeply committed to the eLt strategy and had very little recourse but to spend another $1,000,000 to expand our environment
My $1,000,000 Mistake
SPIL Methodology
SPIL (Stage Pre-Integrate Load)
Data Staging Pre-Integration Load Teradata
Data Sources Validated Data
Validated Data
Production Data
MLOAD / UPSERT