P. 1
70 Everest-PGCon RT

70 Everest-PGCon RT

|Views: 614|Likes:
Publié parwarwithin

More info:

Published by: warwithin on May 30, 2008
Droits d'auteur :Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

05/09/2014

pdf

text

original

Everest

Scaling to Petabytes

Yahoo! May 2008

Everest Architecture
Segment Manager Scripts / Apps
ODBC

PgAdmin

Massively Parallel (Tens of PB)
– – – Commodity Clusters Multi-tier scalability Distributed Columnar Storage Optimized compression Parallel Vector Query Processing Query and Storage optimizations Query Expression and Columnar caching Tools and Connectivity (ODBC) extensibility UDF & UDAF framework COTS

Clients

PERL/ Ruby.DBI

ADO.NET

PQLib

PostgreSQL Lib

Smart
– – – –

PostgreSQL Server
Everest Extensions
Distributed QP

QP
Mgmt Proxy

Segmentation Platform LSM Proxy

Query Server

Trans Proxy

Mgmt Services Storage Provider

Logical Storage Volume Manager Storage Proxy

Trans Server Storage Cache

Shared Memory

Leverage PostgreSQL
– – –

Asynchronous Communications

Node Storage Manager

Inexpensive

Storage Server

Asynchronous Communications Storage Provider Storage Proxy Storage Cache

Volume Storage Manager
Chunk Storage Volume Metadata Storage Services

Volume Volume

2

Performance and Scale
• Proven Petabytes scale in production
– Approaching 2 PB, projected to grow > 30 PB by 2009 – Largest table: 3.5 Trillion rows (time partitioned)

• 10x Price-Performance relative to commercial systems
Data size 90 TB
(600 B rows)

Everest (min) 177 60 250

Vendor A (min) 414 95 1200

Vendor B (min)
Response Time (min)

Performance comparison 500 400 300 200 100 0 30 TB Data size 90 TB Vendor A Vendor B Everest

325 91 1200

30 TB
(200 B rows)

HW Cost (1 PB)

3

Everest Performance Advantages
• Source of Performance and Scale
– Distributed Compressed Columnar Storage – Highly Parallel and Asynchronous
• Multi threaded Query Execution as well as Storage

– – – – –

Vector Query Processing Multi-level data partitioning and query partitioning Cluster-level Compressed Columnar caching Query expression caching Yahoo! specific language extensions and UDF & UDAF

4

You're Reading a Free Preview

Télécharger
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->