Académique Documents
Professionnel Documents
Culture Documents
By
www.kellytechno.com
Hadoop Developer
Core contributor since Hadoops infancy
Project Lead for Hadoop Distributed File
System
Facebook (Hadoop, Hive, Scribe)
Yahoo! (Hadoop in Yahoo Search)
Veritas (San Point Direct, Veritas File System)
IBM Transarc (Andrew File System)
UW Computer Science Alumni (Condor
Project)
www.kellytechno.com
www.kellytechno.com
www.kellytechno.com
Amazon/A9
Facebook
Google
IBM
Joost
Last.fm
New York Times
PowerSet
Veoh
Yahoo!
www.kellytechno.com
www.kellytechno.com
HDFS Architecture
Cluster Membership
am
ilen
1. f
NameNode
kId,
2. Blc
o
es
Nod
Data
Secondary
NameNode
Client
3.Read da
ta
Cluster Membership
DataNodes
www.kellytechno.com
www.kellytechno.com
Meta-data in Memory
The entire metadata is in main memory
No demand paging of meta-data
Types of Metadata
List of files
List of Blocks for each file
List of DataNodes for each block
File attributes, e.g creation time, replication
factor
A Transaction Log
Records file creations, file deletions. etc
www.kellytechno.com
A Block Server
Stores data in the local file system (e.g. ext3)
Stores meta-data of a block (e.g. CRC)
Serves data and meta-data to Clients
Block Report
Periodically sends a report of all existing blocks
to the NameNode
Facilitates Pipelining of Data
Forwards data to other specified DataNodes
www.kellytechno.com
Current Strategy
-- One replica on local node
-- Second replica on a remote rack
-- Third replica on same remote rack
-- Additional replicas are randomly placed
Clients read from nearest replica
Would like to make this policy pluggable
www.kellytechno.com
Use CRC32
File Creation
www.kellytechno.com
www.kellytechno.com
www.kellytechno.com
Production cluster
Test cluster
Web
Servers
Scribe Servers
Networ
k
Storage
MySQL
www.kellytechno.com
Statistics :
www.kellytechno.com
www.kellytechno.com
www.kellytechno.com
Power Management
Major operating expense
Power down CPUs when idle
Block placement based on access pattern
Move cold data to disks that need less power
Condor Green
www.kellytechno.com
www.kellytechno.com
www.kellytechno.com
www.kellytechno.com
www.kellytechno.com
www.kellytechno.com
HDFS Design:
Hadoop API:
http://hadoop.apache.org/core/docs/current/hdfs_design.html
http://hadoop.apache.org/core/docs/current/api/
Hive:
http://hadoop.apache.org/hive/
www.kellytechno.com
Thankyou
Presented
By
www.kellytechno.com