Académique Documents
Professionnel Documents
Culture Documents
HADOOP
Bigdata
Volume
Variety
Velocity
Structure
Distributed Computing
Variety
Semi-Structure
Quasi-Structure
UnStructured
Commodity
Cheap
Scalable
Flexible
Fault Tolerance
Classification by Node
Slave Node
Store data
Run task
Slave Node
Secondary
Master Node
Slave Node
HADOOP COMPONENT
Classification by Job
Map Reduce
HDFS
Suitable data
Unstructured data
Immutable data
Data node can talk together to rebalance data , to move copies around and to
keep replication of data
HDFS
HDFS
HDFS in HADOOP V2
HDFS Federation
HDFS in HADOOP V2
HDFS Federation
HDFS in HADOOP V2
Fault Tolerance
HDFS in HADOOP V2
Fault Tolerance
MAP REDUCE
Programming model
MAP
REDUCE
merges all intermediate values associated with the same intermediate key
Fault tolerance
Scalability
HADOOP V.1
MAP Reduce V1
MASTER
n1
SLAVE
n2
SLAVE
n3
Task
Tracker
Task
Tracker
DataNode
DataNode
Job Tracker
NameNode
HDFS
Secondary
NameNode
SLAVE
n2
SLAVE
n3
Node
Manager
Node
Manager
App Mater
Container
DataNode
DataNode
Resource Manager
YARN
NameNode
HDFS
Secondary
NameNode
HADOOP ECOSYSTEM
Oozie : Workflow
Pig : Scripting
R Connectors : Statistics
Zookeeper : Coordination
HADOOP ECOSYSTEM
HADOOP ECOSYSTEM
Java:Good for: speed; control; binary data; working with existing Java or MapReduce
libraries.
Dumbo(Python),Happy(Jython),Wukong(Ruby),mrtoolkit(Ruby):Good for:
Python/Ruby programmers who want quick results, and are comfortable with the
MapReduce abstraction.
CLOUDERA
Deployment
Configuration
Service management
Security management
Extensibility APIs
FLUME
Simple
Flexible architecture
Robust
Fault tolerant
Reliability
FLUME
Source
Channel
Sink
Setting multi-agent
flow
Consolidation
DEMO
HADOOP 3 Node + Wordcount mapreduce + FLUME
MASTER
n1
YARN
SLAVE
n2
SLAVE
n3
Node
Manager
Node
Manager
DataNode
DataNode
FLUME
FLUME
SLAVE
n4
Resource
Manager
NameNode
HDFS
Secondary
NameNode
FLUME
FLUME
COMPARE
HADOOP
CLOUDERA - FREE
Unlimited
Unlimited
CLOUDERA ENTERPRISE
Unlimited
GUI
Diagnostic
Node Limit
CDH
Cloudera Support
High Availibility
COMPARE
HADOOP
CLOUDERA - FREE
Difficult Configuration
Difficult Diagnosis
Difficult Authorization
Difficult modification
Yarn Bug
Ensure Hadoop ecosystem
compatible
Partner solution with splunk
Apache Spark