Académique Documents
Professionnel Documents
Culture Documents
Hadoop Architecture
NameNode Availability
HDFS HA Architecture
Backup
Security
NameNode manages
NameNode manages all
all the
the data
data nodes
nodes and
and
Secondary
NameNode Metadata maintain all
maintain all the
the metadata
metadata information
information
NameNode
client
number of seconds
between two periodic
checkpoints.
Planned event:
Maintenance work like software or
hardware upgradation
Availability of NameNode means we need
NameNode to be always up and running or
available for executing any Hadoop jobs
Secondary
NameNode NameNode
Unplanned Event:
NameNode crashes because of
In a standard HDFS configuration, NameNode
Hardware Failure
becomes a Single Point of Failure i.e. once
NameNode crashes whole cluster becomes
unavailable
DN1 DN2 DN3
1 2 3
In case of a NameNode
(active) failover, other
NameNode (Standby)
DN1 DN2 DN3
takes over
responsibility
Active Standby
sync sync
NameNode NameNode
Shared Storage
Active NameNode and Standby NameNode keep their state in sync with each other using shared storage
Shared Storage
1 2
Implementation
NFS
Quorum Journal Nodes
(Network File System)
Apache Zookeeper:
Highly available service for maintaining small amounts of coordination data, notifying clients of changes in that
data, and monitoring clients for failures.
Provides Automatic failover to the HDFS HA Architecture
Failure detection:
Each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper.
In case of NameNode failure, the ZooKeeper session will expire, notifying the other NameNode that a
failover should be triggered
?
Why Backup
k o f L os s o f Data
Ris
hadoop distcp hdsf://<source NN> hdfs://<target NN> Ingesting Data Using Flume
Flume
distcp
Principal:
1 Identity that needs to be verified is referred to as a principal
Realm:
2 Refers to an authentication administrative domain
2
Realm:
A realm in Kerberos refers to an authentication administrative domain. Principals are assigned to
specific realms in order to demarcate boundaries and simplify administration.
1 1
2 AS
Initial user authentication
DB request. This message is
TGS directed to Authentication
KDC Server (AS).
Application Service
(AS)
1 3
2 AS Request from the client to the
TGS for a service ticket. This
DB
packet includes the TGT
3 TGS
4 obtained from the previous
KDC message
4
Reply of the TGS to the
previous request. It returns the
requested service ticket
Application Service
(AS)
6 5 6
Reply that the application
service gives to the client to
prove it really is the server the
client is expecting
Application Service
(AS)