Vous êtes sur la page 1sur 33

What is HDFS?

Hadoop Distributed File System.

One of the two main components of Hadoop.

MapReduce.

HDFS.

What is Hadoop?

Distibuted Computing System.


Hadoop was developed to solve problems where you have
huge amout of data.
Hadoop is for situations where you want to run analytics that
are deep and computationally extensive.
The underlying technology was invented by Google.
Googles innovations were incorporated into Nutch, an
open source project.
From there Hadoop came into exsistence, and Yahoo
palyed a big role in developing Hadoop for enterprise
applications.

Why Hadoop needs HDFS?


Hadoop was developed to be

Scalable : Adding and Removign nodes without changing data formats.

Cost Effective : Use of comodity servers.

Flexible : Schema less.

Fault Tolerant : Built-In Redundancy and Failover

HDFS matches all the requirements listed above.

Difference b/w HDFS and other file


systems.

The biggest difference is HDFS is a virtual file system.

Hadoop file system runs on top of the existing file system.

Why HDFS?

HDFS is a distributed file system. While NTFS and FAT are not.

HDFS stores data reliably, It has built-in redundancy and failover. There is
no built-in redundancy and failover.

NTFS and FAT file system supports 4-8 block size. HDFS supports much
larger block sizes, by default 64 MB.

NTFS and FAT are optimized for random access read while HDFS is
optimized for sequential reads.

No local caching for HDFS as files size are huge. A typical file in HDFS
can size upto 1TB or even more

Blocks?
Advantages of blocks:

Fixed in size : easy to calculate how many fits in a disk.

A file can be bigger than any single disk in the network.

Only needed space is used. Smaller files dont use the


default block size.

Fits well with replication to provide fault tolerance and


availability.

Behind the scenes, 1 HDFS block is supported by


128 MB blocks.
multiple operating system
HDFS Block

Operating System Blocks

Components of HDFS

NameNode

Secondary NameNode

DataNode

NameNode

Hadoop Works on Master-Slave architechture.


NameNode is Master Node. There can only be one NameNode in a Hadoop
cluster.
What NameNode Does?
Store all the file system metadata for the cluster.
Oversees the health of Data Nodes and coordinates access to data.
Name Node only knows what blocks make up a file and where those blocks
are located in the cluster.
Keeps track of the clusters storage capacity
Make sure each block of data is meeting the minimum defined replica policy.
dfs.replication property in hdfs-site.xml
Single point of failure Don't use inexpensive comodity servers for NameNode.

Require Large amoutn of RAM. As it keeps all metadata in memory.

HDFS federation : http://hadoop.apache.org/docs/current/hadoop-projectdist/hadoop-hdfs/Federation.html#HDFS_Federation

Secondary NameNode

Provides a high availability backup for the Name Node?

No.

Secondary NameNode is Housekeeper for NameNode.


To maintain interactive speed, the filesystem metadata is
stored in the NameNodes RAM.
NameNode has no capability to persist that data into Disk.
Instead of storing the current snapshot of the filesystem every
time, modifications is continually appended to a log file called
the EditLog.
Restarting the NameNode involves replaying the EditLog to
reconstruct the final system state.

Secondary NameNode

The SecondaryNameNode periodically compacts the EditLog into a


checkpoint; the EditLog is then cleared.
A restart of the NameNode then involves loading the most recent checkpoint
and a shorter EditLog containing only events since the checkpoint.

Compaction ensures that restarts do not incur unnecessary downtime.

The duties of the SecondaryNameNode end there

Secondary Name Node connects to the Name Node [every hour] and grabs a
copy of the Name Nodes in-memory metadata. Combines this information in
a fresh set of files and delivers them back to the Name Node, while keeping a
copy for itself.
Configuration : core-site.xml

fs.checkpoint.period, set to 1 hour by default.

fs.checkpoint.size, set to 64MB by default.

DataNode

DataNodes are the storage servers. These are nodes where the actual
data resides.
These are the slaves of the hadoop Master-Slave architechture.
DataNodes are used to store the blocks of data, But without
NameNode they are not capable to make any sense out of these
blocks.
Data Nodes send heartbeats to the Name Node every 3 seconds via a
TCP handshake.
Every tenth heartbeat is a Block Report, where the Data Node tells
the Name Node about all the blocks it has.
The block reports allow the Name Node build its metadata and insure
minimum required repilca of the block exist on different nodes, in
different racks.

Rack Awareness
NameNode
NameNode
switch
A

switch

switch

DN1

DN6

DN11

DN2

B DN7

DN12

DN3

DN8

DN13

DN4

DN9

DN14

DN5

DN10

DN15

Meta Data
File.txt =
Block A:
DN : 1, 6, 7
Block B:
DN : 7, 14, 15
Rack Awareness
Rack 1:
1, 2, 3, 4, 5
Rack 2 :
6, 7, 8, 9, 10
Rack 3 :
11, 12, 13, 14, 15

Rack Awareness

Rack Awareness is a concept were NameNode is aware of which Data Node


resides within which Rack.
For larger Hadoop installations with multiple racks, it is important to ensure that
replicas of data exist on multiple racks. This way, the loss of a switch does not
render portions of the data unavailable due to all replicas being underneath it.
Have to manually configure this using script in bash or python.
To set the rack mapping script, specify the key topology.script.file.name in
conf/hdfs-site.xml. This provides a command to run to return a rack id; it must be
an executable script or program.

Topology data :================================


192.168.8.50 /dc1/rack1
192.168.8.70 /dc1/rack2
192.168.8.90 /dc1/rack2

HDFS Web Interface

HDFS exposes a web server which is capable of performing basic status monitoring
and file browsing operations.
By default this is exposed on port 50070 on the NameNode. http://namenode:50070/
Contains overview information about the health, capacity, and usage of the cluster
(similar to the information returned by bin/hadoop dfsadmin -report).
The address and port where the web interface listens can be changed by setting
dfs.http.address in conf/hdfs-site.xml.

It must be of the form address:port. To accept requests on all addresses, use 0.0.0.0.

From this interface, you can browse HDFS itself with a basic file-browser interface.

Each DataNode exposes its file browser interface on port 50075. You can override this
by setting the dfs.datanode.http.address configuration key to a setting other than
0.0.0.0:50075.
Log files generated by the Hadoop daemons can be accessed through this interface,
which is useful for distributed debugging and troubleshooting.

Command Line Interface

hadoop fs -cat URI

hadoop fs -chgrp [-R] GROUP URI [URI ]

Similar to put command, except that the source is restricted to a local file reference.

hadoop fs -copyToLocal [-ignorecrc] [-crc] URI <localdst>

Copy single src, or multiple srcs from local file system to the destination filesystem.

hadoop fs -copyFromLocal <localsrc> URI

Change the owner of files.

hadoop fs -put <localsrc> ... <dst>

Change the permissions of files.

hadoop fs -chown [-R] [OWNER][:[GROUP]] URI [URI ]

Change group association of files.

hadoop fs -chmod [-R] <MODE[,MODE]... | OCTALMODE> URI [URI ]

Copies source paths to stdout.

Similar to get command, except that the destination is restricted to a local file reference.

hadoop fs -cp URI [URI ] <dest>

Copy files from source to destination.

Command Line Interface

hadoop fs -get [-ignorecrc] [-crc] <src> <localdst>

hadoop fs -ls <args>

Moves files from source to destination.

hadoop fs -get [-ignorecrc] [-crc] <src> <localdst>

Displays a "not implemented" message.

hadoop fs -mv URI [URI ] <dest>

Creates directories. The behavior is much like unix.

dfs -moveFromLocal <src> <dst>

Recursive version of ls. Similar to Unix ls -R.

hadoop fs -mkdir <paths>

returns list of its direct children as in unix.

hadoop fs -lsr <args>

Copy files to the local file system.

Copy files to the local file system.

hadoop fs -rm URI [URI ]

Delete files specified as args.

hadoop fs -rmr URI [URI ]

Recursive version of delete.

Write Operation.
Result.txt => C, D

Hadoop Client

NameNode
NameNode
C => 3, 11, 12
D => 9, 4, 5

switch
A

switch

switch
DN1

DN6

DN11

DN2

B DN7

DN12

DN8

DN13

DN3
DN4

DN9

DN14

DN5

DN10

DN15

Meta Data
File.txt =
Block A:
DN : 1, 6, 7
Block B:
DN : 7, 14, 15
Rack Awareness
Rack 1:
1, 2, 3, 4, 5
Rack 2 :
6, 7, 8, 9, 10
Rack 3 :
11, 12, 13, 14, 15

Write Operation.
Result.txt => C, D

Re
11 ady
,1 :
2

Hadoop Client

NameNode
NameNode
C => 3, 11, 12
D => 9, 4, 5

switch
A

switch

switch
DN1

DN6

DN11

DN2

B DN7
2
:1

DN12

DN3

dy
a
Re

DN8

Meta Data
File.txt =
Block A:
DN : 1, 6, 7
Block B:
DN : 7, 14, 15

DN13

DN4

DN9

DN14

DN5

DN10

DN15

Rack Awareness
Rack 1:
1, 2, 3, 4, 5
Rack 2 :
6, 7, 8, 9, 10
Rack 3 :
11, 12, 13, 14, 15

Write Operation.
NameNode
NameNode

Hadoop Client

Su
cce
ss

Block Received

switch
A

switch

switch
DN1

DN6

DN11

DN2

B DN7

DN12

DN3

DN8

DN13

DN4

DN9

DN14

DN5

DN10

DN15

Meta Data
File.txt =
Block A:
DN : 1, 6, 7
Block B:
DN : 7, 14, 15
Result.txt =
Block C:
DN : 3, 11, 12
Block D:
DN : 9, 4, 5

Read Operation.
File.txt

Hadoop Client

NameNode
NameNode
A => 1, 6, 7
B => 7, 14, 15

switch
A

switch

switch

DN1

DN6

DN11

DN2

B DN7

DN12

DN8

DN13

DN3
DN4

DN9

DN14

DN5

DN10

DN15

Meta Data
File.txt =
Block A:
DN : 1, 6, 7
Block B:
DN : 7, 14, 15
Rack Awareness
Rack 1:
1, 2, 3, 4, 5
Rack 2 :
6, 7, 8, 9, 10
Rack 3 :
11, 12, 13, 14, 15

Read Operation.
File.txt

Hadoop Client

NameNode
NameNode
A => 1, 6, 7
B => 7, 14, 15

switch
A

switch

switch

DN1

DN6

DN11

DN2

B DN7

DN12

DN8

DN13

DN3
DN4

DN9

DN14

DN5

DN10

DN15

Name Node intelligently order the list of Data Node, considering the
Network traffic load on each Data Node containing the block.

Meta Data
File.txt =
Block A:
DN : 1, 6, 7
Block B:
DN : 7, 14, 15
Rack Awareness
Rack 1:
1, 2, 3, 4, 5
Rack 2 :
6, 7, 8, 9, 10
Rack 3 :
11, 12, 13, 14, 15

Deletion of data from HDFS.


When a file is deleted by a user or an application, it is not immediately removed from
HDFS.
HDFS first renames it to a file in the /trash directory.
The file can be restored quickly as long as it remains in /trash.
A file remains in /trash for a configurable amount of time.
The deletion of a file causes the blocks associated with the file to be freed.
There could be an appreciable time delay between the time a file is deleted by a user
and the time of the corresponding increase in free space in HDFS.
A user can Undelete a file after deleting it as long as it remains in the /trash directory.
The /trash directory contains only the latest copy of the file that was deleted.
When the replication factor of a file is reduced, the NameNode selects excess
replicas that can be deleted.
The next Heartbeat transfers this information to the DataNode. The DataNode then
removes the corresponding blocks and the corresponding free space appears in the
cluster.
The /trash directory is just like any other directory with one special feature: HDFS
applies specified policies to automatically delete files from this directory. The current
default policy is to delete files from /trash that are more than 6 hours old. In the future,
this policy will be configurable through a well defined interface.

Persistent Data Structures


Name Node

In conf/hdfs-site.xml set property dfs.name.dir {/home/hduser/tmp/dfs/name}

Under /home/hduser/tmp/dfs/name

current

image

in_use.lock

previous.checkpoint

current and previous.checkpoint will have same directory structure, both will
have files with same name under them.
/home/hduser/tmp/dfs/ will have namesecondary directory. This will have same
directory structure as name except for the previous.checkpoint directory.
Directory structure for /home/hduser/tmp/dfs/name/current/

edits

fsimage

fstime

VERSION

Persistent Data Structures


Name Node
VERSION file is a Java properties file that contains information about the
version of HDFS that is running.

#Thu Sep 19 18:29:16 IST 2013

namespaceID=1443825132

cTime=0

storageType=NAME_NODE

layoutVersion=-19

namespaceID :

Is a unique identifier for the filesystem.

created when the filesystem is first formatted.

Namenode uses it to identify new datanodes, since they will not know the namespaceID
until they have registered with the namenode.

CTime :

Marks the creation time of the namenodes storage.

Newly for-matted storage will always have value zero.

It is updated to a timestamp whenever the filesystem is upgraded.

Persistent Data Structures


Name Node
storageType :

indicates that this storage directory contains data structures for a


namenode.

layoutVersion :

Is a negative integer that defines the version of HDFSs persistent


data structures.
This version number has no relation to the release number of the
Ha-doop distribution.
Whenever the layout changes, the version number is decremented.

Persistent Data Structures


Name Node
fstime : fstime file used to record the time that the checkpoint was taken.
fsimage : file is a persistent checkpoint of the filesystem metadata. However,
it is not updated for every filesystem write operation, since writing out the
fsimage file, which can grow to be gigabytes in size, would be very slow.
edits : When a filesystem client performs a write operation (such as creating
or moving a file), it is first recorded in the edit log. The namenode also has
an in-memory representation of the filesystem metadata, which it updates
after the edit log has been modified.
The edits file would grow without bound. Though this state of affairs would
have no impact on the system while the namenode is running, if the
namenode were restarted, it would take a long time to apply each of the
operations in its edit log.

Persistent Data Structures


Name Node
The solution is to run the secondary namenode, whose purpose is to produce checkpoints of the primarys in-memory filesystem metadata.
The checkpointing process proceeds as follows :

The secondary asks the primary to roll its edits file, so new edits go to a new file.

The secondary retrieves fsimage and edits from the primary (using HTTP GET).

The secondary loads fsimage into memory, applies each operation from edits, then
creates a new consolidated fsimage file.
The secondary sends the new fsimage back to the primary (using HTTP POST).
The primary replaces the old fsimage with the new one from the secondary, and the old
edits file with the new one it started in step 1.

It also updates the fstime file to record the time that the checkpoint was taken.

At the end of the process, the primary has an up-to-date fsimage file and a shorter edits
file

Persistent Data Structures


Date Node
/home/hduser/tmp/dfs/data/current
dncp_block_verification.log.curr
VERSION
blk_-4815445268453121824
blk_-4815445268453121824_1014.meta
blk_-128854014309496448
blk_-128854014309496448_1013.meta
blk_4868937369333549191
blk_4868937369333549191_1009.meta
VERSION :
#Fri Sep 20 18:50:55 IST 2013
namespaceID=1443825132
storageID=DS-1242753232-127.0.0.1-50010-1379336648923
cTime=0
storageType=DATA_NODE
layoutVersion=-19

Persistent Data Structures


Date Node
The namespaceID, cTime, and layoutVersion are all the same as the values in the name-node.
The storageID is unique to the datanode and is used by the namenode to uniquely identify the
datanode.
The storageType identifies this directory as a datanode storage directory.
The other files in the datanodes current storage directory are the files with the blk_ prefix.
There are two types:

HDFS blocks themselves

metadata for a block

A block file just consists of the raw bytes of a portion of the file being stored;
the metadata file is made up of a header with version and type information, followed by a series of
checksums for sections of the block.
When the number of blocks in a directory grows to a certain size, the datanode creates a new
subdirectory in which to place new blocks and their accompanying metadata. It
creates a new subdirectory every time the number of blocks in a directory reaches 64.
dfs.datanode.numblocks
This ensures that thereis a manageable number of files per directory, which avoids the problems that
most operating systems encounter when there are a large number of files.

Additional HDFS Tasks


REBALANCING BLOCKS
New nodes can be added to a cluster in a straightforward manner.
On the new node, the same Hadoop version and configuration as on the rest of the cluster should be installed.
conf/hadoop-site.xml
The new node should be added to the slaves file on the master server as well
Starting the DataNode daemon on the machine will cause it to contact the NameNode and join the cluster.
But the new DataNode will have no data on board initially. New files will be stored on the new DataNode in addition to the existing ones, but for
optimum usage, storage should be evenly balanced across all nodes.
This can be achieved with the automatic balancer tool included with Hadoop.
The Balancer class will intelligently balance blocks across the nodes to achieve an even distribution of blocks within a given threshold, expressed as
a percentage.
Smaller percentages make nodes more evenly balanced, but may require more time to achieve this state. Perfect balancing (0%) is unlikely to
actually be achieved.
The balancer script can be run by starting bin/start-balancer.sh in the Hadoop directory.
The script can be provided a balancing threshold percentage with the -threshold parameter; e.g., bin/start-balancer.sh -threshold 5.
The balancer will automatically terminate when it achieves its goal, or when an error occurs, or it cannot find more candidate blocks to move to
achieve better balance.
The balancer can always be terminated safely by the administrator by running bin/stop-balancer.sh.
The amount of network traffic balancer can use is very low, with a default setting of 1MB/s. This setting can be changed with the
dfs.balance.bandwidthPerSec parameter in the file hdfs-site.xml

Additional HDFS Tasks


DECOMMISSIONING NODES
Nodes can also be removed from a cluster while it is running, without data loss.
But if nodes are simply shut down "hard," data loss may occur as they may hold the sole copy of one or more file blocks.
Nodes must be retired on a schedule that allows HDFS to ensure that no blocks are entirely replicated within the to-be-retired set of
DataNodes.
HDFS provides a decommissioning feature which ensures that this process is performed safely. To use it, follow the steps below:

Cluster configuration : . Add a key named dfs.hosts.exclude to your conf/hadoop-site.xml file. The value associated with this key
provides the full path to a file on the NameNode's local file system which contains a list of machines which are not permitted to connect
to HDFS.
Determine hosts to decommission. Each machine to be decommissioned should be added to the file identified by
dfs.hosts.exclude, one per line. This will prevent them from connecting to the NameNode.
Force configuration reload. Run the command bin/hadoop dfsadmin -refreshNodes. This will force the NameNode to reread its
configuration, including the newly-updated excludes file.
It will decommission the nodes over a period of time, allowing time for each node's blocks to be replicated onto machines which are
scheduled to remain active.
Shutdown nodes. After the decommission process has completed, the decommissioned hardware can be safely shutdown for
maintenance, etc. The bin/hadoop dfsadmin -report command will describe which nodes are connected to the cluster.
Edit excludes file again. Once the machines have been decommissioned, they can be removed from the excludes file. Running
bin/hadoop dfsadmin -refreshNodes again will read the excludes file back into the NameNode, allowing the DataNodes to rejoin the
cluster after maintenance has been completed, or additional capacity is needed in the cluster again, etc.

Conclusion.

Vous aimerez peut-être aussi