Hadoop Cluster Creation

Hadoop Cluster Creation
(Single Node)
By
Dr. R. Ragupathy
Assistant Professor
Department of Computer Science and Engineering
Hadoop - Introduction
 Hadoop is an open-source framework that allows to store and
process big data in a distributed environment across clusters
of computers using simple programming models.
 It is designed to scale up from single servers to thousands of

machines, each offering local computation and storage.
 Hadoop runs applications using the MapReduce algorithm,

where the data is processed in parallel on different CPU
nodes.
 They could perform complete statistical analysis for a huge

amounts of data.
Hadoop Framework
Hadoop is supported by GNU/Linux platform and its flavors.
Therefore, first install a Linux operating system for setting up
Hadoop environment.
In case you have an OS other than Linux, you can install a

Virtualbox software in it and have Linux inside the Virtualbox.
Pre Installation setup
 Step 1: Install Oracle Java 8
 Step 2: SSH and its Key Generation

Installation Procedure for Oracle
Java 8
Installation Procedure for Oracle Java 8
Java is the main prerequisite for Hadoop.
Step 1: Update apt-get the package index by executing the

following command:
sudo apt-get update

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
Step 2: Check the existence of java by executing the following

command:
java –version
contd...
If java is not installed in your system, then follow the steps given
below for installing java.
Step 3: Install the default Java Runtime Environment (JRE) by

executing the following command:
sudo apt-get install default-jre
Step 4: Install the default Java Development Kit (JDK) by

sudo apt-get install default-jdk
contd...
Before step 5 : execute the following
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
Step 5: Install Open Java Runtime Environment (JRE) by executing
the following command:
sudo apt-get install openjdk-8-jre
Step 6: Install Open Java Development Kit (JDK) by executing the

following command:
sudo apt-get install openjdk-8-jdk
Step 7: Install Oracle JDK 8 by executing the following command:

sudo apt-get install oracle-java8-installer
contd...
Step 8: Set the default java version to be used by executing
sudo update-alternatives --config java
Following message will appear on the screen

There are 2 choices for the alternative java (providing /usr/bin/java).
Selection
Path
Priority
Status
------------------------------------------------------------
* 0 /usr/lib/jvm/java-8-oracle/jre/bin/java 1062 auto mode
1 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java 1061 manual mode
2 /usr/lib/jvm/java-8-oracle/jre/bin/java 1062 manual mode
Press enter to keep the current choice[*], or type selection number:
Choose the number to use as default.
contd…
Step 9: Create a short cut to the installed java by creating a link

for java-8-oracle as jdk in the /usr/lib/jvm directory by
executing the following commands:
cd /usr/lib/jvm
sudo ln -s java-8-oracle jdk
SSH and itsKey Generation
SSH and its Key Generation
Before installing Hadoop into the Linux environment, need to
set up Linux using ssh (Secure Shell).
SSH based authentication is required to do different

operations on a cluster such as starting, stopping,
distributed daemon shell operations and also local machine
if you want to use Hadoop with it.
For our single node setup of Hadoop, need SSH access to

local host.
Follow the steps given below for setting up the Linux

environment
Step 1: Install SSH by executing the following command:
sudo apt-get install openssh-server
Step 2: SSH Key Generation
To authenticate different users of Hadoop, it is required to provide

public/private key pair for a Hadoop user and share it with different
users.
The following command is used for generating a key value pair

using SSH
ssh-keygen -t rsa
Step 3: Store the Keys and Passphrase by answering few more
questions. The entire key generation process looks like
this:
Generating public/private rsa key pair.
Enter file in which to save the key

(/home/ragupathy/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Note : Just enter all the line it will generate key at

/home/ragupathy/.ssh/
Your identification has been saved in /home/ragupathy/.ssh/id_rsa.

Your public key has been saved in /home/ragupathy/.ssh/id_rsa.pub.
The key fingerprint is:
4a:dd:0a:c6:35:4e:3f:ed:27:38:8c:74:44:4d:93:67 ragupathy@ragupathy-Ideapad-Z560
The key's randomart image is:
+----[ RSA 2048]-----------------+
| .oo. |
| . o.E |
| +.o |
| .==. |
| =S=. |
| o+=+ |
| .o+o. |
| .o |
+------------------------------------+
The public key is now located in /home/ragupathy/.ssh/id_rsa.pub
The private key (identification) is now located in /home/ragupathy/.ssh/id_rsa
Step 4: Enable SSH access to your local m/c with the newly
created. This is done by copying the Public Key to
user@machine name/IP by executing the following
command:
ssh-copy-id ragupathy@ragupathy-Ideapad-Z560
Step 5: Ensure the Password for Root Login is disabled by

finding the line PermitRootLogin without-password
in /etc/ssh/sshd_config by executing the following
command:
sudo gedit /etc/ssh/sshd_config

Step 6: Put the changes into effect by executing the following

command:
sudo reload ssh

Procedure to Install Hadoop 2.7.2
Hadoop Operation Modes
Hadoop can be operated in one of the three supported modes:
Local/Standalone Mode : After downloading Hadoop in your

system, by default, it is configured in a standalone mode and can
be run as a single java process.
Pseudo Distributed Mode : It is a distributed simulation on

single machine. Each Hadoop daemon such as hdfs, yarn,
MapReduce etc., will run as a separate java process. This mode is
useful for development.
Fully Distributed Mode : This mode is fully distributed with

minimum two or more machines as a cluster.
Hadoop Installation Procedure
Step 1: Download Hadoop 2.7.2 by executing the following
command at /home//:
wget https://archive.apache.org/dist/hadoop/core/hadoop-
2.7.2/hadoop-2.7.2.tar.gz
Hadoop Installation
Procedure
Step 2: Extract hadoop-2.7.2.tar.gz by executing the following
command:
tar -xfz hadoop-2.7.2.tar.gz

Step 3: Move hadoop-2.7.2 directory as hadoop under user

ragupathy by executing the following command:
mv hadoop-2.7.2 /home/ragupathy/hadoop
The following are the list of files to edit to configure Hadoop
core-site.xml : The core-site.xml file contains information such as the

port number used for Hadoop instance, memory allocated for the file
system, memory limit for storing the data, and size of Read/Write
buffers.
hdfs-site.xml : The hdfs-site.xml file contains information such as the

value of replication data, namenode path, and datanode paths of your
local file systems. It means the place where you want to store the
Hadoop infrastructure.
yarn-site.xml : This file is used to configure yarn into Hadoop
mapred-site.xml :This file is used to specify which MapReduce

framework we are using
Hadoop Installation
Procedure
Step 4: Hadoop Configuration
Configure the Hadoop by modifying the following files:
~/.bashrc
/home/ragupathy/hadoop/etc/hadoop/hadoop-env.sh
/home/ragupathy/hadoop/etc/hadoop/core-site.xml
/home/ragupathy/hadoop/etc/hadoop/yarn-site.xml
/home/ragupathy/hadoop/etc/hadoop/mapred-site.xml
/home/ragupathy/hadoop/etc/hadoop/hdfs-site.xml
Step 5: Use the following command to modify the ~/.bashrc
sudo gedit ~/.bashrc
Append the following in ~/.bashrc

#Hadoop variables
export JAVA_HOME=/usr/lib/jvm/jdk/
#Java Installed path
export HADOOP_INSTALL=/home/ragupathy/hadoop
#Hadoop Installed Path
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_HOME=$HADOOP_INSTALL
#end of paste
After saving the ~/.bashrc, execute the bashrc profile file using
source ~/.bashrc
In order to develop Hadoop programs in java, you have to reset

the java environment variables in hadoop-env.sh file by
replacing JAVA_HOME value with the location of java in your
system.
Step 6 : Open /home/ragupathy/hadoop/etc/hadoop/hadoop-

env.sh using gedit and change JAVA_HOME variable
into export JAVA_HOME=/usr/lib/jvm/jdk/ and Save
Step 7: Edit core-site.xml
Open the /home/ragupathy/hadoop/etc/hadoop/core-site.xml

and Enter the following content in between the tag
<configuration> </configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
Step 8: Edit yarn-site.xml
Open the /home/ragupathy/hadoop/etc/hadoop/yarn-site.xml

And enter the following content in between the tag
<configuration></configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value> </property>
Step 9 : Edit mapred-site.xml
By default, the /home/ragupaty/hadoop/etc/hadoop/ folder

contains the /home/ragupathy/hadoop/etc/hadoop/mapred-
site.xml.template file which has to be renamed/copied with the
name mapred-site.xml. This file is used to specify which
framework is being used for MapReduce. Copy the contents by
cp /home/ragupathy/hadoop/etc/hadoop/mapred-
site.xml.template
/home/ragupathy/hadoop/etc/hadoop/mapred-site.xml
Step 10: Edit mapred-site.xml
Now, open /home/ragupathy/hadoop/etc/hadoop/mapred-

site.xml and enter the following content in between the tag
<configuration> </configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
Step 11: Edit hdfs-site.xml
It is used to specify the directories which will be used as the

namenode and the datanode on that host.
Before editing this file, create two directories which will

contain the namenode and the datanode by executing the
following commands
mkdir -p /home/ragupathy/hadoop/hdfs/namenode
mkdir -p /home/ragupathy/hadoop/hdfs/datanode
Step 12: Open /home/ragupathy/hadoop/etc/hadoop/hdfs-site.xml and enter the
following content in between the tag <configuration> </configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/ragupathy/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/ragupathy/hadoop/hdfs/datanode</value>
</property>
Verifying Hadoop Installation
Step 13: Set up the NameNode using the command “hdfs
namenode -format” as follows.
cd /home/ragupathy/hadoop/ hdfs namenode -format
Note:
(i) The first step to starting up the hadoop installation is
formatting the Hadoop file system.
(ii) Need to do this first time you set up a hadoop cluster
(iii) Do not format a running Hadoop file system or otherwise the

data currently running in the cluster will be lost.
Step 14: Start all Hadoop daemons by executing the following
commands:
start-dfs.sh // It starts the Hadoop DFS

daemons, NameNode and
DataNode
start-yarn.sh // It is used to start the yarn script

Step 15: Check the hadoop using following command:
jps
Following message will appear on the screen

5506 NameNode
5644 DataNode
6518 Jps
6097 ResourceManager
6236 NodeManager
5880 SecondaryNameNode
Thank you

Hadoop Cluster Creation - Single Node

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Hadoop Cluster Creation - Single Node

Transféré par

Droits d'auteur :

Formats disponibles

 It is designed to scale up from single servers to thousands of

 Hadoop runs applications using the MapReduce algorithm,

 They could perform complete statistical analysis for a huge

In case you have an OS other than Linux, you can install a

Pre Installation setup

 Step 1: Install Oracle Java 8

 Step 2: SSH and its Key Generation

Java is the main prerequisite for Hadoop.

Step 1: Update apt-get the package index by executing the

sudo apt-get update

Step 2: Check the existence of java by executing the following

Step 3: Install the default Java Runtime Environment (JRE) by

sudo apt-get install default-jre

Step 4: Install the default Java Development Kit (JDK) by

sudo apt-get install default-jdk

Step 6: Install Open Java Development Kit (JDK) by executing the

Step 7: Install Oracle JDK 8 by executing the following command:

sudo update-alternatives --config java

Following message will appear on the screen

Step 9: Create a short cut to the installed java by creating a link

SSH based authentication is required to do different

For our single node setup of Hadoop, need SSH access to

Follow the steps given below for setting up the Linux

sudo apt-get install openssh-server

Step 2: SSH Key Generation

To authenticate different users of Hadoop, it is required to provide

The following command is used for generating a key value pair

Generating public/private rsa key pair.

Enter file in which to save the key

Note : Just enter all the line it will generate key at

Your identification has been saved in /home/ragupathy/.ssh/id_rsa.

Step 5: Ensure the Password for Root Login is disabled by

sudo gedit /etc/ssh/sshd_config

Step 6: Put the changes into effect by executing the following

sudo reload ssh

Local/Standalone Mode : After downloading Hadoop in your

Pseudo Distributed Mode : It is a distributed simulation on

Fully Distributed Mode : This mode is fully distributed with

tar -xfz hadoop-2.7.2.tar.gz

Step 3: Move hadoop-2.7.2 directory as hadoop under user

core-site.xml : The core-site.xml file contains information such as the

hdfs-site.xml : The hdfs-site.xml file contains information such as the

yarn-site.xml : This file is used to configure yarn into Hadoop

mapred-site.xml :This file is used to specify which MapReduce

Configure the Hadoop by modifying the following files:

sudo gedit ~/.bashrc

Append the following in ~/.bashrc

In order to develop Hadoop programs in java, you have to reset

Step 6 : Open /home/ragupathy/hadoop/etc/hadoop/hadoop-

Open the /home/ragupathy/hadoop/etc/hadoop/core-site.xml

Open the /home/ragupathy/hadoop/etc/hadoop/yarn-site.xml

By default, the /home/ragupaty/hadoop/etc/hadoop/ folder

Now, open /home/ragupathy/hadoop/etc/hadoop/mapred-

It is used to specify the directories which will be used as the

Before editing this file, create two directories which will

cd /home/ragupathy/hadoop/ hdfs namenode -format

(ii) Need to do this first time you set up a hadoop cluster

(iii) Do not format a running Hadoop file system or otherwise the

start-dfs.sh // It starts the Hadoop DFS

start-yarn.sh // It is used to start the yarn script

Following message will appear on the screen

Vous aimerez peut-être aussi