Vous êtes sur la page 1sur 36

Hadoop Cluster Creation

(Single Node)

By
Dr. R. Ragupathy
Assistant Professor
Department of Computer Science and Engineering
Hadoop - Introduction
 Hadoop is an open-source framework that allows to store and
process big data in a distributed environment across clusters
of computers using simple programming models.

 It is designed to scale up from single servers to thousands of


machines, each offering local computation and storage.

 Hadoop runs applications using the MapReduce algorithm,


where the data is processed in parallel on different CPU
nodes.

 They could perform complete statistical analysis for a huge


amounts of data.
Hadoop Framework
Hadoop is supported by GNU/Linux platform and its flavors.
Therefore, first install a Linux operating system for setting up
Hadoop environment.

In case you have an OS other than Linux, you can install a


Virtualbox software in it and have Linux inside the Virtualbox.

Pre Installation setup

 Step 1: Install Oracle Java 8

 Step 2: SSH and its Key Generation


Installation Procedure for Oracle
Java 8
Installation Procedure for Oracle Java 8

Java is the main prerequisite for Hadoop.

Step 1: Update apt-get the package index by executing the


following command:

sudo apt-get update


sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update

Step 2: Check the existence of java by executing the following


command:

java –version
contd...
Installation Procedure for Oracle Java 8
If java is not installed in your system, then follow the steps given
below for installing java.

Step 3: Install the default Java Runtime Environment (JRE) by


executing the following command:

sudo apt-get install default-jre

Step 4: Install the default Java Development Kit (JDK) by


executing the following command:

sudo apt-get install default-jdk

contd...
Installation Procedure for Oracle Java 8
Before step 5 : execute the following
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
Step 5: Install Open Java Runtime Environment (JRE) by executing
the following command:
sudo apt-get install openjdk-8-jre

Step 6: Install Open Java Development Kit (JDK) by executing the


following command:
sudo apt-get install openjdk-8-jdk

Step 7: Install Oracle JDK 8 by executing the following command:


sudo apt-get install oracle-java8-installer
contd...
Installation Procedure for Oracle Java 8
Step 8: Set the default java version to be used by executing
the following command:

sudo update-alternatives --config java

Following message will appear on the screen


There are 2 choices for the alternative java (providing /usr/bin/java).
Selection
Path
Priority
Status
------------------------------------------------------------
* 0 /usr/lib/jvm/java-8-oracle/jre/bin/java 1062 auto mode
1 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java 1061 manual mode
2 /usr/lib/jvm/java-8-oracle/jre/bin/java 1062 manual mode
Press enter to keep the current choice[*], or type selection number:
Choose the number to use as default.
contd…
Installation Procedure for Oracle Java 8

Step 9: Create a short cut to the installed java by creating a link


for java-8-oracle as jdk in the /usr/lib/jvm directory by
executing the following commands:

cd /usr/lib/jvm
sudo ln -s java-8-oracle jdk
SSH and itsKey Generation
SSH and its Key Generation
Before installing Hadoop into the Linux environment, need to
set up Linux using ssh (Secure Shell).

SSH based authentication is required to do different


operations on a cluster such as starting, stopping,
distributed daemon shell operations and also local machine
if you want to use Hadoop with it.

For our single node setup of Hadoop, need SSH access to


local host.

Follow the steps given below for setting up the Linux


environment
SSH and its Key Generation
Step 1: Install SSH by executing the following command:

sudo apt-get install openssh-server

Step 2: SSH Key Generation

To authenticate different users of Hadoop, it is required to provide


public/private key pair for a Hadoop user and share it with different
users.

The following command is used for generating a key value pair


using SSH

ssh-keygen -t rsa
SSH and its Key Generation
Step 3: Store the Keys and Passphrase by answering few more
questions. The entire key generation process looks like
this:

Generating public/private rsa key pair.

Enter file in which to save the key


(/home/ragupathy/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:

Note : Just enter all the line it will generate key at


/home/ragupathy/.ssh/
SSH and its Key Generation

Your identification has been saved in /home/ragupathy/.ssh/id_rsa.


Your public key has been saved in /home/ragupathy/.ssh/id_rsa.pub.
The key fingerprint is:
4a:dd:0a:c6:35:4e:3f:ed:27:38:8c:74:44:4d:93:67 ragupathy@ragupathy-Ideapad-Z560
The key's randomart image is:
+----[ RSA 2048]-----------------+
| .oo. |
| . o.E |
| +.o |
| .==. |
| =S=. |
| o+=+ |
| .o+o. |
| .o |
+------------------------------------+
The public key is now located in /home/ragupathy/.ssh/id_rsa.pub
The private key (identification) is now located in /home/ragupathy/.ssh/id_rsa
SSH and its Key Generation
Step 4: Enable SSH access to your local m/c with the newly
created. This is done by copying the Public Key to
user@machine name/IP by executing the following
command:

ssh-copy-id ragupathy@ragupathy-Ideapad-Z560

Step 5: Ensure the Password for Root Login is disabled by


finding the line PermitRootLogin without-password
in /etc/ssh/sshd_config by executing the following
command:

sudo gedit /etc/ssh/sshd_config


SSH and its Key Generation

Step 6: Put the changes into effect by executing the following


command:

sudo reload ssh


Procedure to Install Hadoop 2.7.2
Hadoop Operation Modes
Hadoop can be operated in one of the three supported modes:

Local/Standalone Mode : After downloading Hadoop in your


system, by default, it is configured in a standalone mode and can
be run as a single java process.

Pseudo Distributed Mode : It is a distributed simulation on


single machine. Each Hadoop daemon such as hdfs, yarn,
MapReduce etc., will run as a separate java process. This mode is
useful for development.

Fully Distributed Mode : This mode is fully distributed with


minimum two or more machines as a cluster.
Hadoop Installation Procedure
Step 1: Download Hadoop 2.7.2 by executing the following
command at /home//:

wget https://archive.apache.org/dist/hadoop/core/hadoop-
2.7.2/hadoop-2.7.2.tar.gz
Hadoop Installation
Procedure
Step 2: Extract hadoop-2.7.2.tar.gz by executing the following
command:

tar -xfz hadoop-2.7.2.tar.gz


Hadoop Installation Procedure

Step 3: Move hadoop-2.7.2 directory as hadoop under user


ragupathy by executing the following command:

mv hadoop-2.7.2 /home/ragupathy/hadoop
Hadoop Installation Procedure
The following are the list of files to edit to configure Hadoop

core-site.xml : The core-site.xml file contains information such as the


port number used for Hadoop instance, memory allocated for the file
system, memory limit for storing the data, and size of Read/Write
buffers.

hdfs-site.xml : The hdfs-site.xml file contains information such as the


value of replication data, namenode path, and datanode paths of your
local file systems. It means the place where you want to store the
Hadoop infrastructure.

yarn-site.xml : This file is used to configure yarn into Hadoop

mapred-site.xml :This file is used to specify which MapReduce


framework we are using
Hadoop Installation
Procedure
Step 4: Hadoop Configuration

Configure the Hadoop by modifying the following files:

~/.bashrc
/home/ragupathy/hadoop/etc/hadoop/hadoop-env.sh
/home/ragupathy/hadoop/etc/hadoop/core-site.xml
/home/ragupathy/hadoop/etc/hadoop/yarn-site.xml
/home/ragupathy/hadoop/etc/hadoop/mapred-site.xml
/home/ragupathy/hadoop/etc/hadoop/hdfs-site.xml
Hadoop Installation Procedure
Step 5: Use the following command to modify the ~/.bashrc

sudo gedit ~/.bashrc

Append the following in ~/.bashrc


#Hadoop variables
export JAVA_HOME=/usr/lib/jvm/jdk/
#Java Installed path
export HADOOP_INSTALL=/home/ragupathy/hadoop
#Hadoop Installed Path
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_HOME=$HADOOP_INSTALL
#end of paste
Hadoop Installation Procedure
After saving the ~/.bashrc, execute the bashrc profile file using
the following command:

source ~/.bashrc

In order to develop Hadoop programs in java, you have to reset


the java environment variables in hadoop-env.sh file by
replacing JAVA_HOME value with the location of java in your
system.

Step 6 : Open /home/ragupathy/hadoop/etc/hadoop/hadoop-


env.sh using gedit and change JAVA_HOME variable
into export JAVA_HOME=/usr/lib/jvm/jdk/ and Save
Hadoop Installation Procedure
Step 7: Edit core-site.xml

Open the /home/ragupathy/hadoop/etc/hadoop/core-site.xml


and Enter the following content in between the tag
<configuration> </configuration>

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
Hadoop Installation Procedure
Step 8: Edit yarn-site.xml

Open the /home/ragupathy/hadoop/etc/hadoop/yarn-site.xml


And enter the following content in between the tag
<configuration></configuration>

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value> </property>
Hadoop Installation Procedure
Step 9 : Edit mapred-site.xml

By default, the /home/ragupaty/hadoop/etc/hadoop/ folder


contains the /home/ragupathy/hadoop/etc/hadoop/mapred-
site.xml.template file which has to be renamed/copied with the
name mapred-site.xml. This file is used to specify which
framework is being used for MapReduce. Copy the contents by
executing the following command:

cp /home/ragupathy/hadoop/etc/hadoop/mapred-
site.xml.template
/home/ragupathy/hadoop/etc/hadoop/mapred-site.xml
Hadoop Installation Procedure
Step 10: Edit mapred-site.xml

Now, open /home/ragupathy/hadoop/etc/hadoop/mapred-


site.xml and enter the following content in between the tag
<configuration> </configuration>

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
Hadoop Installation Procedure
Step 11: Edit hdfs-site.xml

It is used to specify the directories which will be used as the


namenode and the datanode on that host.

Before editing this file, create two directories which will


contain the namenode and the datanode by executing the
following commands

mkdir -p /home/ragupathy/hadoop/hdfs/namenode

mkdir -p /home/ragupathy/hadoop/hdfs/datanode
Hadoop Installation Procedure
Step 12: Open /home/ragupathy/hadoop/etc/hadoop/hdfs-site.xml and enter the
following content in between the tag <configuration> </configuration>

<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/ragupathy/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/ragupathy/hadoop/hdfs/datanode</value>
</property>
Verifying Hadoop Installation
Step 13: Set up the NameNode using the command “hdfs
namenode -format” as follows.

cd /home/ragupathy/hadoop/ hdfs namenode -format

Note:
(i) The first step to starting up the hadoop installation is
formatting the Hadoop file system.

(ii) Need to do this first time you set up a hadoop cluster

(iii) Do not format a running Hadoop file system or otherwise the


data currently running in the cluster will be lost.
Verifying Hadoop Installation
Step 14: Start all Hadoop daemons by executing the following
commands:

start-dfs.sh // It starts the Hadoop DFS


daemons, NameNode and
DataNode

start-yarn.sh // It is used to start the yarn script


Verifying Hadoop Installation
Step 15: Check the hadoop using following command:

jps

Following message will appear on the screen


5506 NameNode
5644 DataNode
6518 Jps
6097 ResourceManager
6236 NodeManager
5880 SecondaryNameNode
Thank you

Vous aimerez peut-être aussi