Hadoop Installation

Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation
Big Data & Hadoop
N. Siva Ram Prasad

Professor & Head
Department of Information Technology

Bapatla Engineering College
Bapatla
November 22, 2016
Bapatla Engineering College, Bapatla, Guntur

Big Data & Hadoop
November 22, 2016 Slide: 1 / 38
1 Hadoop Installation
2 Hadoop Configuration
3 Starting & Stopping
4 Map Reduce
5 Mapreduce Implementation

Big Data & Hadoop
Requirements
Necessary Optional
Java Version 1.7 or above Eclipse
ssh ( Secure shell ) Internet
Linux OS (Ubuntu Version 14.04 or above) connec-
tion
Hadoop framework

Big Data & Hadoop
Java 7 & Installation
Hadoop requires a working Java installation. However, using

java 1.7 or more is recommended.
Following command is used to install java in linux platform
sudo apt-get install openjdk-7-jdk (or)
sudo apt-get install default-jdk
Following command is used to check Java installation
java -version (or)
javac -version

Big Data & Hadoop
Java PATH Setup

We need to set JAVA path
To know java path, run the following command
update-alternatives --config java
Open the file /etc/environment
gedit /etc/environment
Append the following statement to the file
JAVA HOME=/usr/lib/jvm/java7openjdkamd64
source /etc/environment
echo $JAVA HOME
Open the .bashrc file located in home directory
gedit /.bashrc
Append the following statement to the file
export JAVA HOME=/usr/lib/jvm/java7openjdkamd64
Run the following command source /.bashrc
Big Data & Hadoop
Installation & Configuration of SSH
Hadoop requires SSH(Secure Shell) access to manage its

nodes, i.e. remote machines plus your local machine if you
want to use Hadoop on it.
Install SSH using the following command
sudo apt-get install ssh
Generate public and private keys using RSA or DSA for user.
ssh-keygen -t dsa -P -f /.ssh/id dsa
Copy the published keys to the authorized keys folder. cat
/.ssh/id dsa.pub >> /.ssh/authorized keys
Check the installation by running the following command at
the command prompt. ssh localhost

Big Data & Hadoop
Download & Extract Hadoop
Download Hadoop 2.7.2 (binary)from any mirror available at the

following link
http://hadoop.apache.org/releases.html
Extract the contents of the Hadoop package to a location of your

choice. I picked /usr/local/hadoop.
$ cd /usr/local
$ sudo tar xvzf hadoop-2.7.2.tar.gz
$ sudo mv hadoop-2.7.2 hadoop

Big Data & Hadoop
Add Hadoop configuration in .bashrc

Add Hadoop configuration in .bashrc in home directory.
export HADOOP INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP INSTALL/bin
export PATH=$PATH:$HADOOP INSTALL/sbin
export HADOOP MAPRED HOME=$HADOOP INSTALL
export HADOOP HDFS HOME=$HADOOP INSTALL
export HADOOP COMMON HOME=$HADOOP INSTALL
export YARN HOME=$HADOOP INSTALL
export
HADOOP COMMON LIB NATIVE DIR=$HADOOP INSTALL/lib/native
export
HADOOP OPTS="-Djava.library.path=$HADOOP INSTALL/lib"
Run the following command source /.bashrc
Big Data & Hadoop
Create temp file, DataNode & NameNode
Execute below commands to create NameNode

mkdir -p /usr/local/hadoopdata/hdfs/namenode
Execute below commands to create DataNode
mkdir -p /usr/local/hadoopdata/hdfs/datanode
Execute below code to create the tmp directory in hadoop
sudo mkdir -p /app/hadoop/tmp
sudo chown hadoop1:hadoop1 /app/hadoop/tmp
sudo chmod 750 /app/hadoop/tmp

Big Data & Hadoop
Files to Configure
The following are the files we need to configure

core-site.xml
hadoop-env.sh
mapred-site.xml
hdfs-site.xml

Big Data & Hadoop
Add properties in /usr/local/hadoop/etc/core-site.xml

Add the following snippets between the
< configuration > ... < /configuration > tags in the core-site.xml
file.
Add below property to specify the location of tmp
< property >
< name > hadoop.tmp.dir < /name >
< value > /app/hadoop/tmp < /value >
< /property >
Add below property to specify the location of default file
system and its port number.
< property >
< name > fs.default.name < /name >
< value > hdfs : //localhost : 54310 < /value >
< /property >

Big Data & Hadoop
Add properties in /usr/local/hadoop/etc/hadoop-env.sh
Un-Comment the JAVA HOME and Give Correct Path For

Java.
export JAVA HOME=/usr/lib/jvm/java-7-openjdk-amd64

Big Data & Hadoop
Add property in
/usr/local/hadoop/etc/hadoop/mapred-site.xml
In file we add The host name and port that the MapReduce job
tracker runs at. Add following in mapred-site.xml :
< property >
< name > mapred.job.tracker < /name >
< value > localhost : 54311 < /value >
< /property >

Big Data & Hadoop
Add properties in ... etc/hadoop/hdfs-site.xml

In file hdfs-site.xml add following:
Add replication factor
< property >
< name > dfs.replication < /name >
< value > 1 < /value >
< description > Defaultblockreplication < /description >
< /property >
Specify the NameNode
< property >
< name > dfs.namenode.name.dir < /name >
< value > file : /usr /local/hadoopdata/hdfs/namenode < /value >
< /property >
Specify the DataNode
< property >
< name > dfs.datanode.name.dir < /name >
< value > file : /usr /local/hadoopdata/hdfs/datanode < /value >
< /property >
Big Data & Hadoop
Formatting the HDFS filesystem via the NameNode
The first step to starting up your Hadoop installation is

Formatting the Hadoop file system
We need to do this the first time you set up a Hadoop.
Do not format a running Hadoop filesystem as you will lose all
the data currently in HDFS
To format the filesystem, run the command
hadoop namenode -format
Look at the status code ( 1 Error, 0 Success )

Big Data & Hadoop
Starting single-node cluster

Run the command:
start-all.sh
This will startup a NameNode,SecondaryNameNode,
DataNode, ResourceManager and a NodeManager on your
machine.
A nifty tool for checking whether the expected Hadoop
processes are running is jps
hadoop1@hadoop1:/usr/local/hadoop$ jps
2598 NameNode
3112 ResourceManager
3523 Jps
2917 SecondaryNameNode
2727 DataNode
3242 NodeManager

Big Data & Hadoop
Stopping your single-node cluster
Run the command

stop-all.sh
To stop all the daemons running on your machine output will be
like this.
stopping NodeManager
localhost: stopping ResourceManager
stopping NameNode
localhost: stopping DataNode
localhost: stopping SecondaryNameNode

Big Data & Hadoop
Map-Reduce Framework
Map Reduce programming paradigm

It relies basically on two functions, Map and Reduce
Map Reduce used to manage many large-scale computations
The framework takes care of scheduling tasks, monitoring
them and re-executes the failed tasks.
The framework to effectively schedule tasks on the nodes
where data is already present

Big Data & Hadoop
Map-Reduce Computation Steps
The key-value pairs from each Map task are collected by a

master controller and sorted by key. The keys are divided
among all the Reduce tasks, so all key-value pairs with the
same key wind up at the same Reduce task.
The Reduce tasks work on one key at a time, and combine
all the values associated with that key in some way. The
manner of combination of values is determined by the code
written by the user for the Reduce function.

Big Data & Hadoop
Hadoop - MapReduce

Big Data & Hadoop
Hadoop - MapReduce (Word Count) Example

Big Data & Hadoop
MapReduce - WordCountMapper
In WordCountMapper class we perform the following operations

Read a line from file
Split line into Words
Assign Count 1 to each word

Big Data & Hadoop
Useful commands in eclipse
sudo apt-get install eclipse // for eclipse installation

ctrl + spacebar // for context sensitive help
ctrl + shift + f // for formatting the code
ctrl + 1 // assign to a variable or for a class implement the
unimplemented methods
RClick RunAs Runconfigurations JavaApplication
Arguments Apply Run // to set command line parameters
and run the code

Big Data & Hadoop
Mapreduce Implementation in Java using Eclipse
Step 1: open Eclipse

Step 2: Go to File New Project
Select Java Project and click on Next button
Write project name and click on Finish button

Big Data & Hadoop
Continue...
Step 3: Right side it creates a project

1 Right click on Project New Class
2 Write Name of Class and then Click Finish
3 Write Mapper program in that class
Step 4: Similarly write Reducer and Job related JAVA Programs

Big Data & Hadoop
Continue...
Step 5: Importing External JAR files

1 RC on Project and select properties (Alt+Enter)
2 Select Java Build Path Click on Libraries, then click on
add external JARS
3 Select the following jars from Hadoop library.
/usr/local/Hadoop/share/Hadoop/common/libs
/usr/local/Hadoop/share/Hadoop/hdfs/libs
/usr/local/Hadoop/share/Hadoop/httpfs/libs
/usr/local/Hadoop/share/Hadoop/mapreduce/libs
/usr/local/Hadoop/share/Hadoop/yarn/libs
/usr/local/Hadoop/share/Hadoop/tools/

Big Data & Hadoop
WordCountMapper source code
public class WordCountMapper extends

Mapper<LongWritable, Text, Text, LongWritable>{
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
// read the file line by line
String line = value.toString();
// split the line into words
String[] words = line.split(" ");
// assign count(1) to each word
for (String word : words) {
context.write(new Text(word), new LongWritable(1));
}
}
}

Big Data & Hadoop
MapReduce - WordCountReducer
In WordCountReducer class we perform the following operations

Sum the list of values
Assign sum to corresponding word

Big Data & Hadoop
WordCountReducer source code
public class WordCountReducer extends

Reducer<Text,LongWritable,Text,LongWritable> {
protected void reduce(Text key, Iterable<LongWritable> values,
Context context) throws IOException, InterruptedException {
//sum the list of values
long sum = 0;
for (LongWritable value : values) {
sum = sum + value.get();
}
//assign the sum to corresponding word or key
context.write(key, new LongWritable(sum));
}
}

Big Data & Hadoop
WordCountJob
public class WordCountJob implements Tool{

// initializing the configuration object
private Configuration conf;
public Configuration getConf() {
return conf; // return configuration object
}
public void setConf(Configuration conf) {
this.conf = conf; // set the configuration object
}

Big Data & Hadoop
run method
public int run(String[] args) throws Exception {
// initializing the job object with configuration
Job job = new Job(getConf());
job.setJobName("Word count job");
job.setJarByClass(this.getClass()); // main job class
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
return job.waitForCompletion(true) ? 0 : 1;
}
Big Data & Hadoop
main method
public static void main(String[] args) throws Exception {

int status = ToolRunner.run(new Configuration(), new
WordCountJob(), args);
System.out.println("My Status: " + status);
}

Big Data & Hadoop
Header Files to include
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

Big Data & Hadoop
main method

Big Data & Hadoop
Continue ....
Step 6: Set input file path

1 Create folder in home dir
2 copy text files in to that
3 Select path of Input
Step 7: Set input and output path
Step a: RC any where on WordCountJob RunAS Run
Configurations
Step b: Java Application Arguments Apply Run

Big Data & Hadoop
Web interface to Hadoop cluster
1 http://localhost:50070 // Namenode interface

2 http://localhost:8088 // Resource manager interface

Big Data & Hadoop
Useful commands in hadoop
http://machine name:50070
kill -9 PID
hadoop fs -touchz /demo/test
hadoop fs -setrep -w 3 /demo/test
hadoop home/share/hadoop/mapreduceexamples
hadoop jar jarfilename mainclass parameters
/usr/local/hadoop/etc/hadoop/slaves
/usr/local/hadoop/etc/hadoop/masters
/etc/hostname
sudo reboot

Big Data & Hadoop
Useful commands in hadoop
sudo rm -rf /usr/local/hadoop/hadoop data/

sudo mkdir -p /usr/local/hadoop/hadoop data/hdfs/datanode
sudo chown -R uname:gname /usr/local/hadoop/
ssh -copy -id -i /.ssh/id dsa.pub uname@hostname
ssh hostname
sudo apt-get install eclipse
ctrl + space for context sensitive help
ctrl + shift + f for formatting the code
ctrl + 1 assign to a variable or for class implement the
unimplemented methods
Rgt.Click + RunAs + Runconfigurations + JavaApplication +
Arguments

Big Data & Hadoop
thank You

Big Data & Hadoop

Hadoop Installation

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Hadoop Installation

Transféré par

Droits d'auteur :

Formats disponibles

Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Big Data & Hadoop

N. Siva Ram Prasad

Department of Information Technology

November 22, 2016

Bapatla Engineering College, Bapatla, Guntur

3 Starting & Stopping

Bapatla Engineering College, Bapatla, Guntur

Bapatla Engineering College, Bapatla, Guntur

Java 7 & Installation

Hadoop requires a working Java installation. However, using

Bapatla Engineering College, Bapatla, Guntur

Java PATH Setup

Installation & Configuration of SSH

Hadoop requires SSH(Secure Shell) access to manage its

Bapatla Engineering College, Bapatla, Guntur

Download & Extract Hadoop

Download Hadoop 2.7.2 (binary)from any mirror available at the

Extract the contents of the Hadoop package to a location of your

Bapatla Engineering College, Bapatla, Guntur

Add Hadoop configuration in .bashrc

Create temp file, DataNode & NameNode

Execute below commands to create NameNode

Bapatla Engineering College, Bapatla, Guntur

The following are the files we need to configure

Bapatla Engineering College, Bapatla, Guntur

Add properties in /usr/local/hadoop/etc/core-site.xml

Bapatla Engineering College, Bapatla, Guntur

Add properties in /usr/local/hadoop/etc/hadoop-env.sh

Un-Comment the JAVA HOME and Give Correct Path For

Bapatla Engineering College, Bapatla, Guntur

Bapatla Engineering College, Bapatla, Guntur

Add properties in ... etc/hadoop/hdfs-site.xml

Formatting the HDFS filesystem via the NameNode

The first step to starting up your Hadoop installation is

Bapatla Engineering College, Bapatla, Guntur

Starting single-node cluster

Bapatla Engineering College, Bapatla, Guntur

Stopping your single-node cluster

Run the command

Bapatla Engineering College, Bapatla, Guntur

Map Reduce programming paradigm

Bapatla Engineering College, Bapatla, Guntur

Map-Reduce Computation Steps

The key-value pairs from each Map task are collected by a

Bapatla Engineering College, Bapatla, Guntur

Bapatla Engineering College, Bapatla, Guntur

Hadoop - MapReduce (Word Count) Example

Bapatla Engineering College, Bapatla, Guntur

In WordCountMapper class we perform the following operations

Bapatla Engineering College, Bapatla, Guntur

Useful commands in eclipse

sudo apt-get install eclipse // for eclipse installation

Bapatla Engineering College, Bapatla, Guntur

Mapreduce Implementation in Java using Eclipse

Step 1: open Eclipse

Bapatla Engineering College, Bapatla, Guntur

Step 3: Right side it creates a project

Bapatla Engineering College, Bapatla, Guntur

Step 5: Importing External JAR files

Bapatla Engineering College, Bapatla, Guntur

WordCountMapper source code

public class WordCountMapper extends

Bapatla Engineering College, Bapatla, Guntur