Vous êtes sur la page 1sur 39

Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Big Data & Hadoop

N. Siva Ram Prasad


Professor & Head

Department of Information Technology


Bapatla Engineering College
Bapatla

November 22, 2016

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 1 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

1 Hadoop Installation

2 Hadoop Configuration

3 Starting & Stopping

4 Map Reduce

5 Mapreduce Implementation

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 2 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Requirements

Necessary Optional
Java Version 1.7 or above Eclipse
ssh ( Secure shell ) Internet
Linux OS (Ubuntu Version 14.04 or above) connec-
tion
Hadoop framework

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 3 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Java 7 & Installation

Hadoop requires a working Java installation. However, using


java 1.7 or more is recommended.
Following command is used to install java in linux platform
sudo apt-get install openjdk-7-jdk (or)
sudo apt-get install default-jdk
Following command is used to check Java installation
java -version (or)
javac -version

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 4 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Java PATH Setup


We need to set JAVA path
To know java path, run the following command
update-alternatives --config java
Open the file /etc/environment
gedit /etc/environment
Append the following statement to the file
JAVA HOME=/usr/lib/jvm/java7openjdkamd64
source /etc/environment
echo $JAVA HOME
Open the .bashrc file located in home directory
gedit /.bashrc
Append the following statement to the file
export JAVA HOME=/usr/lib/jvm/java7openjdkamd64
Run the following command source /.bashrc
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 22, 2016 Slide: 5 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Installation & Configuration of SSH

Hadoop requires SSH(Secure Shell) access to manage its


nodes, i.e. remote machines plus your local machine if you
want to use Hadoop on it.
Install SSH using the following command
sudo apt-get install ssh
Generate public and private keys using RSA or DSA for user.
ssh-keygen -t dsa -P -f /.ssh/id dsa
Copy the published keys to the authorized keys folder. cat
/.ssh/id dsa.pub >> /.ssh/authorized keys
Check the installation by running the following command at
the command prompt. ssh localhost

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 6 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Download & Extract Hadoop

Download Hadoop 2.7.2 (binary)from any mirror available at the


following link

http://hadoop.apache.org/releases.html

Extract the contents of the Hadoop package to a location of your


choice. I picked /usr/local/hadoop.
$ cd /usr/local
$ sudo tar xvzf hadoop-2.7.2.tar.gz
$ sudo mv hadoop-2.7.2 hadoop

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 7 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Add Hadoop configuration in .bashrc


Add Hadoop configuration in .bashrc in home directory.
export HADOOP INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP INSTALL/bin
export PATH=$PATH:$HADOOP INSTALL/sbin
export HADOOP MAPRED HOME=$HADOOP INSTALL
export HADOOP HDFS HOME=$HADOOP INSTALL
export HADOOP COMMON HOME=$HADOOP INSTALL
export YARN HOME=$HADOOP INSTALL
export
HADOOP COMMON LIB NATIVE DIR=$HADOOP INSTALL/lib/native
export
HADOOP OPTS="-Djava.library.path=$HADOOP INSTALL/lib"
Run the following command source /.bashrc
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 22, 2016 Slide: 8 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Create temp file, DataNode & NameNode

Execute below commands to create NameNode


mkdir -p /usr/local/hadoopdata/hdfs/namenode
Execute below commands to create DataNode
mkdir -p /usr/local/hadoopdata/hdfs/datanode
Execute below code to create the tmp directory in hadoop
sudo mkdir -p /app/hadoop/tmp
sudo chown hadoop1:hadoop1 /app/hadoop/tmp
sudo chmod 750 /app/hadoop/tmp

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 9 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Files to Configure

The following are the files we need to configure


core-site.xml
hadoop-env.sh
mapred-site.xml
hdfs-site.xml

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 10 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Add properties in /usr/local/hadoop/etc/core-site.xml


Add the following snippets between the
< configuration > ... < /configuration > tags in the core-site.xml
file.
Add below property to specify the location of tmp
< property >
< name > hadoop.tmp.dir < /name >
< value > /app/hadoop/tmp < /value >
< /property >
Add below property to specify the location of default file
system and its port number.
< property >
< name > fs.default.name < /name >
< value > hdfs : //localhost : 54310 < /value >
< /property >

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 11 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Add properties in /usr/local/hadoop/etc/hadoop-env.sh

Un-Comment the JAVA HOME and Give Correct Path For


Java.
export JAVA HOME=/usr/lib/jvm/java-7-openjdk-amd64

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 12 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Add property in
/usr/local/hadoop/etc/hadoop/mapred-site.xml

In file we add The host name and port that the MapReduce job
tracker runs at. Add following in mapred-site.xml :
< property >
< name > mapred.job.tracker < /name >
< value > localhost : 54311 < /value >
< /property >

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 13 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Add properties in ... etc/hadoop/hdfs-site.xml


In file hdfs-site.xml add following:
Add replication factor
< property >
< name > dfs.replication < /name >
< value > 1 < /value >
< description > Defaultblockreplication < /description >
< /property >
Specify the NameNode
< property >
< name > dfs.namenode.name.dir < /name >
< value > file : /usr /local/hadoopdata/hdfs/namenode < /value >
< /property >
Specify the DataNode
< property >
< name > dfs.datanode.name.dir < /name >
< value > file : /usr /local/hadoopdata/hdfs/datanode < /value >
< /property >
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 22, 2016 Slide: 14 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Formatting the HDFS filesystem via the NameNode

The first step to starting up your Hadoop installation is


Formatting the Hadoop file system
We need to do this the first time you set up a Hadoop.
Do not format a running Hadoop filesystem as you will lose all
the data currently in HDFS
To format the filesystem, run the command
hadoop namenode -format
Look at the status code ( 1 Error, 0 Success )

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 15 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Starting single-node cluster


Run the command:
start-all.sh
This will startup a NameNode,SecondaryNameNode,
DataNode, ResourceManager and a NodeManager on your
machine.
A nifty tool for checking whether the expected Hadoop
processes are running is jps
hadoop1@hadoop1:/usr/local/hadoop$ jps
2598 NameNode
3112 ResourceManager
3523 Jps
2917 SecondaryNameNode
2727 DataNode
3242 NodeManager

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 16 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Stopping your single-node cluster

Run the command


stop-all.sh
To stop all the daemons running on your machine output will be
like this.
stopping NodeManager
localhost: stopping ResourceManager
stopping NameNode
localhost: stopping DataNode
localhost: stopping SecondaryNameNode

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 17 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Map-Reduce Framework

Map Reduce programming paradigm


It relies basically on two functions, Map and Reduce
Map Reduce used to manage many large-scale computations
The framework takes care of scheduling tasks, monitoring
them and re-executes the failed tasks.
The framework to effectively schedule tasks on the nodes
where data is already present

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 18 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Map-Reduce Computation Steps

The key-value pairs from each Map task are collected by a


master controller and sorted by key. The keys are divided
among all the Reduce tasks, so all key-value pairs with the
same key wind up at the same Reduce task.
The Reduce tasks work on one key at a time, and combine
all the values associated with that key in some way. The
manner of combination of values is determined by the code
written by the user for the Reduce function.

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 19 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Hadoop - MapReduce

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 20 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Hadoop - MapReduce (Word Count) Example

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 21 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

MapReduce - WordCountMapper

In WordCountMapper class we perform the following operations


Read a line from file
Split line into Words
Assign Count 1 to each word

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 22 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Useful commands in eclipse

sudo apt-get install eclipse // for eclipse installation


ctrl + spacebar // for context sensitive help
ctrl + shift + f // for formatting the code
ctrl + 1 // assign to a variable or for a class implement the
unimplemented methods
RClick RunAs Runconfigurations JavaApplication
Arguments Apply Run // to set command line parameters
and run the code

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 23 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Mapreduce Implementation in Java using Eclipse

Step 1: open Eclipse


Step 2: Go to File New Project
Select Java Project and click on Next button
Write project name and click on Finish button

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 24 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Continue...

Step 3: Right side it creates a project


1 Right click on Project New Class
2 Write Name of Class and then Click Finish
3 Write Mapper program in that class
Step 4: Similarly write Reducer and Job related JAVA Programs

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 25 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Continue...

Step 5: Importing External JAR files


1 RC on Project and select properties (Alt+Enter)
2 Select Java Build Path Click on Libraries, then click on
add external JARS
3 Select the following jars from Hadoop library.
/usr/local/Hadoop/share/Hadoop/common/libs
/usr/local/Hadoop/share/Hadoop/hdfs/libs
/usr/local/Hadoop/share/Hadoop/httpfs/libs
/usr/local/Hadoop/share/Hadoop/mapreduce/libs
/usr/local/Hadoop/share/Hadoop/yarn/libs
/usr/local/Hadoop/share/Hadoop/tools/

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 26 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

WordCountMapper source code

public class WordCountMapper extends


Mapper<LongWritable, Text, Text, LongWritable>{
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
// read the file line by line
String line = value.toString();
// split the line into words
String[] words = line.split(" ");
// assign count(1) to each word
for (String word : words) {
context.write(new Text(word), new LongWritable(1));
}
}
}

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 27 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

MapReduce - WordCountReducer

In WordCountReducer class we perform the following operations


Sum the list of values
Assign sum to corresponding word

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 28 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

WordCountReducer source code

public class WordCountReducer extends


Reducer<Text,LongWritable,Text,LongWritable> {
protected void reduce(Text key, Iterable<LongWritable> values,
Context context) throws IOException, InterruptedException {
//sum the list of values
long sum = 0;
for (LongWritable value : values) {
sum = sum + value.get();
}
//assign the sum to corresponding word or key
context.write(key, new LongWritable(sum));
}
}

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 29 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

WordCountJob

public class WordCountJob implements Tool{


// initializing the configuration object
private Configuration conf;
public Configuration getConf() {
return conf; // return configuration object
}
public void setConf(Configuration conf) {
this.conf = conf; // set the configuration object
}

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 30 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

run method
public int run(String[] args) throws Exception {
// initializing the job object with configuration
Job job = new Job(getConf());
job.setJobName("Word count job");
job.setJarByClass(this.getClass()); // main job class
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
return job.waitForCompletion(true) ? 0 : 1;
}
Bapatla Engineering College, Bapatla, Guntur
Big Data & Hadoop
November 22, 2016 Slide: 31 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

main method

public static void main(String[] args) throws Exception {


int status = ToolRunner.run(new Configuration(), new
WordCountJob(), args);
System.out.println("My Status: " + status);
}

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 32 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Header Files to include

import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 33 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

main method

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 34 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Continue ....

Step 6: Set input file path


1 Create folder in home dir
2 copy text files in to that
3 Select path of Input
Step 7: Set input and output path
Step a: RC any where on WordCountJob RunAS Run
Configurations
Step b: Java Application Arguments Apply Run

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 35 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Web interface to Hadoop cluster

1 http://localhost:50070 // Namenode interface


2 http://localhost:8088 // Resource manager interface

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 36 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Useful commands in hadoop

http://machine name:50070
http://machine name:8042
http://machine name:8088
kill -9 PID
hadoop fs -touchz /demo/test
hadoop fs -setrep -w 3 /demo/test
hadoop home/share/hadoop/mapreduceexamples
hadoop jar jarfilename mainclass parameters
/usr/local/hadoop/etc/hadoop/slaves
/usr/local/hadoop/etc/hadoop/masters
/etc/hostname
sudo reboot

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 37 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

Useful commands in hadoop

sudo rm -rf /usr/local/hadoop/hadoop data/


sudo mkdir -p /usr/local/hadoop/hadoop data/hdfs/datanode
sudo chown -R uname:gname /usr/local/hadoop/
ssh -copy -id -i /.ssh/id dsa.pub uname@hostname
ssh hostname
sudo apt-get install eclipse
ctrl + space for context sensitive help
ctrl + shift + f for formatting the code
ctrl + 1 assign to a variable or for class implement the
unimplemented methods
Rgt.Click + RunAs + Runconfigurations + JavaApplication +
Arguments

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 38 / 38
Outline Hadoop Installation Hadoop Configuration Starting & Stopping Map Reduce Mapreduce Implementation

thank You

Bapatla Engineering College, Bapatla, Guntur


Big Data & Hadoop
November 22, 2016 Slide: 39 / 38

Vous aimerez peut-être aussi