Académique Documents
Professionnel Documents
Culture Documents
Application
Cahya Perdana
201583373
1. Create instances on Amazon Web Service
Setup storage each nodes, We recommend each node should have 29 GB.
When we are in configure security group just clik review and launch.
When there is dialog box to download key pair, click download key pair. This
key pair will be useful to try connect to all instances that we have made.
To make easy how we know which is the master node and slave node, we will
change Name instances to make easy us remember.
Now we add protocol that we would like to use in Security group. Add SSH
with port 22, All TCP with source anywhere, and All ICMP with source
anywhere. With all this settings, we can remote our instances and even we
can check by using Ping.
2. Remote instances
To remote our intstances, we need some tools:
Putty Key Generator : tool that can help you to generate key access to
connect our instances on amazon web service.
Putty: tool that will help you to connect to our instances using SSH.
First we are going to generate key. Import key from key pair that we
downloaded before.
Now open putty client, and then set SSH key in SSH menu.
Now set up all the hostname in our nodes like this table:
AMI
Public DNS
IP
Name
Master
slave2
slave1
secondary
ec2-54-191-198-212.us-west2.compute.amazonaws.com
ec2-52-11-196-185.us-west-2.compute.amazonaws.com
ec2-54-191-199-0.us-west-2.compute.amazonaws.com
ec2-54-191-198-251.us-west2.compute.amazonaws.com
172.31.45.31
172.31.45.30
172.31.45.29
172.31.45.28
Then we will upload our key generation to Master node, so Master node can
connect to the other nodes by using SSH. To Upload file to the master node,
we will use winscp. Set up connection in winscp to master node.
After we set up connection to master node, we will upload key from putty key
generation (hadoop.pem)
3. Download Hadoop
This instruction will apply to all nodes. First we have to upgrade our
repository so we can get the latest java SDK.
$sudo apt-get update
$sudo add-apt-repository ppa:webupd8team/java
Now we will check whether our SSH setting is running well, we run with
command:
$ssh ubuntu@ec2-54-191-198-251.us-west-2.compute.amazonaws.com.
Hadoop-env.sh
Core-site.xml
$vi $HADOOP_CONF/Core-site.xml
Now add this property:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://ec2-54-209-221-112.compute1.amazonaws.com:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/ubuntu/hdfstmp</value>
</property>
</configuration>
HDFS-site.xml
This file contains the configuration for HDFS daemons, the master node,
secondary master and slave nodes.
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
Mapred-site.xml
This file contains the configuration settings for MapReduce daemons;
<property>
<name>mapred.job.tracker</name>
<value>hdfs://ec2-54-191-198-212.us-west2.compute.amazonaws.com:8020</value>
</property>
Move all this configuration from master node to all the other nodes.
$scp Hadoop-env.sh core-site.xml hdfs-site.xml mapred-site.xml
ubuntu@ec2-54-191-198-251.us-west2.compute.amazonaws.com:/home/ubuntu/hadoop/conf
Masters
Slaves
Setup slaves
Setup the master file, we delete all the values in this.
Slaves
For the slaves file, add the next slave node.
6. Startup Daemon
The first step to start our Hadoop is formatting the Hadoop filesystem.
$hadoop namenode format
Ec2-54-191-198-212.us-west-2.compute.amazonaws.com:50070/dfshealth.jsp
1. Create wordCount.java
$vi wordCount.java
package org.myorg;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
public class WordCount {
public static class Map extends MapReduceBase implements
Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}
public static class Reduce extends MapReduceBase implements
Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}