Vous êtes sur la page 1sur 38

Hadoop Setup

Version 1.x.x

Karina Hauser Associate Professor Management Information Systems

Apache Hadoop Projects

Ambari HBase Hive

Avro

Cassandra Mahout Pig

Chukwa

ZooKeeper

Apache Hadoop Related Projects

HDFS

YARN (Vers. 2) Hadoop Common


Apache Hadoop Subprojects

MapReduce

Hadoop Common: HDFS and MapReduce

Hadoop Setup Options


Three Options: Standalone (single Java process) Pseudo-Distributed (separate Java processes) Fully-Distributed

Prerequisites
Ubuntu server 12.04
Ubuntu desktop (for monitoring) Python (to add repositories) Oracle (Sun) Java 1.7.0_25
http://www.webupd8.org/2012/01/install-oracle-java-jdk-7-inubuntu-via.html

SSH

Check Setup
Check /etc/hosts
For pseudo-distributed mode: 127.0.0.1 localhost NO 127.0.1.1 entry !! ubuntu

Hadoop Downloads
Hadoop 1.1.2 current stable version Hadoop 2.0.5-alpha YARN version
Differences:
YARN HDFS Federation: multiple, redundant name nodes acting in congress Scalability beyond 4000 nodes

Download and Install


$ cd /usr/local
$ sudo wget http://apache.claz.org/hadoop/common/sta ble/hadoop-1.2.1.tar.gz $ sudo tar -xzvf hadoop-1.2.1.tar.gz

$ sudo mv hadoop-1.2.1 hadoop


$ sudo rm hadoop-1.2.1.tar.gz

User Permissions
$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hduser $ sudo chown -R hduser:hadoop hadoop

Hadoop Directories and Files


Conf configuration files directory
xxxx-env.sh environment core-site.xml master xxxx-site.xml node

Log log files (except for Pig)

Hadoop Configuration
Startup: .bashrc All Hadoop config files are in hadoop/conf Environment: hadoop-env.sh

Site specific:
Core: core-site.xml HDFS: hdfs-site.xml

MapReduce: mapred-site.xml

Fully-distributed mode:
conf/masters

conf/slaves

Set Environment Variables


Changes in ~/.bashrc
# Set home directory variables
export HADOOP_PREFIX=/usr/local/hadoop export PATH=$PATH:$HADOOP_PREFIX/bin export HADOOP_CONF_DIR=/usr/local/hadoop/conf

Set Java Environment Variable


Changes in .bashrc
# Set JAVA installation directory export JAVA_HOME=/usr/lib/jvm/java-7-oracle

Activate changes with


source .bashrc

Set Hadoop Environment Variables


Changes in hadoop-env.sh (in /hadoop/conf)
Set JAVA_HOME variable to current JAVA version export JAVA_HOME =/usr/lib/jvm/java-7-oracle

Create Hadoop Directories


Create temporary directory
$ sudo mkdir -p /app/hadoop/tmp $ sudo chown hduser:hadoop /app/hadoop/tmp

Changes in Hadoop Setup Files


Changes in core-site.xml
<property> <name>hadoop.tmp.dir</name> <value>/app/hadoop/tmp</value> <description> A base for other temporary directories. </description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:54310</value> <description>The name of the default file system</description> </property>

Changes in Hadoop Setup Files


Changes in hdfs-site.xml
<property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. Set to 1 for pseudo-distributed mode. </description> </property>

Changes in Hadoop Setup Files


Changes in mapred-site.xml
<property> <name>mapred.job.tracker</name> <value>localhost:54311</value> <description> The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property

Format HDFS
Existing data in HDFS will be erased !
$ bin/hadoop namenode -format

Generate SSH Key


$ ssh-keygen
No filename or passphrase $ ssh-copy-id -i ~/.ssh/id_rsa.pub localhost permanently added localhost to the list of know hosts

Starting Hadoop Daemons


All: $ start-all.sh NameNode and Datanode:
$ start-dfs.sh

Jobtracker and Tasktracker:


$ start-mapred.sh

Log output: HADOOP_PREFIX/logs

Check Daemons
jps
1367 Jps 8695 DataNode 8609 NameNode 6318 SecondaryNameNode 2600 NodeManager 2830 ResourceManager

If DataNode daemon is missing:


Delete content of /app/hadoop/tmp/dfs/data Reformat HDFS Restart Hadoop daemons

Web Interface
http://localhost:50030 Jobtracker Cluster status http://localhost:50070 NameNode

File system

Pig (Latin)

Pig Installation
$ cd /usr/local $ sudo wget http://mirror.sdunix.com/apache/pig/stable/pig -0.11.1.tar.gz $ sudo tar -xzvf pig-0.11.1.tar.gz $ sudo mv pig-0.11.1 pig $ sudo chown -R hduser:hadoop pig $ sudo rm pig-0.10.0.tar.gz

Set Environment Variables


Changes in .bashrc (of HDUSER)
# Set home directory variables export PIG_HOME=/usr/local/pig export PATH=$PATH:$PIG_HOME/bin

HBase

HBase Installation
$ cd /usr/local $ sudo wget http://apache.petsads.us/hbase/stable/hbase0.94.10.tar.gz $ sudo tar -xzvf hbase-0.94.10.tar.gz $ sudo mv hbase-0.94.10 hbase $ sudo chown -R hduser:hadoop hbase $ sudo rm hbase-0.94.10.tar.gz

Check Setup Again


Check /etc/hosts
For pseudo-distributed mode: 127.0.0.1 localhost NO 127.0.1.1 entry !! ubuntu-hadoop

Set Environment Variables


Changes in .bashrc (of HDUSER)
# Set home directory variables
export HBASE_HOME=/usr/local/hbase export PATH=$PATH:$HBASE_HOME/bin export HBASE_CONF_DIR=/usr/local/hbase/conf

Changes in HBase Setup Files


Changes in conf/hbase-env.sh
export JAVA_HOME =/usr/lib/jvm/java-7-oracle export HBASE_CLASSPATH=$HADOOP_CONF_DIR

Changes in HBase Setup Files


Changes in conf/hbase-site.xml
<property> <name>hbase.rootdir</name> <value>hdfs://localhost:54310/hbase</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>localhost</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property>

HBase Commands
Starting (Hadoop needs to be running) $ bin/start-hbase.sh Stopping (Stop before stopping Hadoop)

$ bin/stop-hbase.sh
User Interfaces

http://localhost:60010
http://localhost:60030

Check Daemons
jps
1367 Jps 8695 DataNode 8609 NameNode 6318 SecondaryNameNode 2600 NodeManager 2830 ResourceManager 2251 HRegionServer 3458 HMaster 2312 HQuorumPeer

HBase Shell
Starting a shell $ bin/hbase shell Help

$ help

Monitoring
http://localhost:50030 jobtracker http://localhost:50070 NameNode

http://localhost:60010 Master
http://localhost:60030 RegionServer

Integrating Hadoop and HBase


Changes to conf/hadoop-env,sh
export HADOOP_CLASSPATH= /usr/local/hbase/hbase-0.94.10.jar : /usr/local/hbase/lib/zookeeper-3.4.5.jar : /usr/local/hbase/lib/protobuf-java-2.4.0a.jar :$HADOOP_CLASSPATH

Questions ?

Vous aimerez peut-être aussi