Académique Documents
Professionnel Documents
Culture Documents
Prerequisite:
• System: Mac OS / Linux / Cygwin on Windows
Notice:
1. only works in Ubuntu will be supported by TA. You may
try other environments for challenge.
2. Cygwin on Windows is not recommended, for its
instability and unforeseen bugs.
Hadoop Setup
Single Node Setup (Usually for debug)
• Untar hadoop-*.**.*.tar.gz to your user path
About Version:
The latest stable version 1.0.1 is recommended.
Hadoop Setup
Execution
• generating ssh keygen. Passphrase will be omitted when
starting up:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
$ ssh localhost
Hadoop Setup
Execution(continued)
Hadoop Setup
Details About Configuration Files
Hadoop configuration is driven by two types of important
configuration files:
1. Read-only default configuration:
src/core/core-default.xml
src/hdfs/hdfs-default.xml
src/mapred/mapred-default.xml
conf/mapred-queues.xml.template.
2. Site-specific configuration:
conf/core-site.xml
conf/hdfs-site.xml
conf/mapred-site.xml
conf/mapred-queues.xml
Hadoop Setup
Details About Configuration Files (continued)
conf/core-site.xml:
Parameter Value Notes
fs.default.name URI of NameNode. hdfs://hostname/
conf/hdfs-site.xml:
Parameter Value Notes
Path on the local filesystem If this is a comma-delimited
where the NameNode list of directories then the
dfs.name.dir stores the namespace and name table is replicated in
transactions logs all of the directories, for
persistently. redundancy.
If this is a comma-delimited
Comma separated list of
list of directories, then data
paths on the local filesystem
dfs.data.dir will be stored in all named
of a DataNode where it
directories, typically on
should store its blocks.
different devices.
Hadoop Setup
Details About Configuration Files (continued)
conf/mapred-site.xml:
Parameter Value Notes
mapred.job.tracker Host or IP and port of JobTracker. host:port pair.
Path on the HDFS where where the Map/Reduce This is in the default filesystem (HDFS) and must be
mapred.system.dir framework stores system files e.g. accessible from both the server and client
/hadoop/mapred/system/. machines.
Comma-separated list of paths on the local
mapred.local.dir filesystem where temporary Map/Reduce data is Multiple paths help spread disk i/o.
written.
The maximum number of Map/Reduce tasks, which
Defaults to 2 (2 maps and 2 reduces), but vary it
mapred.tasktracker.{map|reduce}.tasks.maximum are run simultaneously on a given TaskTracker,
depending on your hardware.
individually.
If necessary, use these files to control the list of
dfs.hosts/dfs.hosts.exclude List of permitted/excluded DataNodes.
allowable datanodes.
If necessary, use these files to control the list of
mapred.hosts/mapred.hosts.exclude List of permitted/excluded TaskTrackers.
allowable TaskTrackers.
The Map/Reduce system always supports atleast
one queue with the name as default. Hence, this
parameter's value should always contain the string
default. Some job schedulers supported in Hadoop,
like the Capacity Scheduler, support multiple
queues. If such a scheduler is being used, the list of
configured queue names must be specified here.
Comma separated list of queues to which jobs can
mapred.queue.names Once queues are defined, users can submit jobs to
be submitted.
a queue using the property name
mapred.job.queue.name in the job configuration.
There could be a separate configuration file for
configuring properties of these queues that is
managed by the scheduler. Refer to the
documentation of the scheduler for information on
the same.
Hadoop Setup
You may get detailed information from
The official site:
http://hadoop.apache.org
Hadoop Setup