Vous êtes sur la page 1sur 5

11/25/2015

CreatingaMultinodeHadoopSandbox|TECHtonka

TECHtonka
Hadoop,Itswhatsfordinner.

CreatingaMultinodeHadoopSandbox
Postedon27Aug14
OneofthegreatthingsaboutalltheHadoopvendorsisthattheyhavemadeitveryeasy
forpeopletoobtainandstartusingtheirtechnologyrapidly.IwillsaythatIthinkCloudera
hasdonethebestjobbyprovidingcloudbasedaccesstotheirdistributionviaCloudera
Live.AllvendorsseemstohaveaVMwareandVirtualBoxbasedsandbox/trialimage.
HavingworkedwithHortonworksIhavethemostexperiencewithandthoughtaquick
initialblogpostwouldbehelpful.
Whileonecouldsimplydothisinstallationfromscratchfollowingthepackage
instructions,itsalsopossibletoshortcircuitmuchofthesetupaswellastakeadvantage
ofthescaleddownconfigurationworkalreadyputintothevirtualmachineprovidedby
Hortonworks.InshorttheideawouldbetousetheVMasasinglemasternodeand
simplyadddatanodestothismaster.Runningthiswayprovidesandeasywaytoinstall
andexpandaninitialHadoopsystemuptoabout10nodes.Asthesystemgrowsyouwill
needtoaddRAMtonotonlythevirtualhostbuttoHadoopDaemonsasitscales.Afull
scriptisavailablehere.Belowisadescriptionoftheprocess.
Thegeneralstepsinclude:
1.TheSandbox
DownloadandinstalltheHortonworksSandboxasyourheadnodeinyourvirtualization
systemofchoice.Thesandboxtendstobeproducedpriortothelatestmajorrelease
(compareyumlisthadoop*\output).MakesureyouhavefirstenabledAmbaribyrunning
thescriptinrootshomedirectoryandreboot.
InordertomakesureyouareusingtheverylateststablereleaseandthattheAmbari
serverandagentdaemonshavematchingversionsupgradingiseasiest.Thisincludes
following:
1
2

HWXREPO="http://s3.amazonaws.com/public-repo-1.hortonworks.com"
export AMBARIREPO="http://$HWXREPO/ambari/centos6/1.x/updates/1.6.1/ambari.repo"

http://techtonka.com/?p=223

1/5

11/25/2015

CreatingaMultinodeHadoopSandbox|TECHtonka

3
4
5
6
7
8
9
10
11
12
13
14
15

wget $AMBARIREPO -O /etc/yum.repos.d/ambari.repo


ambari-server stop
ambari-agent stop
yum clean all
yum upgrade ambari-server ambari-log4j
[ -d /etc/ambari-server/conf.save ] && (mv /etc/ambari-server/conf.save
ambari-server upgrade
yum upgrade ambari-agent ambari-log4j
yum upgrade hdp_mon_nagios_addons
rpm -qa | grep ambari
ambari-server start
ambari-agent start
#POOF! Upgraded Ambari

2.TheNodes
Install1NCentos6.5nodesasslavesandprepthemasworkernodes.Thesecanbe
defaultinstallsoftheOSbutneedtobeonthesamenetworkastheAmbariserver.This
canalsobefacilitatedviapdsh(butthisrequirespasswordlessssh)ORbetteryetsimply
creatingonedatanodeimageviaaPXEbootenvironmentorsnapshotoftheVirtual
machinetoquicklyreplicate1Nnodeswiththesechanges.
IfyouwanttouseSSHyoucandothisfromtheheadnodetoquicklyenable
passwordlessSSH:
1
2
3
4
5
6
7
8
9

ssh-keygen -q -t rsa -f ~/.ssh/id_rsa -N ""


ssh-copy-id root@localhost

##loop over nodes


hostlist="hostname1 hostname2"
for i in $hostlist;do
ssh-copy-id root@$i
yum -y install pdsh
fi

Youthenwanttomakesureyoumakethefollowingchangestoyourslavenodes.Again
thiscouldeasilybedoneviapdshbypcdptheascripttoeachnodeandexecutingwith
thefollowingcontent.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

AMBSRV="HEADNODENAME"
# or insert local repo name here
HWXREPO="http://s3.amazonaws.com/public-repo-1.hortonworks.com"
export AMBARIREPO="http://$HWXREPO/ambari/centos6/1.x/updates/1.6.1/ambari.repo"
wget $AMBARIREPO -O /etc/yum.repos.d/ambari.repo
sed -i 's/SELINUX=permissive/SELINUX=disabled/g;s/SELINUX=enforcing/SELINUX=disabled
chkconfig --del iptables
iptables -F
service iptables stop
iptables -vnL
yum -y erase mysql-libs postgresql nagios ganglia ganglia-gmetad libganglia
wait
yum -y install net-snmp net-snmp-utils ntp wget
wait
service ntpd start
chkconfig --add ntpd
chkconfig --levels 35 ntpd on
JDKLOC="http://$HWXREPO/artifacts/jdk-7u45-linux-x64.tar.gz"
wget $JDKLOC -O /tmp/jdk-7u45-linux-x64.tar.gz

http://techtonka.com/?p=223

2/5

11/25/2015

CreatingaMultinodeHadoopSandbox|TECHtonka

20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

mkdir -p /usr/java
tar -C /usr/java -zxvf /tmp/jdk-7u45-linux-x64.tar.gz
wait
echo "export JAVA_HOME=/usr/java/jdk1.7.0_45" > /etc/profile.d/java.sh
echo "export PATH=/usr/java/jdk1.7.0_45/bin:$PATH" >> /etc/profile.d/
echo "export PDSH_SSH_ARGS_APPEND=\"-o StrictHostKeyChecking=no\"" >
source /etc/profile.d/java.sh
source /etc/profile.d/login.sh
wait
sed -i 's/gpgcheck=1/gpgcheck=0/' /etc/yum.conf /etc/yum.repos.d/*
yum clean all
wait
service iptables stop
yum -y install ambari-agent
wait
sed -i "s/^hostname=.*/hostname=$AMBSRV/" /etc/ambari-agent/conf/ambari
ambari-agent start

Pushthisfiletoslavenodesandrunit.ThisdoesNOTneedtobedoneonthe
sandbox/headnode.
1 pdcp -whost[1-5] ./scriptfile.sh root@~/
2 pdsh -whost[1-5] "chmod 755 /root/scriptfile.sh;/root/scriptfile.sh"

3.ConfigureServicesRuntheAmbariaddnodesGUIinstallertoadddatanodes.Be
suretoselectmanualregistrationandfollowtheonscreenpromptstoinstall
components.Irecommendinstallingeverythingonallnodesandsimplyturningthe
servicesoffandonasneeded.Alsoinstallingtheclientbinariesonallnodeshelpsto
makesureyoucandodebuggingfromanynodeinthecluster.

4.Turnoffselectservicesasrequired.
Thereshouldnowbe1Ndatanodes/slavesattachedtoyourAmbari/Sandboxheadnode.
Herearesomesuggestedchanges.
http://techtonka.com/?p=223

3/5

11/25/2015

CreatingaMultinodeHadoopSandbox|TECHtonka

1.TurnofflargeservicesyouarentusinglikeHBase,Storm,Falcon.Thiswillhelpsave
RAM.

2.DecommissiontheDatanodeonthismachine!No!aheadnodeisnotadatanode.If
yourunjobshereyouwillhaveproblems.
3.HDFSReplicationfactorThisissetto1inthesandboxbecausethereisonlyone
datanode.Ifyouonlyhave13datanodesthentriplereplicationdoesntmakesense.I
suggestyouuse1untilyougetover3datanodesatabareminimum.Ifyouhavethe
resourcesjuststartwith10datanodes(thatswhyitscalledBigData).Ifnotstickwith
replicationfactorof1butbeawarethiswillfunctionasaprototypesystemandwont
providethenaturalsafeguardsorparallelismofnormalHDFS.
4.IncreaseRAMtoHeadnodeAtabareminimumAmbarirequires4096MB.Ifyouplan
torunthesandboxasaheadnodeconsiderincreasingfromthisminimum.Alsoconsider
givingrunningservicesroomtobreathbyincreasingtheRAMallocatedinAmbariforeach
service.Hereisagreatreviewandscriptforguestimatinghowtoscaleservicesfor
MapReduceandYarn.
5.NFStomakeyourlifeeasieryoumightwanttoenableNFSonadatanodeortwo.

SH A R ETH IS:

LinkedIn

Twitter

Email

Google

Print

ThisentrywaspostedinHadoopbyoneadem12.Bookmarkthepermalink
http://techtonka.com/?p=223

4/5

11/25/2015

CreatingaMultinodeHadoopSandbox|TECHtonka

[http://techtonka.com/?p=223].

http://techtonka.com/?p=223

5/5

Vous aimerez peut-être aussi