Vous êtes sur la page 1sur 16

Terminology

Data Explosions & its reasons


What is Big Data
Hadoop Time Line
Hadoop in Action
Hadoop Ecosystem
HDFS
MapReduce
www.techdatasolution.co.in

info@techdatasolution.co.in

Hadoop Ecosystem

www.techdatasolution.co.in

info@techdatasolution.co.in

Module 1 - Introduction to Hadoop


Introduction to Data Science
Hadoop Architecture
Environment Requirements
Installing Hadoop
Modes of Hadoop operation
Configuring Pseudo Mode
What is the need to setup ssh
www.techdatasolution.co.in

info@techdatasolution.co.in

Data Science

www.techdatasolution.co.in

info@techdatasolution.co.in

Hadoop Architecture

Data Access Framework


Sqoop

Hive

Pig

Flume

Data Processing Framework (MapReduce)


Data Storage Framework (HDFS)

JVM
Operating System
Warehouse/Storage
www.techdatasolution.co.in

info@techdatasolution.co.in

Environment Requirements

www.techdatasolution.co.in

info@techdatasolution.co.in

Environment Requirements

Verifying installation of Java6


Using javac & java version commands

www.techdatasolution.co.in

info@techdatasolution.co.in

Installing Hadoop
Step1: Go to apache Hadoop Click here
Step2: Click on Stable folder
Step3: You have to download the xyz-bin.tar.gz file
Step5: Decompress the file as below:

Step6: Make your Hadoop pointing to already installed


Java, by setting JAVA_HOME in hadoop-env.sh file (hadoop/conf)

www.techdatasolution.co.in

info@techdatasolution.co.in

Modes of Hadoop
Local Standalone Mode:
Default
All the components run in single process
Pseudo distributed mode:
JVM bred for each of the component
Components communicate, resembles a mini cluster
Fully distributed Mode:
Hadoop, spread across machines
Each node may be general purpose/specific to a single component

www.techdatasolution.co.in

10

info@techdatasolution.co.in

Configuring Pseudo-Distributed Mode


Go to conf directory & observe the following 3 files:

Update the core-site.xml file:

www.techdatasolution.co.in

11

info@techdatasolution.co.in

Configuring Pseudo-Distributed Mode


Update the hdfs-site.xml file:

Update the mapred-site.xml file:

www.techdatasolution.co.in

12

info@techdatasolution.co.in

What is the need to setup passwordless ssh


Network protocol for secure data communication, between
two networked computers

SSH with a passphrase will prompt the user for a password


when connecting to the remote host

www.techdatasolution.co.in
www.techdatasolution.co.in

13

info@techdatasolution.co.in
info@techdatasolution.co.in

Revisit
Introduction to Data Science
Hadoop Architecture
Environment Requirements
Installing Hadoop
Modes of Hadoop operation
Configuring Pseudo Mode
What is the need to setup ssh
www.techdatasolution.co.in

14

info@techdatasolution.co.in

Techdata Solution
Go to Files tab

www.techdatasolution.co.in

15

info@techdatasolution.co.in

Thank You!!!

www.techdatasolution.co.in

16

info@techdatasolution.co.in

Vous aimerez peut-être aussi