Bienvenue sur Scribd !

Ignorer le carrousel

Cloud

Transféré par

Debankan Ganguly

0% ont trouvé ce document utile (0 vote)

119 vues11 pages

Cloud

Copyright

Formats disponibles

PDF, TXT ou lisez en ligne sur Scribd

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Signaler ce document

Cloud

Droits d'auteur :

Formats disponibles

Téléchargez comme PDF, TXT ou lisez en ligne sur Scribd

Signaler comme contenu inapproprié

0% ont trouvé ce document utile (0 vote)

119 vues11 pages

Cloud

Transféré par

Debankan Ganguly

Cloud

Droits d'auteur :

Formats disponibles

Téléchargez comme PDF, TXT ou lisez en ligne sur Scribd

Signaler comme contenu inapproprié

Passer à la page

Vous êtes sur la page 1sur 11

Rechercher à l'intérieur du document

Cloud Computing :

MapReduce - Tutorial
Prof. Soumya K Ghosh
Department of Computer Science and Engineering
IIT KHARAGPUR

1
Introduction
• MapReduce: programming model developed at Google
• Objective:
– Implement large scale search
– Text processing on massively scalable web data stored using BigTable and GFS distributed file
system
• Designed for processing and generating large volumes of data via massively parallel
computations, utilizing tens of thousands of processors at a time
• Fault tolerant: ensure progress of computation even if processors and networks fail
• Example:
– Hadoop: open source implementation of MapReduce (developed at Yahoo!)
– Available on pre-packaged AMIs on Amazon EC2 cloud platform

9/11/2017 2
MapReduce Model
• Parallel programming abstraction
• Used by many different parallel applications which carry out large-scale
computation involving thousands of processors
• Leverages a common underlying fault-tolerant implementation
• Two phases of MapReduce:
– Map operation
– Reduce operation
• A configurable number of M ‘mapper’ processors and R ‘reducer’ processors are
assigned to work on the problem
• The computation is coordinated by a single master process

9/11/2017 3
MapReduce Model Contd…
• Map phase:
– Each mapper reads approximately 1/M of the input from the global file
system, using locations given by the master
– Map operation consists of transforming one set of key-value pairs to
another:

– Each mapper writes computation results in one file per reducer

– Files are sorted by a key and stored to the local file system
– The master keeps track of the location of these files

9/11/2017 4
MapReduce Model Contd…

• Reduce phase:
– The master informs the reducers where the partial computations have been stored
on local files of respective mappers
– Reducers make remote procedure call requests to the mappers to fetch the files
– Each reducer groups the results of the map step using the same key and performs a
function f on the list of values that correspond to these key value:

– Final results are written back to the GFS file system

9/11/2017 5
MapReduce: Example
• 3 mappers; 2 reducers
• Map function:

• Reduce function:

9/11/2017 6
Problem-1

In a MapReduce framework consider the HDFS block size is 64 MB.

We have 3 files of size 64K, 65Mb and 127Mb. How many blocks will
be created by Hadoop framework?

9/11/2017 7
Problem-2

Write the pseudo-codes (for map and reduce functions) for calculating
the average of a set of integers in MapReduce.

Suppose A = (10, 20, 30, 40, 50) is a set of integers. Show the map and
reduce outputs .

9/11/2017 8
Problem-3
Compute total and average salary of organization XYZ and group by
based on gender (male or female) using MapReduce. The input is as
follows
Name, Gender, Salary
John, M, 10,000
Martha, F, 15,000
----

9/11/2017 9
Problem-4
Write the Map and Reduce functions (pseudo-codes) for the following Word
Length Categorization problem under MapReduce model.

Word Length Categorization: Given a text paragraph (containing only words),

categorize each word into following categories. Output the frequency of
occurrence of words in each category.

Categories:
tiny: 1-2 letters; small: 3-5 letters; medium: 6-9 letters; big: 10 or more letters

9/11/2017 10
11

Vous aimerez peut-être aussi

Bda - 3 Unit
Document18 pages
Bda - 3 Unit
ASMA UL HUSNA
Pas encore d'évaluation
The Map Reduce Programming
Document15 pages
The Map Reduce Programming
manjunath
Pas encore d'évaluation
By Christian Mechem and Geoff Crowley
Document11 pages
By Christian Mechem and Geoff Crowley
Christian Mechem
Pas encore d'évaluation
CCD-333 Exam Tutorial
Document20 pages
CCD-333 Exam Tutorial
Mohammed Haleem S
Pas encore d'évaluation
Dynamicmr: A Dynamic Slot Allocation Optimization Framework For Map Reduce Clusters
Document18 pages
Dynamicmr: A Dynamic Slot Allocation Optimization Framework For Map Reduce Clusters
Harikrishnan Shunmugam
Pas encore d'évaluation
HadoopMapreduce Summerization
Document24 pages
HadoopMapreduce Summerization
Atharv Chaudhari
Pas encore d'évaluation
Map Reduce
Document25 pages
Map Reduce
ahmadroheed
Pas encore d'évaluation
Survey Paper On Traditional Hadoop and Pipelined Map Reduce: Dhole Poonam B, Gunjal Baisa L
Document5 pages
Survey Paper On Traditional Hadoop and Pipelined Map Reduce: Dhole Poonam B, Gunjal Baisa L
International Journal of computational Engineering research (IJCER)
Pas encore d'évaluation
Introduction To Map Reduce
Document50 pages
Introduction To Map Reduce
KhAn Zainab
Pas encore d'évaluation
Chap 6 - MapReduce Programming
Document37 pages
Chap 6 - MapReduce Programming
Harshitha Raaj
Pas encore d'évaluation
Unit - III Advanced Analytics Technology and Tools
Document44 pages
Unit - III Advanced Analytics Technology and Tools
Diksha Chhabra
Pas encore d'évaluation
Big Data & Hadoop - Machine Learning: Ajay Kumar Assistant Professor-I Department of Computer Science & Engineering
Document37 pages
Big Data & Hadoop - Machine Learning: Ajay Kumar Assistant Professor-I Department of Computer Science & Engineering
Dank Boii
Pas encore d'évaluation
MapReduce in Cloud Computing Explained
Document10 pages
MapReduce in Cloud Computing Explained
Muhammad umar
Pas encore d'évaluation
Map Reduce: Simplified Processing On Large Clusters
Document29 pages
Map Reduce: Simplified Processing On Large Clusters
Joy Bagdi
Pas encore d'évaluation
Large-Scale Data Management: Cs525: Special Topics in Dbs
Document22 pages
Large-Scale Data Management: Cs525: Special Topics in Dbs
Pindiganti
Pas encore d'évaluation
Lecture 10 MapReduce Hadoop
Document37 pages
Lecture 10 MapReduce Hadoop
DAVID HUMBERTO GUTIERREZ MARTINEZ
Pas encore d'évaluation
AAAI2011 Tutorial Slides
Document213 pages
AAAI2011 Tutorial Slides
sleakaeu
Pas encore d'évaluation
Unit 2 Hadoop Framework
Document10 pages
Unit 2 Hadoop Framework
Aravindhan R
Pas encore d'évaluation
Parameterized Pipelined Map Reduce Based Approach For Performance Improvement of Parallel Programming Model
Document5 pages
Parameterized Pipelined Map Reduce Based Approach For Performance Improvement of Parallel Programming Model
Kavita Rahane
Pas encore d'évaluation
Term Paper Java
Document14 pages
Term Paper Java
Muskan Bharti
Pas encore d'évaluation
Sen-762 Advanced Big Data Analytics: Mapreduce
Document46 pages
Sen-762 Advanced Big Data Analytics: Mapreduce
بالیراجپوت
Pas encore d'évaluation
Map-Reduce Is A Programming Model That Is Mainly Divided Into Two
Document2 pages
Map-Reduce Is A Programming Model That Is Mainly Divided Into Two
Fahim Muntasir
Pas encore d'évaluation
Mapreduce and Hadoop Ecosystem
Document64 pages
Mapreduce and Hadoop Ecosystem
Rin Rin Nurmalasari
Pas encore d'évaluation
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
Document54 pages
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
Vignesh Srinivasan
Pas encore d'évaluation
Big Data Analytics Overview and Hadoop Distributions
Document23 pages
Big Data Analytics Overview and Hadoop Distributions
Hiran Suresh
Pas encore d'évaluation
Sem 7 - COMP - BDA
Document16 pages
Sem 7 - COMP - BDA
Raja Rajgonda
Pas encore d'évaluation
MapReduce: Major step backwards for databases
Document6 pages
MapReduce: Major step backwards for databases
Ravin Ravin
Pas encore d'évaluation
Implementation of K-Means Algorithm Using Different Map-Reduce Paradigms
Document6 pages
Implementation of K-Means Algorithm Using Different Map-Reduce Paradigms
Narayan Loke
Pas encore d'évaluation
201070046_BDA_03
Document10 pages
201070046_BDA_03
HARSH NAG
Pas encore d'évaluation
Unit 3 Bba
Document11 pages
Unit 3 Bba
rajendrameena172003
Pas encore d'évaluation
Chapter 4 - Understanding Map Reduce Fundamentals
Document45 pages
Chapter 4 - Understanding Map Reduce Fundamentals
WEGENE ARGOW
Pas encore d'évaluation
BY K.Karthikeyan: Hadoop & Map Reduce
Document7 pages
BY K.Karthikeyan: Hadoop & Map Reduce
Karthik S
Pas encore d'évaluation
Mapreduce: Simpli - Ed Data Processing On Large Clusters
Document4 pages
Mapreduce: Simpli - Ed Data Processing On Large Clusters
Ibrahim Hamza
Pas encore d'évaluation
Chapter 3MapReduce
Document30 pages
Chapter 3MapReduce
Komal
Pas encore d'évaluation
Hadoop: A Seminar Report On
Document28 pages
Hadoop: A Seminar Report On
Roshni Khairnar
Pas encore d'évaluation
Low-Latency, High-Throughput Access To Static Global Resources Within The Hadoop Framework
Document15 pages
Low-Latency, High-Throughput Access To Static Global Resources Within The Hadoop Framework
sandeep_nagar29
Pas encore d'évaluation
Distributed and Cloud Computing
Document58 pages
Distributed and Cloud Computing
18JE0254 CHIRAG JAIN
Pas encore d'évaluation
226 Unit-6
Document32 pages
226 Unit-6
shivam saxena
Pas encore d'évaluation
Intro To Apache Spark: Credits To CS 347-Stanford Course, 2015, Reynold Xin, Databricks (Spark Provider)
Document96 pages
Intro To Apache Spark: Credits To CS 347-Stanford Course, 2015, Reynold Xin, Databricks (Spark Provider)
Costi Stoian
Pas encore d'évaluation
777 1651400043 BD Module 4
Document21 pages
777 1651400043 BD Module 4
nimmy
Pas encore d'évaluation
7 Full Hadoop Performance Modeling For Job Estimation and Resource Provisioning
Document94 pages
7 Full Hadoop Performance Modeling For Job Estimation and Resource Provisioning
arunkumar
Pas encore d'évaluation
Splits Input Into Independent Chunks in Parallel Manner
Document4 pages
Splits Input Into Independent Chunks in Parallel Manner
Sayali Shivarkar
Pas encore d'évaluation
3a - MapReduce Data Flow Scheduling Combiner Partitioner PDF
Document22 pages
3a - MapReduce Data Flow Scheduling Combiner Partitioner PDF
23522020 Danendra Athallariq Harya P
Pas encore d'évaluation
Intro ToHadoop-Unit 04
Document24 pages
Intro ToHadoop-Unit 04
Asiya Khan
Pas encore d'évaluation
Paper Summary - MapReduce - Simplified Data Processing On Large Clusters (2004) - MeloSpace
Document7 pages
Paper Summary - MapReduce - Simplified Data Processing On Large Clusters (2004) - MeloSpace
Edward
Pas encore d'évaluation
A Brief On MapReduce Performance
Document6 pages
A Brief On MapReduce Performance
IJIRAE
Pas encore d'évaluation
He-Phan-Bo - Wyatt-Lloyd - L19-Big-Data - (Cuuduongthancong - Com)
Document16 pages
He-Phan-Bo - Wyatt-Lloyd - L19-Big-Data - (Cuuduongthancong - Com)
Hiếu Khổng
Pas encore d'évaluation
05-movies-data-analysis-using-mapreduce
Document20 pages
05-movies-data-analysis-using-mapreduce
mohammadkhaja.shaik
Pas encore d'évaluation
Module 3 (Part-1) - Big Data
Document46 pages
Module 3 (Part-1) - Big Data
sujith
Pas encore d'évaluation
Hadoop: A Report Writing On
Document13 pages
Hadoop: A Report Writing On
dilip kodmour
Pas encore d'évaluation
Shortnotes For Cloud
Document22 pages
Shortnotes For Cloud
Mahi Mahi
Pas encore d'évaluation
MapReduce Framework Breaks Down Big Data Processing into Map and Reduce Phases
Document11 pages
MapReduce Framework Breaks Down Big Data Processing into Map and Reduce Phases
Hasnain
Pas encore d'évaluation
Lecture 2 - Mapreduce: Cpe 458 - Parallel Programming, Spring 2009
Document26 pages
Lecture 2 - Mapreduce: Cpe 458 - Parallel Programming, Spring 2009
rhshriva
Pas encore d'évaluation
UNIT 4 Notes by ARUN JHAPATE
Document20 pages
UNIT 4 Notes by ARUN JHAPATE
Ankit “अंकित मौर्य” Mourya
Pas encore d'évaluation
Big Data Notes (All Lectures)
Document44 pages
Big Data Notes (All Lectures)
abdhatemsh
Pas encore d'évaluation
Cheat Sheet 1
Document2 pages
Cheat Sheet 1
Anusha Gupta
Pas encore d'évaluation
Unit 3 - Big Data Technologies
Document42 pages
Unit 3 - Big Data Technologies
prakash N
Pas encore d'évaluation
Hadoop: Data Processing and Modelling
D'Everand
Hadoop: Data Processing and Modelling
Garry Turkington
Pas encore d'évaluation
Hadoop Beginner's Guide
D'Everand
Hadoop Beginner's Guide
Garry Turkington
Évaluation : 4 sur 5 étoiles
4/5 (7)
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked: Job Interview Questions Series
D'Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked: Job Interview Questions Series
Vibrant Publishers
Pas encore d'évaluation
Iqms Check Qual
Document1 page
Iqms Check Qual
Debankan Ganguly
Pas encore d'évaluation
Iqms Check Qual
Document1 page
Iqms Check Qual
Debankan Ganguly
Pas encore d'évaluation
Abigail Iqms Ke Liye
Document1 page
Abigail Iqms Ke Liye
Debankan Ganguly
Pas encore d'évaluation
Noc19 cs21 Assignment3 PDF
Document3 pages
Noc19 cs21 Assignment3 PDF
Debankan Ganguly
Pas encore d'évaluation
Scan Doc by CamScanner
Document1 page
Scan Doc by CamScanner
Debankan Ganguly
Pas encore d'évaluation
Intellectual Traits Depth of Understanding PDF
Document1 page
Intellectual Traits Depth of Understanding PDF
Debankan Ganguly
Pas encore d'évaluation
Intellectual Traits Depth of Understanding
Document1 page
Intellectual Traits Depth of Understanding
Debankan Ganguly
Pas encore d'évaluation
Jauys Ygshgt HSH
Document1 page
Jauys Ygshgt HSH
Debankan Ganguly
Pas encore d'évaluation
Prelim Depth of Intellectual Understanding PDF
Document1 page
Prelim Depth of Intellectual Understanding PDF
Debankan Ganguly
Pas encore d'évaluation
Luck-Kol 8 - 25 PDF
Document1 page
Luck-Kol 8 - 25 PDF
Debankan Ganguly
Pas encore d'évaluation
Sample Document Yo
Document1 page
Sample Document Yo
Debankan Ganguly
Pas encore d'évaluation
Importance of HH
Document1 page
Importance of HH
Debankan Ganguly
Pas encore d'évaluation
Troisieme XJNDJDN
Document1 page
Troisieme XJNDJDN
Debankan Ganguly
Pas encore d'évaluation
Sample Document
Document1 page
Sample Document
Debankan Ganguly
Pas encore d'évaluation
Scan Doc by CamScanner
Document1 page
Scan Doc by CamScanner
Debankan Ganguly
Pas encore d'évaluation
Importance of The GG
Document1 page
Importance of The GG
Debankan Ganguly
Pas encore d'évaluation
Timestamp
Document1 page
Timestamp
Debankan Ganguly
Pas encore d'évaluation
Cloud Computing
Document134 pages
Cloud Computing
Siva Gowtham
Pas encore d'évaluation
Monitors
Document1 page
Monitors
Debankan Ganguly
Pas encore d'évaluation
A Quantum Swarm Evolutionary Algorithm for Mining Association Rules in Large Databases
Document6 pages
A Quantum Swarm Evolutionary Algorithm for Mining Association Rules in Large Databases
Anonymous TxPyX8c
Pas encore d'évaluation
The Behavior of Crowds PDF
Document332 pages
The Behavior of Crowds PDF
Debankan Ganguly
Pas encore d'évaluation
Theory Methodology and Implementation of Robotic Intelligence and Communication
Document6 pages
Theory Methodology and Implementation of Robotic Intelligence and Communication
John Ronquillo
Pas encore d'évaluation
Literature Review On Data Normalization and Clustering
Document4 pages
Literature Review On Data Normalization and Clustering
Bhupen Acharya
Pas encore d'évaluation
SQL Simplified in 10 Pages
Document11 pages
SQL Simplified in 10 Pages
chinnu benny
Pas encore d'évaluation
DBMS Lab Ex 1& 2
Document22 pages
DBMS Lab Ex 1& 2
Charumathi Gopalakrishnan
Pas encore d'évaluation
ADB Chapter Two
Document22 pages
ADB Chapter Two
ethiopia1212 etho
Pas encore d'évaluation
The Google File System
Document29 pages
The Google File System
hEho44744
Pas encore d'évaluation
Export DDL DML Oracle SQL Developer
Document11 pages
Export DDL DML Oracle SQL Developer
Venkat
Pas encore d'évaluation
Essential SQL Queries Explained
Document19 pages
Essential SQL Queries Explained
Saloni Jain
Pas encore d'évaluation
End-to-End Hadoop Development Using OBIEE, ODI and Oracle Big Data Discovery
Document82 pages
End-to-End Hadoop Development Using OBIEE, ODI and Oracle Big Data Discovery
Bada Sainath
Pas encore d'évaluation
DataDomain EMC Networker Whitepaper Best Practices
Document35 pages
DataDomain EMC Networker Whitepaper Best Practices
Juan Alanis
Pas encore d'évaluation
Database Administration - The Complete Guide To Practices and Procedures Cap14 p2
Document10 pages
Database Administration - The Complete Guide To Practices and Procedures Cap14 p2
edgar
Pas encore d'évaluation
Hyper Table
Document12 pages
Hyper Table
Sukhvinder Singh
Pas encore d'évaluation
Basic Database Concepts
Document24 pages
Basic Database Concepts
Abdul Bari Malik
Pas encore d'évaluation
Disk Base Tips and Trick
Document5 pages
Disk Base Tips and Trick
Fahmi Rachman Waas
Pas encore d'évaluation
s50mcc PDF
Document575 pages
s50mcc PDF
Antonio Marcos Santos
Pas encore d'évaluation
LAB03-Creating An ETL Solution With SSIS
Document9 pages
LAB03-Creating An ETL Solution With SSIS
hiba_cherif
Pas encore d'évaluation
Build Entity Relationship Diagrams (ERDs) - A Complete Guide
Document35 pages
Build Entity Relationship Diagrams (ERDs) - A Complete Guide
Rihab
Pas encore d'évaluation
SAP WM Auto TR, TO, Confirmation for Fixed Bin Scenario
Document32 pages
SAP WM Auto TR, TO, Confirmation for Fixed Bin Scenario
reddiprashanth
Pas encore d'évaluation
Advantage of Auxiliary Cloud Services
Document5 pages
Advantage of Auxiliary Cloud Services
Vijay Gangarde
Pas encore d'évaluation
The Bank Management System in VB 6
Document32 pages
The Bank Management System in VB 6
sangeetha s
Pas encore d'évaluation
Bda
Document2 pages
Bda
Jigar
Pas encore d'évaluation
Infosys-Normalisation-2nd 3rd BCNF
Document31 pages
Infosys-Normalisation-2nd 3rd BCNF
Raj Kumar
Pas encore d'évaluation
IMS HALDB Administration: IBM Software Group
Document55 pages
IMS HALDB Administration: IBM Software Group
Γιώργος Μπότσιος
Pas encore d'évaluation
Check promo material BOM
Document3 pages
Check promo material BOM
Jose Grill
Pas encore d'évaluation
BIGuidebook Templates - BI Project Plan
Document15 pages
BIGuidebook Templates - BI Project Plan
wingkitcwk
Pas encore d'évaluation
007-Backup and Disaster Recovery V1.01
Document42 pages
007-Backup and Disaster Recovery V1.01
Phan An
0% (1)
Iisql
Document4 pages
Iisql
Dintita
Pas encore d'évaluation
Informatica Document
Document74 pages
Informatica Document
Babu Raja
Pas encore d'évaluation
Analisis Perkembangan Coffee Shop Sebagai Salah Sa
Document8 pages
Analisis Perkembangan Coffee Shop Sebagai Salah Sa
Muhammad Fikri
Pas encore d'évaluation
SQL Create Index Statement
Document2 pages
SQL Create Index Statement
mateigeorgescu80
Pas encore d'évaluation
Data Organization
Document20 pages
Data Organization
MDatta
Pas encore d'évaluation