Vous êtes sur la page 1sur 23

Spark

- Lightning-Fast Cluster
Compu6ng by Example
Ramesh Mudunuri
Saturday, November 29, 2014

About me
Big data enthusiast
Product developer using spark technology

What to expect

Introduc6on to Spark
Spark Eco system
How is it di from Hadoop Map Reduce
Where it shine well
How easy to install and start learning
Small code demos
Where to nd addi6onal informa6on

This is not
Training class
Work shop
Product demo with commercial interest

What is Spark
Apache Spark is a fast and general engine for
large-scale data processing.
General purpose large-scale high performance
processing engine

Spark History
Started as research project at
UC Berkeley amplab in 2010 and now a
apache open source project
Prominent research team member Matai
Zaharia
Later Ma6a started company Databricks
Now Apache open source project

What is Spark
Apache Spark is a fast and general engine for
large-scale data processing.

Why is Spark so special


Speed - faster processing engine
In-memory
Developer friendly
More then one language : Java, Scala and Python
We dont have to manually specify Map/Reduce
opera6ons

Why is Spark so special..


Tools- well stacked and easy to use tools
Can run in various setups
Standalone (learners favorite)
Cluster, EC2,
Yarn, Mesos

Read data from


Local le system
HDFS
Hbase, Cassandra and

RDD (Resilient Distributed Datasets) my favorite

Spark Ecosystem

Spark SQL

Spark Streaming

Mllib- Machine leaning

GraphX

Im not Spark expert, but have some


working knowledge
Spak SQL
Spark Mllib
Will be showing some code demos

Under the hood(Internals)

How is it dierent from


Hadoop MapReduce
List here

Where it shine well


List here some use cases

How easy to install and start learning


Show by quickly installing How easy to
installing on your laptop
Give some parameter check lists

Code Demos
Write some interes6ng code snippets on REPL
using scala
1. Read Meetup par6cipants into get some
counts -
2. create as table get some counts with SPARK
SQL
3. Mllib example

My Live project SAMPLE screen


1. Compare screen
2 D3 data distribu6on
3. MLlib screen

Where to nd addi6onal informa6on

List
Matai Papers
Spark Documenta6on
Spark Summit videos
Books
Workshop
Databricks
My twiger handle

Final note
Thank you - Hosts and Par6cipants
Share the knowledge