Vous êtes sur la page 1sur 25

* Machine learning

and Neural
networks
-Impact on Big data

Big Data
* 3Vs
* Generating Buzz - Scientific data exponential growth
* 2012 - the year of Big Data
* 2013 - the year of Big Data analytics

*Machine Learning &


Big Data

Machine learning
* Branch of AI
* Focus on the study and construction of systems - predictions on
unseen data
* Applications in search engines, stock market analysis, speech
recognition ,information retrieval etc.

*ML History
1952- Arthur Samuel- first game-playing program - train checkers
Machine learning - gained momentum in early 90s
Learning algorithms come into commercial systems (Bayesian
networks )

Logistics Regression

Neural Network
Support Vector Machine
Clustering
Other popular algorithms:Random forest, Lasso, K-means, SVD etc

* Supervised and unsupervised

learning Algorithm

* Open source tools


R miner, Python, Waka, Graphlab, H2o, Octave

* Commercial Tools
Mahout, SaS, Matlab

* Tools used for machine learning


Weka detail

Scikit Python

How does learning Algorithm work ?


Linear
Linear Regression
Regression
Logistic
Logistic Regression
Regression
Neural
Neural network
network
Adaline
Adaline
Perceptron
Perceptron
K-Means
K-Means
SVM
SVM
Lasso
Lasso

Cost Function
Minimization

Gradient
Descend

https://www.coursera.org/course/ml

How easy is it to compute gradient ?

What if m is
3,000,000

Solution 1
Scale up your learning
algorithm for Big Data

In memory computing
summation is not
possible

Traditional
Traditional Learning
Learning
algorithm
algorithm Not
Not working
working
for
for big
big data
data

https://www.coursera.org/course/ml

How do is scale up ?

Stochastic Gradient Descend

Rsofia,Shortgun-r
Java-Lingpipe
Shortgun-Pyhon

10^5 training examples and more


than 10^5 features Easy Job!!!

More time to converge


In memory computing possible

https://www.coursera.org/course/ml

My data is even bigger and it takes a lot of time!!!


Map Reduce approach

Computer
4
Data
Split 000
By Map
300
Reduce

000
training
data

Computer
3

Computer 2

Computer
1

Guest Lecture on ML-Max Linn

Combine
results
Stochastic
Gradient
Descend

What if the algorithm is not additive?

Distributed Learning with Bagging

Guest Lecture on ML-Max Linn

What all kind of implementation are we


working on?
R and JAQL Bridge, similar
Bridge for Weka, Python

R-JAQL Bridge

Haloop
Haloop inherits map reduce from Hadoop. It adds various modification
in order to support iterative map reduce task.
HaLoop has API for easily writing iterative data analysis program
There is a Loop control module in master of Haloop which starts new
map reduce job and and control exit
In case of failure in iterative task the task scheduler and task trackers
facilitate recovery and allow the iterative data analysis to continue.

Apache Mahout
R-Hadoop

Haloop Details

Mahout Details

r-Hadoop

Implementations of
Machine learning
in Big data

Manufacturing and

Government

Banking and finance

Science and medicine

Quality control

Missile targeting

Loan underwriting

Specimen identification

Six Sigma

Criminal behaviour prediction

Credit scoring

Protein sequencing

Beer and wine flavour

Credit card fraud detection

Tumour and tissue diagnosis

Natural language Processing

Energy price prediction

Heart attack diagnosis

Telecommunication line fault

Real-estate appraisal

New drug effectiveness

Prediction of air and sea currents

Air and water quality

industry

prediction

detection

* Neural Networks

Application in Big Data

* NVIDIA
Built largest artificial neural network - purpose to simulate and learn behavior
of human brain.
Nearly 6.5 times larger than the one developed by google in 2012

* NUANCE
Leader in Natural Language Processing and speech recognition

* NETFLIX
Uses neural networks on big chunk of user data generated through websites predict better recommendations for its users.

* Predicting India Volatility Index


Employed on big data generated by stock market for online learning. Used to
forecast the upwards or downwards motion in next trading day's volatility
using India VIX (a volatility index based on the NIFTY Index Option prices)
based indicators.

* Sample implementation

- Location Graph
Combines first-party data the big data with platform based on
machine learning

Aim - Reaching the right audience at the right


time
Offline machine learning process based on Hive - data
preparation,
Weka & Hadoop Mahout - machine learning activities
R&

http
://blog.jiwire.com/how-big-data-enables-jiwire-to-deliver-30-or-more-lift-in-campaign-perfor
RMR
- statistics analysis and data visualization
mance/

Graph analytics

Commercial product
raphLab
arcData

Other Projects

Gappa-University of Washington Project


witter Cassovary
Neo4j
Giraph-an open source, Hadoop-based
Pregel clone developed at Facebook

http://gigaom.com/2013/05/14/were-witnessing-the-rise-of-the-graph-in-big-data/

Big data, crowdsourcing and machine learning tackle


Parkinsons

LIONsolver -
-was able to differentiate Parkinsons patients from healthy individuals
-show the trend in symptoms of the disease over time

p://successfulworkplace.com/2013/07/31/big-data-crowdsourcing-and-machine-learning-tackle-parkinsons/

University of Cambridge researcher Anastasios Noulas - choosing the best retail location.

* Apps that rely machine learning to work their

magic:

http://gigaom.com/2012/11/03/5-trends-that-are-changing-how-we-do-big-data/

raised $30.6 million for its "Insight Discovery" technology


- Inked deals with
General Electric, Citigroup, Merck, Anadarko, U.S. Food and Drug Administration,
Centers for Disease Control and Prevention, the University of California San
Francisco, Mount Sinai Hospital, Texas A&M University and Harvard Medical
School.

Big Data + Machine Learning + Crowd sourcing

High Usage of Machine Learning


Random Forest, Neural Network
High predictive accuracy of more than 95%

http://www.kaggle.com/

Big Data + Machine Learning + Quantum Computing

Exponential
Compression
Machine
Learning

Big Data

Real time
Search Of
Big Data
Large Data
Set

Training set

Q-App

http://www.eetimes.com/document.asp?doc_id=1319059

*Thank You

Vous aimerez peut-être aussi