Vous êtes sur la page 1sur 2

Machine Learning with Python- Why do they form the best

combination

Posted by Ivy Pro School on April 11, 2016 at 5:57am


View Blog

Machine Learning is being hailed as Next Generation Analytics. Machine Learning tasks can
be roughly classified as

Getting the data


Cleaning the data
Applying ML algorithms
Put the churned data in visualizations
Publish the results so that clients consume the information with ease

Python is turning out to be the preferred tool used in machine learning. Lets see
how it is used in the above steps.
Getting the data
Python is a leader here. Being a simple general purpose scripting programming
language, it has Application Programming Interfaces (APIs) that let it connect with a variety
of data sources Excel/CSV/Text files, databases, Hadoop file systems etc. More often than
not we need to scrap data from the web, deal with XML and JSON data types and we need
to parse that information. Python does all of that with ease and is way ahead of its
competitors in this space.

Cleaning the Data

Packages like scipy, numpy, pandas and sframes enableus to scale up to


gigabytes of data and process it in machines with commodity hardware. With very simple
functions, we can reshape data to more amenable forms for further processing.

Applying ML algorithms

With scikit learn and graphlab even the most sophisticated algorithms can be
implemented in a few lines of code. It is very easy to tweak parameters so that the
implementation suits ones needs.

Visualization Capabilities

Natively built matplotlib libraries can be used to build beautiful visualizations,


plots, 3d charts etc.

Publishing Capabilities

Python is again a leader. Being a general purpose programming language, it can


integrate seamlessly with any system if the results need to be pushed downstream. Real
time dashboards containing interactive visuals can be effortlessly built given its native
widgets. With Python Kivy, we can build mobile applications with ML embedded. We can
also export results standalone in static forms (HTML/CSV etc.) if need be.

In the corporate world, SAS is still the dominant tool for Statistical Analysis. But with the
advent of open source softwares like R and python, the trend is fast changing. Startups
want a cost effective infrastructure and even big enterprises are slowly switching to open
source solutions as the liscence costs seem to be weighing them down. So all in all, it can
be safely assumed that python will emerge as one of the primary technologies for
implementing ML.
This article has been contributed by a Machine learning and Big data enthusiast.

Vous aimerez peut-être aussi