Vous êtes sur la page 1sur 19

Shorten the Learning Curve!

Download 10+ Bonus Resources for Data Science & Machine Learning



Free Data Science Resources for Beginners

 MAY 26, 2017

1.3k  Share (https://www.facebook.com/sharer.php?


 Google (https://plus.google.com/share?

 Linkedin (https://www.linkedin.com/shareArticle?

 (h
 Tweet (https://twitter.com/intent/tweet?

In this guide, we’ll share 65 free data science resources that we’ve hand-picked and
annotated for beginners.

To become data scientist, you have a formidable challenge ahead. You’ll need to

master a variety of skills, ranging from machine learning to business analytics.

However, the rewards are worth it. Organizations will prize alchemists who can

turn raw data into smarter decisions, better products, happier customers, and
ultimately more profit. Plus, you’ll get to solve interesting problems and master new,
impactful technologies.

If that sounds like a career you’d enjoy, then bookmark this page and read on because
we compiled this list just for you.

Free: Beginner Resources

Get a resource guide PDF with hand-picked beginner resources + plenty of other
free cheatsheets, checklists, worksheets, and resources in our Subscriber Vault.

First Name


Send My Download

Data Science Resources

1. Foundational Skills
Programming and Data Wrangling
Statistics and Probability
2. Technical Skills
Data Collection
Data Visualization
Applied Machine Learning
3. Business Skills
Creativity and Innovation
Operations and Strategy
Business Analytics
4. Supplementary Skills
Natural Language Processing
Recommendation Systems
Time Series Analysis
5. Practice
Problem Solving Challenges

*Note: Advanced, Niche, or Industry-Specific Skills

Certain roles might require other skills, such as:

Deep Learning, Big Data, Optimization, Anomaly Detection, Graph and Network Models,
Quantitative Finance, Research Leadership, Project Management, Product Design,
Software Engineering, Spacial Data Analysis, etc...

In this guide, we'll only be covering the skills that are most frequently demanded

across the industry.

1. Foundational Skills

Foundational skills form the basis of true understanding, which will in turn allow you
to discover novel solutions, build more accurate models, and make better decisions.

1.1. Programming and Data Wrangling

First, you'll need to know at least one scripting language well enough to wrangle
datasets, prototype models, and perform analyses.

We strongly recommend choosing between Python or R, as they are both open-source

(free), widely adopted, and supported by active communities. They each have their
own strengths, but we recommend picking just one at the start.

Python is more common in software startups, large tech firms, and adTech.
Python tends to be more flexible because it's a general purpose programming
language. It's also better for deep learning and processing data.
R / RStudio is popular in research, finance, and analytics. R is a statistical
programming language that has mature libraries for econometrics, statistics, and
machine learning.
We've also written a more detailed comparison of Python vs. R for data science

If you're still on the fence, we'd recommend starting with Python due to its breadth
and flexibility (and it's a bit more beginner-friendly).

Tip: Each resource link below opens in a new tab, so you won't lose your place.

Python Resources:

Learn Python the Hard Way (Online Book)

(https://learnpythonthehardway.org/book/) -  Recommended for beginners who
want a complete course in programming with Python.
LearnPython.org (Interactive Tutorial) (http://www.learnpython.org/) -  Short,
interactive tutorial for those who just need a quick way to pick up Python syntax.
How to Think Like a Computer Scientist (Interactive Book)
(http://interactivepython.org/runestone/static/thinkcspy/index.html) - Interactive
"CS 101" course taught in Python that really focuses on the art of problem solving.
This goes beyond the bare minimum needed to get started, but it's such a
wonderful gem that we had to include it here.
PythonChallenge.com (Online Puzzle) (http://www.pythonchallenge.com/) - Fun
puzzle with 33 levels that you can solve with Python programming.
How to Learn Python for Data Science, The Self-Starter Way
(http://elitedatascience.com/learn-python-for-data-science) - Our guide that
covers these resources in more detail.
A Beginner's Guide to SQL, Python, and Machine Learning
machine-learning-white-paper) - We've partnered with General Assembly to bring
you a concise overview of how these core technologies power modern business.

R / RStudio Resources:
R for Data Science (Online Book) (http://r4ds.had.co.nz/introduction.html) -
Recommended for beginners who want a complete course in data science with R.
Swirl (Interactive R Package) (http://swirlstats.com/) - Very cool R package that
you can install and learn the language directly from inside RStudio (the most
common interface used to run R).
Introduction to Data Science with R (Video Series)
PbHM39cwCU0PF&index=1) - For those who learn better by watching someone
else walk through the steps.

1.2. Statistics and Probability

A strong statistics foundation helps you fully understand machine learning,

conditional probability, A/B testing, and many other core skills. It also helps you "think
like a data scientist," which include spotting biases, efficiently iterating on predictive
models, and knowing how to extract insights from data.

Plus, learning the common probability distributions (especially Gaussian,

Binomial, Uniform, Exponential, Poisson) is critical for implementing many real-world
applications, such as multi-armed bandits, market-basket analyses, and anomaly
detection programs.

Statistics and Probability (Khan Academy)

(https://www.khanacademy.org/math/statistics-probability) -
Practical introduction to statistics and probability from Khan Academy.
Recommended for getting up to speed quickly.
Harvard Stats 110: Probability (Video Series) (https://www.youtube.com/watch?
v=KbB0FjPg0mw) - Rigorous treatment of probability theory from Harvard.
Recommended for building deeper mastery.
Think Stats: Probability and Statistics for Programmers (PDF)
(http://greenteapress.com/thinkstats/thinkstats.pdf) - Excellent resource for
those with programming backgrounds. Quote: "The thesis of this book is that if
you know how to program, you can use that skill to help you understand
probability and statistics."
Crash Course on Basic Statistics (PDF)
(http://cbmm.mit.edu/sites/default/files/documents/probability_handout.pdf) -
Short PDF that covers a whirlwind review of key topics. We like this review sheet
because it has simple intuitive explanations for each concept.
How to Learn Statistics for Data Science, The Self-Starter Way
(http://elitedatascience.com/learn-statistics-for-data-science) - Our guide that
covers these resources in more detail.

2. Technical Skills

Data science is all about converting raw data into insights, predictions, software, and
so on. Therefore, you'll need to be comfortable working with data.

Core technical skills include collecting, cleaning, managing, and visualizing data, plus
the big umbrella of applied machine learning.

2.1. Data Collection

Everything hinges on the quality and quantity of your data. Just as a chemist needs

the right chemicals, you'll need relevant data.

There are 4 common ways to collect data:

1. Internal Data. This is proprietary data that your company collects through its

operations or through partnerships with other providers. This is usually
the most relevant data.
2. Searching Online. Need a labeled set of 8 million videos? There's a webpage for
that... (https://research.google.com/youtube8m/) Seriously, you'd be surprised at
what you can find out there. Online datasets allow you to prototype
before investing in proprietary data.
3. API's. API's allow you to programmatically (and legally) access datasets that other
companies collect. You can find anything from Twitter feeds to weather data to
financial data.
4. Web Scraping. Web crawling and scraping is a powerful tool that you must use
responsibly. It opens a whole new world, but make sure to respect terms of

API Resources:

Python: requests Quickstart Guide (Tutorial) (http://docs.python-

requests.org/en/master/user/quickstart/) - How to use the requests library to
request data from API's.
R: httr Quickstart Guide (Tutorial) (https://cran.r-
project.org/web/packages/httr/vignettes/quickstart.html) - How to use the httr
library to request data from API's.

Web Scraping Resources:

R: rvest (Tutorial) (https://rpubs.com/Radcliffe/superbowl) - Basic web scraping

with the rvest library.
Python Web Scraping Libraries (http://elitedatascience.com/python-web-
scraping-libraries) - Our overview of the Python web scraping landscape.

2.2. SQL

SQL is the lingua franca for database management and querying, and you should be
able to write complex queries.

Learning SQL also gives a better understanding of relational data in general (i.e. data
in "table" format), which will improve your data analysis skills in any language.

Intro to SQL by Khan Academy (Course)

(https://www.khanacademy.org/computing/computer-programming/sql) -
Comprehensive video series that covers every important SQL topic.
sqlcourse.com (Interactive Tutorial) (http://www.sqlcourse.com/) - Great to
use review or a quick crash course.
SQL Fundamentals (Course) (https://www.sololearn.com/Course/SQL/) - Course
that covers the basics of SQL. Includes quizzes along the way to test your

2.3. Data Visualization

Data visualization is important for exploratory analysis and for communicating your

insights, and no list of data science resources would be complete without this topic.

Raw data can be difficult to interpret, so you'll need to investigate trends and

distributions with plots and charts.

Data Visualization in Python (Video Series) (https://www.youtube.com/watch?

v=q7Bo_J8x_dw&list=PLQVvvaa0QuDfefDfXb9Yf0la1fPDKluPF)- Tutorial on using
the matplotlib library in Python.
Data Visualization in R (Video Series) (https://www.youtube.com/watch?
Tutorial on using the ggplot library in R.
Python Seaborn Tutorial (http://elitedatascience.com/python-seaborn-tutorial) -
Our tutorial for the seaborn library in Python, which we strongly recommend for

2.4. Applied Machine Learning

Machine learning is a broad umbrella term that contains many sub-tasks. In a

nutshell, it's about teaching computers how to learn patterns and models from data.

To some people, machine learning is synonymous with data science, but we consider

it a separate field that heavily overlaps with data science. There's no doubt that
machine learning is a powerful toolset, and it's the meatiest skill on this list.

Machine Learning by Andrew Ng (Video Series)

A4rycgrgOYma6zxF4BZGGPW&index=1) - This is the gold standard when it comes
to learning the theory behind machine learning courses.
Elements of Statistical Learning (PDF)
(http://statweb.stanford.edu/~tibs/ElemStatLearn/) - Reference text. This is one
of the classic textbooks of the industry, but it requires a solid math background.
An Introduction to Statistical Learning in R (PDF) (http://www-
bcf.usc.edu/~gareth/ISL/ISLR%20Sixth%20Printing.pdf) - Reference text. Another
classic textbook that has gentler math requirements.
How to Learn Machine Learning, the Self-Starter Way
(http://elitedatascience.com/learn-machine-learning) - Our beginner-friendly
overview of the machine learning landscape.
Data Science Primer (http://elitedatascience.com/primer) - Our free mini-course
on the data science and machine learning workflow.
Modern Machine Learning Algorithms: Strengths and Weaknesses
(http://elitedatascience.com/machine-learning-algorithms) - Our concise tour of
machine learning algorithms.
Python Machine Learning Tutorial (http://elitedatascience.com/python-machine-
learning-tutorial-scikit-learn) - Our end-to-end tutorial for training your first model
using Python's Scikit-Learn library.

3. Business Skills

Business skills and soft skills are sometimes overlooked in data science

curricula, but they are supremely important, and employers will look out for them.

Data science is never performed in a vacuum. You'll need to anticipate business

needs, think creatively about solutions, and communicate your insights clearly.
As machine learning libraries mature and algorithms become easier to use "out-of-
the-box," businesses will value people who can work with data and work with people.
This section of our list of data science resources will help you stand out.

3.1. Communication

If a tree falls in a forest but no one is around to hear it, does it make a sound? If data
is analyzed but no one can explain the results, does it really matter?

Effective communication skills are universal, but data scientists have the
added challenge of discussing highly technical or mathematical topics.

During data scientist interviews, you'll often be asked to "explain a technical concept
to a layperson" or "describe a previous project you've worked on." Employers will
specifically look for clarity, conciseness, and organization.

The best stats you've ever seen (TED Talk)

en) - This is an iconic TED talk and a fun display of storytelling with data.
Think Fast, Talk Smart (Video) (https://www.youtube.com/watch?
v=HAnw168huqA) - This is a workshop at the Stanford Graduate School of
Business on how to overcome anxiety and speak spontaneously. Not only will this
help you for the rest of your career, but it will also allow you to stand out
during your interview.
7 Tips for Improving Communication (Video) (https://www.youtube.com/watch?
v=mPRUNGGORDo) - Simple, practical tips on how to communicate effectively on
a daily basis.
How to Win Friends and Influence People (PDF)
(http://images.kw.com/docs/2/1/2/212345/1285134779158_htwfaip.pdf), (Free
Audiobook Version) (https://soundcloud.com/larry-amos-jr/sets/dale-carnegie-
how-to-win) - This is a book we'd recommend for anyone, data scientist or not.
While some of the verbiage is a bit dated, the teachings about interpersonal
relationships are timeless.
Practice teaching a technical concept to a friend - This will help you solidify your
understanding of the concept while getting valuable communication practice. Try
explaining an interesting machine learning algorithm, including its strengths,
weaknesses, and proper use cases.
Practice describing projects that you've completed - This will help you practice
organizing the many moving parts of data science into coherent narratives.

3.2. Creativity & Innovation

Data scientists are hired to build new products, perform complex analyses, and
invent valuable ways to use data.

In fact, they rarely solve the same problem twice. Even if you can apply the same
methods to an adjacent dataset, you'll need to be creative about feature engineering,
supplemental data, and business implications.

You'll naturally become a better creative thinker as you gain more experience, but the
following resources can help jumpstart your problem-solving and innovation skills.

Machine Intelligence and Data Products (Video)

(https://www.youtube.com/watch?v=SxxqaC5hf04) - Future-looking discussion of
data products and data science.
Machine Intelligence Landscape (Chart) (http://www.shivonzilis.com/) - Venture
capitalist's perspective on the landscape of machine intelligence applications.
The art of innovation (TED Talk) (https://www.youtube.com/watch?v=Mtjatz9r-
Vc) - Great TED talk on innovation by Guy Kawasaki.
7 steps of creative thinking (TED Talk) (https://www.youtube.com/watch?v=MRD-
4Tz60KE) - Creative thinking tips from the perspective of a serial artist and
Working backwards to solve a problem (TED Talk)
(https://www.youtube.com/watch?v=v34NqCbAA1c) - Chess grand-master
Maurice Ashley on how to see the endgame and work backwards.

3.3. Business Operations and Strategy

Here's a question you should ask yourself every day: "What are some ways I can
improve this business?"

At the end of the day, companies don't hire you to analyze data... they hire you to help
them grow or become more profitable. This means that you should have an
understand how data can help make better decisions and build better products.
Data Driven Decisions (Video) (https://www.youtube.com/watch?
v=trbOW1TDOao) - How to take business objectives, extract testable hypotheses
from them, and then design experiments to evaluate.
How to be data driven and build great products by DJ Patil (Video)
(https://www.youtube.com/watch?v=54t7bSXniAs) - Lecture by DJ Patil before he
become Chief Data Scientist of the USA.
Big Data: New Tricks for Econometrics by Hal Varian (PDF)
(http://people.ischool.berkeley.edu/~hal/Papers/2013/ml.pdf) - Hal Varian, Chief
Economist at Google, gives an excellent overview of the technology and
methodology landscape for data analysis.
How data will transform business (TED Talk)
(https://www.ted.com/talks/philip_evans_how_data_will_transform_business) -
Thought-provoking discussion of the relationship between business strategy and
technology. Explains why the two long-standing theories of business strategy
have become invalidated by the rise of big data.
Victor Cheng's Case Interview Workshop (Video Series)
v=fBwUxnTpTBo&index=1&list=PL8b_fmdDHHyCznYmSeWJrdrJN4UJhUrsh) -
Some employers like to ask consulting-style "case" questions during the interview.
This is more common for Data Scientists in business operations, strategy, or
analytics roles. This is an excellent crash course on tackling case interviews.

3.4. Business Analytics

Business analytic skills are critical for data scientists in operational roles. Python and
R will allow you to perform more complex analyses than Excel can, thanks to the
flexibility of programming languages.

After you master the technical tools, building strong domain knowledge will lead to
greater business impact.

Introduction to Business Analytics (Video) (https://www.youtube.com/watch?

v=9IIgH0hNtgk) - Short and sweet intro to how businesses use analytics, including
case studies.
Marketing Metrics and Analytics (Video) (https://www.youtube.com/watch?v=IW-
L7LTFl7A) - Introduction to common metrics and analytics methods using in
Effective Cross-Selling using Market Basket Analysis (Tutorial)
basket-analysis/)- How to do smarter cross-selling.
An Intuitive Guide to A/B Testing (Video) (https://www.youtube.com/watch?
v=Auu9AnCozWQ) - Overview of A/B testing and interpretation.
25 Examples of Business KPIs (Examples)
(https://www.klipfolio.com/resources/articles/what-are-business-metrics#gref) -
"What gets measured gets managed." Here are 25 examples of business Key
Performance Indicators (KPIs).
Analytics Academy by Google (Courses)
(https://analyticsacademy.withgoogle.com/) - Practical courses on digital
analytics, e-commerce analytics, and other topics.

4. Supplementary Skills

Supplementary Skills are more situational depending on the role, but they help you

become a well-rounded data scientist. Here are data science resources for NLP,
recommender systems, and time series analysis.

4.2. Natural Language Processing (NLP)

Natural Language Processing (NLP), or Text Mining, is an exciting sub-field of

machine learning for extracting structure, grammar, and insights from text.
Famous applications include Sentiment Analysis, Article Classification, and even
teaching a Neural Network to write Shakespeare

Stanford NLP (Video Series) (https://www.youtube.com/watch?

v=nfoudtpBV68&list=PLiNErZ5Bus8qNxNsFZFkh-9_CzZRW9iH9) - Full course on
"traditional" Natural Language Processing, including sentiment analysis, Naive
Bayes models, n-grams, etc.
CS224D: Deep Learning for Natural Language Processing (Course)
(Course materials here) (http://cs224d.stanford.edu/syllabus.html) - Introduction
to the theory behind deep learning for NLP.
Python NLP Libraries (http://elitedatascience.com/python-nlp-libraries) - Our
overview of Python libraries for NLP. Once you have basic programming skills and
a solid understanding of applied machine learning, you can actually jump straight

4.3. Recommendation Systems

Recommendation Systems, or Collaborative Filters, are one of the great success

stories of data science, especially in e-Commerce.

They power many amazing websites and apps, including Amazon, Yelp, Netflix, and
Spotify. In a nutshell, recommendation systems find other users who have similar
tastes to you to make better recommendations for you. This produces a huge win-win
by improving user experience while driving up revenue.

Recommendation engine tutorial (Video Series)

0vSQg&list=PLseNcwx1RJ4WdgtrMTXndw4B4nlf4-pgS) - Introduction to
collaborative filters using Python. Does a very nice job of explaining the intuition
behind the algorithm.
Recommender Systems (Video Series) (https://www.youtube.com/watch?v=gnlq-
1Zjh2M&list=PLnnr1O8OWc6ZYcnoNWQignIiP5RRtu3aS) - Discussion of the
theory and math behind collaborative filters by Andrew Ng. More math-heavy, and
it'll be easier to follow if you have some background with Linear Algebra.
Collaborative Filtering with Python (Tutorial)
(http://www.salemmarafi.com/code/collaborative-filtering-with-python/) -
Reference tutorial that implements a music recommender system in Python.
Collaborative Filtering with R (Tutorial)
(http://www.salemmarafi.com/code/collaborative-filtering-r/) - The same tutorial
as the previous one, except in R.

4.3. Time Series Analysis

Time Series Analysis deals with data series that are indexed by time. For example,
stock prices, precipitation amounts, and Twitter hashtags by hour would all be
considered time series. Time series analysis is commonly used in Finance,
Forecasting, and Econometrics.

While much of machine learning deals with "cross-sectional data" (data without

regard to differences in time), there are also models specifically designed to
handle time series.

Time Series (Course Material) (http://stat565.cwick.co.nz/) - Lecture slides,

homework, and R Code for the Time Series course at Oregon State University.
The Little Book of R for Time Series (Online Book) (http://a-little-book-of-r-for-time-
series.readthedocs.io/en/latest/src/timeseries.html) - Very practical step-by-step
introduction to using R for time series analysis. Includes code and outputs for
each step.
Time Series Forecasting with Python (Tutorial)
python/) - Tutorial on performing time series visualization, analysis, and
forecasting with Python.
Seasonal ARIMA with Python (Tutorial)
python/) - Introduction to ARIMA models in Python. Includes all code.
Statistical forecasting, Fuqua School of Business (Online Book)
(http://people.duke.edu/~rnau/411home.htm) - Course notes from the statistical
forecasting course taught at the Fuqua School of Business at Duke University.
5. Practice

Practice projects have two main purposes:

1. They help you solidify concepts and practice pulling together all the moving
pieces of data science.
2. They arm you with something tangible to show employers. If a picture is worth
1000 words, a project is worth a million...

By nature, projects are personal undertakings, and you should pick topics you're
interested in. Here are a few places to find project ideas:

6 Fun Machine Learning Projects for Beginners

(https://elitedatascience.com/machine-learning-projects-for-beginners) - Our list
of 6 fun machine learning project ideas for beginners.
Predict Titanic Survival (Kaggle Competition) (https://www.kaggle.com/c/titanic) -
Kaggle is a site that hosts data science competitions, many of which are beginner-
friendly. The Titanic Survival Prediction challenge is a classic, with detailed
tutorials for both Python and R.
Hacker Rank (Programming Challenges)
(https://www.hackerrank.com/domains/ai/machine-learning) - Short
programming challenges that are good for sharpening your skills without
committing to a longer project.
And that's a wrap! To jumpstart your journey ahead, please check out our Data
Science Primer (http://elitedatascience.com/primer).

1.3k  Share (https://www.facebook.com/sharer.php?


 Google (https://plus.google.com/share?

 Linkedin (https://www.linkedin.com/shareArticle?

 Tweet (https://twitter.com/intent/tweet?

« Previous Post
Dimensionality Reduction Algorithms: Strengths and Weaknesses

Next Post »
WTF is the Bias-Variance Tradeoff? (Infographic)

Free: Data Science Downloads

10+ Bonus Resources for Data Science & Machine Learning



Start Here (https://elitedatascience.com/start-here)

Homepage (/)

Login (https://elitedatascience.com/login)


Guides (https://elitedatascience.com/category/guides)

Concept Explainers (https://elitedatascience.com/category/explainers)

Code Tutorials (https://elitedatascience.com/category/tutorials)

Career Help (https://elitedatascience.com/category/career)

Tools & Resources (https://elitedatascience.com/category/resources)


Machine Learning Masterclass (https://elitedatascience.com/machine-learning-masterclass)

Interview Prep Kit (https://elitedatascience.com/interview-prep-kit)


 Share (https://www.facebook.com/sharer.php?u=https%3A%2F%2Felitedatascience.com)
 Google (https://plus.google.com/share?

 Linkedin (https://www.linkedin.com/shareArticle?

 Tweet (https://twitter.com/intent/tweet?

Copyright © 2016-2018 · EliteDataScience.com · All Rights Reserved · Terms (https://elitedatascience.com/terms-of-

service) · Privacy (https://elitedatascience.com/privacy-policy)