Vous êtes sur la page 1sur 2

In general, I recommend that you are able to (a) think in math and (b) code those thoughts

up. Everything else you can teach yourself on the spot. But here is a giant list roughly in
order of increasing complexity.
Coding. Be a master of Python and/or R. There are other options but these two are
ubiquitous nowadays.
Know Thy Distributions. You should have a good intuition of what distribution is used
for what. Given some data, you should be able to do something like this for many scenarios:
Q: Is my data well-modeled by a Pareto?
No, the empirical histogram is not monotonically decreasing.
Q: A Gaussian of course!
A: Nope, there aren't any negative values.
Q: How about the Exponential?
A: No, there are no zeros.
Q: OK, uh, the von Mises?
A: Don't be silly, I'm pretty sure this data doesn't reside on the surface of a circle...
Q: The log-normal!
A: That sounds good. Better plot it and see...
Fitting. Once you've got your distributions down, you should know how to fit them to data
in slick ways. Start with maximum likelihood and go from there.
Classical hypothesis testing. I think p-values and frequentist hypothesis testing in
general are really hard to explain & hard to understand (failing to reject null hypotheses &c),
but both are still ubiquitous.
Markov chains + bells + whistles.
Basic Bayesian thinking & modeling. Learn to think of everything as a probability
distribution instead of just a single value (if appropriate). Be able to assemble the models &
compute with them.
Some old-school stats and probability theory. E.g. "Random variables;
transformations, conditional expectation, moment generating functions, convergence, limit
theorems, estimation; Cramer-Rao lower bound, maximum likelihood estimation,
sufficiency, ancillarity, completeness. Rao-Blackwell theorem. Some decision theory."
Regression! First linear, then non-linear. (Gasp!)
Machine learning. I know you said "statistics," but really if you want to be a "data
scientist" then machine learning will be an amazingly versatile & useful toolbelt for you.
Also, machine learning is broad, so maybe that could be another Quora question. =)
Writing. Communicate your ideas clearly, succinctly, & compellingly.

There are 9 courses in it. They comprehensively cover all the important concepts and topics
under data science.
1.
2.
3.
4.
5.
6.
7.
8.
9.

Data Science Toolbox


R Programming
Getting and Cleaning Data
Exploratory Data Analysis
Reproducible Research
Statistical Inference
Regression Models
Practical Machine Learning
Developing Data Products

http://www.datasciencecentral.com/profiles/blogs/data-science-without-statistics-ispossible-even-desirable

Introduction to Data Mining Tan, Steinbach, Kumar

Basic statistical concepts: standard error, confidence interval estimation,


significance values in testing,
Simple and Multiple regression
Logit Probit Models
Time Series Decomposition Models
Smoothing Models
Box Jenkins (ARIMA)
ARIMA with regression errors and ARIMAX
Bass Model for new product forecasting
Combing Forecast and Forecast Evaluation

Vous aimerez peut-être aussi