Académique Documents
Professionnel Documents
Culture Documents
up. Everything else you can teach yourself on the spot. But here is a giant list roughly in
order of increasing complexity.
Coding. Be a master of Python and/or R. There are other options but these two are
ubiquitous nowadays.
Know Thy Distributions. You should have a good intuition of what distribution is used
for what. Given some data, you should be able to do something like this for many scenarios:
Q: Is my data well-modeled by a Pareto?
No, the empirical histogram is not monotonically decreasing.
Q: A Gaussian of course!
A: Nope, there aren't any negative values.
Q: How about the Exponential?
A: No, there are no zeros.
Q: OK, uh, the von Mises?
A: Don't be silly, I'm pretty sure this data doesn't reside on the surface of a circle...
Q: The log-normal!
A: That sounds good. Better plot it and see...
Fitting. Once you've got your distributions down, you should know how to fit them to data
in slick ways. Start with maximum likelihood and go from there.
Classical hypothesis testing. I think p-values and frequentist hypothesis testing in
general are really hard to explain & hard to understand (failing to reject null hypotheses &c),
but both are still ubiquitous.
Markov chains + bells + whistles.
Basic Bayesian thinking & modeling. Learn to think of everything as a probability
distribution instead of just a single value (if appropriate). Be able to assemble the models &
compute with them.
Some old-school stats and probability theory. E.g. "Random variables;
transformations, conditional expectation, moment generating functions, convergence, limit
theorems, estimation; Cramer-Rao lower bound, maximum likelihood estimation,
sufficiency, ancillarity, completeness. Rao-Blackwell theorem. Some decision theory."
Regression! First linear, then non-linear. (Gasp!)
Machine learning. I know you said "statistics," but really if you want to be a "data
scientist" then machine learning will be an amazingly versatile & useful toolbelt for you.
Also, machine learning is broad, so maybe that could be another Quora question. =)
Writing. Communicate your ideas clearly, succinctly, & compellingly.
There are 9 courses in it. They comprehensively cover all the important concepts and topics
under data science.
1.
2.
3.
4.
5.
6.
7.
8.
9.
http://www.datasciencecentral.com/profiles/blogs/data-science-without-statistics-ispossible-even-desirable