Académique Documents
Professionnel Documents
Culture Documents
Books
1. Lander, J. (2013). R for Everyone: Advanced Analytics and Graphics.
New Jersey: Addison-Wesley.
2. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An
Introduction to Statistical Learning: with Applications in R. New
York: Springer-Verlag.
website: http://www-bcf.usc.edu/~gareth/ISL/
Internet Websites
http://www.r-bloggers.com/
https://stat.ethz.ch/mailman/listinfo/r-help
http://stackoverflow.com/questions/tagged/r
http://blog.revolutionanalytics.com/r/
http://chance.amstat.org/
http://www.statslife.org.uk/significance
Journals
The R Journal (web: https://journal.r-project.org/)
Journal of Statistical Software (web: http://www.jstatsoft.org/index)
Data Analytics
Analytics is the scientific process of transforming data into insight for
making better decisions.
Thus analytics is employed for data-driven or fact-based decisionmaking.
By using analytics, the managers can certainly improve decisions over
time.
Present
What is happening
now?
Future
What will happen?
(Reporting)
(Alerts)
(Extrapolation)
Whats the
best/worst that can
happen?
(Modeling,
Experimental
Design)
(Recommendation)
(Prediction,
optimization,
simulation)
Information
Insight
Davenport, T. H. , Harris, J. G., & Morison, R. (2010). Analytics at Work: Smarter Decisions, Better Results. Harvard
Business Review Press.
Descriptive Analytics
Descriptive Analytics consists of set of techniques that describes what
has happened in the past.
Examples: Data Queries, Reports, Descriptive Statistics, Data
Visualization, etc.
Predictive Analytics
Predictive analytics comprises of the set of techniques that use
models constructed from the past data to predict the future or study
the impact on one variable on the other.
Examples: Linear Regression, Time Series Analysis, etc.
Prescriptive Analytics
Prescriptive analytics provides a best course of action to take, i.e., the
output from a prescriptive analytics model is the best solution.
A common example is portfolio models in finance, which determine
the mix of investments that yield the highest expected return while
limiting the exposure to risk.
Data Set 1
Consider an Advertising data set consisting of the sales of a particular
product in different markets.
The data set also provides the advertising budgets for three different
media: TV, radio, and newspaper.
Goal is to find out which media generate the biggest boost in sales.
Data Set 2
Consider the Default data set which provides information about the
customers on the following variables:
Whether a customer has defaulted on his/her credit card payment.
Annual income
Annual Credit Card Information
Student Status
Data Set 3
Consider the Customer data base which has access to a large number
of measurements (e.g., household income, occupation, distance from
nearest urban area, and so forth) for a large number of number of
people.
Goal is to perform market segmentation by identifying subgroups of
people who might be more receptive to a particular form of
advertising, or more likely to purchase a particular product.
Supervised Learning
Supervised Learning is where both the predictor(s), , and the
response, , are observed.
Main purpose is either to predict based on or to understand the
relationship between and .
Supervised learning problems can be further divided into regression
and classification problems based on the nature of .
Unsupervised Learning
A set of statistical tools intended for the setting in which we have only
a set of features 1 , 2 , , measures on observations.
We are not interested in prediction, because we do not have an
associated response variable .
The goal is to discover interesting things about the measurements on
1 , 2 , , .
Is there any informative way to visualize the data?
Can we discover the subgroups among the variables or among the
observations?