Vous êtes sur la page 1sur 51

The BIG Picture

Stat 133 by Gaston Sanchez


Creative Commons Attribution Share-Alike 4.0 International CC BY-SA
Concepts in
Computing with Data?

2
Computing with data refers to
activities in which data is acquired,
managed, and processed for a great
variety of purposes: organization,
visualization, summaries, analysis,
etc.
John Chambers

3
Gaston Sanchez
Computing with data refers to
activities in which data is acquired,
managed, and processed for a great
variety of purposes: organization,
visualization, summaries, analysis,
etc.
John Chambers

4
Gaston Sanchez
Computing with Data (CwD)

CwD means everything and nothing at the same


time
Data Analysis
Data Manipulation
Introduction to Programming

5
Gaston Sanchez
Concepts
m p in
ut at i on a l
C o
Computing
a t a A with
n a l y sData?
is
D

6
My vision of the Data Analysis Cycle

Actual
Analysis

Data
Preparation Reporting

7
Gaston Sanchez
Data Analysis is the process
by which data becomes
understanding, knowledge
and insight
Hadley Wickham

8
Gaston Sanchez
http://www.phdcomics.com/comics/archive.php?comicid=462

9
Data
Acquisition
Storage
Cleaning
Processing
Tidying
Reshaping
Wrangling

10
Analysis
Exploration
Description
Visualization
Hypothesis Tests
Simulation
Model Fitting

11
Reports
Document(s)
Article(s)
Book(s)
Poster(s)
Blog post(s)
Dissertation
News

12
Communication
Oral
Print
Web
Audio
Video
Other

13
Cartoon view of the DAC

Data Analysis Report Communication

14
Gaston Sanchez
Starting Point

15
Starting Point

We usually start with some research question


- Why are we losing money?
- Who uses/buys our products?
- How can we make more money?
- How do we compare with competitors?
- When something happened?

16
Gaston Sanchez
Questions

What?
When?
Where?
Why?
How?

17
Gaston Sanchez
Data Collection

Designing Experiments
Designing Surveys
Sampling Design
Recording & Gathering

18
Gaston Sanchez
We will assume that ...

Data is already in digital form


It has been collected
It already lives in some files/directories
No worries about transcribing data
Or setting up a data base

19
Gaston Sanchez
NBA Data
2016-2017
Regular Season

20
Ill be making extensive use
of NBA data throughout the
course
(I hope you dont get bored/tired of it)

http://www.basketball-reference.com/

21
Research
Question #1

22
The more scored points,
the higher the salary?

23
http://www.basketball-reference.com/teams/GSW/2017.html 24
25
`

26
http://www.basketball-reference.com/teams/GSW/2017.html

27
http://www.basketball-reference.com/teams/GSW/2017.html

28
Research Question #1

The more scored points,


the higher the salary?

Analyst /Scientist

29
Gaston Sanchez
How do analysts and
scientists think about
data?

30
Statistical Perspective

X Y
Scored Points Salary

Quantitative
variables

Analyst /Scientist

31
Gaston Sanchez
Statistical Perspective

Y = f(X) + e
Salary = f(Points) + e

Theoretical
Model

Analyst /Scientist

32
Gaston Sanchez
Statistical Perspective

Y = b0 + b1 X + e
Salary = b0 + b1 Points + e

Linear
Model

Analyst /Scientist

33
Gaston Sanchez
What about the
storage of Data?

34
Data Technologies
Data Sets

File Format
Encoding
Metadata
Location
Size

35
Gaston Sanchez
Importing & getting
access to Data?

36
Importing/Accessing Data

Data Sets

Software &
Languages

Operating
OS System

Analyst /Scientist
37
Gaston Sanchez
How computers treat
data?

38
How computers treat Data

Data Sets

Software &
Languages

Binary
system
Analyst /Scientist
39
Gaston Sanchez
How do programming
languages handle
data?

40
How Program/Languages handle Data

Data Sets Data for programming languages

Data Types?
Software &
Languages Data Structures?

Operating
OS System

Analyst /Scientist
41
Gaston Sanchez
And what about all the
instructions, analysis,
reports?

42
Instructions, Scripts & Programs

Data Sets

Software &
Languages
</>

Code, scripts,
Reports

Analyst /Scientist
43
Gaston Sanchez
Last but not least ...

44
Displayed/Reported Data

45
Gaston Sanchez
The
Data Computing
Diagram (DCD)

46
Software &
Languages
Data Code, Scripts,
Sets C++ Programs

OS

Computers

Analyst /Scientist
47
Gaston Sanchez
Software &
Languages
Data Code, Scripts,
Sets C++ Programs

OS

Analyst /Scientist
48
Gaston Sanchez
Software &
Languages
Data Code, Scripts,
Sets C++ Programs

OS

collaboration Analyst /Scientist collaboration


49
Gaston Sanchez
Next Week

50
Install Software

R
RStudio

51
Gaston Sanchez

Vous aimerez peut-être aussi