Vous êtes sur la page 1sur 16

# Program L1 What is statistics?

• Introduction
• What is statistics?
• Making sence of numerical information
• Basic statistical concepts - Interpret quantitative data 80

Frequency
60

- Summarize information 40

## • What is this course about? 20

0

15
30
45
60
75
90
105
120
More
• Study design • Sampling Bills

## - Draw samples from a population (target group)

• Types of studies N
too large to investigate in total ∑ (x i − µ )2
• Types of data 2 i =1
σ =
• Random sampling • Dealing with insecurity N
• Drop-outs • Analyzing relationships ” INFLUENZA VACCINE DOES NOT WORK”

## • Descriptive statistics A new study confirms yet again what

• Making prognoses many prior studies have shown, that
flu vaccination is largely ineffective.
• Frequency tables
• Graphs 1 2

## Statistics is the science of learning from data

(numerical facts)

## 1. The science of collecting, organizing, and

interpreting data.
2. The science of dealing with
randomness.

3 Source: www.statistics2013.org

Måns Thulin
Quantitative Methods Fall 2015 1
Why statistics? A simple example

## With statistics you can show anything • Target population:

Supermarket customers
Without statistics you can’t show a thing
• Variable of interest:
Age of customers
Statistics is used to draw conclusions
Other sciences are used to Do customers before and after 3 pm,
interpret the conclusions respectively, differ in age?

5 6

## • Population (target population)

• Examine the whole population
all objects (eg. individuals) with a certain set of defined
properties in common,
or a sample?
eg. customers shopping at supermarkets, youth in a - Every individual (census) or a limited number?
certain city, or parents with children at day care

• Variable
characteristic of the object, the property to be measured,
represented as a number.
X, Y, Z.
Eg. X=age, Y=drug use (yes/no), Z=income
7 8

Måns Thulin
Quantitative Methods Fall 2015 2
Population and sample Population and sample

## Population (N individuals) Population (N individuals)

Sample of n individuals
Individual
9 10

## Individuals, variables and values To think about

Male
• Investigate one or more supermarkets?
Gender Female - How to chose supermarkets?

## • Investigate all customers or a sample at

Age No. of years the chosen supermarkets?
Elementary - How large samples should we choose?
Education High school
• How to measure age?
University

## • How to present the results?

Individual Variable Value
(case of observation) 11 12

Måns Thulin
Quantitative Methods Fall 2015 3
What do we need to learn? More to think about

Before 3 pm After 3 pm
• Study planning
No. customers 25 25
- What to think about before we start our study Average age 49 yrs 38 yrs
No. retired 11 4
• Sampling theory
- Different ways to draw a sample
• Average age in our sample differs
• Presentation of the result (before/after 3 pm)
- Graphical or numerical presentation - But can we be sure that the average age in the
population would differ?

13 14

## • Study design (L1)

To draw conclusions about the population
Plan your study in the best possible way to be able
based on the result from one sample
to present relevant results.
- What is the probability that our result was
caused by chance?

Drop-outs

15 16

Måns Thulin
Quantitative Methods Fall 2015 4
• Descriptive statistics (L1-L2)
• Data collection using surveys (L3)
To arrange, summarize and present data to enable
meaningful interpretations.

Numerical Graphical

Age - Income 80
55 35000 Fre quency 60
42 28000 40

. . 20
0
. .
15
30
45
60
75
90
105
120
More
Bills
17 18

## • Data handling and analysis • Random variation (L3 & L4)

(Home assignment and exercises)

## - Randomness & random variables

- Density curves
SPSS
- Normal distribution
19 20

Måns Thulin
Quantitative Methods Fall 2015 5

## • Statistical inference (L4-L12) • Relationships (L12-L14)

To draw conclusions about the population based on 30

## the information from a sample 25

20

Weight (kg)
15
- Confidence intervals 10

0
0 1 2 3 4 5 6 7

Age (years)

- Correlation
- Regression
21 22

## Learning outcomes We have a lot to offer

After this course you should: The statistical department are very fortunate to have
• have obtained practice in applying analytical methods for enough resources to give you relatively many
the social and the behavioral sciences lectures.
• be able to use statistical software packages for the This means that there is a lot on the schedule, which
analysis of statistical data (SPSS) we believe is positive.
• be able to interpret the results of a statistical analysis All lectures, supplemental lectures, and part exams,
• be aware of limitations and possible sources of errors in are voluntary. But if you follow everything we offer it
the analysis will be easier for you to learn the content of the
• have ability to both in oral and written form present course.
results of statistical analysis The only mandatory part, besides the final exam, is
the home assignment.
23 24

Måns Thulin
Quantitative Methods Fall 2015 6
Home assignment Exams
Work in groups of 2-3 persons.
Part exams 1, 2 and 3 (not mandatory):
A number of tasks to solve by using SPSS.
To be performed online
Computer labs booked Monday Nov 24: if you need help 80% 1 credit point for final exam
(get started before that). 90% 2 credit points for final exam
Other days: The availability of the computer labs can be
checked online before you go (links in Studentportalen). Final exam:
Write your results in the form of an electronic report and 50% Pass (G)
hand it in visa Studentportalen. Deadline: Dec 10. 80% Pass with distinction (VG)
If you don’t pass, I’ll let you know which tasks to correct Handouts allowed
and you’ll hand in a new version Dec 17. 25 26

## Lecture notes will be available at Studentportalen the • Experimental study

day before each lecture. - a study of what happens to a certain variable when
you interfere with the process in a planned way,
Any errors that are discovered during lectures will be
eg. through providing a medical or psychological
updated in the lecture notes afterwards.
treatment, a particular diet, educational activities, etc.
Check the dates on when the documents are posted.
If the date is the same as or after the lecture, updates • Observational study
have been made. - a study of the natural proess, without interference.
Eg. mapping the course of a disease, or people’s
political preferences, etc.

27 28

Måns Thulin
Quantitative Methods Fall 2015 7
Type of study Control group

## It can be valuable to compare a group of individuals

Retrospective Cross-sectional Prospective with a so-called control group, another group of
back in time snapshot looking ahead individuals that lack some kind of risk/treament factor
(e.g. smokers compared to non-smokers, CBT treated
Historical data, E.g. survey E.g. people treated compared to non-treated, etc.) Gr. 1 Gr. 2
eg. about studies where with cognitive A B
smoking habits, data is collected behavioural parallel groups
• Control group A B
are collected to at a certain point therapy (CBT), are ”cross-over” A B
- matched controls
investigate the of time. followed for a
relationship to a Popular votes period of time. - placebo (medical studies only) Individual Trt.
phenomena (polls) on political - active treament same (different ”doses”) 1 A B
other (known effective) 2 B A
today, e.g. lung preferences, etc. - historical controls 3 A B
cancer. 29 30

## 1. Cross-sectional data Sample on volontary basis

- one measurement at one point of time Individuals choose to participate by responding to some
- snapshot (no time aspect) kind of general appeal (often electronically).
The risk of bias is imminent! Bias is a systematic error
2. Longitudinal data that can have a negative effect on the results.
- repeated measurements for one individual People with strong opinions are most likely to respond,
which leads to biased results (not generalizable).
• Different statistical methods suitable for different
types of data Not random!

31 32

Måns Thulin
Quantitative Methods Fall 2015 8
Voluntary response sample - example Sampling designs
To what extent would you like to be a house wife?
Waiting room survey
Visitors at eg. a pharmacy or a bank (in some kind of
waiting room) are asked if they want to participate.
The probability to get selected increases if you visit the
place often.
It can be troublesome to interpret the results, since the
probability of being selected for the survey is unknown
and varies for different people.
A large risk of bias (systematic errors in the results)!
Green: Not at all
Blue: Very much
Not random!
33 34

## A simple random sample (SRS) consists of n individuals

Random sampling designs: from the population, where every individual has the same
probability to be selected.
Simple random sample
Stratified random sample Every individual is numbered, and n numbers are randomly
Cluster sample chosen (using software).
(etc.)
(+) Bias is avoided in the choice
(+) Every possible sample has an equal chance to be chosen.

35 36

Måns Thulin
Quantitative Methods Fall 2015 9
Stratified random sample Cluster sample

A stratified random sample consists of a number of simple A cluster sample consists of every individual in a random
random samples from different strata (groups of similar sample of groups (clusters).
individuals).
The population is divided in groups, and one or several
The strata are known before the sample is taken. groups are randomly chosen. Every individual in the chosen
Eg. cities, districts, sex, age groups, educational level, etc. groups are included in the sample.

If the sample is stratified, the strata should always be (+) A good solution when difficult to
evaluated separately (in addition to the full sample). choose individuals, or when there
(+) Can produce more exact information than SRS
37 38

## Individuals in the sample that don’t participate (by refusal or

It is important to try to minimize the drop-out rate, e.g.
no possibility to participate) are referred to as drop-outs.
through sending reminders, or using well designed
If the drop-out rate is large it might have a large impact on surveys that makes it easy to respond.
the results. The more drop-outs, the larger is the risk that
we make false generalizations to the target population. It is also important to try to ease the effects of the drop-
out rate that you eventually have, e.g. by investigating if
If single questions are not answered, or some measurement the drop-outs consist of individuals with some certain
values are missing, it is instead referred to as missing data. property, or if the drop-outs can be assumed to be at
If the rate of missing data is large it might also have a random and thereby don’t have a large effect on the
large impact on the results. But we can at least make use results.
of the data that we do have for these individuals.
39 40

Måns Thulin
Quantitative Methods Fall 2015 10
Descriptive statistics The role of descriptive statistics

To arrange, summarize and present data to enable  Compile the information of interest in a table.
meaningful interpretations.
 Visualize the table graphically.
Numerical Graphical  Summarize the information.

Age - Income 80
55 35000
The table, figure, and summarizing measures that are
Fre quency
60
42 28000 40
informative depend on the type of the variable.
. . 20
0
. . 15
30
45
60
75
90
105
120
More
Bills

41 42

## • Categorical/qualitative variable In a statistics program data

places an individual into one of two or more groups or is stored with individuals
categories. as rows and variables as
columns.
Eg X=gender, Y=marital status, Z=political preference
This is the standard
structure when working
• Numerical/quantitative variable with data.
takes numerical values for which arithmetic operations
Common programs are
make sense.
SPSS, Stata, SAS,
Eg. X=age, Y=length of life, Z=income Minitab.
43 44

Måns Thulin
Quantitative Methods Fall 2015 11
Frequency table for a categorical variable Frequency table for a categorical variable
Question: Customers retired or not Question: Customers retired or not

## Variable: Without any missing data

Variable values
Retirement Retirement

## (retired or not ) Frequency Percent Valid Percent Cumulative

Percent
Not retired 29 72,5 72,5 72,5
Valid Retired 11 27,5 27,5 100,0

## Total 40 100,0 100,0

No. of individuals
Percent is also called
relative frequency
45 46

## Graphs display distributions Graphs for categorical variables

Number of new products introduced
in USA during the years 1989-1994

## Graphs are a good tool to use to display the • Bar chart

distribution of a variable.
• Pie chart
A variable’s distribution tells us what values
Number of new products introduced in
the variable takes and how often it takes these USA during the years 1989-1994

• Time plot
values.
Number of new products introduced in
USA during the years 1989-1994

47 48

Måns Thulin
Quantitative Methods Fall 2015 12
Bar chart Pie chart

## In a bar chart, each variable value is represented by a bar.

In a pie chart each
The bar height is proportional to the frequency. variable value is
The bars represent categories, not numerical values. represented by a
sector proportional to
its frequency.

numbers (e.g.
percentages) to the
sectors.

49 50

## Bar charts are

generally more
informative than
pie charts

51 52

Måns Thulin
Quantitative Methods Fall 2015 13
Time plot Cross Tables
Number of new products introduced Simultaneous frequency table for two (or more)
in USA during the years 1989-1994
20 000 categorical variables.
15 000 Retirement * Time of day Crosstabulation
Count
10 000
Time of day Total
5 000
Before 3 pm After 3 pm
0
-89 -90 -91 -92 -93 -94 Not retired 11 18 29
Retirement
Retired 9 2 11
• Suitable when the categories are points of time
Total 20 20 40
• A good tool when you want to illustrate a trend
The marginal distributions – the frequencies of
53 each variable separately 54

## Cross Tables Cross Tables – bar chart

Simultaneous frequency table for two (or more) To visualize a cross table with a bar chart, begin by
Retirement * Time of day Crosstabulation
Count  Choose the main variable.
Time of day Total Interested in the age distribution within each time of day,
Before 3 pm After 3 pm
or the distribution over time of day for retired and not
retired respectively?
Not retired 11 18 29
Retirement  Interest in absolute numbers or relative?
Retired 9 2 11
Total 20 20 40  How to divide the bars? To stack or group?
The joint distribution – the frequencies of each pair
of variable values 55 56

Måns Thulin
Quantitative Methods Fall 2015 14
Choosing the main variable Choosing the main variable

## When time of day is the Now we get the

main variable we get distribution of
information on the time of day within
number of retired/not each retirement
retired people for each group. The relative
time of day. All relative frequencies within
frequences within each each retirement group
time of day sum to 100. sum to 100.

57 58

## Absolute frequencies Relative frequencies Grouped bars Stacked bars

(counts) Harder to interpret
(percent)
59 60

Måns Thulin
Quantitative Methods Fall 2015 15
To do
Studentportalen: