Vous êtes sur la page 1sur 80

INTRODUCTION TO STATISTICS AND

DATA ANALYSIS
Detailed introductory
part of statistics

L2
Statistics

• The science of collectiong, organizing, presenting,


analyzing, and interpreting data to assist in
making more effective decisions
• Statistical analysis – used to manipulate
summarize, and investigate data, so that useful
decision-making information results.
Types of statistics

• Descriptive statistics – Methods of organizing,


summarizing, and presenting data in an informative
way
• Inferential statistics – The methods used to
determine something about a population on the basis
of a sample
– Population –The entire set of individuals or objects
of interest or the measurements obtained from all
individuals or objects of interest
– Sample – A portion, or part, of the population of
interest
• Collect data
– e.g., Survey
• Present data
– e.g., Tables and graphs
• Summarize data
– e.g., mean
Inferential Statistics
Inference is the process of drawing conclusions or
making decisions about a population based on
sample results
• Define the objective of the experiment and the
population of interest
• Determine the design of the experiment and
the sampling plan to be used
• Collect and analyze the data
• Make inferences about the population from
information in the sample
• Determine the goodness or reliability of the
inference.
Sampling methods
Sampling methods can be:
• random (each member of the population has an equal
chance of being selected)
• nonrandom

• The actual process of sampling causes sampling errors.


– For example, the sample may no t be large e n o u gh o r
representative of the population. Factors not related to the
sampling process cause nonsampling errors. A defective
counting device can cause a nonsampling error.
Random sampling methods
1. simple random sample (each sample of the same
size has an equal chance of being selected)
2. stratified sample (divide the population into
groups called strata and then take a sample from
each stratum)
3. cluster sample (The whole population is
subdivided into clusters, or groups, and random
samples are then collected from each group.)
4. systematic sample (randomly select a starting
point and take every n-th piece of data from a
listing of the population)
Experimental Design

• Experimental design is a way to carefully plan


experiments in advance so that your results are
both objective and valid.
experimental design

• Describe how participants are allocated to


experimental groups.
• Minimize or eliminate confounding variables, which
can offer alternative explanations for the
experimental results.
• Allow you to make inferences about the relationship
between independent variables and dependent
variables.
• Reduce variability, to make it easier for you to find
differences in treatment outcomes.
Design of experiments involves:

• The systematic collection of data


• A focus on the design itself, rather than the results
• Planning changes to independent (input) variables
and the effect on dependent variables or response
variables
• Ensuring results are valid, easily interpreted, and
definitive.
Principle of Experimental Design

• Randomization: the assignment of study


components by a completely random method, like
simple random sampling. Randomization
eliminates bias from the results
• Replication: the experiment must be replicable
by other researchers. This is usually achieved with
the use of statistics like the standard error of the
sample mean or confidence intervals.
• Blocking: controlling sources of variation in the
experimental results.
Recap

• Types Of statistics
• Types of Sampling
• Experimental Design
NEXT Lecture

• Measures of Location
– Sample Mean
– Median
– Mode
INTRODUCTION TO STATISTICS AND
DATA ANALYSIS
Measures of Location:
The Sample Mean and
Median

L3
Measures of Location

• A fundamental task in many statistical analyses is


to estimate a location parameter for the
distribution; i.e., to find a typical or central value
that best describes the data
86 97 84

73 63 88

97 100 95
86 97 84

73 63 88

97
100 95
What is the MEAN?
How do we find it?

The mean is the numerical


average of the data set.
The mean is found by adding
all the values in the set, then
dividing the sum by the
number of values.
97
84
88
100
95
63
73
783 ÷ 9 +
86
97
The mean is 87 783
What is the MEDIAN?
How do we find it?

The MEDIAN is the number that is in


the middle of a set of data
1. Arrange the numbers in the set
in order from least to greatest.

2. Then find the number that is in


the middle.
63 73 84 86 88 95 97 97 100

The median is 88.

Half the numbers are Half the numbers are


less than the median. greater than the median.
Median
Sounds like
MEDIUM
Think middle when you hear
median.
How do we find the MEDIAN
when two numbers are in the
middle?

1. Add the two numbers.

2. Then divide by 2.
63 73 84 88 95 97 97 100

88 + 95 = 183

183 ÷ 2 The median is


91.5
What is the MODE?
How do we find it?

The MODE is the piece of data that


occurs most frequently in the data
set.
A set of data can have:
One mode
More than one mode
No mode
63 73 84 86 88 95 97 97 100

The value 97 appears twice.


All other numbers appear just once.

97 is the MODE
A Hint for remembering the MODE…
The first two letters give you a hint…
MOde
Most Often
9, 11, 16, 6, 7, 17, 18
A

B 18, 7, 10, 7, 18

C 9, 11, 16, 8, 16
9, 11, 16, 6, 7, 17, 18
A

B 18, 7, 10, 7, 18

C 13, 12, 12, 11, 12


A 9, 11, 16, 8, 16

B 9, 11, 16, 6, 7, 17, 18

C
18, 7, 10, 7, 18
This one is the requires
more work than the
others.
Right in the
MIDDLE.

This one is the easiest


to find— Just LOOK.
Find the….
Find the….
Recap

• Measures of Location
– Sample Mean
– Median
– Mode
NEXT Lecture

• Measures of Variability
– Range
– Variance
– Standard Deviation
INTRODUCTION TO STATISTICS AND
DATA ANALYSIS
Measures of Variability:
The variance, Range
and Standard
Deviation
L4
What is the RANGE?
How do we find it?

The RANGE is the difference


between the lowest and highest
values.
63 73 84 86 88 95 97 97

97 34 is the RANGE
-63 or spread
34 of this set of data
99 48 86 84

97 71 88
48 71 84 86 88 97 99

99
-48
51
17 48 46 33

15 67 85
15 17 33 46 48 67 85

85
-15
70
267
119
357 329
401
227 483
119 227 267 329 357 401 483

483
-119
364
Find the….
Find the….
Find the….
Variance

• In a population, variance is the average squared


deviation from the population mean, as defined by
the following formula:

σ2 = Σ ( Xi - μ )2 / N

• where σ2 is the population variance, μ is the


population mean, Xi is the ith element from the
population, and N is the number of elements in the
population.
Steps for Variance

1. Find the mean of the data.


Hint – mean is the average so add up the values and divide
by the number of items.
2. Subtract the mean from each value – the
– result is called the deviation from the mean.
3. Square each deviation of the mean.
4. Find the sum of the squares.
5. Divide the total by the number of items.
Standard Deviation

• The standard deviation is the square root of the


variance. Thus, the standard deviation of a
population is:

σ = sqrt [ σ2 ] = sqrt [ Σ ( Xi - μ )2 / N ]

• where σ is the population standard deviation, μ is


the population mean, Xi is the ith element from
the population, and N is the number of elements in
the population.
Steps for Standard Deviation

Find the variance.


a) Find the mean of the data.
b) Subtract the mean from each value.
c) Square each deviation of the mean.
d) Find the sum of the squares.
e) Divide the total by the number of items.
Take the square root of the variance.
Find the variance and standard
deviation
The math test scores of five students are: 92,88,80,68 and
52.
1) Find the mean: (92+88+80+68+52)/5 = 76.
2) Find the deviation from the mean:
92-76=16
88-76=12
80-76=4
68-76= -8
52-76= -24
Find the variance and
standard deviation

The math test scores of five students are:


92,88,80,68 and 52.
3) Square the deviation from the
mean: (16)2 256
2
(12) 144
2
(4) 16
2
( 8) 64
2
( 24) 576
Find the variance and
standard deviation
The math test scores of five students are:
92,88,80,68 and 52.
4) Find the sum of the squares of the
deviation from the mean:
256+144+16+64+576= 1056
5) Divide by the number of data items
to find the variance:
1056/5 = 211.2
Find the variance and
standard deviation
The math test scores of five students are:
92,88,80,68 and 52.

6) Find the square root of the


variance: 211.2 14.53

Thus the standard deviation of the


test scores is 14.53.
Standard Deviation

A different math class took the same


test with these five test scores:
92,92,92,52,52.

Find the standard deviation for this


class.
Hint:
1. Find the mean of the data.
2. Subtract the mean from each value –
called the deviation from the mean.
3. Square each deviation of the mean.
4. Find the sum of the squares.
5. Divide the total by the number of items –
result is the variance.
6. Take the square root of the variance –
result is the standard deviation.
Solve:

A different mss took the same test with these five test scores:
92,92,92,52,52.

Find the standard deviation for this class.

Answer Now
The math test scores of five students are:
92,92,92,52 and 52.
1) Find the mean: (92+92+92+52+52)/5 = 76
2) Find the deviation from the mean:
92-76=16 92-76=16 92-76=16
52-76= -24 52-76= -24
3) Square the deviation from the mean:
(16) 2 256 (16) 2 256 (16) 2 256

4) Find the sum of the squares:


256+256+256+576+576= 1920
The math test scores of five students
are: 92,92,92,52 and 52.

5) Divide the sum of the squares by


the number of items :
1920/5 = 384 variance
6) Find the square root of the variance:
384 19.6
Thus the standard deviation of the second
set of test scores is 19.6.
Analyzing the data:

Consider both sets of scores. Both


classes have the same mean, 76.
However, each class does not have
the same scores. Thus we use the
standard deviation to show the
variation in the scores. With a
standard variation of 14.53 for the
first class and 19.6 for the second
class, what does this tell us?
Answer
Now
Analyzing the data:

Class A: 92,88,80,68,52
Class B: 92,92,92,52,52

With a standard variation of 14.53 for the


first class and 19.6 for the second class, the
scores from the second class would be more
spread out than the scores in the second
class.
Analyzing the data:
Class A: 92,88,80,68,52
Class B: 92,92,92,52,52

Class C: 77,76,76,76,75
Estimate the standard deviation for Class C.
a) Standard deviation will be less than 14.53.
b) Standard deviation will be greater than 19.6.
c) Standard deviation will be between 14.53
and 19.6.
d) Can not make an estimate of the standard
deviation.

Answer
Now
Analyzing the data:
Class A: 92,88,80,68,52
Class B: 92,92,92,52,52
Class C: 77,76,76,76,75
Estimate the standard deviation for Class C.
a) Standard deviation will be less than 14.53.
b) Standard deviation will be greater than 19.6.
c) Standard deviation will be between 14.53
and 19.6
d) Can not make an estimate if the standard
deviation.

Answer: A
The scores in class C have the same mean of
76 as the other two classes. However, the
scores in Class C are all much closer to the
mean than the other classes so the standard
deviation will be smaller than for the other
classes.
Summary:

As we have seen, standard deviation


measures the dispersion of data.

The greater the value of the standard


deviation, the further the data tend to be
dispersed from the mean.
Recap

• Measures of Variability
– Range
– Variance
– Standard Deviation
NEXT Lecture

– Statistical Data
– Types of Data
– Types of Variables

Vous aimerez peut-être aussi