Basic Statistics 9 19 2013

Basic statistics
Hui Bian
Office for Faculty Excellence
Basic statistics
My contact information:
Hui Bian, Statistics & Research Consultant
Office for Faculty Excellence, 1001 Joyner
library, room 1006
Email: bianh@ecu.edu
Website:
http://core.ecu.edu/ofe/StatisticsResearch/
2
Basic statistics
Statistics: a bunch of mathematics used
to summarize, analyze, and interpret a
group of numbers and observations.
*It is a tool.
*Cannot replace your research design,
your research questions, and theory or
model you want to use.
3
Basic statistics
Population: any group of interest or any
group that researchers want to learn more
about.
Population parameters (unknown to us):
characteristics of population
Sample: a group of individuals or data are
drawn from population of interest.
Sample statistics: characteristics of sample

4
Basic statistics
We are much more interested in the
population from which the sample was drawn.
Example: 30 GPAs as a representative
sample drawn from the population of GPAs
of the freshmen currently in attendance at a
certain university or the population of
freshmen attending colleges similar to a
certain university.
5
Basic statistics
6
Basic statistics
Two types of statistics
Descriptive statistics
Inferential statistics
7
Basic statistics
Descriptive statistics:
are procedures used to
summarize, organize, and make
sense of a set of scores or
observations.

8
Basic statistics
Inferential statistics:
are procedures used that allow
researchers to infer or generalize
observations made with samples to
the larger population from which
they were selected.

9
Use descriptive statistics to describe,
summarize, and organize set of
measurements.
Use descriptive statistics to communicate
with other researchers and the public.
Descriptive statistics: Central tendency
and Dispersion
10
Measures of Central tendency: we use
statistical measures to locate a single
score that is most representative of all
scores in a distribution.
Mean
Median
Mode

11
Descriptive statistic
The notations used to represent
population parameters and sample
statistics are different.
For example
Population size : N
Sample size : n
12
Mean
(or M) for sample mean and for

population mean
(x bar) =

x means sum of all individual scores of x
1
-
x
n
n means number of scores
13
Example 1: we want to know how 25
students performed in math tests.
Data are in the next slide.

14

15
Score (X) Frequency (f) fX
60 1 60
65 2 130
70 3 210
75 4 300
80 5 400
85 4 340
90 3 270
95 2 190
100 1 100
Sum 25 2000
How to calculate mean for those 25
scores?

=
2000
25
= 80.00

16
Distribution of Example 1
17
Mean = 80
Median
Data: 2, 3, 4, 5, 7, 10, 80. Mean of those
scores is 15.86.
80 is an outlier.
Mean fails to reflect most of the data. We
use median instead of mean to remove the
influence of an outlier.
Median is the middle value in a distribution
of data listed in a numeric order.

18
Median
Position of median =
+1
2

For odd numbered sample size:
3,6,5,3,8,6,7. First place each score
in numeric order: 3,3,5,6,6,7,8.
Position 4. median = 6

19
Median
For even-numbered sample size:
3,6,5,3,8,6. First place each score in
numeric order: 3,3,5,6,6,8. Position
3.5. Median =
5+6
2
= 5.5
Example 2: we want to know average
salary of 36 cases.

20
Salary Frequency
$20k 1
$25k 2
$30k 3
$35k 4
$40k 5
$45k 6
$50k 5
$55k 4
$200k 3
$205k 2
$210k 1
Total 36
21
Median = ?
Position 18.5
Which number is at position 18.5?
Median = $45k

22
Mode
The value in a data set that occurs
most often or most frequently.
Example: 2,3,3,3,4,4,4,4,7,7,8,8,8.
Mode = 4
23
Dispersion (Variability): a measure of the
spread of scores in a distribution.
24
Compare different distributions
25
Compare different distributions

26
Two sets of data have the same
sample size, mean, and median.
But they are different in terms of
variability.
27
Dispersion
Range
Variance
Standard deviation

28
Range
It is the difference between the
largest value and smallest
value.
It is informative for data
without outliers.

29
Variance
It measures the average squared
distance that scores deviate from
their mean.
Sample variance (s
2
, population
variance
2
sigma)
30
How to calculate variance?
2
=

2
1
or
1
: ss means sum of
squares.
n-1 means: degree of freedom: the
number of scores in a sample that are
free to vary.

31
Example: five scores: 5, 10, 7, 8, 15
Mean = 9
Lets calculate variance
SS = (5-9)
2
+ (10-9)
2
+ (7-9)
2
+ (8-9)
2
+ (15-
9)
2
= 58
Sample variance = 58/(5-1) = 14.5

32
Degree of freedom
Example 1. we have five scores: 1, 2, 3, and
two unknown scores: x and y. The mean of
five values is equal to 3. x + y = 9.
Example 2. we have five scores: 1, 2, and
three unknown scores: x, y, and z. The mean
of five values is equal to 3. x + y + z = 12.

33
Standard deviation (s, )
It is the square root of variance.
It is average distance that scores
deviate from their mean.
=
1

34
Example 3: calculate standard deviation
35
Scores (x) Frequency(f) (d) d
2
fd
2
(ss)
100 6 100-115.5=-15.5 240.25 6*240.25
110 12 110-115.5= -5.5 30.25 12*30.25
120 16 120-115.5=4.5 20.25 16*20.25
130 6 130-115.5=14.5 210.25 6*210.25
Sum 40 3390.0
s =
3390
401
= 9.32

= 115.5
Summary:
When individual scores are close to
mean, the standard deviation (SD)
is smaller.

36
Summary
When individual scores are spread
out far from the mean, the
standard deviation is larger.
SD is always positive
It is typically reported with mean.

37
Choosing proper measure of central
tendency depends on:
the type of distribution
the scale of measurement
38
Mean describes data that are
normally distributed and measures
on an interval or ratio scale.
Median is used when the data are
not normally distributed.
39
Normal distribution
Probability: the frequency of times an
outcome is likely to occur divided by
the total number of possible
outcomes.
It varies between 0 and 1.
Example (next slide)
40
Normal distribution

41
Fail Pass Total
Male 3 2 5
Female 1 4 5
Total 4 6 10
1. What is the probability of Fail? 4/10 =.4
2. What is the probability of Pass? 6/10 = .6
3. What is the probability of Fail among males? 3/5 = .6
4. What is the probability of Pass among females? 4/5 = .8
Normal distribution/Normal curve
Data are symmetrically distributed
around mean, median, and mode.
Also called the symmetrical, Gaussian,
or bell-shaped distribution.

42
Normal curve
43
Normal curve
44
Characteristics of normal distribution
The normal distribution is
mathematically defined.
The normal distribution is theoretical.
The mean, median, and mode are all
the same value at the center of the
distribution.
45
The normal distribution is symmetrical.
The form of a normal distribution is
determined by its mean and standard
deviation.
Standard deviation can be any positive
value.

46
The total area under the curve is equal
to 1.
The tails of normal distribution are
always approaching to x axis, but never
touch it.
47
Normal distribution/Normal curve
We use normal distribution to locate
probabilities for scores.
The area under the curve can be used
to determine the probabilities at
different points.

48
49
Proportions of area under the normal curve
Normal distribution: the standard
deviation indicates precisely how the
scores are distributed. Empirical rule:
About 68% of all scores lie within one
standard deviation of the mean. In
another word, roughly two thirds of
the scores lie between one standard
deviation on either side of the mean.
50
Normal distribution
About 95% of all scores lie within two
standard deviation of the mean
(Normal scores: close to the mean).
About 99.7% of all scores lie within
three standard deviation of the mean.

51
In another word, we have 95% chance of
selecting a score that is within 2 standard
deviation of mean.
Less than 5% scores are far from the
mean (NOT normal scores).
52
Standard normal distribution or Z
distribution
A normal distribution with mean = 0,
and standard deviation = 1.
A Z score is a value on the x-axis of a
standard normal distribution

53
Standard normal distribution or Z
distribution

54
z transformation
z =

55
X means individual value, M is mean and SD is standard
deviation.
In SPSS, go to Analyze > Descriptive Statistics >
Descriptives to get Z scores
Normal table/z table

56
How to use z table?
Example: a sample of scores are
approximately distributed normally
with mean 8 and standard deviation 2.
What is the probability of score lower
than 6?

57
How to use z table?
Transform a raw score 6 into a z score
z = (6-8)/2=-1
Check the normal table p (probability)
= 0.5-0.34=0.16
The probability of obtaining score less
than 6 is 16%
58
59
Descriptive statistics in SPSS
Frequencies
Descriptives
Explore

60
Exercise: use 2011 YRBSS data
Use Explore function to get descriptive
statistics for Q6 (height)
Analyze > Descriptive Statistics >
Explore

61
62
SPSS output
63
SPSS output: Normal Quantile-Quantile (Q-
Q) plot
64
The goal of statistics is to make
inferences about a population based
upon information obtained in a sample.
Hypothesis testing is the method we use
to test a claim or hypothesis about a
parameter in a population using
observed data.
65
Steps of hypothesis testing
State a hypothesis
Set the criterion
Compare what we observe with
what we expect.
Make a decision

66
Five elements in significance test
Assumptions
Hypotheses
Test statistics
P-value
conclusion
67
Assumptions
Type of data
Form of the population distribution
Method of sampling
Sample size
68
State a hypothesis
Null hypothesis (H
0
): in a hypothesis testing,
we start by assuming the null hypothesis is
true.
Alternative hypothesis: directly contradicts
null hypothesis
The hypothesis testing is all about testing
null hypothesis.

69
Null hypothesis (H
0
): populations values are
NOT different from each other.
Example: H
0
: There is NO difference in
blood pressure between treatment group
and control group among patients.
Example: H
0
:
1
=
2
or
1
-
2
= 0
Example: H
0
: two samples are drawn from
the same population.
70
Alternative hypothesis (H
1
): populations
values are different from each other.
Example: H
1
= There is difference in blood
pressure between treatment group and control
group among patients (nondirectional-two-tailed
test).
Example: or H
1
= The blood pressure of treatment
group is lower than the blood pressure of control
group (directional-one-tailed test).

71
Or two samples are drawn from the
different populations.
Or H
1
:
1
-
2
0
Or H
1
:
1
-
2
> 0
Or H
1
:
1
-
2
< 0

72
The difference in blood pressure between
treatment and control group is because of
random error or chance (not statistically
different).
Or the difference is large enough to conclude
that blood pressure values are statistically
different between two groups or because of
treatment effect.
73
Set the criterion: set the level of
significance (a prespecified cutoff
point)
Typically set at 0.05 ( level) or 0.01.
The smaller the level, the stronger
the evidence must be to reject H
0
.

74
P-value
P-value is the probability of
obtaining test statistic from sample
data when null hypothesis is true.
If p-value is less than 5%, we reject
the null hypothesis (why?).

75
Compute the test statistic
Test statistic: such as t, F values
(obtained value depends on tests used
in data analysis): measures the extent
of apparent departure from H
0
.
76
Example: we want to know whether there is
a difference between gender in height.
Two variables: Q2 (gender) and Q6 (height)
We want to compare two means
H
0
:
1
(female) =
2
(male)

H
1
:
1
(female)
2
(male)

77
We use independent-samples t test to
get t statistic.
78
The value of test statistic is used to
make a decision regarding the null
hypothesis: compare test statistic to
the critical value.

79
80
Obtain critical value: it depends on
degree of freedom.
A cut-off value
We need to look at t test table for example
to obtain critical value.
If the test statistics is less than the critical
value, then you fail to reject the null
hypothesis.
Make a decision
Compare test statistic to critical value
p value: p value is the probability of
obtaining a test statistic given that null
hypothesis is true.
Significance: when p < .05, reject the
null hypothesis, we reach significance.

81
Make a decision: use t test example
t value is -98.11
df = 13997
Look at t table to get critical value
It is equal to 1.96
t value < -1.96
Reject Null hypothesis
82
Critical region
83
(Neutens & Rubinson, 1997)
Types of errors
Type I error (): the probability of rejecting
a null hypothesis that is actually true.
Type II error (): the probability of retaining
a null hypothesis that is actually false.
84
Truth in the
population
Fail to reject the Null Reject the Null
True Correct 1- Type I error
False Type II error Correct 1- (power)
Type I and Type II errors are inversely
related, which means the smaller the
level, the larger the Type II error.
To keep both errors low, large sample size
is important.

85
Power: the probability of rejecting H
0
when it is in fact false.
Power = 1- ( is Type II error)
Statistical power is the ability to
detect a true effect.
86
Power
If statistical power is high, the
probability of making a Type II
error, or concluding there is no
effect when, in fact, there is one,
goes down.

87
Power
For example, 80% power in a clinical
trial means that the study has a 80%
chance of ending up with a p value of
less than 5% in a statistical test (i.e. a
statistically significant treatment
effect) if there really was an important
difference between treatments.
88
Effect size
Example: how much of an effect/a
difference the intervention had/made
(magnitude of intervention effect).
We want to know if the intervention
effect is large or small, meaningful or
trivial.

89
Effect size
Mean difference
Correlation coefficient
Odds ratio
R
2

90
The relationship between effect size and
power and sample size
When effect size increases, power
increases.
When sample size is large enough, the
power increases.
Example: Cohens d = M1 - M2 / s
pooled

91
92
Test means: t tests and Analysis of
variance
T tests
one sample t test
Independent-samples t test
Paired-samples t test
Test means: t tests and Analysis of
variance
Analysis of variance (ANOVA)
One-way/two-way between subject
design
One-way/two-way within subject
design
Mixed design

93
Correlation
Linear regression
Non-parametric tests
Chi-Square tests
Sign test
Wilcoxon signed-rank t test
Mann-Whitney U test
Kruskal-Wallis H test
Friedman test
94
SPSS demonstration
95
SPSS demonstration

96
SPSS demonstration
97
SPSS demonstration
98
T-test for two independent sample
means
Example: we want to know if there is a
gender (Q2) difference in height.
H
0
:
1
(Female) =
2
(Male);
H
1
:
1
(Female)
2
(Male)
99
Think about the following.
The mean differences by two groups
can be due to chance.
sampling and measurement error.
Tests and measuring instrument used
to collect data are not perfect.

100
Calculate t value: SPSS can do that for us.
=
1
2
1
+
2
2
2

101
T-test for two independent sample means

102
t = -98.15 < t
critical
= -1.96 (critical value), p < .05. There
is a significant difference in height between females
and males. When sample size is greater than 120, t
critical
= 1.96 at = 0.05.
Confidence interval (CI) for a Mean
A CI for a parameter is a range of
numbers within which the parameter is
believed to fall.
103
104
Standard error
is the standard deviation of a
sampling distribution of sample
means. It is the distance that
sample mean values deviate from
the value of the population mean.
105
How to calculate CI?
Compute sample mean and standard error.
Choose the level of confidence interval
and find the critical value at the level of
confidence.
Compute the estimation formula to find
the confidence interval

106
95% CI =
critical value at 95%

level ( = .05) * standard error
Example 4: two-independent sample
t-test, gender differences in height.
We use YRBSS 2011 data.
Q2 (Gender) and Q6 (height)
107
Two-Independent sample t-test: go to Analyze
> Compare Means > Independent Sample T
Tests

108
Click Option > Choose 95%

109
Output

110
Output
111
In this case, critical value = 1.96
(check t distribution table, df = )
What do we learn from the 95% CI of
mean difference?
112
Basic statistics
References
Agresti, A. & Finlay, B. (1997). Statistical
methods for the social sciences. Upper
Saddle River, NJ. Prentice Hall, Inc.
Neutens, J. J., & Rubinson, L. (1997).
Research techniques for the health
sciences. Needham Heights, MA. Allyn &
Bacon.
113
Basic statistics
References
Privitera, G. J. (2012). Statistics for the
behavioral sciences. Thousand Oaks,
CA. SAGE Publications, Inc.

114

Basic Statistics 9 19 2013

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Basic Statistics 9 19 2013

Transféré par

Droits d'auteur :

Formats disponibles

Basic statistics

(or M) for sample mean and for

critical value at 95%

Vous aimerez peut-être aussi