Académique Documents
Professionnel Documents
Culture Documents
Pearson’s Chi‐Square Used to determine if a statistically significant association exists
between a categorical dependent variable and a categorical
independent variable
t‐test Used to determine if means are statistically different:
• Between a sample and a population
• Between two (independent) samples
• Between paired samples
ANOVA Used to determine if a dependent variable has a significant
dependence on one or more independent variables
MANOVA As with ANOVA, but for more than one dependent variable
(Multivariable ANOVA)
Population Sample
Mean
x x
Standard Deviation
sx sx
Variance 2 2
sx sx
Probability Density Functions (PDF)
PDF: A statistical measure that defines a
probability distribution for a random variable and
is often denoted as p(x). When the PDF function is
graphically portrayed, the area under the graph
will indicate the interval under which the variable
with fall.
Probability Density Functions (PDF)
An especially useful PDF is for a Gaussian
distribution or normal distribution:
Probability Density Functions (PDF)
Note, for example, represents 68.26% of the
population.
Probability Density Functions (PDF):
Properties of Normal Distribution
.
p(x) > 0 (positive)
(unit area)
The probability of a sample occurring within a
given range is the area under the curve
The probability of a measurement being within:
±1 = 68.26% ±2 = 95.44% (95% confidence)
±3 = 99.73% ±4 = 99.98%
±5 = 99.9999% ±6 = 99.9999998%
To Provide a “Generic” PDF
Define a non-dimensional variable:
x - mx
z=
sx
For a z = 2.41
p‐value the probability of observing an effect given that
the null hypothesis is true (high “p” means null
hypothesis must be accepted)
the level of significance (a “threshold” for
accepting or rejecting the null hypothesis)
Error Types Explanation
Type 1 Rejecting the null hypothesis when it is true is called a
Type 1 error. Depending on the confidence level, there is
always a small probability that the sample mean falls in
the “reject” region of the distribution curve.
Type 2 Accepting the null hypothesis when it is false is called a
Type 2 error. Again, depending on the confidence level,
there is always a probability of accepting the null
hypothesis even when the means are different.
MEEN 404 – ENGINEERING LABORATORY
Example of Statistics
Example: Students Measure a Steady-
State Temperature
600 • Combined data
162 measurements Tavg -
590
= +/- 9.91oF from 16 students
Tavg = 540.4oF using different TCs
580
Tavg +
570 • Average = 540.37F
Temperature ( F)
560
550
• Std dev = 1.8%
540
• Reasons for
530
random errors:
520 calibration
510 mounting
500
temp gradients
extension wire
490
DAS noise
480 analog to digital
0 20 40 60 80 100 120 140 160
mV to temperature
Sample Number other
Does the data fit a “normal”
distribution?
50
Tavg = 540.37oF • The data appears
45 to have the
Number of measurements
40 characteristics of a
“normal”
35
distribution.
30
25
20
15
10
0
500 510 520 530 540 550 560 570 580
Temperature (oF)
Example from Student No. 1
600
# T (oF) Student 01
590
s = +/- 8.61oF (+/- 1.6%)
1 540 580 Tavg = 539.1oF
Tavg + s Tavg - s
2 548 570
Temperature ( F)
o 560
3 533
550
4 548 540
5 533 530
520
6 533
510
7 524 500
8 540 490
480
9 552 0 1 2 3 4 5 6 7 8 9 10
560
• Average = 539.1F
550
• Sample deviation =
540
8.61F
530
520
510
500
490
480
0 1 2 3 4 5 6 7 8 9 10
Sample Number
Example from Student No. 9
• Average = 536.2F
o
560 Tavg - s
520
510
500
490
480
80 81 82 83 84 85 86 87 88 89 90
Sample Number
Example from Student No. 15
560
• Average = 545.1F
550
• Sample deviation =
540
11.5F
530
520
510
500
490
480
140 141 142 143 144 145 146 147 148 149 150
Sample Number
Summary
Student Number of Average Standard Standard • Data from “all” may be
Measurements (F) Deviation Deviation considered the total
or Sample or Sample population.
Deviation Deviation • The ten individual
(F) (%) student measurements
All 162 540.4 ±9.91 ±1.8 are “samples.”
01 10 539.1 ±8.61 ±1.6 • Note: each individual
student may consider
09 10 536.2 ±8.93 ±1.7
their measurements to
15 10 545.1 ±11.5 ±2.1 be “correct.” Of course,
with knowledge of “all,”
we know that the
samples are
approximations for the
full population.
Small Sample Sizes:
Using the “t-test”
Small Sample Sizes:
Using the “t-test”
Is the mean from a sample equivalent
to the mean from the (whole)
population?
This question is most important for
small sample sizes (say, less than 30).
6
"Two Tails"
t-value
3
Student’s t-test. 2
95% Confidence (p = 0.05)
0
0 10 20 30
t-value
3
0
0 10 20 30
Confidence Intervals:
Interval around the likely population mean
p-value:
The probability that the means are not different (supporting
the null hypothesis). Note: 0<p<1 – so, a p-value of 0.99
means that in 99 cases out of 100, the result would support
the null hypothesis. Conversely, a p-value of 0.01 means that
only 1 case out of 100 would support the null hypothesis
(i.e., the means are significantly different).
Using the “t-test”
Types of “t-tests” ––
One-sample t-test: used to compare a sample mean
with a known population mean
Independent samples t-test: used to compare two
(unpaired) means from two independent groups
(either with equal population variances or unequal
population variances)
Paired samples t-test:
(a) used to compare two means that are repeated measures
for the same participants
(b) used to compare paired samples
Using the “t-test”
Type of Test t-test Degrees of *Typically,
Freedom whether the
One-sample x-μ population
df = n - 1 variances are
sx equal or not is
n not known. The
Independent conservative
samples
x1 - x2 approach is to
(equal population 1 1 df = n1 + n2 - 2 assume unequal
variances* – not
Sp population
n1 n2 variances
equal sample (Si2)
variances) (Sp later) unless there is
contrary
information.
Independent
samples x1 - x2
(unequal S12 S22 (see later page)
population
n1 n2
variances*)
Using the “t-test”
Independent samples with equal variances
SP
n1 1 S12 n2 1 S22
n1 n2 2
1 2
df n1 n2
2 2
S1 S2
2 2
n1 n2
n1 1 n2 1
Using the “t-test”
Type of Test t-test Degrees of
Freedom
Paired groups: xi
for example, data “before” sd df = n - 1
and “after” using the same
subjects
n
1 n
xi xi and xi x1i x2i
n i 1
1 n
xi xi
2
Sd
n 1 i 1
Using the “t-test”
More on the “paired t-test”:
Also called the “correlated groups” t-test.
This is used when you have two samples which consist
of “paired” items
Two major situations for this:
• You have two measures on the same subjects (for example,
“before” and “after”)
• You have two separate samples but the subjects are individually
matched (but not the same subjects)
Using the “t-test”
Examples of “paired t-test” –
Example of “before” and “after” pairing:
Subject Before After
Treatment Treatment
1 50 55
2 52 57
3 44 49
4 42 47
5 49 54
This result may be stated as: The sample mean of “79.12” is not
statistically different than the population mean of “75”
Example 1B: Math Scores
A teacher wants to have 90% confidence that the average
from six (6) students can be statistically valid for the full
class. She wants the average of the full class to be at
least a score of 70.
The mean from the six (6) students is 79.17 and the
sample standard deviation is ±13.17.
Note: although 79.17 is greater than the target “70” – is
this just the result of six (6) “smarter” (random)
students or do the statistics support this as a
representative mean for the full class?
Need to compute the t-value and compare to the
tabulated t-value for a “df” of 5 at 90% confidence
( = 0.10).
Example 1B: Math Scores
This is a one-tailed test since we are interested in a
“greater than” comparison.
Obtaining the “t-value” for the sample in the example:
x
t-value sample
sx
n
79.17 70 9.17
t-value sample 1.71
13.17 5.38
6
Group Scores
A 26 26 26 21 18
21 19 25 23
29 22 22 24
23 22
B 18 20 20 26 17
23 20 16 21
18 21 29 20
25 19
Example 2: Independent Samples
Now, the question can be posed in two ways:
1. (with “direction”) performance will be better with type
A music
2. (with no “direction”) performance will be different
x1 - x2 23.13 20.87
t 1.905
S12 S22 8.55 12.55
n1 n2 15 15
Example 2: Independent Samples
For this case (the populations have unequal variances),
2
S 2
S
2
8.55 12.55
2
1
2
df 12 2 2
n n 15 15
2 2
27.03
S12 S22 8.55 12.55
n1 n2 15 15
n1 1 n2 1 14 14
i 1 2 3 4 5 6 7 8
x1i 44 62 59 29 78 79 92 38
x2i 46 58 56 26 72 80 90 35
xi -2 4 3 3 6 -1 2 3
1
xi 2 4 3 3 6 1 2 3 2.25
8
Sd
1
8 1
2 2 2
2 2.25 4 2.25 ... 3 2.25 2.60
Example 3: Paired Samples
Now, the t test statistic:
xi 2.25
t 2.45
sd 2.60 / 8
n
The ttable value for 95% confidence and df = 7 (two-tails):
ttable 2.365
Since the t test statistic is greater than the ttable value, the
null hypothesis may be rejected. In other words, there is
statistical evidence that the operators are different.
Another use of the “t” statistic:
Recall that for a large number of samples, an individual
measurement was known with 95% confidence to be
within ±1.96 of the population mean.
For a few samples, however, the probability is that these
measurements would come from near the center portion
of the distribution. These measurements would appear to
indicate a small standard deviation which would not
represent the whole population.
Another use of the “t” statistic:
For small sample sizes, therefore, the following
expression is a better estimate for the 95% confidence
limits: t95%,df sx
where t95%,df is the t-statistic for 95% confidence with
degree of freedom (df), and sx is the sample standard
deviation.
Recall that for n>30, the t-statistic remains at about “2”
for 95% confidence, and this agrees with the “1.96”
which is used for cases with a “large” number of
samples.
Another use of the “t” statistic:
Recall, the “t” statistic provides an estimate for how
close the sample mean ( x ) is to the population mean (µ):
x ts
t or x
s n
n
20
144 values
Tmean = 75.00oF
sT = 5.15oF
15
Freqrency
10
-2sT +2sT
5
0
60 70 80 90
o
Temperature ( F)
Example of Small Sample Size
This figure shows the effect of sample size on the mean
temperature. These values were obtained by a random
selection from the full distribution. Note that the scatter is
somewhat greater for the small sample sizes.
78
144 values
Tmean = 75.00oF
77 sT = 5.15oF
Temperaturemean (oF)
76
75
74
73
Tmean (144)
72
0 10 20 30 40 50 60
Sample Size
Example of Small Sample Size
This figure shows the sample standard deviation as a
function of the sample size. Again, for small sample sizes
the scatter is greater, and as the sample size increases the
scatter decreases. Generally, n > 30 is considered not a
small sample.
10
144 values
Tmean = 75.00oF
8 sT = 5.15oF
6
sT ( F)
o
sT (144)
0
0 10 20 30 40 50 60
Sample Size
Some Issues with the “t-test”
Although the “t-test” works well and is fairly easy to use,
it does have limitations:
It is limited to two levels (such as, method A and B;
before and after; … )
It is not possible to consider more than one
independent variable
It is not convenient for categorical variables (different
types of insulation, different steel alloys, … )
These results are consistent with the earlier “hand” calculations.
The conclusion is that the 79.12 mean is not statistically different
than the population mean of 75. Note that the probability (0.168)
is higher than 0.05 (95% confidence). This means that the null
hypothesis can not be rejected.
End: Application of Statistics