08 Statistics Applications and Examples Fall 2019

Application of Statistics – part I
Experimental data must

be interpreted using
statistics
All Data Must be Interpreted
 The measured value is almost always not the
true value
 The difference is often classified as a type of
error:
• Bias or systematic error
• Random error – due to uncontrollable variation
 Random errors are often due to items such as
electrical noise, friction, human error,
insufficient sensitivity of the measuring system
 The role of statistics is to provide an
assessment of the measured value as a
reasonable representation of the actual value.
Which Statistical Method?
There are hundreds of statistical tests that may be
applied to experimental data. They are based on
different assumptions and are used for different
purposes. The statistical tests described in this
course are some of the most common ones used
in engineering experimentation. Choosing the
correct statistical test largely depends on the
number and nature of the variables in the
experiment.
Continuous and Categorical Variables
Continuous variables are quantitative, numeric and
ordered sequentially. Examples: weight, height, force,
elapsed time, … Other names for these variables are
interval and ratio.
Categorical variables are qualitative and cannot be

numerically ordered. Examples: color, tire brands, material
(steel, aluminum, copper, …), process method, …
Also called nominal variables. In some experiments, the
dependent variable may be a categorical variable.
Sometimes a categorical variable is reported as a number
(the count of such items), but this does mean that it is a
continuous variable.
Choosing the Appropriate Statistical Test
Comparing Dependent Variable Independent Test
Variable(s)
The means of two 1, continuous 1, categorical*, t‐test: independent
independent groups only two levels
The means of two paired 1, continuous 1, categorical*, t‐test: paired
samples only two levels
The means of independent 1, continuous 1, categorical*, ANOVA: one‐way
groups (levels) two or more
levels
The means of independent 1, continuous 2, categorical*, ANOVA: two‐way
levels
The means of independent 1, continuous 3, categorical*, ANOVA: three‐way
levels
Assessing the relationship 1, categorical 1, categorical, Pearson’s
between the dependent two or more Chi‐Square
and independent variables levels
*Discrete continuous
variables may be used.
See following.
*What are “discrete” continuous variables?
To use most of these statistical tools, the

independent variables must have the same
levels. For example for temperature as an
independent variable, acceptable discrete
levels might be: 50°F, 100°F, 150°F, and 200°F.
You could not apply the statistical tools if the
independent levels were not discrete. In
other words, tests were performed and the
temperature was allowed to vary. Results
were not obtained at the same temperatures.
Summary of Statistical Tests
TEST DESCRIPTION
Pearson’s Chi‐Square Used to determine if a statistically significant association exists
between a categorical dependent variable and a categorical
independent variable
t‐test Used to determine if means are statistically different:
• Between a sample and a population
• Between two (independent) samples
• Between paired samples
ANOVA Used to determine if a dependent variable has a significant
dependence on one or more independent variables
MANOVA As with ANOVA, but for more than one dependent variable
(Multivariable ANOVA)
*More on these statistical tools will be presented

throughout the rest of the course.
Terminology for Describing the
Statistics of Experiments
x the quantity to be determined
 xt the true value (unknown)
 x 1, x 2, … the measured values
N number of items in the population
n number of items in the sample
 x the population mean
• average of all items in population
• population mean is closest to the true
value if there are no bias errors
 x the sample mean
• average of all items in sample
• the sample mean approaches the
population mean as “n” increases
Sample mean: n
1
x=
n åx
i =1
i
sx the standard deviation of the population

2
sx the population variance
sx the sample deviation
2
sx the sample variance
0.5
æ1 N ö
2÷
çç
Standard deviation: sx = ç
ççè N å
i =1
( xi - m) ÷÷÷
÷ø
The population standard deviation is based on the

square root of the sum of the squares of the
differences. This provides for a total deviation and
eliminates the issue of negative/positive
differences.
0.5
æ 1 n
2÷
ö
çç
Sample deviation: sx = ç
ççè n - 1 å
i =1
( xi - x ) ÷÷÷
÷ø
Note: (n-1) is used rather than (n)

This approach can be shown to provide a “sample
deviation” that is close to the population standard
deviation. Note the case of n=1. Also, note the
cases where n is large (then (n-1)~n).
Terminology – Summary
Population Sample
Mean
x x
Standard Deviation
sx sx
Variance 2 2
sx sx
Probability Density Functions (PDF)
PDF: A statistical measure that defines a
probability distribution for a random variable and
is often denoted as p(x). When the PDF function is
graphically portrayed, the area under the graph
will indicate the interval under which the variable
with fall.
An especially useful PDF is for a Gaussian
distribution or normal distribution:
Note, for example,  represents 68.26% of the
population.
Probability Density Functions (PDF):
Properties of Normal Distribution
 .
 p(x) > 0 (positive)
 (unit area)
 The probability of a sample occurring within a
given range is the area under the curve
 The probability of a measurement being within:
±1  = 68.26% ±2  = 95.44% (95% confidence)
±3  = 99.73% ±4  = 99.98%
±5  = 99.9999% ±6  = 99.9999998%
To Provide a “Generic” PDF
Define a non-dimensional variable:
x - mx
z=
sx
For a z = 2.41
Therefore, for z,

99.20% of the
population is
captured.
Some More
Terminology and
Examples
Some Other Terminology
Word/Phrase Definition
Null Hypothesis the hypothesis that sample observations result
purely from chance and there is no difference
between different means (denoted by H0)
p‐value the probability of observing an effect given that
the null hypothesis is true (high “p” means null
hypothesis must be accepted)
 the level of significance (a “threshold” for
accepting or rejecting the null hypothesis)
As a matter of good scientific practice, a significance level () is

chosen before data collection and is usually set to 0.05 (5%). This is
the same as 95% confidence. The “p-value” is compared to the
-value to determine if an observed outcome is statistically
significant. So, a p-value less than  indicates that the null hypothesis
may be rejected, and therefore, the results are statistically different.
Note on
“Not Rejecting the Null Hypothesis”
The conclusion that the null hypothesis cannot be
rejected must be stated as, “not rejecting H0.” This is
different than stating, “the null hypothesis is true.”
This situation is similar to the distinction that is
prevalent in our justice system. If a jury finds the
defendant “not guilty” – that is different than finding the
defendant “innocent.” In other words, the jury was not
convinced of the defendant’s guilt, but likewise, they have
not necessarily been convinced of the defendant’s
innocence.
Some Other Terminology
For some cases, an incorrect hypothesis testing
decision is made. Two types of errors are defined:
Error Types Explanation
Type 1 Rejecting the null hypothesis when it is true is called a
Type 1 error. Depending on the confidence level, there is
always a small probability that the sample mean falls in
the “reject” region of the distribution curve.
Type 2 Accepting the null hypothesis when it is false is called a
Type 2 error. Again, depending on the confidence level,
there is always a probability of accepting the null
hypothesis even when the means are different.
MEEN 404 – ENGINEERING LABORATORY
Example of Statistics
Example: Students Measure a Steady-
State Temperature
600 • Combined data
162 measurements Tavg - 
590
 = +/- 9.91oF from 16 students
Tavg = 540.4oF using different TCs
580
Tavg + 
570 • Average = 540.37F
Temperature ( F)
• Std dev = 9.91F

o
560
550
• Std dev = 1.8%
540
• Reasons for
530
random errors:
520 calibration
510 mounting
500
temp gradients
extension wire
490
DAS noise
480 analog to digital
0 20 40 60 80 100 120 140 160
mV to temperature
Sample Number other
Does the data fit a “normal”
distribution?
50
Tavg = 540.37oF • The data appears
45 to have the
Number of measurements
40 characteristics of a
“normal”
35
distribution.
30
25
20
15
10
0
500 510 520 530 540 550 560 570 580
Temperature (oF)
Example from Student No. 1
600
# T (oF) Student 01
590
s = +/- 8.61oF (+/- 1.6%)
1 540 580 Tavg = 539.1oF
Tavg + s Tavg - s
2 548 570
Temperature ( F)
o 560
3 533
550
4 548 540
5 533 530
520
6 533
510
7 524 500
8 540 490
480
9 552 0 1 2 3 4 5 6 7 8 9 10
10 540 Sample Number
• Data from student no. 1

• Ten (10) measurements
• Average = 539.1F
• Sample deviation = 8.61F
600
Student 01 • Data from student
590
s = +/- 8.61oF (+/- 1.6%) no. 1
580 Tavg = 539.1oF
Tavg + s Tavg - s • Ten (10)
570
measurements
Temperature ( F)
o
560
550
• Sample deviation =
540
8.61F
530
520
510
500
490
480
0 1 2 3 4 5 6 7 8 9 10
Sample Number
600 • Data from student

Student 09 no. 9
590
s = +/-8.93oF (+/-1.7%)
580 Tavg = 536.2oF • Ten (10)
Tavg + s measurements
570
Temperature ( F)
o
560 Tavg - s
550 • Sample deviation =

540 8.93F
530
520
510
500
490
480
80 81 82 83 84 85 86 87 88 89 90
Sample Number
600 • Data from student

590 Student 15 no. 15
s = +/-11.5oF (+/-2.1%) Tavg = 545.1oF
580 • Ten (10)
Tavg + s Tavg - s
570 measurements
Temperature ( F)
o
560
550
• Sample deviation =
540
11.5F
530
520
510
500
490
480
140 141 142 143 144 145 146 147 148 149 150
Sample Number
Summary
Student Number of Average Standard Standard • Data from “all” may be
Measurements (F) Deviation Deviation considered the total
or Sample or Sample population.
Deviation Deviation • The ten individual
(F) (%) student measurements
All 162 540.4 ±9.91 ±1.8 are “samples.”
01 10 539.1 ±8.61 ±1.6 • Note: each individual
student may consider
09 10 536.2 ±8.93 ±1.7
their measurements to
15 10 545.1 ±11.5 ±2.1 be “correct.” Of course,
with knowledge of “all,”
we know that the
samples are
approximations for the
full population.
Small Sample Sizes:
Using the “t-test”
Small Sample Sizes:
 Is the mean from a sample equivalent
to the mean from the (whole)
population?
 This question is most important for
small sample sizes (say, less than 30).
6
"Two Tails"
 In 1908, W. Gosset published a way to

do this. He used the pseudonym 4
“Student” – now this is know as
t-value
3
Student’s t-test. 2
95% Confidence (p = 0.05)
0
0 10 20 30
Degree of Freedom (df)

Small Sample Sizes: Using the “t-test”
 The t-test assumes the population possesses a Gaussian

distribution, and recognizes that the sample deviation (sx)
is usually less than the standard deviation (x).
 To obtain a “t value,” the degrees of freedom and the
desired confidence level are needed.
 The degrees of freedom is simply the number of samples
minus one: df = #samples – 1. (Note: In statistics, the
number of degrees of freedom is the number of values in
the final calculation of a statistic that are free to vary.)
t Table
 Determine the “t value” by using the “df” at the desired
confidence level.
 Note that as the “df” increases the “t-value” reaches a
minimum.
6
"Two Tails"
t-value
3

2
0
0 10 20 30
Degree of Freedom (df)

Notes on Using the t-test
 May use as a one sided or two sided distribution

 One sided: with 95% confidence, the mean is at least µ
 Two sided: with 95% confidence, the mean is equal to µ ±sx
 Examples for a one and two sided distributions
Some Terminology Relative to
t-test Statistics
Two measures of variability:
 The standard deviation of the samples (Sx)
 The standard error of the mean (SE Mean)
Confidence Intervals:
 Interval around the likely population mean
p-value:
 The probability that the means are not different (supporting
the null hypothesis). Note: 0<p<1 – so, a p-value of 0.99
means that in 99 cases out of 100, the result would support
the null hypothesis. Conversely, a p-value of 0.01 means that
only 1 case out of 100 would support the null hypothesis
(i.e., the means are significantly different).
Types of “t-tests” ––
 One-sample t-test: used to compare a sample mean
with a known population mean
 Independent samples t-test: used to compare two
(unpaired) means from two independent groups
(either with equal population variances or unequal
population variances)
 Paired samples t-test:
 (a) used to compare two means that are repeated measures
for the same participants
 (b) used to compare paired samples
Type of Test t-test Degrees of *Typically,
Freedom whether the
One-sample x-μ population
df = n - 1 variances are
sx equal or not is
n not known. The
Independent conservative
samples
x1 - x2 approach is to
(equal population 1 1 df = n1 + n2 - 2 assume unequal
variances* – not
Sp  population
n1 n2 variances
equal sample (Si2)
variances) (Sp later) unless there is
contrary
information.
Independent
samples x1 - x2
(unequal S12 S22 (see later page)
population 
n1 n2
variances*)
Independent samples with equal variances
Need to estimate the pooled standard deviation (SP)
SP 
 n1  1 S12   n2  1 S22
n1  n2  2
Note that this “weights” the sample standard deviations by

size of the sample (n1 and n2). Also note, for S1 equal to S2,
the expression results in SP = S1 = S2.
Independent samples with unequal variances
The expression for “df” (round down to an integer):

2
S 2
S 
2
  
1 2
df   n1 n2 
2 2
 S1   S2 
2 2
   
 n1    n2 
n1  1 n2  1
Type of Test t-test Degrees of
Freedom
Paired groups: xi
for example, data “before” sd df = n - 1
and “after” using the same
subjects
n
1 n
xi   xi and xi  x1i  x2i
n i 1
1 n
  xi  xi 
2
Sd 
n  1 i 1
More on the “paired t-test”:
 Also called the “correlated groups” t-test.
 This is used when you have two samples which consist
of “paired” items
 Two major situations for this:
• You have two measures on the same subjects (for example,
“before” and “after”)
• You have two separate samples but the subjects are individually
matched (but not the same subjects)
Examples of “paired t-test” –
 Example of “before” and “after” pairing:
Subject Before After
Treatment Treatment
1 50 55
2 52 57
3 44 49
4 42 47
5 49 54
Note that the reason this is important is that the

comparison must be for the same subject. Consider the
bogus results if the comparison was done without the
pairing.
Examples of “paired t-test” –
 Example of “matched” pairs:
Twin Pair Twin 1 (control) Twin 2 (experimental group)
A 10 8
B 12 10
C 21 19
D 18 15
In this case, it is important to “pair” the twins so that the

experimental group is fairly compared to the control
group.
One-tail and Two-tail “t-tests”
In using the “t-test,” questions may be posed in two ways:
1. Are two different means in agreement within ± some
amount? In other words, no preference is given as to
whether one mean is higher or lower than the other
mean.  Use the two-tail t-test.
2. Is one mean greater (or smaller) than the other mean.
 Use the one-tail t-test.
1. For the two-tail t-test, for 95% confidence, the
“rejection” region is on both sides of the distribution
with 0.025 (2.5%) probability.
2. For the one-tail t-test, for 95% confidence, the
“rejection” region is on one side of the distribution with
0.05 (5.0%) probability.
You will notice that the tabulated values are listed for both
one and two tails, and the t-values are a factor of 2
different.
Examples:
Assume you are checking to see if the “pH” value of a
stream has changed over the past year.
1. If the question is whether the new “pH” value is “the
same” as the previous year, you would use the two-tail
t-test.
2. If the question is whether the new “pH” value is higher
than the value from the previous year, you would use
the one-tail t-test.
Summary of t-test Requirements
1. The dependent variable must be a continuous variable.
2. The independent variable must have only two levels.
3. There is no relationship between the two levels
(independence of the values).
4. More than one value at each level.
5. There are no “outliers.”
6. The dependent variable may be approximated by a
normal distribution.
7. The variances are approximately the same.
Note: Sometimes you will get a negative number for the t-

value. We are only interested in the absolute value. The
negative sign simply means that the smaller number was
considered first.
Example 1: Student Scores
Example 1: for a two sided distribution
 A sample of n = 25 has a mean of 79.12. Is this
statistically the same (with a 95% confidence) compared
to a mean of the full population of 75?
Item Sample Full Statistics

Population
n 25 n/a n/a
Mean 79.12 75 n/a
deviation 14.5 ??? n/a
t‐value for n/a n/a see following
sample
df n/a n/a 24
Confidence n/a n/a 95%
t‐value from n/a n/a 2.064
tables
Example 1
 Obtaining the “t-value” for the sample in the example:
x 
t-value sample 
sx
n
79.12  75 4.12
t-value sample    1.42
14.5 14.5 / 5
25
 Obtaining the tabulated value (df = 24): ttable = 2.064

 Since tsample < ttable, the null hypothesis cannot be
rejected. This means that the sample mean (79.12) is not
statistically different than the full population mean (75).
Example 1
Example for a two sided distribution:
Item Sample Full Statistics
Population
n 25 n/a n/a
Mean 79.12 75 n/a
deviation 14.5 ??? n/a
t‐value for n/a n/a 1.42
sample
df n/a n/a 24
Confidence n/a n/a 95%
t‐value from n/a n/a 2.064
tables
This result may be stated as: The sample mean of “79.12” is not
statistically different than the population mean of “75”
Example 1B: Math Scores
A teacher wants to have 90% confidence that the average
from six (6) students can be statistically valid for the full
class. She wants the average of the full class to be at
least a score of 70.
 The mean from the six (6) students is 79.17 and the
sample standard deviation is ±13.17.
 Note: although 79.17 is greater than the target “70” – is
this just the result of six (6) “smarter” (random)
students or do the statistics support this as a
representative mean for the full class?
 Need to compute the t-value and compare to the
tabulated t-value for a “df” of 5 at 90% confidence
( = 0.10).
 This is a one-tailed test since we are interested in a
“greater than” comparison.
 Obtaining the “t-value” for the sample in the example:
x 
t-value sample 
sx
n
79.17  70 9.17
t-value sample    1.71
13.17 5.38
6
 Obtaining the “t-value” from the tables for df = 5 at 90%:

t-valuetable  1.476
Recall:
 If tsample < ttable, this implies that the null hypothesis cannot
be rejected.
 If tsample > ttable, this implies that the null hypothesis can be
rejected.
For example 1B:
tsample = 1.71 and ttable = 1.476
 Therefore, the null hypothesis can be rejected.
 This means that the sample mean (79.17) is (with 90%
confidence) different from the full class average (70.0).
Consider how each parameter affects this comparison:
 If the mean from the six (6) students was higher, the
tsample would be even higher, and the “test” would be
even more positive. This is consistent since the mean
is still compared to “70” and this indicates even a
greater possibility.
 If the sample consisted of fewer students (a lower
“df”), then the ttable value would be higher. This means
that for a smaller sample size, the test is more
“difficult.”
Independent Samples
An example of this kind of “t-test” is the two-sample t-test
which is used to determine if two sample means are
statistically equal. In other words, are the two samples likely
to have come from the same population.
This is useful when your experiment is aimed at comparing
two ways to do something. Are your results significantly
different?
There are several versions of this:
 The data may be “paired” or “not paired.” Paired data have
a one-to-one correspondence: x1, y1, x2, y2, …
 The population variances may be assumed to be same or
different the two sets of data.
 Note: the two sample t-test is the same as a one-way ANOVA
with two levels.
Two-Sample t-test for Equal Means
The computations are not difficult, but somewhat tedious
– software such as “Minitab” is available that can
complete these computations quickly.
Some observations:
 The value of the “tsample” increases as:
• the difference between the means increases
• the sample standard deviation decreases
• the sample sizes increase
 The higher the “tsample” value the more likely the means
are significantly different
Example 2: Independent Samples
To examine the effects of background music, two groups
(A & B) of 15 college students were tested while two
different types of music played. Each student completed
a standardized test. The scores were:
Group Scores
A 26 26 26 21 18
21 19 25 23
29 22 22 24
23 22
B 18 20 20 26 17
23 20 16 21
18 21 29 20
25 19
Now, the question can be posed in two ways:
1. (with “direction”) performance will be better with type
A music
2. (with no “direction”) performance will be different
In the first case, the t-test would be “one-tail” and in the

second case, the t-test would be “two-tail”
For this example, assume that the question of interest is
“1”: is the performance better with type A music.
The null hypothesis is “the two types of music do not
affect the test performance.”
Means, sample standard deviations and variances:
Group Means SSD Variance
A 23.13 2.924 8.55
B 20.87 3.543 12.55
Assuming the populations have unequal variances,
x1 - x2 23.13  20.87
t   1.905
S12 S22 8.55 12.55
 
n1 n2 15 15
For this case (the populations have unequal variances),
2
S 2
S 
2
 8.55 12.55 
2

1
 2
   
df   12 2  2  
n n 15 15 
2 2
 27.03
 S12   S22   8.55   12.55 
       
 n1    n2   15   15 
n1  1 n2  1 14 14
To be conservative, the lowest integer is assigned:

df = 27
From the tables for one-tail, 95% confidence, df=27,
ttable = 1.703
Recall:
 If tdata < ttable, this implies that the null hypothesis
cannot be rejected.
 If tdata > ttable, this implies that the null hypothesis can
be rejected.
For example 2, tdata = 1.91 and ttable = 1.70, in other words:

tdata > ttable
and this means that the null hypothesis can be rejected.
This result could be stated:
“with 95% confidence the performance while music A
played was significantly better than the performance
while music B played.”
Using Minitab:
Example 3: Paired Samples
Eight parts were measured by two operators for the
purpose of determining if there was a difference between
the operators. Each operator measured the same part:
i 1 2 3 4 5 6 7 8
x1i 44 62 59 29 78 79 92 38
x2i 46 58 56 26 72 80 90 35
xi -2 4 3 3 6 -1 2 3
1
xi   2  4  3  3  6  1  2  3  2.25
8
Sd 
1
8 1
 2 2 2

 2  2.25   4  2.25  ...   3  2.25  2.60
Example 3: Paired Samples
Now, the t test statistic:
xi 2.25
t   2.45
sd 2.60 / 8
n
The ttable value for 95% confidence and df = 7 (two-tails):
ttable  2.365
Since the t test statistic is greater than the ttable value, the
null hypothesis may be rejected. In other words, there is
statistical evidence that the operators are different.
Another use of the “t” statistic:
Recall that for a large number of samples, an individual
measurement was known with 95% confidence to be
within ±1.96 of the population mean.
For a few samples, however, the probability is that these
measurements would come from near the center portion
of the distribution. These measurements would appear to
indicate a small standard deviation which would not
represent the whole population.
For small sample sizes, therefore, the following
expression is a better estimate for the 95% confidence
limits: t95%,df sx
where t95%,df is the t-statistic for 95% confidence with
degree of freedom (df), and sx is the sample standard
deviation.
Recall that for n>30, the t-statistic remains at about “2”
for 95% confidence, and this agrees with the “1.96”
which is used for cases with a “large” number of
samples.
Recall, the “t” statistic provides an estimate for how
close the sample mean ( x ) is to the population mean (µ):
x  ts
t or  x 
s n
n
Note that as n  , x  µ. Also, for small sample sizes,

“t” is large which better approximates the difference.
Example of Small Sample Size
Consider 144 measurements of a temperature. The
distribution of the values is given below. The mean is
75.00°F with a sample standard deviation (sT) of 5.15°F.
20
144 values
Tmean = 75.00oF
sT = 5.15oF
15
Freqrency
10
-2sT +2sT
5
0
60 70 80 90
o
Temperature ( F)
This figure shows the effect of sample size on the mean
temperature. These values were obtained by a random
selection from the full distribution. Note that the scatter is
somewhat greater for the small sample sizes.
78
144 values
Tmean = 75.00oF
77 sT = 5.15oF
Temperaturemean (oF)
76
75
74
73
Tmean (144)
72
0 10 20 30 40 50 60
Sample Size
This figure shows the sample standard deviation as a
function of the sample size. Again, for small sample sizes
the scatter is greater, and as the sample size increases the
scatter decreases. Generally, n > 30 is considered not a
small sample.
10
144 values
Tmean = 75.00oF
8 sT = 5.15oF
6
sT ( F)
o
sT (144)
0
0 10 20 30 40 50 60
Sample Size
Some Issues with the “t-test”
Although the “t-test” works well and is fairly easy to use,
it does have limitations:
 It is limited to two levels (such as, method A and B;
before and after; … )
 It is not possible to consider more than one
independent variable
 It is not convenient for categorical variables (different
types of insulation, different steel alloys, … )
Later we will introduce ANOVA (analysis of variances)

which is more general and can avoid the above
limitations.
Comments on Stating
the Statistical Results
Stating Statistical Results
In general, statistical results are simply providing
quantitative evaluations of the “likelihood” about
something. The important point is that at no time do
these results “prove” anything. All statistical results
have a (small) probability that they are wrong. No
experiment is perfect and “proof” in the strictest sense
requires perfection.
So, most statements concerning statistical results need
to be carefully worded. These statements should
describe the statistical procedures, assumptions,
confidence levels, and results.
Software for Statistics
of Experiments
Software for Statistics of Experiments
Many companies offer software packages for completing
statistical analyses – including ANOVA. Not all of these
are specific to experimental data.
 Design Expert (Stat-Ease): for planning and analyses
of experiments.
 JMP (SAS Institute): comprehensive suite of statistical
tools.
 Minitab: well established and complete program for
statistical analyses (TAMU availability).
 Systat: mature product; has every statistical
procedure that might be needed.
Minitab
TAMU has a license for a web based version of Minitab.
Go to: voal.tamu.edu (Virtual Open Access Lab)
Log in with **NetID**
Click on “VOAL” desktop tab
This takes you to a virtual desktop. Open the “start”
button and find the Minitab application.
You can now use Minitab.
For help: minitab.com
Minitab
Starting screen:
Minitab
Other items:
Some of the
applications that
are available:
1 sample t-test
2 sample t-test
Paired t-test
One-way ANOVA
Two-way ANOVA
Multiple
regression
Minitab
Some example results:
Some Example
Computations
 Recall the previous example where 25 scores with an average of
79.12 and a sample deviation of 14.5 was compared to the full
population mean of 75.
 The scores are: 97 55 97 79 62 99 56 90 90 87 86 84 91 75 59 64
65 79 67 90 77 94 90 89 56
 Activate Minitab and enter the scores in column “C1”
 Select “Stat” & “Basic Statistics” & “1 Sample t …”
 The results are next:
Minitab results:
These results are consistent with the earlier “hand” calculations.
The conclusion is that the 79.12 mean is not statistically different
than the population mean of 75. Note that the probability (0.168)
is higher than 0.05 (95% confidence). This means that the null
hypothesis can not be rejected.
End: Application of Statistics

08 Statistics Applications and Examples Fall 2019

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

08 Statistics Applications and Examples Fall 2019

Transféré par

Droits d'auteur :

Formats disponibles

Application of Statistics – part I

Experimental data must

Categorical variables are qualitative and cannot be

To use most of these statistical tools, the

*More on these statistical tools will be presented

sx the standard deviation of the population

The population standard deviation is based on the

Note: (n-1) is used rather than (n)

Therefore, for z,

As a matter of good scientific practice, a significance level () is

• Std dev = 9.91F

10 540 Sample Number

• Data from student no. 1

600 • Data from student

550 • Sample deviation =

600 • Data from student

 In 1908, W. Gosset published a way to

“Student” – now this is know as

80% Confidence (p = 0.20)

Degree of Freedom (df)

 The t-test assumes the population possesses a Gaussian

95% Confidence (p = 0.05)

80% Confidence (p = 0.20)

Degree of Freedom (df)

 May use as a one sided or two sided distribution

Need to estimate the pooled standard deviation (SP)

Note that this “weights” the sample standard deviations by

The expression for “df” (round down to an integer):

Note that the reason this is important is that the

In this case, it is important to “pair” the twins so that the

Note: Sometimes you will get a negative number for the t-

Item Sample Full Statistics

 Obtaining the tabulated value (df = 24): ttable = 2.064

 Obtaining the “t-value” from the tables for df = 5 at 90%:

In the first case, the t-test would be “one-tail” and in the

Assuming the populations have unequal variances,

To be conservative, the lowest integer is assigned:

For example 2, tdata = 1.91 and ttable = 1.70, in other words:

Note that as n  , x  µ. Also, for small sample sizes,

Later we will introduce ANOVA (analysis of variances)

Vous aimerez peut-être aussi