Académique Documents
Professionnel Documents
Culture Documents
1. If α is the significance level, all other symbols are of their conventional meaning, consider
the equation below, and discuss its application:
2
z σ
n = α /2
x − µ0
This is to estimate the minimum sample size that is necessary for a two-tail Z-test, where
we can conclude that the difference between sample mean and the setpoint is significant (at
a level of α) given the population variance.
X − µ0 20.05 − 20 0.05 50
(2) The test statistic is Z = = = = 0.707
σ / n 0.5 / 50 0.5
1.96
20.05 ± 0.5 = 20.05 ± 0.139 (19.911, 20.189)
zα / 2 1.96 50
µ=X± σ = 20.05 ± 0.5 =
n n 20.05 ± 1.96 0.5 = 20.05 ± 0.044 (20.006, 20.094)
500
For sample to, you are more confident that the population mean should be in a much smaller
range, which does not include 20. This means statistically, it has very low probability to be
20, (and thus high probability to be different from 20).
All above discussions are about the “statistical significance of difference”, i.e., whether the
data really show a detectable “difference”, or the real mean is really different from a
compared reference. However, is this difference really matter? For example, for Sample 2,
we know that the population mean is probable in a range of 20.006 and 20.094. The
maximum difference from the standard is 0.094 (almost not possible to be larger). Then,
does this 0.094 difference really matter? We call this “physical significance of difference”.
One may absolutely accept a batch of layers which is 0.094 micron (or 0.5%) different from
the standard. So it is not physically significant. In industry, we use “tolerance” to be a
criterion. For example, a factory is producing a batch of tubes with inner diameter of 1 mm,
with a tolerance of 0.1 mm. That means any tube whose inner diameter is between 0.99 mm
and 1.01 mm is acceptable. Statistical tool can only tell you from the sample information
whether your product’s specification is really different from the standard, but only engineers
can tell you whether this difference is physically important.
3. Consider “correlation” and “causal-effect” relationship: in studying the time of recovery
from disease A, we looked at the records of 2000 patients all over the world and categorized
them into three groups according to their IQ’s (namely, low, average, and high IQ groups),
and performed an ANOVA test. The result show that that people with higher IQ recovered
significantly faster. What does this result imply?
This result implies that the people’s IQ (independent variable) is correlated to the recovery
time from disease A (dependent variable). It does NOT necessarily mean that IQ imposes
an effect on the recovery time. The real deterministic factor of the recovery time could be
some other variables, for example, the diet. It is possible that the diet influences the IQ, and
influences the recovery time as well, thus people who are on a certain diet will get a high
IQ and get a shorter recovery time from the disease simultaneously, then the IQ looks
correlated with the recovery time. One should do further experiment to verify the conclusion.
4. One wants to study the effect of cover design on the sales of a book (same contents and
prices). Three different covers (namely, A, B, and C) are designed, and these books are
placed in the checkout line of a supermarket for sale. After a few weeks, one found the sale
record as in the Excel file “Tutorial_3_Data.xlsx” (empty cell means no available data).
Use MATLAB and EXCEL tool to find whether the design of the cover page correlated to
the sales of a book.
MATLAB code:
clear;
A=xlsread('Tutorial_3_Data.xlsx','$B:$B');
B=xlsread('Tutorial_3_Data.xlsx','$C:$C');
C=xlsread('Tutorial_3_Data.xlsx','$D:$D');
na=numel(A);
nb=numel(B);
nc=numel(C);
% Always check how many data are there in a group, for grouping later
Results:
P-value = 0.0012, a small number. This means the H1 (at least one mean differs) is
significant, i.e., the sale is correlated with the design of cover page.
Excel analysis will end up with the same P-value and thus the same conclusion. More
tutorial will be found in the video.