Vous êtes sur la page 1sur 9

7: Sampling Distribution Confidence Intervals for the  The sample size is increased

Population Proportion, π (n↑)


 Sampling error is the difference between  The distribution of the sample proportion is  The confidence level is
an estimate and the corresponding approximately normal if the sample size is decreased, (1 – ) ↓
population parameter. large, with standard deviation
Sampling error  X    We will estimate this with sample data σ σ Z 2 σ2
X Z e  zα/2 n
 Bias is the difference between the  Upper and lower confidence limits for the n n e2
expected value of the estimator and the population proportion are calculated with
corresponding parameter. Example: the formula: Z2 π (1  π )
n
Bias  E (X )   p(1 p) p(1 p)  (1   ) e2
 Standard Error of the Mean p  Z/2 σp 
n n n
σ
σX   where
n
 Zα/2 is the standard normal value
 If a population is normal with mean μ and for the level of confidence desired
standard deviation σ, the sampling  p is the sample proportion
distribution of is also normally distributed  n is the sample size
with: μ μ σ  Note: must have X = np > 5 and n – X =
X σX  n(1-p) > 5
n
Confidence Interval for a Population
8: Confidence Intervals Variance 2
 Assumptions:  If the population is normal, then the sample
Population variance σ2 is known variance s2 follows the chi-square distribution
Population is normally distributed (c2) with degrees of freedom d.f. = n–
If population is not normal, use large 1.
sample.
 σ σ
x  z α/2  μ  x  z α/2
n n
Sampling Error
Confidence Interval for μ
(σ Unknown)  The required sample size can be found to
 Assumptions: reach a desired margin of error (e) with a
Population standard deviation is unknown specified level of confidence (1 - )
Population is normally distributed  The margin of error is also called sampling
If population is not normal, use large error
sample S S  The margin of error can be reduced if
x  t n -1,α/2  μ  x  t n -1,α/2
n n  the population standard
deviation can be reduced (σ↓)
9: One-sample Hypothesis Tests  Decision Rule: If the test statistic falls
in the rejection region, reject H0;
otherwise do not reject H0
 P-Values:
P > 0.10 No evidence against the null
hypothesis.
0.05 < P < 0.10 Weak evidence
against the null hypothesis
0.01 < P < 0.05 Moderate evidence
against the null hypothesis

Errors in Hypothesis Test Decision Making

P < 0.01 Strong evidence against the


null hypothesis

t Test of Hypothesis for the Mean


(σ Unknown)
Confidence Level
 The confidence coefficient (1-α) is the
probability of not rejecting H0 when it is
true.
 The confidence level of a hypothesis test
is (1-α)*100%.
 The power of a statistical test (1-β) is the
probability of rejecting H0 when it is
false.
Hypothesis Tests for Proportions 10: Two-sample Hypothesis Tests
 Goal: Test hypothesis or form a
confidence interval for the difference
between two population means, μ1 –
μ2
 The point estimate for the difference is:

Hypothesis tests for μ1 – μ2


Type II Error
Hypothesis tests for µ1 - µ2 with σ1
and σ2 unknown and assumed equal

Hypothesis tests for µ1 - µ2 with σ1


and σ2 known and assumed equal
 Assumptions:
+ Samples are randomly and
independently drawn
+ Populations are normally
distributed or both sample
sizes are at least 30
+ Population variances are
unknown but assumed equal
Related Populations Two Population Proportions
The Paired Difference Test

Hypothesis tests for µ1 - µ2 with σ1


and σ2 unknown, not assumed equal
Assumptions:
 Population variances are
unknown and cannot be
assumed to be equal
Testing for the Ratio of Two  A one-factor ANOVA compares the
Population Variances means of c treatments (groups)
 Sample sizes within each treatment do
not need to be equal.
 The total number of observations:
n = n1 + n2 + … + nc

One-factor ANOVA as
a Linear Model
 yij comes from a population with a
11: Analysis of Variance (ANOVA)
common mean (m) plus a treatment
 Each possible value of a factor or effect (Aj) plus random error (eij):
combination of factors is a treatment. yij = m + Aj + eij (j = 1, 2, …, c and i =
The F Distribution  Test if each factor has a significant 1, 2, …, n)
effect on Y:  Random error is assumed to be
H0: m1 = m2 = m3 =…= mc normally distributed with zero mean
H1: Not all the means are equal and the same variance.
 If we cannot reject H0, we conclude that  Testing hypotheses:
observations within each treatment have H0: A1 = A2 = … = Ac = 0
the same mean m. H1: Not all Aj are zero
ANOVA Assumptions  If H0 is true, then the ANOVA model is:
 Observations on Y are yij = m + eij
independent  The same mean in all groups, or no
 Populations being sampled are factor effect.
normal Decomposition of Variation
 Populations being sampled have
equal variances
One-factor ANOVA
Group and Grand Means
 The mean of each group (group mean)

 Decision Rule:

 The overall sample mean (grand mean)

F Statistics
F  The F statistic is the ratio of the
variance due to treatments (MSA) to
the variance due to error (MSE).  Where Tc,n−c is a critical value of the
Tukey test statistic Tcalc for the desired
Partition of Deviations
level of significance.
 For a given observation yij, the following
 5% critical values of Tukey test
relationship holds:
statistics
 When F is near zero, there is little
difference among treatments and we
would not reject H0
Decision Rule: Reject H0 if F > Fα,
otherwise do not reject H0
The Tukey’s Test
 Do after rejection of equal means in
Partioned Sum of Squares
ANOVA
 Tells which population means are ANOVA Assumption
significantly different  ANOVA assumes that observations on
e.g.: μ1 = μ2 ¹ μ3 the response variable are from
Hypothesis Testing  Tukey’s studentized range test is a normally distributed populations that
multiple comparison test have the same variance.
 SSA and SSE are used to test the
 For c groups, there are c(c – 1)/2 distinct  The one-factor ANOVA test is only
hypothesis of equal means by dividing
pairs of means to be compared. slightly affected by inequality of
each sum of squares by it degrees of
 Tukey’s is a two-tailed test for equality variance when group sizes are equal.
freedom.
 These ratios are called Mean Squares
of paired means from c groups  One can test this assumption of
compared simultaneously. homogeneous variances by using
(MSA and MSE).
 The hypotheses are: Hartley’s Fmax Test.
Hartley’s Test Two-factor ANOVA SST = SSA + SSB + SSE
 Hypothesis Without Replication  SST = Total sum of squared deviations
 Two factor A and B may affect Y about the mean
 factor A has r levels, factor B has c  SSA = Between rows sum of squares
levels (effects of factor A)
 all levels of both factors occur, and each  SSB = Between columns sum of
 The test statistic is the ratio of the
cell contains one observation squares (effects of factor B)
largest sample variance to the smallest
sample variance  SSE = Error sum of squares
The decision rule: Linear Model of Two-factor
Reject H if ANOVA with Replication
0

H >H
calc critical

 Hcritical can be found in Hartley’s


test statistics, using: df1 = c, df2 =
n/c -1

Levente’s Test
 Levene’s test is a more robust
alternative to Hartley’s F test.
SST = SSA + SSB + SSI + SSE
 Levene’s test does not assume a
SST = Total sum of squared deviations about
normal distribution. the mean
 It is based on the distances of the SSA = Between rows sum of squares (effects of
observations from their sample factor A)
medians rather than their sample SSB = Between columns sum of squares
means. (effects of factor B)
Total Sum Squares can now be SSI = Interaction sum of squares (effects of AB)
split into three parts: SSE = Error sum of squares
12: Simple Linear Regression Least Squares Estimators
 Regression analysis is used to:  b0 and b1 are obtained by finding the
values of b0 and b1 that minimize the
 Predict the value of a dependent sum of the squared differences between
variable based on the value of y and ŷ
independent variable(s)
min SSE  min  ei2
 Explain the impact of changes in an
independent variable on the dependent  min  (yi ŷ i ) 2
Coefficient of Determination, R2
 min  [yi  (b 0  b1x i )]
variable 2
 The coefficient of determination is the
Linear Regression Model portion of the total variation in the
Differential calculus is used to obtain the
yi  β 0  β1x i  ε i coefficient estimators b0 and b1 that
dependent variable that is explained by
variation in the independent variable
minimize SSE  The coefficient of determination is also
Where 0 and 1 are the population
called R-squared and is denoted as R2
model coefficients and  is a random error
term. SSR SSE
R2   1
SST SST
0  R2 1

Measures of Variation

Simple Linear Regression Equation


uses two estimated parameters, b0 and Comparing Standard Errors of the Slope
b1, instead of one
 is called the standard Inference about the Slope:
se  s e2
error of t Test
the estimatethe estimate.
 se is a measure of the variation of
observed y values from the regression
line

F-Test for Significance

Correlation and R2 Inferences About the


2
 The coefficient of determination, R , for Regression Model
a simple regression is equal to the  The variance of the regression slope
simple correlation squared coefficient (b1) is estimated by
 2
R  rxy
2
s 2b1 
s e2

s e2
 (x i  x)2 (n  1)s2x
Estimation of Model
Error Variance
 An estimator for the variance of the
population model error is
n

e 2
i
SSE
σ̂ 2  s e2  i 1

n2 n2
 Division by n – 2 instead of n – 1 is
because the simple regression model

Vous aimerez peut-être aussi