Académique Documents
Professionnel Documents
Culture Documents
Stat Skills
Chi-Square Distribution, F
Distribution, ANOVA, Correlation
and Covariance
x -2 23 48 73 98
P(X=x) 0.977 0.008 0.008 0.006 0.001
x -2 23 48 73 98
Frequency 965 10 9 9 7
x Observed Expected
Frequency Frequency
-2 965 977
23 10 8
48 9 8
73 9 6
98 7 1
(𝑂−𝐸)2
𝑋2 = σ ,
where O is the observed frequency and E the
𝐸
expected frequency.
𝑋 2 = 38.272
Is this high?
To find this, we need to look at the 𝜒 2 distribution.
x Observed Expected
Frequency Frequency
-2 965 977
23 10 8
48 9 8
73 9 6
98 7 1
ν=4
2
𝜒5% 4 = 9.488. This means the critical region is X2 > 9.488.
2
𝜒1% 5 = 15.086. This means the critical region is X2 > 15.086.
X2 = 10.47
X2 = 10.46, which is less than the critical value of 15.086. It is NOT in the
critical region.
Critical
Region
𝟏𝟓. 𝟎𝟖𝟔
There is not enough evidence to reject the null hypothesis that the
distribution is Poisson.
USDA intends to update the nutritional guideline for ideal diet. 90 adults
are shown 3 different diagrams (D1, D2, D3) explaining the proper
proportion of food groups, and asked to identify the easiest to follow.
The group’s picks are given below.
D1: 23
D2: 39
D3: 28
𝜈=2
2
𝜒5% 2 = 5.991. This means the critical region is X2 > 5.991.
2
𝜒1% 4 = 13.277. This means the critical region is X2 > 13.277.
Critical
Region
Since the p-value is greater than the threshold of 1%, we accept the null hypothesis
that the Croupiers are independent.
EXPECTED
Pain Tranquilizers/ Stimulants TOTAL
ν=4
relievers Sedatives
21-34 40.94 66.58 31.49 139
35-55 29.74 48.38 22.88 101
>55 20.32 33.05 15.63 69
TOTAL 91 148 70 309
2
𝜒1% 4 = 13.277. This means the critical region is X2 > 13.277.
Critical
Region
𝟏𝟑. 𝟐𝟕𝟕
2
(𝑛 − 1)𝑠
∴ 𝜒2 =
𝜎2
σ(𝑥𝑖 −𝑥)ҧ 2
Recall 𝜒2 =
𝜎2
Data: 2.69, 2.66, 2.64, 2.59, 2.62, 2.63, 2.69, 2.66, 2.63, 2.65, 2.57, 2.63,
2.70, 2.71, 2.64, 2.65, 2.59, 2.66, 2.62, 2.57
2
𝜒0.01,19 = 36.191
The company wants to know if the variance from each sample comes from the
same population variance (population variances are equal) or from different
population variances (population variances are unequal).
𝑠12 0.11378
Ratio of sample variances, 𝐹 = = = 5.62
𝑠22 0.02023
1
𝐹0.025,9,11 = 3.5879; 𝐹0.975,9,11 = = 0.2787
𝐹0.025,9,11
3+2+1+5+3+4+5+6+7
𝑋ധ = =4
9
When there are m groups and n members in each group, the degrees of freedom are 𝒎𝒏 − 𝟏,
since we can calculate one member knowing the overall mean.
How much of this variation is coming from within the groups and how much from between the
groups?
3+2+1+5+3+4+5+6+7
𝑋ധ = =4
9
𝑇𝑜𝑡𝑎𝑙 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑠 𝐵𝑒𝑡𝑤𝑒𝑒𝑛, 𝑆𝑆𝐵 = 3(2 − 4)2 + 3(4 − 4)2 +3(6 − 4)2 = 𝟐𝟒
When there are m groups, the degrees of freedom are 𝒎 − 𝟏.
3+2+1+5+3+4+5+6+7
𝑋ധ = =4
9
Given that mean of group 3 is highest and that of group 1 lowest, can we conclude that the
pills given to group 3 had a larger impact or is it just variation within the group?
Let us have a null hypothesis that the population means of the 3 groups from which the
samples were taken have the same mean, i.e., the pills do not have an impact on the
performance in the exam. 𝜇1 = 𝜇2 = 𝜇3 . Let us also have a significance level, 𝛼 = 0.10.
What is the alternate hypothesis?
The pills have an impact on performance.
If numerator is much bigger than the denominator, it means variation between means has
bigger impact than variation within, thus rejecting the null hypothesis.
Fc, the critical F-statistic, therefore, is 3.46330. 12 is way higher than this and hence we reject
the null hypothesis. That means the pills do have an impact on the performance.
134
𝑋ധ = = 4.96 𝑀𝑊
27
𝑆𝑆𝐵 26.96
𝑑𝑓𝑆𝑆𝐵
𝐹 − 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 = = 2 = 8.99
𝑆𝑆𝑊 36
𝑑𝑓𝑆𝑆𝑊 24
SUMMARY
Groups Count Sum Average Variance
Group1 9 32 3.55556 1.027778
Group2 9 50 5.55556 2.777778
Group3 9 52 5.77778 0.694444
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 26.963 2 13.4815 8.987654 0.0012 2.5383
Within Groups 36 24 1.5
Total 62.963 26
The best place for students to learn Applied Engineering 87 http://www.insofe.edu.in
Tree Diagram Taxonomy of Inferential Techniques
Confidence Intervals Hypothesis Tests
(Estimation)
Variable
1 Independent
Two-way
ANOVA
Means
One-way
ANOVA
z for p1-p2 𝜒2
𝜒2 z for p1-p2
z z
F for 𝜎1 2 − 𝜎2 2
z t
z t Paired t-test
t for Diff
See: https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/cars.html
This method of fitting the line of best fit is called least squares
regression.
y = 3.93 x – 17.58
σ(𝑥−𝑥)(𝑦−
ҧ ത
𝑦)
Covariance 𝑠𝑥𝑦 =
𝑛−1
The best place for students to learn Applied Engineering 100 http://www.insofe.edu.in
Covariance: Speed vs Stopping distance
The best place for students to learn Applied Engineering 101 http://www.insofe.edu.in
Covariance and Correlation
σ(𝑥 − 𝑥)(𝑦
ҧ − 𝑦)
ത
𝑠𝑥𝑦 =
𝑛−1
• The value of covariance itself doesn’t say much. It only shows
whether the variables are moving together (positive value) or
opposite to each other (negative value).
• To know the strength of how the variables move together,
covariance is standardized to the dimensionless quantity,
correlation.
𝑠𝑥𝑦
𝑟=
𝑠𝑥 𝑠𝑦
The best place for students to learn Applied Engineering 102 http://www.insofe.edu.in
Correlation Coefficient
Correlation coefficient, r, is a number between -1 and 1 and tells us
how well a regression line fits the data.
𝑏𝑠𝑥
𝑟=
𝑠𝑦
The best place for students to learn Applied Engineering 104 http://www.insofe.edu.in
Correlation: Speed vs Stopping distance
y = 3.93 x – 17.58
The best place for students to learn Applied Engineering 105 http://www.insofe.edu.in
THE STORY
The best place for students to learn Applied Engineering 106 http://www.insofe.edu.in
We started by looking at various types of data
(Categorical/Numerical) and we sought methods to describe the
central tendency of the data. We discussed Mean, Median and
Mode.
The best place for students to learn Applied Engineering 107 http://www.insofe.edu.in
We moved on to basic Probability theory and described rules of
probabilities using Venn Diagrams. We talked about conditional
probabilities and that led to Bayes Theorem.
The best place for students to learn Applied Engineering 108 http://www.insofe.edu.in
Probability Distributions
From: http://blog.cloudera.com/blog/2015/12/common-probability-distributions-the-data-scientists-crib-sheet/
The best place for students to learn Applied Engineering 109 http://www.insofe.edu.in
From: http://www.math.wm.edu/~leemis/chart/UDR/UDR.html
The best place for students to learn Applied Engineering 110 http://www.insofe.edu.in
Then we saw how the Sampling Distributions of Means tend to
normal distribution irrespective of how the population is
distributed and learned how to describe populations based on
available sample data.
The best place for students to learn Applied Engineering 111 http://www.insofe.edu.in
We then studied various statistical tests to test hypotheses.
The best place for students to learn Applied Engineering 112 http://www.insofe.edu.in
HYDERABAD BENGALURU
2nd Floor, Jyothi Imperial, Vamsiram Builders, Old L77, 15th Cross Road, 3rd Main Road, Sector 6,
Mumbai Highway, Gachibowli, Hyderabad - 500 032 HSR Layout, Bengaluru – 560 102
+91-9701685511 (Individuals) +91-9502334561 (Individuals)
+91-9618483483 (Corporates) +91-9502799088 (Corporates)
Social Media
Web: http://www.insofe.edu.in
Facebook: https://www.facebook.com/insofe
Twitter: https://twitter.com/Insofeedu
YouTube: http://www.youtube.com/InsofeVideos
SlideShare: http://www.slideshare.net/INSOFE
LinkedIn: http://www.linkedin.com/company/international-school-of-engineering
This presentation may contain references to findings of various reports available in the public domain. INSOFE makes no representation as to their accuracy or that the organization
subscribes to those findings.
The best place for students to learn Applied Engineering 113 http://www.insofe.edu.in