Stat Project

Ateneo de Manila University
STAT 109:
Introduction to Statistical Theory
FINAL PROJECT
Reina Krizel J. Adriano, Celina Ysabel H. Gacias

III - BS/M Applied Mathematics
major in Mathematical Finance
Submitted to
Sir Anthony Zosa
21 October 2014
1
TABLE OF CONTENTS
A. HYPOTHESIS TESTING: ONE-SAMPLE CASES
Normal Distribution: Variance Known
Test of Proportion
Median Test
Inferences Concerning 2
Inferences Concerning when 2 is Unknown
4
4
5
7
8
B. HYPOTHESIS TESTING: TWO-SAMPLE CASES

Inferences about x y from Samples with Equal 2
10
Testing in Matched-Pair Experiments

Testing Binomial Data
Test of Variances
Inferences about x y from Samples with 2 x =/ 2 y
11
12
13
14
C. REGRESSION ANALYSIS
Linear Regression
17
A. HYPOTHESIS TESTING:
ONE-SAMPLE CASES
Normal Distribution: Variance Known
Test of Proportion
Median Test
Inferences Concerning 2
Inferences Concerning when 2 is Unknown
I. Gaussian Distribution: Variance Known

Toluene can be very detrimental to a persons health as it is considered
one of the Volatile Organic Compounds (VOCs). Three Chemistry-Materials
Science Engineering students took six samples of toluene in an urban area.
The location they drew samples from was a two-storey house 250 meters away
from the roadside area in Pasay City, Manila. Using their data, we are to test if
the toluene levels in the area have become hazardous, given that toluene
g
levels are said to be dangerous at 5.6 m3 .
Sample
Toluene Concentration
9.9
15.7
6.9
5.3
11.6
12.9
Table 1. VOC concentrations in
g
m3
from urban
areas of Toluene. From Montenoso, J., Nery, A., &

Tan, N. (2014).
H o : = o = 5.6 vs. H a : < o = 5.6

Lets use = 0.05 .
Using the test statistic, we get
Z =
X
/n
10.38333333 5.6
3.854564394 / 6
= 3.039701
Testing against Z 0.05 = 1.64

Since Z > Z 0.05 , we do not reject H o . The mean of the samples is statistically
significant, and the area has some rather dangerous levels of toluene.
II. Test of Proportion

In a study of coffeeshop preference conducted by Atenean QMT109
students, 457 students from the four universities - ADMU, DLSU, UP, and UST were surveyed. 316 respondents said that Starbucks is their favorite coffee
shop, among others. Out of the 185 Ateneans surveyed, 125 out of the chosen
sample size said they would buy coffee from Starbucks too. Given the
4
following data, we are going to test whether the study is significant - whether
or not it is true that more than 50% of the total population likes Starbucks or
not, using = 0.05.
ADMU only
ADMU, DLSU, UP, UST
Students who chose

Starbucks
125
316
Total number of
correspondents
185
457
67.5676%
69.1466%
Table 2. Number of respondents from different schools in favor of Starbucks. From

Buying Behavior and Preferences of College Students towards Coffee Shops (2014).
H 0 : p = 0.50 vs. H a : p =/ 0.5

Firstly, we need to check if npo 10 or n(1 po ) 10.
457(0.675676) = 308.7837838 10
457(1-0.675676) = 148.2162162 10
Using the test statistic,
Z =
p po
po(1 p )
o
n
0.500.675676
0.675676(10.675676)
457
= 8.022520385
Testing against Z 0.025 = 1.96

Z < Z 0.025 , which means Z is in the area of rejection. Therefore, we reject H 0 ,
and we conclude that not 50% of the total population likes Starbucks.
III. Median Test

The years 1970-1985 marked the Silver Age of Comics, and it was at this
time that comic book company DC cancelled all of its superhero titles save for
those that featured Superman and Batman and began to experiment with
minor characters in other genres, such as horror, time travel, and sci-fi.
Toward the end of the Silver Age and the coming of the Bronze Age, however,
DC began to revert to publishing dominantly superhero-centric titles. Here, we
are to test whether or not the median of the total sales per year of Batman
comics is 168,164 as in 1977, total sales per year at the middle of the Silver Age.
Year
Total Paid Circulation
1970
293,897
1971
244,488
1972
185,283
1973
200,574
1974
193,223
1975
154,000
1976
178,000
1977
168,164
1978
125,421
1979
166,640
1980
129,299
1981
110,997
1982
108,234
1983
97,741
1984
89,217
1985
75,303
Table 3. Batman comic books sales

per year from 1970 to 1985.
H 0 : M edian = 168, 164 or p = 0.50 vs. H a : M edian =/ 168, 164 or p =/ 0.5
(where p is the proportion of sales counts below the median)

Lets use = 0.05 .
In this sample of data from 16 years, there were 6 years where the total
paid circulation was higher than 168,164, 9 years where it was lower than
168,164, and 1 year where it was exactly 168,164.
For the n = 16 1 = 15 years with sales differing from 168,164:
= 0.50n = (0.50)(15) = 7.5

= 0.25n = (0.25)(7.5) = 1.3693
With x = 6 as the number of plus signs, the test statistic becomes:
Z=
67,5
1.3969
= 1.0738
z /2 = z 0.025 = 1.96
Since z /2 < Z < z /2 , we do not reject H 0 . There is no evidence to show that

the median of the yearly Batman comic book sales is not 168,164.
IV. Inferences Concerning 2

The National Statistics Coordination Board (NSCB) has has been
surveying the natural resources found in the Philippines for a certain period
of time. The percentage of arable land is a potentially good indicator of
national development, given that the agricultural sector makes a significant
contribution to national income. Here, we are to test if there has been
variability in this percentage throughout the years, from 2000 to 2012 using
= 0.05 .
Year
Arable Land (% of Land Area)
2000
16.88299
2001
16.7153
2002
16.55096
2003
16.43358
2004
16.76896
2005
16.76896
2006
17.10434
2007
17.43972
2008
17.77509
2009
18.11047
2010
18.11047
2011
18.11047
2012
18.59677
Table 5. Arable land area as a percentage of

total land area in the Philippines. Data from
National Statistical Coordination Board.
H 0 : 2 = 0 vs. H a : 2 =/ 0
Using the test statistic: X
=
2
(n1)S
2
From the chi-square table, X /2, n1 =
(121)(0.525865124)
=
0.485413961
2
X 0.025, 12 = 4.404
11.91666667
2
2
Since X > X 2, n1 , we do not reject H 0 . There has been no statistically
/
significant change in the percentage of land area that is arable land from
Years 2000-2012.
V. Inferences Concerning when 2 is Unknown

The National Statistics Office (NSO) was able to gather data on the crop
production index from 1985 to 2012. The crop production index (CPI) is
another good measure of economic activity. We want to test the quality of the
indices by seeing if they match up to 100, which is said to be the optimal CPI.
63.86
68.28
66.39
65.79
67.93
75.73
74.74
76.75
77.25
79.19
79.14
84.28
83.81
75.51
81.63
84.63
87.87
90.61
93.96
97.95
100.45
101.6
108.75
113.03
112.33
110.88
113.35
117.86
Table 6. Crop Production Index from 1985 to 2012. From World Development
Indicators, National Statistics Office.
H 0 : = 100 vs. H a : =/ 100

Lets use = 0.05 .
t=
X o
SD
nd
87.62679100
16.64756/28
= 3.93288
Comparing this with the value in the t-table, t0.025,27 = 2.0518

Since t < t0.025,27 , we do not reject H 0 . There is no evidence to suggest that
the Philippines CPI has been consistently satisfactory.
B. HYPOTHESIS TESTING:
TWO-SAMPLE CASES
Inferences about x y from Samples with Equal 2
Testing in Matched-Pair Experiments
Testing Binomial Data
Test of Variances
Inferences about x y from Samples with 2 x =/ 2 y
I. Inferences about x y from Samples with Equal 2

Along with the Philippine Health Statistics (PHS), and the Department
of Health (DOH), NSCB conducted a survey for the Child Health Index per
region on 2003 and 2006. The infant mortality rate is determined by the
number of infant deaths occurring before reaching 12 months of life in a
given period per 1,000 live births.
Region
2003 Infant Mortality Rate
2006 Infant Mortality Rate
NCR
19.7
18.7
CAR
8.7
9.2
Reg I
16.2
15.7
Reg II
9.9
8.3
Reg III
10.9
10.6
Reg IVA
15.2
14.5
Reg IVB
14.3
13.9
Reg V
12.9
11.9
Reg VI
15.2
13.4
Reg VII
12
12
Reg VIII
15.3
13
Reg IX
10
11.5
Reg X
10.4
9.8
Reg XI
8.3
7.4
Reg XII
9.2
7.8
CARAGA
9.3
7.4
ARMM
4.7
4.4
Table 7. Infant Mortality Rate per region for Years 2003 and 2006. From
NSCB website.
H o : x = y vs. H a : x =/ y
Lets use = 0.05 .
Using the test statistic: t =
where
sp =
(n1)S x +(m1)S y
n+m2
XY
Sp
1 1
n+m
11.8941176511.14705882
3.6431947032/17
(171)13.71183824 + (171)12.83389706
17 + 17 2
,
= 0.597835748
= 3.643194703
10
Testing against t/2,n+m2 = t0.025,32 = 2.0370

Since t0.025,32 < t < t0.025,32, we do not reject H o . The means of both mortality
rates are the same. It is evident that the Philippines has not been progressing
in terms of the provision of better infant and newborn healthcare.
II. Testing in Matched-Pair Experiments

The National Statistics Coordination Board conducted a survey of the
Enrolment Rates in Elementary and Secondary Education last 2006.
Elementary rates came from the school age population adopted by the
Deparment of Education (DepEd) covered those aged 6-11 years while the
secondary rates came from those aged 12-15 years. We are to test if there is a
significant change between the two sample rates.
Region
Elementary
Secondary
Difference
NCR
73.2
55.7
17.5
CAR
72.7
39.5
33.2
Reg I
76.5
52.5
24
Reg II
74
46.2
27.8
Reg III
78.8
49.9
28.9
Reg IVA
79.1
51.7
27.4
Reg IVB
80.8
47.4
33.4
Reg V
81.2
45.9
35.3
Reg VI
71.7
45.2
26.5
Reg VII
73.4
39.7
33.7
Reg VIII
76.4
42.6
33.8
Reg IX
75.7
40.9
34.8
Reg X
74.5
37.3
37.2
Reg XI
70.7
38.3
32.4
Reg XII
72.2
39.6
32.6
CARAGA
75.8
40.6
35.2
ARMM
84.8
28.7
56.1
Table 8. Enrolment rates in elementary and secondary education in the

Philippines in 2006 and the differences between the rates. Source: Basic
Education Information System, DepEd.
11
H 0 : D = 0 vs. H a : D =/ 0
dd
0
Using the test statistic: T = S
= 32.341176470 = 16.88567
n
/
D D
7.896997754 / 17
Testing against t/2,n1 = t0.025,15 = 2.1315
Since T > t0.025,15, we reject H 0 . The difference between elementary and

secondary enrolment rates is statistically significant, indicating that there
was a considerable number of out-of-school youth in 2006.
III. Testing Binomial Data

The expected (optimistic) demand for customers on both peak days and
non-peak days were derived by students conducting a research for The
Zeitgeist, a proposal of a bar in the Metro. Using an interval of 95%, they were
able to obtain the following data from firsthand observations of bar activities,
as well as interviews with various establishments in the Bonifacio Global City
area, such as Cantinetta, Draft and Barcino.
Peak Day
Non-Peak Day
Average Number of
Visitors who
Purchase
Drinks/Food
261
108
Average Number of
Visitors in Total
300
120
0.87
0.90
Table 9.Expected (optimistic) forecast for customers of the The

Zeitgeist on peak days and non-peak days. From Calo, R.P.,
Ortega, A.L., Sitosta, A.T., Suarez, M.C., Yu, C.D. (2014). The
Zeitgeist. p.12-13.
We want to test whether the variances of the two samples of data are
equal or the variance of the peak day is greater than that of the non-peak day.
H 0 : px = py vs. H a : px > py
Lets use = 0.05 .
12
Test statistic:
Z =
=
x y
n m
pe(1pe)
pe(1 pe)
+ m
n
0.03
0.00035453+0.000886325
x+y
where pe = n + m =
Testing against z =
261 108
300 120
0.879(10.879) 0..879(10.879)
+
300
120
= 0.850352277 ,
261 + 108
300 + 120
= 0.879
z 0.05 = 1.64
Since Z < z 0.05 , we reject H 0 . The variance of the influx of customers on a

peak day is greater than that of a non-peak day.
IV. Test of Variances

With regards to labor statistics, separation and accession rates are
adequate indicators of labor market activity. While separation is
employer-initiated, usually through layoffs or resignations, accession is
caused by replacement of workers due to contraction and expansion of
business firms. The Employment and Manpower Statistics Division of the
Department of Labor and Employment (DOLE) conducted a survey based on
the different separation and accession rates of labor turnover in the National
Capital Region.
Year - Quarter
Separation Rate
Accession Rate
2014 - Q1
10.06
9.47
2013 - Q1
7.49
7.51
2013 - Q2
8.77
6.28
2013 - Q3
8.37
5.99
2013 - Q4
8.86
5.64
2012 - Q1
8.10
7.47
2012 - Q2
8.93
8.08
2012 - Q3
8.43
6.08
2012 - Q4
8.14
5.67
Sample Variance, S 2
0.513194444
1.70985
Table 10. Labor Turnover Rates by Year and Quarter from 2012-2014
From LabStat Updates, Vol. 18, No. 21,
https://www.psa.gov.ph/sites/default/files/vol18_21.pdf
13
We want to test for the homogeneity of variance between the accession

and separation rates using = 0.05 .
Hence,
H0 :
2
x
= 2 y vs: H a :
2
x
Using the test statistic: f =
=/ 2 y
2
S y
2
S x
1.70985
0.513194444
= 3.331778078
F /2,n1,m1 = F 0.025,8,8 = 0.226

F 1/2,n1,m1 = F 0.975,8,8 = 4.43
Comparing the F-test values obtained from the distribution table with
the test statistic, we get F /2,n1,m1 f F 1/2,n1,m1 Since
.
f is in between the
confidence interval, we reject H 0 . We cannot assume homogeneity of
variance between the accession and separation rates.
V. Inferences about x y from Samples with 2 x =/ 2 y

Three Materials Science Engineering graduates took samples of Volatile
Organic Compounds (VOCs) in different concentrations. VOCs are usually
pollutants in the form of toluene, xylene, and ethylbenzene. They tried
gathering samples of n-Heptane where there are known to be high amounts of
these in Metro Manila. We are to test whether or not the difference in the
amounts of n-Heptanes in the samples they drew from the roadside at the
MRT train station in EDSA, Pasay City, and the Luneta park is statistically
significant.
From roadside
2.5
2.5
From park
0.9
0.4
Table 11. VOC concentrations in
2.8
5.9
2.5
2.8
0.4
0.3
0.1
0.3
g
m3
from urban areas of n-Heptane
From Montenoso, J., Nery, A., & Tan, N. (2014).
First, we are to determine if we may assume homogeneity of variance for the

two samples.
H 0 : 2 x = 2 y vs. H a : 2 x =/ 2 y
14
f =
S y
2
S x
(0.268)
(1.347)
= 0.03968
F /2,m1,n1 = F 0.025,5,5 = 0.140

Comparing this with the value in the f-test table, we get f < F 0.025,5,5. Therefore,
we reject H 0 . We cannot assume homogeneity of variance for the two
samples.
Thus, we proceed to conducting the t-test for two samples with unknown
means but with unequal variance:
H 0 : x = y vs. H a : x =/ y
Test statistic: T =
XY
S2
n
S2
=
y
2.767
(1.347)2 (0.268)2
+ 6
6
s2 y 2
)
n
s2 y 2
s2 x 2
( n )
(
)
+ mm1
n1
= 4.9350
Degrees of freedom: df = [
( smx +
0.09883
] = [ 0.01832
]=5
Testing against t/2,df = t0.025,5 = 2.5706

Comparing the test statistic with the value obtained from the t-test
table, we get T > t 2,df . Therefore, we reject H 0 . There is a statistically
significant difference in the VOC concentrations observed at roadsides and at

parks.
15
C. REGRESSION ANALYSIS
Linear Regression
16
I. Linear Regression
Population in largest city refers to the urban population of a country's
largest metropolitan area, which, in the case of the Philippines, is Metro
Manila. Detailed below are the number of years from 1990 matched with the
population in largest city. We are to model the data into a fitted regression
line of the form y = a + bx in order to estimate future populations in the next
few years.
Year
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
Years away from 1990

(base year)
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Population in Largest City

7972799
8239973
8516488
8801482
9096427
9401255
9537615
9638964
9741532
9845191
9961912
10139607
10320724
10505075
10692980
10883716
11078124
11276004
11477699
11682432
11891107
12103509
12319706
12539764
Table 12 . The Philippines population in largest city for each year since
1990 until 2013. Data from the National Statistics Office (NSO) Website.
17
Obtaining the datas sufficient statistics:

n
xi = 276
i=1
y i = 240484085
i=1
xi y i = 3059784486
x2 i = 4324
i=1
y2i = 2.595013313278310E + 15
i=1
i=1
From the form y = a + bx , we compute for the slope and the x-intercept:
n
b=
i=1
n
i=1
n xi y i ( xi )( y i )
i=1
n x 2 i ( x i )
i=1
(24)(3059784486)(276)(240484085)
2
(24)(4324)(276)
= 255841.3117
i=1
a = y bx = 10020170.21 (255841.3117)(11.5) = 7077995.123

Thus, the fitted regression line for the given data on the Philippines
population in largest city would be: y = 7077995.123 + 255841.3117x .
If we want to compute for r, we use Pearsons formula:
n
r =
n xi yi ( xi )( yi )
n
i=1
i=1
i=1
(n x2 i ( xi ) )(n y2 i ( yi ) )
i=1
i=1
i=1
(24)(3059784486)(276)(240484085)
2
2
((24)(4324)(276) )((24)(2.595013313278310E+15)(240484085) )
i=1
18
References
Acosta, J.I., Balolong, R.T., Chiong, M.M., et al. (2014). Buying Behavior and
Preferences of College Students towards Coffee Shops.
Chan, Z.E., Decangchon, F.M., Duque, M.W., et al. (2014). A Statistical Study of
the Ice Cream Market in Selected High Schools and Colleges in Metro
Manila.
Calo, R.P., Ortega, A.L., Sitosta, A.T., Suarez, M.C., Yu, C.D. (2014). The Zeitgeist.
p.12-13.
Cruz, K., Discar, G., Padaen, R., Que, L. & Yu, M. (2014). Consumer Behavior in
the Philippine Shampoo Industry.
Monteroso, J., Andrea, N., & Tan, N. (2014). Volatile Organic Compounds in
Urban and Industrial Areas in the Philippines.
Batman
Annual
Sales
Figures.
Comichron.net.
Retrieved
http://www.comichron.com/titlespotlights/batman.html
from
Child Health Indicators. National Statistical Coordination Board. Retrieved

from http://www.nscb.gov.ph/stattables/.
Enrolment Rates. National Statistical Coordination Board. Retrieved from
http://www.nscb.gov.ph/stattables/.
Labor Turnover Statistics. LabStat Updates, Vol. 18, No. 21. Retrieved from
https://www.psa.gov.ph/sites/default/files/vol18_21.pdf
World Development Indicators. National Statistics Office. Retrieved from
http://web0.psa.gov.ph/statistics/quickstat
19

Stat Project

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Stat Project

Transféré par

Droits d'auteur :

Formats disponibles

Ateneo de Manila University

Reina Krizel J. Adriano, Celina Ysabel H. Gacias

B. HYPOTHESIS TESTING: TWO-SAMPLE CASES

Testing in Matched-Pair Experiments

I. Gaussian Distribution: Variance Known

Table 1. VOC concentrations in

areas of Toluene. From Montenoso, J., Nery, A., &

H o : = o = 5.6 vs. H a : < o = 5.6

Testing against Z 0.05 = 1.64

II. Test of Proportion

ADMU, DLSU, UP, UST

Students who chose

Table 2. Number of respondents from different schools in favor of Starbucks. From

H 0 : p = 0.50 vs. H a : p =/ 0.5

Testing against Z 0.025 = 1.96

III. Median Test

Total Paid Circulation

Table 3. Batman comic books sales

H 0 : M edian = 168, 164 or p = 0.50 vs. H a : M edian =/ 168, 164 or p =/ 0.5

(where p is the proportion of sales counts below the median)

= 0.50n = (0.50)(15) = 7.5

Since z /2 < Z < z /2 , we do not reject H 0 . There is no evidence to show that

IV. Inferences Concerning 2

Arable Land (% of Land Area)

Table 5. Arable land area as a percentage of

From the chi-square table, X /2, n1 =

V. Inferences Concerning when 2 is Unknown

H 0 : = 100 vs. H a : =/ 100

Comparing this with the value in the t-table, t0.025,27 = 2.0518

I. Inferences about x y from Samples with Equal 2

2003 Infant Mortality Rate

2006 Infant Mortality Rate

Testing against t/2,n+m2 = t0.025,32 = 2.0370

II. Testing in Matched-Pair Experiments

Table 8. Enrolment rates in elementary and secondary education in the

Since T > t0.025,15, we reject H 0 . The difference between elementary and

III. Testing Binomial Data

Table 9.Expected (optimistic) forecast for customers of the The

Since Z < z 0.05 , we reject H 0 . The variance of the influx of customers on a

IV. Test of Variances

We want to test for the homogeneity of variance between the accession

Using the test statistic: f =

F /2,n1,m1 = F 0.025,8,8 = 0.226

V. Inferences about x y from Samples with 2 x =/ 2 y

Table 11. VOC concentrations in

from urban areas of n-Heptane

From Montenoso, J., Nery, A., & Tan, N. (2014).

First, we are to determine if we may assume homogeneity of variance for the

Using the test statistic,

F /2,m1,n1 = F 0.025,5,5 = 0.140

Testing against t/2,df = t0.025,5 = 2.5706

significant difference in the VOC concentrations observed at roadsides and at

Years away from 1990

Population in Largest City

Obtaining the datas sufficient statistics:

a = y bx = 10020170.21 (255841.3117)(11.5) = 7077995.123

Child Health Indicators. National Statistical Coordination Board. Retrieved

Vous aimerez peut-être aussi