Vous êtes sur la page 1sur 19

Ateneo de Manila University

STAT 109:
Introduction to Statistical Theory

FINAL PROJECT

Reina Krizel J. Adriano, Celina Ysabel H. Gacias


III - BS/M Applied Mathematics
major in Mathematical Finance
Submitted to
Sir Anthony Zosa
21 October 2014
1

TABLE OF CONTENTS
A. HYPOTHESIS TESTING: ONE-SAMPLE CASES
Normal Distribution: Variance Known
Test of Proportion
Median Test
Inferences Concerning 2
Inferences Concerning when 2 is Unknown

4
4
5
7
8

B. HYPOTHESIS TESTING: TWO-SAMPLE CASES


Inferences about x y from Samples with Equal 2

10

Testing in Matched-Pair Experiments


Testing Binomial Data
Test of Variances
Inferences about x y from Samples with 2 x =/ 2 y

11
12
13
14

C. REGRESSION ANALYSIS
Linear Regression

17

A. HYPOTHESIS TESTING:
ONE-SAMPLE CASES
Normal Distribution: Variance Known
Test of Proportion
Median Test
Inferences Concerning 2
Inferences Concerning when 2 is Unknown

I. Gaussian Distribution: Variance Known


Toluene can be very detrimental to a persons health as it is considered
one of the Volatile Organic Compounds (VOCs). Three Chemistry-Materials
Science Engineering students took six samples of toluene in an urban area.
The location they drew samples from was a two-storey house 250 meters away
from the roadside area in Pasay City, Manila. Using their data, we are to test if
the toluene levels in the area have become hazardous, given that toluene
g
levels are said to be dangerous at 5.6 m3 .
Sample

Toluene Concentration

9.9

15.7

6.9

5.3

11.6

12.9

Table 1. VOC concentrations in

g
m3

from urban

areas of Toluene. From Montenoso, J., Nery, A., &


Tan, N. (2014).

H o : = o = 5.6 vs. H a : < o = 5.6


Lets use = 0.05 .
Using the test statistic, we get

Z =

X
/n

10.38333333 5.6
3.854564394 / 6

= 3.039701

Testing against Z 0.05 = 1.64


Since Z > Z 0.05 , we do not reject H o . The mean of the samples is statistically
significant, and the area has some rather dangerous levels of toluene.

II. Test of Proportion


In a study of coffeeshop preference conducted by Atenean QMT109
students, 457 students from the four universities - ADMU, DLSU, UP, and UST were surveyed. 316 respondents said that Starbucks is their favorite coffee
shop, among others. Out of the 185 Ateneans surveyed, 125 out of the chosen
sample size said they would buy coffee from Starbucks too. Given the
4

following data, we are going to test whether the study is significant - whether
or not it is true that more than 50% of the total population likes Starbucks or
not, using = 0.05.
ADMU only

ADMU, DLSU, UP, UST

Students who chose


Starbucks

125

316

Total number of
correspondents

185

457

67.5676%

69.1466%

Table 2. Number of respondents from different schools in favor of Starbucks. From


Buying Behavior and Preferences of College Students towards Coffee Shops (2014).

H 0 : p = 0.50 vs. H a : p =/ 0.5


Firstly, we need to check if npo 10 or n(1 po ) 10.
457(0.675676) = 308.7837838 10
457(1-0.675676) = 148.2162162 10
Using the test statistic,

Z =

p po

po(1 p )
o
n

0.500.675676

0.675676(10.675676)
457

= 8.022520385

Testing against Z 0.025 = 1.96


Z < Z 0.025 , which means Z is in the area of rejection. Therefore, we reject H 0 ,
and we conclude that not 50% of the total population likes Starbucks.

III. Median Test


The years 1970-1985 marked the Silver Age of Comics, and it was at this
time that comic book company DC cancelled all of its superhero titles save for
those that featured Superman and Batman and began to experiment with
minor characters in other genres, such as horror, time travel, and sci-fi.
Toward the end of the Silver Age and the coming of the Bronze Age, however,
DC began to revert to publishing dominantly superhero-centric titles. Here, we
are to test whether or not the median of the total sales per year of Batman
comics is 168,164 as in 1977, total sales per year at the middle of the Silver Age.

Year

Total Paid Circulation

1970

293,897

1971

244,488

1972

185,283

1973

200,574

1974

193,223

1975

154,000

1976

178,000

1977

168,164

1978

125,421

1979

166,640

1980

129,299

1981

110,997

1982

108,234

1983

97,741

1984

89,217

1985

75,303

Table 3. Batman comic books sales


per year from 1970 to 1985.

H 0 : M edian = 168, 164 or p = 0.50 vs. H a : M edian =/ 168, 164 or p =/ 0.5

(where p is the proportion of sales counts below the median)


Lets use = 0.05 .

In this sample of data from 16 years, there were 6 years where the total
paid circulation was higher than 168,164, 9 years where it was lower than
168,164, and 1 year where it was exactly 168,164.
For the n = 16 1 = 15 years with sales differing from 168,164:

= 0.50n = (0.50)(15) = 7.5


= 0.25n = (0.25)(7.5) = 1.3693
With x = 6 as the number of plus signs, the test statistic becomes:

Z=

67,5
1.3969

= 1.0738

z /2 = z 0.025 = 1.96

Since z /2 < Z < z /2 , we do not reject H 0 . There is no evidence to show that


the median of the yearly Batman comic book sales is not 168,164.

IV. Inferences Concerning 2


The National Statistics Coordination Board (NSCB) has has been
surveying the natural resources found in the Philippines for a certain period
of time. The percentage of arable land is a potentially good indicator of
national development, given that the agricultural sector makes a significant
contribution to national income. Here, we are to test if there has been
variability in this percentage throughout the years, from 2000 to 2012 using

= 0.05 .
Year

Arable Land (% of Land Area)

2000

16.88299

2001

16.7153

2002

16.55096

2003

16.43358

2004

16.76896

2005

16.76896

2006

17.10434

2007

17.43972

2008

17.77509

2009

18.11047

2010

18.11047

2011

18.11047

2012

18.59677

Table 5. Arable land area as a percentage of


total land area in the Philippines. Data from
National Statistical Coordination Board.

H 0 : 2 = 0 vs. H a : 2 =/ 0
Using the test statistic: X

=
2

(n1)S
2

From the chi-square table, X /2, n1 =

(121)(0.525865124)
=
0.485413961
2
X 0.025, 12 = 4.404

11.91666667

2
2
Since X > X 2, n1 , we do not reject H 0 . There has been no statistically
/
significant change in the percentage of land area that is arable land from
Years 2000-2012.

V. Inferences Concerning when 2 is Unknown


The National Statistics Office (NSO) was able to gather data on the crop
production index from 1985 to 2012. The crop production index (CPI) is
another good measure of economic activity. We want to test the quality of the
indices by seeing if they match up to 100, which is said to be the optimal CPI.
63.86

68.28

66.39

65.79

67.93

75.73

74.74

76.75

77.25

79.19

79.14

84.28

83.81

75.51

81.63

84.63

87.87

90.61

93.96

97.95

100.45

101.6

108.75

113.03

112.33

110.88

113.35

117.86

Table 6. Crop Production Index from 1985 to 2012. From World Development
Indicators, National Statistics Office.

H 0 : = 100 vs. H a : =/ 100


Lets use = 0.05 .
Using the test statistic,

t=

X o
SD
nd

87.62679100
16.64756/28

= 3.93288

Comparing this with the value in the t-table, t0.025,27 = 2.0518


Since t < t0.025,27 , we do not reject H 0 . There is no evidence to suggest that
the Philippines CPI has been consistently satisfactory.

B. HYPOTHESIS TESTING:
TWO-SAMPLE CASES
Inferences about x y from Samples with Equal 2
Testing in Matched-Pair Experiments
Testing Binomial Data
Test of Variances
Inferences about x y from Samples with 2 x =/ 2 y

I. Inferences about x y from Samples with Equal 2


Along with the Philippine Health Statistics (PHS), and the Department
of Health (DOH), NSCB conducted a survey for the Child Health Index per
region on 2003 and 2006. The infant mortality rate is determined by the
number of infant deaths occurring before reaching 12 months of life in a
given period per 1,000 live births.
Region

2003 Infant Mortality Rate

2006 Infant Mortality Rate

NCR

19.7

18.7

CAR

8.7

9.2

Reg I

16.2

15.7

Reg II

9.9

8.3

Reg III

10.9

10.6

Reg IVA

15.2

14.5

Reg IVB

14.3

13.9

Reg V

12.9

11.9

Reg VI

15.2

13.4

Reg VII

12

12

Reg VIII

15.3

13

Reg IX

10

11.5

Reg X

10.4

9.8

Reg XI

8.3

7.4

Reg XII

9.2

7.8

CARAGA

9.3

7.4

ARMM

4.7

4.4

Table 7. Infant Mortality Rate per region for Years 2003 and 2006. From
NSCB website.

H o : x = y vs. H a : x =/ y
Lets use = 0.05 .
Using the test statistic: t =
where

sp =

(n1)S x +(m1)S y
n+m2

XY
Sp

1 1
n+m

11.8941176511.14705882
3.6431947032/17

(171)13.71183824 + (171)12.83389706
17 + 17 2

,
= 0.597835748

= 3.643194703
10

Testing against t/2,n+m2 = t0.025,32 = 2.0370


Since t0.025,32 < t < t0.025,32, we do not reject H o . The means of both mortality
rates are the same. It is evident that the Philippines has not been progressing
in terms of the provision of better infant and newborn healthcare.

II. Testing in Matched-Pair Experiments


The National Statistics Coordination Board conducted a survey of the
Enrolment Rates in Elementary and Secondary Education last 2006.
Elementary rates came from the school age population adopted by the
Deparment of Education (DepEd) covered those aged 6-11 years while the
secondary rates came from those aged 12-15 years. We are to test if there is a
significant change between the two sample rates.
Region

Elementary

Secondary

Difference

NCR

73.2

55.7

17.5

CAR

72.7

39.5

33.2

Reg I

76.5

52.5

24

Reg II

74

46.2

27.8

Reg III

78.8

49.9

28.9

Reg IVA

79.1

51.7

27.4

Reg IVB

80.8

47.4

33.4

Reg V

81.2

45.9

35.3

Reg VI

71.7

45.2

26.5

Reg VII

73.4

39.7

33.7

Reg VIII

76.4

42.6

33.8

Reg IX

75.7

40.9

34.8

Reg X

74.5

37.3

37.2

Reg XI

70.7

38.3

32.4

Reg XII

72.2

39.6

32.6

CARAGA

75.8

40.6

35.2

ARMM

84.8

28.7

56.1

Table 8. Enrolment rates in elementary and secondary education in the


Philippines in 2006 and the differences between the rates. Source: Basic
Education Information System, DepEd.

11

H 0 : D = 0 vs. H a : D =/ 0
dd

0
Using the test statistic: T = S
= 32.341176470 = 16.88567
n
/
D D
7.896997754 / 17
Testing against t/2,n1 = t0.025,15 = 2.1315

Since T > t0.025,15, we reject H 0 . The difference between elementary and


secondary enrolment rates is statistically significant, indicating that there
was a considerable number of out-of-school youth in 2006.

III. Testing Binomial Data


The expected (optimistic) demand for customers on both peak days and
non-peak days were derived by students conducting a research for The
Zeitgeist, a proposal of a bar in the Metro. Using an interval of 95%, they were
able to obtain the following data from firsthand observations of bar activities,
as well as interviews with various establishments in the Bonifacio Global City
area, such as Cantinetta, Draft and Barcino.
Peak Day

Non-Peak Day

Average Number of
Visitors who
Purchase
Drinks/Food

261

108

Average Number of
Visitors in Total

300

120

0.87

0.90

Table 9.Expected (optimistic) forecast for customers of the The


Zeitgeist on peak days and non-peak days. From Calo, R.P.,
Ortega, A.L., Sitosta, A.T., Suarez, M.C., Yu, C.D. (2014). The
Zeitgeist. p.12-13.

We want to test whether the variances of the two samples of data are
equal or the variance of the peak day is greater than that of the non-peak day.

H 0 : px = py vs. H a : px > py
Lets use = 0.05 .

12

Test statistic:

Z =
=

x y
n m
pe(1pe)
pe(1 pe)
+ m
n

0.03
0.00035453+0.000886325
x+y

where pe = n + m =
Testing against z =

261 108
300 120

0.879(10.879) 0..879(10.879)
+
300
120

= 0.850352277 ,

261 + 108
300 + 120

= 0.879

z 0.05 = 1.64

Since Z < z 0.05 , we reject H 0 . The variance of the influx of customers on a


peak day is greater than that of a non-peak day.

IV. Test of Variances


With regards to labor statistics, separation and accession rates are
adequate indicators of labor market activity. While separation is
employer-initiated, usually through layoffs or resignations, accession is
caused by replacement of workers due to contraction and expansion of
business firms. The Employment and Manpower Statistics Division of the
Department of Labor and Employment (DOLE) conducted a survey based on
the different separation and accession rates of labor turnover in the National
Capital Region.
Year - Quarter

Separation Rate

Accession Rate

2014 - Q1

10.06

9.47

2013 - Q1

7.49

7.51

2013 - Q2

8.77

6.28

2013 - Q3

8.37

5.99

2013 - Q4

8.86

5.64

2012 - Q1

8.10

7.47

2012 - Q2

8.93

8.08

2012 - Q3

8.43

6.08

2012 - Q4

8.14

5.67

Sample Variance, S 2

0.513194444

1.70985

Table 10. Labor Turnover Rates by Year and Quarter from 2012-2014
From LabStat Updates, Vol. 18, No. 21,
https://www.psa.gov.ph/sites/default/files/vol18_21.pdf

13

We want to test for the homogeneity of variance between the accession


and separation rates using = 0.05 .
Hence,

H0 :

2
x

= 2 y vs: H a :

2
x

Using the test statistic: f =

=/ 2 y
2

S y
2

S x

1.70985
0.513194444

= 3.331778078

F /2,n1,m1 = F 0.025,8,8 = 0.226


F 1/2,n1,m1 = F 0.975,8,8 = 4.43
Comparing the F-test values obtained from the distribution table with
the test statistic, we get F /2,n1,m1 f F 1/2,n1,m1 Since
.
f is in between the
confidence interval, we reject H 0 . We cannot assume homogeneity of
variance between the accession and separation rates.

V. Inferences about x y from Samples with 2 x =/ 2 y


Three Materials Science Engineering graduates took samples of Volatile
Organic Compounds (VOCs) in different concentrations. VOCs are usually
pollutants in the form of toluene, xylene, and ethylbenzene. They tried
gathering samples of n-Heptane where there are known to be high amounts of
these in Metro Manila. We are to test whether or not the difference in the
amounts of n-Heptanes in the samples they drew from the roadside at the
MRT train station in EDSA, Pasay City, and the Luneta park is statistically
significant.
From roadside

2.5

2.5

From park

0.9

0.4

Table 11. VOC concentrations in

2.8

5.9

2.5

2.8

0.4

0.3

0.1

0.3

g
m3

from urban areas of n-Heptane

From Montenoso, J., Nery, A., & Tan, N. (2014).

First, we are to determine if we may assume homogeneity of variance for the


two samples.

H 0 : 2 x = 2 y vs. H a : 2 x =/ 2 y
14

Using the test statistic,

f =

S y
2

S x

(0.268)

(1.347)

= 0.03968

F /2,m1,n1 = F 0.025,5,5 = 0.140


Comparing this with the value in the f-test table, we get f < F 0.025,5,5. Therefore,
we reject H 0 . We cannot assume homogeneity of variance for the two
samples.
Thus, we proceed to conducting the t-test for two samples with unknown
means but with unequal variance:

H 0 : x = y vs. H a : x =/ y
Test statistic: T =

XY

S2
n

S2

=
y

2.767
(1.347)2 (0.268)2
+ 6
6

s2 y 2
)
n
s2 y 2
s2 x 2
( n )
(
)
+ mm1
n1

= 4.9350

Degrees of freedom: df = [

( smx +

0.09883
] = [ 0.01832
]=5

Testing against t/2,df = t0.025,5 = 2.5706


Comparing the test statistic with the value obtained from the t-test
table, we get T > t 2,df . Therefore, we reject H 0 . There is a statistically

significant difference in the VOC concentrations observed at roadsides and at


parks.

15

C. REGRESSION ANALYSIS
Linear Regression

16

I. Linear Regression
Population in largest city refers to the urban population of a country's
largest metropolitan area, which, in the case of the Philippines, is Metro
Manila. Detailed below are the number of years from 1990 matched with the
population in largest city. We are to model the data into a fitted regression
line of the form y = a + bx in order to estimate future populations in the next
few years.
Year
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013

Years away from 1990


(base year)
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

Population in Largest City


7972799
8239973
8516488
8801482
9096427
9401255
9537615
9638964
9741532
9845191
9961912
10139607
10320724
10505075
10692980
10883716
11078124
11276004
11477699
11682432
11891107
12103509
12319706
12539764

Table 12 . The Philippines population in largest city for each year since
1990 until 2013. Data from the National Statistics Office (NSO) Website.

17

Obtaining the datas sufficient statistics:


n

xi = 276

i=1

y i = 240484085

i=1

xi y i = 3059784486

x2 i = 4324

i=1

y2i = 2.595013313278310E + 15

i=1

i=1

From the form y = a + bx , we compute for the slope and the x-intercept:
n

b=

i=1
n

i=1

n xi y i ( xi )( y i )
i=1

n x 2 i ( x i )
i=1

(24)(3059784486)(276)(240484085)
2
(24)(4324)(276)

= 255841.3117

i=1

a = y bx = 10020170.21 (255841.3117)(11.5) = 7077995.123


Thus, the fitted regression line for the given data on the Philippines
population in largest city would be: y = 7077995.123 + 255841.3117x .
If we want to compute for r, we use Pearsons formula:
n

r =

n xi yi ( xi )( yi )
n

i=1

i=1

i=1

(n x2 i ( xi ) )(n y2 i ( yi ) )
i=1

i=1

i=1

(24)(3059784486)(276)(240484085)
2
2
((24)(4324)(276) )((24)(2.595013313278310E+15)(240484085) )

i=1

18

References
Acosta, J.I., Balolong, R.T., Chiong, M.M., et al. (2014). Buying Behavior and
Preferences of College Students towards Coffee Shops.
Chan, Z.E., Decangchon, F.M., Duque, M.W., et al. (2014). A Statistical Study of
the Ice Cream Market in Selected High Schools and Colleges in Metro
Manila.
Calo, R.P., Ortega, A.L., Sitosta, A.T., Suarez, M.C., Yu, C.D. (2014). The Zeitgeist.
p.12-13.
Cruz, K., Discar, G., Padaen, R., Que, L. & Yu, M. (2014). Consumer Behavior in
the Philippine Shampoo Industry.
Monteroso, J., Andrea, N., & Tan, N. (2014). Volatile Organic Compounds in
Urban and Industrial Areas in the Philippines.
Batman
Annual
Sales
Figures.
Comichron.net.
Retrieved
http://www.comichron.com/titlespotlights/batman.html

from

Child Health Indicators. National Statistical Coordination Board. Retrieved


from http://www.nscb.gov.ph/stattables/.
Enrolment Rates. National Statistical Coordination Board. Retrieved from
http://www.nscb.gov.ph/stattables/.
Labor Turnover Statistics. LabStat Updates, Vol. 18, No. 21. Retrieved from
https://www.psa.gov.ph/sites/default/files/vol18_21.pdf
World Development Indicators. National Statistics Office. Retrieved from
http://web0.psa.gov.ph/statistics/quickstat

19

Vous aimerez peut-être aussi