Vous êtes sur la page 1sur 26

CvSU Mission

CvSU Vision Republic of the Philippines Cavite State University shall provide excellent,
The premier university in
CAVITE STATE UNIVERSITY
equitable and relevant educational opportunities in
historic Cavite recognized for the arts, science and technology through quality
excellence in the development of Cavite City Campus instruction and relevant research and development
activities.
globally competitive and morally
Brgy. 8, Pulo II, Dalahican, Cavite City It shall produce professional, skilled and
upright individuals. morally upright individuals for global competitiveness.

CHAPTER 4
DATA MANAGEMENT
Objectives:
After the completion of the chapter, students should be able to:
 Use variety of statistical tools to process and manage numerical data;
 Use methods of linear regression and correlations to predict the value of a variable given certain
conditions; and
 advocate the use of statistical data in making important decisions.

EVALUATION REQUIREMENTS:
 Problem Sets and Exercises
 Quiz
 Quantitative Research Proposal (FINAL PROJECT)
SAMPLE: You want the university to offer an online enrolment system to improve the enrolment
process. CSG asks your team to present hard data that will convince the administration. Prepare a
proposal on how you will do this task.

Statistical tools derived from mathematics are useful in processing and managing numerical data
in order to describe a phenomenon and predict values.

4.1 BASIC CONCEPTS AND TERMS


DEFINITION OF STATISTICS
It is a branch of science which deals with the collection, presentation, analysis and interpretation of data.

NATURE OF STATISTICS
General Uses of Statistics
a. Statistics aids in decision making
 provides comparison
 explains action that has taken place
 justifies a claim or assertion
 predicts future outcome
 estimates unknown quantities
b. Statistics summarizes data for public use

FIELDS OF STATISTICS
a. Statistical Methods of Applied Statistics – refers to procedures and techniques used in the
collection, presentation, analysis and interpretation of data.
 Descriptive statistics
- methods concerned with the collection, description and analysis of a set of data
without drawing conclusions or inferences about a larger set.
- the main concern is simply describe the set of data.
 Inferential Statistics
- methods concerned with making predictions or inferences about a larger set of data
using only the information gathered from a subset of this larger set.
- the main is not merely to describe but actually predict and make inferences based
on the information gathered.
2

b. Statistical Theory of Mathematical Statistics – deals with the development and exposition of
theories that serve as bases of statistical methods.

POPULATION AND SAMPLE


 A population is a collection of all the elements under consideration in a statistical study.
 A sample is a part or subset of the population from which the information is collected.
 A parameter is numerical characteristic of a population.
 A statistic is a numerical characteristic of the sample.

Steps in Statistical Inquiry


1. Define the problem.
2. Formulate the research design.
3. Collect data.
4. Code and analyzed the collected data.
5. Interpret the results.

VARIABLE AND MEASUREMENT


 A variable is a characteristic or attribute of persons or objects which can assume different values or
labels for different persons or objects under consideration.
 Measurement is the process of determining the value or label of a particular variable for a particular
experimental unit.
 An experimental unit is the individual or object on which a variable is measured.

CLASSIFICATION OF VARIABLE
1. Discrete vs. Continuous
Discrete – a variable which can assume finite number of values; usually measured by counting or
enumeration.
Continuous – a variable which can assume infinitely many values corresponding to a line number.
2. Qualitative vs. Quantitative
Qualitative – a variable that yields a categorical response.
Example: Occupation, Marital Status
Quantitative – a variable that takes on numerical values representing an amount or quantity.
Example: Weight, Height, Age, Number of cars

LEVEL OF MEASUREMENT
1. Nominal Level – the nominal level or classificatory scale is the weakest level of measurement where
numbers or symbols are used simply for categorizing subjects into different groups.
Examples: Sex: M-Male F-Female
Marital Status: 1-Single 2-Married 3-Widowed 4-Separated
2. Ordinal Level – the ordinal level of measurement contains the properties of the nominal level, and in
addition, the numbers assigned to categories of any variables may be ranked or ordered in some
low-to-high manner.
Examples: Teaching Ratings 1-poor 2-fair 3-good 4-excellent
Year Level 1-1st year 2-2nd year 3-3rd year 4-4th year
3. Interval Level – the interval level is that which the distances between any two numbers on the scale
are of known sizes.
Example: IQ level, Temperature
4. Ratio Level – the ratio level of measurement contains all the properties of the interval level, and in
addition, it has a “true zero” point.
Example: Number of correct answers in exam.

CLASSIFICATION OF DATA

G NE D0 3: Mat he mat ics i n t he Mo de r n Wo r ld | A. B . Ag ui lar


3

1. Primary vs. Secondary


a. Primary Source – data measured by the researcher/agency that published it.
b. Secondary Source – any republication of data by another agency.
Example: The publication of the National Statistics Office (NSO) is primary sources and
all subsequent publications of other agencies are secondary sources.
2. External vs. Internal
a. Internal Data – information that relates to the operations and functions of the organization
collecting the data.
b. External Data – information that relates to some activity outside the organization collecting
the data.
Example: The sales data of SM is internal data for SM but external data for any other
organization such as Robinson’s.

EXERCISE 4.1 ______________


A. Identify each item as discrete or continuous.
_______________1.Student enrolment in Cavite State University – Cavite City Campus
_______________2.Weight of the students
_______________3.Student number
_______________4.Amount of time spent surfing the internet per week.
_______________5.Number of persons in a family
B. Determine whether the data are qualitative or quantitative.
_______________1. The colors of automobiles on a used car lot.
_______________2. The numbers on the shirts of a girl’s soccer team.
_______________3. The seats in a movie theater.
_______________4. A list of house numbers on your street.
_______________5. The ages of a sample of 350 employees of a large hospital.
C. Identify the data set’s level of measurement (nominal, ordinal, interval, ratio).
_______________1. Hair color of women on a high school tennis team.
_______________2. Number of milligrams of tar in 28 cigarettes.
_______________3. Temperatures of 22 selected refrigerators.
_______________4. The ratings of a movie raging from “poor” to “good’ to “excellent”.
_______________5. List of zip codes for Chicago.
D. Identify the population, variable of interest, and type of variable of the following:
1. From all students registered this semester, the Mathematics Department would like to know how
many students like mathematics.
Population: _________________________________________________________________________
Variable: ___________________________________________________________________________
Type of Variable: ____________________________________________________________________

2. A study to be conducted by an NGO would determine the Filipinos’ awareness about the war
against IRAQ.
Population: _________________________________________________________________________
Variable: ___________________________________________________________________________
Type of Variable: ____________________________________________________________________

4.2 DATA COLLECTION AND PRESENTATION


GENERAL CLASSIFICATION OF COLLECTING DATA
 Census of complete enumeration is the process of gathering information from every unit in the
population.
- not always possible to get timely, accurate and economical data
- costly, especially of the number of units in the population is too large
 Survey sampling is the process of obtaining information from the units in the selected sample.

G NE D0 3: Mat he mat ics i n t he Mo de r n Wo r ld | A. B . Ag ui lar


4

G NE D0 3: Mat he mat ics i n t he Mo de r n Wo r ld | A. B . Ag ui lar


5

SLOVIN’S FORMULA
𝑁
𝑛=
1 + 𝑁𝑒 2
Where:
n = sample size
N = population size
e = margin of error (0.05 or 0.01)

Example:
1. Solve for the sample size of 350 patients from Cavite Medical Center.

2. Solve for the sample size of 4,565 students of CvSU – Rosario.

EXERCISE 4.2.1: _______________


Solve for the sample size of the following using Slovin’s formula:
1. 6,666

2. 12,345

3. 1000

4. 1203

G NE D0 3: Mat he mat ics i n t he Mo de r n Wo r ld | A. B . Ag ui lar


6

PROBABILITY AND NON-PROBABILITY SAMPLING


 A sampling procedure that gives every element of the population a nonzero chance of being
selected in the sample is called probability sampling. Otherwise, the sampling procedure is called
non-probability sampling.
 The target population is the population from which information is desired.
 The sampled population is the collection of elements from which the sample is actually taken.
 The population frame is a listing of all individual units in the population.

METHODS OF NON-PROBABILITY SAMPLING


1. Purposive sampling – sets out to make a sample agree with the profile of the population based on
some pre-selected characteristic.
2. Quota sampling – selects a specified number (quota) of sampling units possessing certain
characteristics.
3. Convenience sampling – selects sampling units that come to hand or are convenient to get
information from.
4. Judgment sampling – selects sample in accordance with an expert’s judgment.

METHODS OF PROBABILITY SAMPLING


1. Simple random sampling – is a method of selecting n units out of the N units in the population in such
a way that every distinct sample of size n has an equal chance of being drawn.
2. Stratified random sampling – the population of N units is first divided into subpopulations called
strata. Then a simple random sample is drawn from each stratum, the selection being made
independently in different strata.
3. Systematic sampling – is a method of selecting a sample by taking every kth unit from an ordered
population, the first unit being selected at random.
4. Cluster sampling – is a method where a sample of distinct groups, or cluster, of elements is selected
and then a census of every element in the selected cluster is taken.
5. Multistage sampling – the population is divided into a hierarchy of sampling units corresponding to
the different sampling stages. In the first stage of sampling, the population is divided into primary
stage units (PSU) then a sample of PSUs is drawn. In the second-stage units (SSU) then a sample of
SSUs is drawn.
6. Sequential sampling – units are drawn one by one in a sequence without prior fixing of the total
number of observations and the results of the drawing at any stage are used to decide whether to
terminate sampling or not.

DATA COLLECTION METHODS


Data Collection Methods
1. Survey method – questions are asked to obtain information, either through self-administered
questionnaire or personal interview.

Self-administered Questionnaire Personal Interview


 It can be administered to a large number of  It is administered to a person or group one
people simultaneously. at a time.
 Respondents may feel free to express views  Respondents may feel more cautious
and are less pressured to answer particularly in answering sensitive questions
immediately. for fear of disapproval.
 It is more appropriate for obtaining about
 It is more appropriate for obtaining complex emotionally-laden topics or
objective information. probing sentiments underlying an expressed
opinion.

2. Observation method – makes possible the recording of behavior but only at the time of occurrence.

G NE D0 3: Mat he mat ics i n t he Mo de r n Wo r ld | A. B . Ag ui lar


7

3. Experimental method – a method designed for collecting data under controlled conditions. An
experiment is an operation where there is actual human interference with the conditions
than can affect the variable under study.
4. Use of existing studies – e.g., census, health statistics, and weather bureau reports.
Two type:
 Documentary sources – published or written reports, periodicals, unpublished documents,
etc.
 Field sources – researchers who have done studies on the area of interest are asked
personally or directly for information needed.
5. Registration method – e.g., car registration, student registration and hospital admission.

EXERCISE 4.2.2 ______________


A. Identify which data collection method is best used on the following statements:
_______1. Tracer Study on BSBM graduates of CvSU – CCC from 2011-2016
_______2. The role of Brgy Officials in maintaining peace and order in the community.
_______3. The effects of entertainment media to the academic performance of senior high school
students.
_______4. Grading the demonstration teaching of pre-service teachers at CNHS
_______5. Testing the new vaccine for Parvo virus on puppies.
B. Identify the sampling technique used (random, cluster, stratified, convenience, systematic).
_______________1. Every fifth person boarding a plane is searched thoroughly.
_______________2. At a local community College, five math classes are randomly selected out of 20
and all of the students from each class are interviewed.
_______________3. A researcher randomly selects and interviews fifty male and fifty female teachers.
_______________4. Based on 12,500 responses from 42,000 surveys sent to its alumni, a major
university estimated that the annual salary of its alumni was 92,500.
_______________5. A community college student interviews everyone in a biology class to determine
the percentage of students that own a car.

TABULAR AND GRAPHICAL PRESENTATION OF DATA


Textual Presentation – data incorporated to a paragraph of text.

Advantages Disadvantages
 When a large mass of quantitative data are
 It gives emphasis to significant figures and included in a text or paragraph, the
comparisons. presentation becomes almost
incomprehensible.
 It is simplest and most appropriate  Paragraphs can be tiresome to read
approach when there are only a few especially if the same words are repeated
numbers to be presented. so many times.

Tabular Presentation – the systematic organization of data in rows and columns.


Advantages
 More concise than textual presentation
 Easier to understand
 Facilitates comparisons and analysis of relationship among different categories
 presents data in greater detail than a graph
Parts of a Formal Statistical Table
1. Heading – consist of a table number, title, and a head note.

G NE D0 3: Mat he mat ics i n t he Mo de r n Wo r ld | A. B . Ag ui lar


8

2. Box Head –the portion of the table that contains the column heads which describe the data in each
column.
3. Stub – The portion of the table usually comprising the first column on the left. The row caption is a
descriptive title of the data on the given line.
4. Field – main part of the table; contains the substance or the figures of one’s data.
5. Source note – an exact citation of the source of data presented in the table (should always be
placed when the figures are not original).
6. Foot note – any statement or note inserted at the bottom of the table.

Table 4.4 – CRIME VOLUME AND RATE BY TYPE: 1991 – 1993


heading
(Rate per 100,000 populations)

1991 1992 1993


Type Crime Crime Crime boxhead
Volume Volume Volume
Rate Rate Rate

Total 121,326 195 104,719 164 96,686 148

Index CrimesPhilippine77,261
Source: National 124
Police 67,354 106 58,684 90
stub Murder 8,707 14 8,293 13 7,758 12
Homicide 8,069 13 7,912 12 7,123 11
Physical 29,862 35 20,462 32 18,722 29 field
Injury 13,817 22 11,164 18 9,856 15
Robbery 22,780 37 17,374 27 12,940 20
Theft 2,026 3 2,149 3 2,285 4
Rape
44,065 71 37,365 59 38002 58
Nonindex crimes

Graphical Presentation – a graph or chart is a device for showing numerical values or relationships in
pictorial form.
Advantages:
 Main features and implications of a body of data can be grasped at a glance.
 Can attract attention and hold the reader’s interest.
 Simplifies concepts that would otherwise have been expressed in so many words.
 Can readily clarify data; frequently bring hidden facts and relationships.

Quality of a Good Graph


1. Accuracy
2. Clarity
3. Simplicity
4. Appearance

Common Types of Graph


1. Line chart – graphical presentation of data especially useful for showing trends over a period
of time.
2. Pie chart – a circular graph that is useful in showing how a total quantity is distributed among
a group of categories.
3. Bar chart – consist of a series of rectangular bars where the length of the bar represents the
quantity or frequency for each category if the bars are arranged horizontally. If the
bars are arranged vertically, the height of the bar represents the quantity.
4. Pictorial unit chart – a pictorial chart in which each symbol represents a definite and uniform
value.

G NE D0 3: Mat he mat ics i n t he Mo de r n Wo r ld | A. B . Ag ui lar


9

4.3 MEASURES OF CENTRAL TENDENCY AND LOCATION


MEASURES OF CENTRAL TENDENCY: UNGROUPED DATA
 It is any single value that is used to identify the “center” or the typical value of a data set. It is often
referred to as the averages.
a. Mean – this is obtained by summing up all the observations and divided by the sum by the number of
observations. We call this the simple mean.
∑𝑥
Formula: 𝑥̅ =
𝑛
Where: 𝑥̅ = mean
𝑥 = value of the particular item
𝑛 = number of items in the sample
Example:
A sample of 10 students was taken and was asked how much time they travel from their respective places
of residences to the school. The results are listed below. Compute the mean.
Student Travel time
A 30 min
B 15
C 35
D 20
E 25
F 45
G 10
H 25
I 30
J 15
b. Median – It is the middle value after arranging the set of observations into ascending or descending
order. If the number of observation is odd number, the median is the middle value and if the number
of observation is even number, the median is the average of the two middle values or observations.
Formula:
ODD EVEN
𝑛+1 𝑛 𝑛
( ) + ( + 1)
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑀𝑒𝑑𝑖𝑎𝑛 = 2 2
2 2
Example:
A sample of 10 students was taken and was asked how much time they travel from their respective
places of residences to the school. The results are listed below. Compute the mean.
Student Travel time
A 30 min
B 15
C 35
D 20
E 25
F 45
G 10
H 25
I 30
J 15

a. Mode – it is the observation that appears most often. Mode is the least preferred measure of central
location.
Example: Find the mode
Observations Mode
3 8 6 7 9 9 3 3 10 3 - unimodal
10 15 15 20 25 25 30 35 45 15 & 25 - bimodal
10 15 15 20 25 25 30 30 35 45 15, 25 & 30 - trimodal

G NE D0 3: Mat he mat ics i n t he Mo de r n Wo r ld | A. B . Ag ui lar


10

3 8 6 6 7 7 9 9 3 6 3 10 7 9 3, 6, 7, & 9 - multimodal

MEASURES OF CENTRAL TENDENCY: GROUPED DATA


a. Mean
∑ 𝑓𝑥
Formula: 𝑥̅ =
𝑛
Where: 𝑥̅ = mean
𝑓 = frequency
𝑥 = value of the particular item
𝑛 = number of observation
Example:
Final grades of Stat 110 students arrange in array. Solve for the mean.
50 50 50 50 50 50 51 52 53 53 57
59 59 60 60 60 62 62 62 62 63 65
66 66 68 68 68 68 68 69 69 69 69
69 70 71 71 71 71 72 72 72 72 72
73 73 73 73 74 74 74 75 75 75 75
75 76 76 76 76 77 77 77 77 78 79
79 79 79 79 80 80 80 81 81 81 81
82 82 82 82 82 82 83 83 84 84 84
84 84 84 84 85 85 86 86 87 87 87
87 87 87 88 89 89 91 92 94 94 96

Solution:

𝐾 = 1 + 3.322 𝑙𝑜𝑔110 = 7.78 𝑜𝑟 8 𝑅 = 96 − 50 = 40 𝐶 = 46 ÷ 8 = 6

Class Frequency CM (x) fx


∑ 𝑓𝑥
50 – 55 10 52.5 525 𝑥̅ =
𝑛
56 – 61 6 58.5 351
62 – 67 8 64.5 516 8175
=
68 – 73 25 70.5 1,762.5 110
74 – 79 22 76.5 1,683
80 – 85 23 82.5 1,897.5 = 74.32
86 – 91 12 88.5 1,062
92 – 97 4 94.5 378
N= 110 fx = 8,175

b. Median
𝑛
(2 −<𝑐𝑓𝑝 )
Formula: 𝑥̃ = 𝐿𝐶𝐵𝑚𝑑 + [ ]𝑖
𝑓𝑚𝑑

Where: 𝐿𝐶𝐵𝑚𝑑 = lower class boundary of the median class


𝑛 = number of observations
< 𝑐𝑓𝑝 = sum of the frequencies before the median class
𝑓𝑚𝑑 = frequency of the median class
𝑖 = class interval/size

Example:
Final grades of Stat 101 students arrange in array. Solve for the median.
Solution:
1. Determine the median class by dividing the total number of observations by 2.
𝑛 110
= = 55
2 2

G NE D0 3: Mat he mat ics i n t he Mo de r n Wo r ld | A. B . Ag ui lar


11

2. Go over the entries in the less than cumulative frequency column. The class that immediately
has a sum of frequencies greater than the result of step 1 is the median class.

𝑛
( −<𝑐𝑓𝑝 )
2
Class Frequency LCB <cf 𝑥̃ = 𝐿𝐶𝐵𝑚𝑑 + [ ]𝑖
𝑓𝑚𝑑
50 – 55 10 49.5 10
56 – 61 6 55.5 16 (
110
−49)
2
62 – 67 8 61.5 24 𝑥̃ = 73.5 + [ ]6
22
68 – 73 25 67.5 49
74 – 79 22 73.5 71 𝑥̃ = 75.14
Median class 80 – 85 23 79.5 94
86 – 91 12 85.5 106
92 – 97 4 91.5 110
N= 110

c. Mode
𝑓𝑚 −𝑑1
Formula: 𝑥̂ = 𝐿𝐶𝐵𝑚 + ( )𝑖
2𝑓𝑚−𝑑1 −𝑑2
Where: 𝑥̂ = Mode
𝐿𝐶𝐵𝑚 = LCB of the modal class
𝑓𝑚 = Frequency of the modal class
𝑑1 = difference between the frequency of the modal
class and the frequency before the modal class
𝑑2 = difference between the frequency of the modal
class and the frequency preceding the modal class

Example:
Final grades of Stat 101 students arrange in array. Solve for the median.

Solution:
1. Determine the modal class by identifying the class that contains the highest frequency or
observation.
𝑓 𝑑
Frequenc 𝑥̂ = 𝐿𝐶𝐵𝑚 + ( 𝑚− 1 ) 𝑖
Class LCB <cf 2𝑓𝑚 −𝑑1 −𝑑2
y
50 – 55 10 49.5 10 25−17
𝑥̂ = 67.5 + ( )6
56 – 61 6 55.5 16 2(25)−17−3

62 – 67 8 61.5 24
Modal class 68 – 73 25 67.5 49 𝑥̂ = 69.10
74 – 79 22 73.5 71
80 – 85 23 79.5 94
86 – 91 12 85.5 106
92 – 97 4 91.5 110
N= 110

G NE D0 3: Mat he mat ics i n t he Mo de r n Wo r ld | A. B . Ag ui lar


12

EXERCISE 4.3.1 _________________


1. The owner of a newly opened Internet café recorded the number of customers who are coming in
to his Internet café. Below is a tabulation of the number of customers for 10 days. Calculate the
mean, median and mode.
Days No. of Customers
1st 8
2nd 5
3rd 9
4th 12
5th 12
6th 10
7th 15
8th 15
9th 15
10th 14

2. Complete the Frequency Distribution Table to find the mean, median and mode of the data set
given:
Class F CM (x) fx LCB <CF

10-19 3

20-29 1

30-39 3

40-49 2

50-59 9

60-69 8

70-79 35

80-89 30

90-99 9

MEASURES OF LOCATION: UNGROUPED DATA


 These are values below which a specified fraction or percentage of the observations in a given set
must fall.
 Measures of location are the quartiles, deciles and percentiles.
 Quartiles divide the set of observation into 4 equal parts, Deciles into 10 and percentiles into 100
divisions. At some points, the three measures are equal as illustrated below

Percentile (P) …10 …20 …25 …30 …40 …50 …60 …70 …75 …80 …90 …100
Decile (D) 1 2 3 4 5 6 7 8 9 10
Quartile (Q) 1 2 3 4

𝑖(𝑛+1)
a. Percentile – to compute for the 𝑖 𝑡ℎ percentile: 𝑃𝑖 = is the value of the [ ] 𝑡ℎ observation in the
100
array.
Where: 𝑃𝑖 = Percentile location
𝑖 = Percentile of interest
𝑛 = number of observation

The following guidelines will help us identify the quantile location:


G NE D0 3: Mat he mat ics i n t he Mo de r n Wo r ld | A. B . Ag ui lar
13

1. If Pi is a whole number, the percentile location is the Pth in the ordered set of observations.
2. If Pi is not a whole number, the percentile location is between the P th and (P+1)th , by taking the
difference between the Pth and (P+1)th location and multiply the result by the decimal portion of
Pi.

Example:
Below is the list of the daily wages of 20 workers of XYZ Construction Company. Compute for P 87.
200 200 265 285 290 300 300 315 330 350
375 450 450 500 550 550 600 615 630 650
Solution:
𝑖(𝑛+1)
𝑃𝑖 = [ ] 𝑃87 = 615 + 0.27(630 − 615)
100
87(20+1)
𝑃87 = [ ] 𝑃87 = 619.05 𝑜𝑟 619
100
𝑃87 = 18.27𝑡ℎ 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛

𝑖(𝑛+1)
b. Decile – to compute for the 𝑖 𝑡ℎ decile: 𝐷𝑖 = is the value of the [ 10 ] 𝑡ℎ observation in the array.
Where: 𝐷𝑖 = Decile location
𝑖 = Decile of interest
𝑛 = number of observation
Example:
Below is the list of the daily wages of 20 workers of XYZ Construction Company. Compute for D 7.
200 200 265 285 290 300 300 315 330 350
375 450 450 500 550 550 600 615 630 650
Solution:
𝑖(𝑛+1)
𝐷𝑖 = [ ] 𝐷7 = 500 + 0.7(550 − 500)
10
7(20+1)
𝐷7 = [ 10 ] 𝐷7 = 535
𝐷7 = 14.70𝑡ℎ 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛
𝑖(𝑛+1)
c. Quartile – to compute for the 𝑖 𝑡ℎ quartile: 𝑄𝑖 = is the value of the [ 4 ] 𝑡ℎ observation in the array.
Where: 𝑄𝑖 = quartile location
𝑖 = quartile of interest
𝑛 = number of observation
Example:
Below is the list of the daily wages of 20 workers of XYZ Construction Company. Compute for Q 3.
200 200 265 285 290 300 300 315 330 350
375 450 450 500 550 550 600 615 630 650
Solution:
𝑖(𝑛+1)
𝑄𝑖 = [ ] 𝑄3 = 550 + 0.75(550 − 550)
4
3(20+1)
𝑄3 = [ 4 ] 𝑄3 = 550
𝑄3 = 15.75𝑡ℎ 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛

G NE D0 3: Mat he mat ics i n t he Mo de r n Wo r ld | A. B . Ag ui lar


14

MEASURES OF LOCATION: GROUPED DATA


a. Quartiles – the formula for quartiles will be patterned from the median formula.
𝑘𝑛
( −<𝑐𝑓𝑝 )
Formula: 𝑄𝑘 = 𝐿𝐶𝐵𝑄𝑘 + [ 4𝑓 ]𝑖
𝑄𝑘
Where: 𝐿𝐶𝐵𝑄𝑘 = lower class boundary of the quartile class
𝑛 = number of observations
< 𝑐𝑓𝑝 = sum of the frequencies before the quartile class
𝑓𝑄𝑘 = frequency of the quartile class
𝑖 = class interval/size
Example:
Final grades of Stat 101 students arrange in array. Solve for the Q1.
Class Frequency LCB <cf
50 – 55 10 49.5 10
56 – 61 6 55.5 16
62 – 67 8 61.5 24
68 – 73 25 67.5 49
74 – 79 22 73.5 71
80 – 85 23 79.5 94
86 – 91 12 85.5 106
92 – 97 4 91.5 110
N= 110

Solution:
1. Determine the Quartile class by dividing the number of observation by 4.
𝑛 110
= = 27.5
4 4
2. Go over the entries in the less than cumulative frequency column. The class that has a sum of
𝑛
frequencies greater than the 4 is the quartile 1 class.
𝑛
Class Frequency LCB <cf ( −<𝑐𝑓𝑝 )
4
𝑄1 = 𝐿𝐶𝐵𝑄1 + [ ]𝑖
50 – 55 10 49.5 10 𝑓 𝑄𝑘
56 – 61 6 55.5 16
110
62 – 67 8 61.5 24 (
4
−24)
68 – 73 25 67.5 49 𝑄1 = 67.5 + [ ]6
25
74 – 79 22 73.5 71
80 – 85 23 79.5 94 𝑄1 = 68.34
86 – 91 12 85.5 106
92 – 97 4 91.5 110
N= 110
b. Deciles
𝑘𝑛
( 10 −<𝑐𝑓𝑝 )
Formula: 𝐷𝑘 = 𝐿𝐶𝐵𝐷𝑘 + [ 𝑓𝐷𝑘
]𝑖
Where: 𝐿𝐶𝐵𝐷𝑘 = lower class boundary of the deciles class
𝑛 = number of observations
< 𝑐𝑓𝑝 = sum of the frequencies before the deciles class
𝑓𝐷𝑘 = frequency of the quartile class
𝑖 = class interval/size

G NE D0 3: Mat he mat ics i n t he Mo de r n Wo r ld | A. B . Ag ui lar


15

Example:
Final grades of Stat 101 students arrange in array. Solve for the D8.

Solution:
1. Determine the Deciles class by dividing the number of observation by 10.
𝑘𝑛 8∗110
= = 88
10 10
2. Go over the entries in the less than cumulative frequency column. The class that has a sum of
𝑛
frequencies greater than the 10 is the deciles 8 class.
𝑘𝑛
Class Frequency LCB <cf (
10
−<𝑐𝑓𝑝 )
𝐷8 = 𝐿𝐶𝐵𝐷8 + [ ]𝑖
50 – 55 10 49.5 10 𝑓𝐷𝑘
56 – 61 6 55.5 16
62 – 67 8 61.5 24
68 – 73 25 67.5 49
74 – 79 22 73.5 71
80 – 85 23 79.5 94
86 – 91 12 85.5 106
92 – 97 4 91.5 110
N= 110

c. Percentile
𝑘𝑛
(100−<𝑐𝑓𝑝 )
Formula: 𝑃𝑘 = 𝐿𝐶𝐵𝑃𝑘 + [ 𝑓𝑃𝑘
]𝑖
Where: 𝐿𝐶𝐵𝑃𝑘 = lower class boundary of the percentile class
𝑛 = number of observations
< 𝑐𝑓𝑝 = sum of the frequencies before the percentile
class
𝑓𝑃𝑘 = frequency of the percentile class
𝑖 = class interval/size
Example:
Final grades of Stat 101 students arrange in array. Solve for the P57.

Solution:
1. Determine the Percentile class by dividing the number of observation by 100.
𝑘𝑛 57∗110
= = 62.7
100 100
2. Go over the entries in the less than cumulative frequency column. The class that has a sum of
𝑛
frequencies greater than the 100 is the percentile 57 class.
𝑘𝑛
Class Frequency LCB <cf (
100
−<𝑐𝑓𝑝 )
𝑃57 = 𝐿𝐶𝐵𝑃57 + [ ]𝑖
50 – 55 10 49.5 10 𝑓 𝑃𝑘
56 – 61 6 55.5 16
62 – 67 8 61.5 24
68 – 73 25 67.5 49
74 – 79 22 73.5 71
80 – 85 23 79.5 94
86 – 91 12 85.5 106
92 – 97 4 91.5 110
N= 110

G NE D0 3: Mat he mat ics i n t he Mo de r n Wo r ld | A. B . Ag ui lar


16

EXERCISE 4.3.2 ___________


A. The owner of a newly opened Internet café recorded the number of customers who are coming
in to his Internet café. Below is a tabulation of the number of customers for 10 days.

Days No. of Customers


1st 8
2nd 5
3rd 9
4th 12
5th 12
6th 10
7th 15
8th 15
9th 15
10th 14
Calculate the following:
Q1 D8

Q3 P45

D3 P89

B. Complete the Frequency Distribution Table to find the Q3, D6 and P94 of the data set given:
Class F LCB <CF
10-19 3
20-29 1
30-39 3
40-49 2
50-59 9
60-69 8
70-79 35
80-89 30
90-99 9

G NE D0 3: Mat he mat ics i n t he Mo de r n Wo r ld | A. B . Ag ui lar


17

4.4 MEASURES OF ABSOLUTE DISPERSION


MEASURES OF DISPERSION
 It indicates the extent to which individual items in a series are scattered about an average.
Some Uses for Measuring Dispersion:
 To determine the extent of the scatter so that steps may be taken to control the existing variation.
 Used as a measure of reliability of the average value
General Classifications of Measures of Dispersion:
1. Measures of Absolute Dispersion
2. Measures of Relative Dispersion

MEASURES OF ABSOLUTE DISPERSION: UNGROUPED DATA


 Expected in the units of the original observations.
 They cannot be used to compare variations of two data sets when the averages of these data sets
differ a lot in value or when the observations differ in units of measurement.

1. Range – it is the difference between the largest and smallest values.


Range = maximum – minimum
Example:
a. The IQ’s of 5 members of a certain family are 108,112,127,116 and 113. Find the range.
Range = maximum – minimum
Range = 127 -108 = 19
2. Mean Absolute Deviation or Average Deviation
∑ |𝑥 − 𝑥̅ |
𝑀𝐷 =
𝑁
3. Standard Deviation – is the most frequently used measure of dispersion.
∑(𝑥−𝑥̅ )2
Formula: 𝑠=√
𝑛−1
Where: 𝑠 = sample standard deviation
𝑥 = observation
𝑥̅ = sample mean
𝑛 = number of observation
Steps in Calculating the Standard Deviation
1. Compute the mean
2. Compute the deviations by subtracting the mean from each of the observations
3. Square the deviations
4. Take the sum of the squared deviations
5. Divide the sum by N – 1
6. Take the square root of the sample variance

Example:
Below is the list of the scores of two groups of students in a grammar quiz.
Group A Group B
13 10
14 10
15 15
16 18
19 18
20 19
25 26
30 36

Solution:
1. Compute the mean
∑𝑥 152 ∑𝑥 152
𝑥̅𝐴 = = = 19 𝑥̅𝐵 = = = 19
𝑛 8 𝑛 8

G NE D0 3: Mat he mat ics i n t he Mo de r n Wo r ld | A. B . Ag ui lar


18

2. Compute the deviations by subtracting the mean from each of the observations, and then
square the deviations.
Group A 𝑥 − 𝑥̅ (𝑥 − 𝑥̅ )2 Group B 𝑥 − 𝑥̅ (𝑥 − 𝑥̅ )2
13 -6 36 10 -9 81
14 -7 49 10 -9 81
15 -4 16 15 -4 16
16 -3 9 18 -1 1
19 0 0 18 -1 1
20 1 1 19 0 0
25 6 36 26 7 49
30 11 121 36 17 289

3. Take the sum of the squared deviations, then divide the sum by N – 1, then take the square root
of the sample variance
∑(𝑥−𝑥̅ )2 268 ∑(𝑥−𝑥̅ )2 518
𝑠𝐴 = √ =√ = 6.19 𝑠𝐵 = √ =√ = 8.60
𝑛−1 8−1 𝑛−1 8−1

MEASURES OF ABSOLUTE DISPERSION: GROUPED DATA


Mean Deviation
∑ 𝑓|𝑥 − 𝑥̅ |
𝑀𝐷 =
𝑁
Standard Deviation – is the most frequently used measure of dispersion.
∑ 𝑓(𝑥−𝑥̅ )2
Formula: 𝑠=√
𝑛−1
Where: 𝑠 = sample standard deviation
𝑓 = frequency
𝑥 = class mark
𝑥̅ = sample mean
𝑛 = number of observation
Steps in Calculating the Standard Deviation
1. Compute the mean
2. Compute the deviations by subtracting the mean from each of the class mark
3. Square the deviations
4. Multiply the squared deviations by its corresponding frequency
5. Take the sum of the product of the squared deviations and the frequency
6. Divide the sum by N – 1
7. Take the square root of the sample variance

Example:
Final grades of students in Stat 101 arranged in FDT. Solve for the Standard deviation.
Frequenc 𝑥 − 𝑥̅ (𝑥 − 𝑥̅ )2 𝑓(𝑥 − 𝑥̅ )2
Class CM (x) 𝑓𝑥
y
50 – 55 10
56 – 61 6
62 – 67 8
68 – 73 25
74 – 79 22
80 – 85 23
86 – 91 12
92 – 97 4
N= 110

∑ 𝑓(𝑥−𝑥̅ )2 ∑ 𝑓|𝑥− 𝑥̅ |
𝑠=√ 𝑀𝐷 =
𝑛−1 𝑁

G NE D0 3: Mat he mat ics i n t he Mo de r n Wo r ld | A. B . Ag ui lar


19

EXERCISE 4.4 ____________


A pediatrician has clinic hours in two leading hospitals. His clinic schedule in Alabang is 10:00 to 12:00 pm,
MWF. His clinic schedule in Makati is 2:00 to 4:00 pm, TTh. The logbook of his secretaries shows the
number of patients who visited him for the last two weeks.
Hospital in Alabang Hospital in Makati
4,800 4,200
4,200 3,600
4,200 3,600
3,000 3,000
2,400 4,800

Complete the Frequency Distribution Table to find the standard deviation of the data set given:
Class F CM (x) 𝑓𝑥 𝑥 − 𝑥̅ (𝑥 − 𝑥̅ )2 𝑓(𝑥 − 𝑥̅ )2
10-19 3
20-29 1
30-39 3
40-49 2
50-59 9
60-69 8
70-79 35
80-89 30
90-99 9

G NE D0 3: Mat he mat ics i n t he Mo de r n Wo r ld | A. B . Ag ui lar


20

4.5 MEASURES OF RELATIVE DISPERSION


NORMAL DISTRIBUTION

Properties of a Normal Distribution

a. The mean, median, and mode are all equal and are located at the center of the distribution.
b. The distribution is symmetric. The distribution depicts a bell-shaped curve where the left area is a
mirror image of the right area.
c. The total area under the normal curve is 1 or 100%.
d. The distribution is asymptotic.
e. The location of the distribution is determined by the mean and the standard deviation determines
dispersion of the distribution.

The graph below shows the graph of a normal distribution:

𝜇 − 3𝛿 𝜇 − 2𝛿 𝜇 − 1𝛿 𝜇 𝜇 + 1𝛿 𝜇 + 2𝛿 𝜇 + 3𝛿

The mean and the standard deviation determine the shape of the distribution.
As previously stated, there are infinite families of curves depending upon the standard deviation of the
distribution. This may suggest that we have to use different table corresponding to a particular mean and
standard deviation. Well, it is not. It is necessary that we need to standardize a given observation. the
standardized score may also be termed as Z-value, Z statistics, standard deviate, standard normal value or
just normal value. The formula is shown below.
𝑥−𝜇
𝑍=
𝜎
Where: 𝑧 = normal value
𝑥 = value of any particular observation
𝜇 = mean of the distribution
𝜎 = standard deviation of the distribution

G NE D0 3: Mat he mat ics i n t he Mo de r n Wo r ld | A. B . Ag ui lar


21

The different rules presented by examples can be summarized as follows:

Z - values Rules
1. The z – values are positive and negative Add the areas of the corresponding Z – values.
2. Both Z – values are positive or both Z – Value In either case, subtract the smaller area from the
are negative bigger area
3. To the right of a positive z – value or to the
Subtract the area from 0.5
left of a negative z value
4. To the right of a negative z value or to the
Add area to 0.5
left of a positive z value

Examples:
Find the area under the normal distribution curve of the following z values:
1. 0 < z < 1.63 5. z > 1.63

2. 0 > z > - 2.44 6. z < -2.44

3. z < 2.44 7. – 2.44 < z < –1.05

4. z > - 1.63 8. – 1.05 < z < 1.63

G NE D0 3: Mat he mat ics i n t he Mo de r n Wo r ld | A. B . Ag ui lar


22

EXERCISE 4.5 _____________


Sketch the normal distribution of the given problem. Show your solutions.
A data set follows a normal distribution with a mean of 40 and a standard deviation of 4.75.
What is the area under the normal curve?
a. Between 34.06 and 46.08?
b. Between 28.6 and 35.11?
c. Greater than 49.5?
d. Less than 44.04?

G NE D0 3: Mat he mat ics i n t he Mo de r n Wo r ld | A. B . Ag ui lar


23

4.6 HYPOTHESIS TESTING


CONCEPT OF HYPOTHESIS TESTING
 Hypothesis – is a statement about the population developed for the purpose of testing.
 Hypothesis testing – is a procedure consisting of pertinent steps whose major objective is to be able
to make a decision based on the gathered data.

NULL AND ALTERNATIVE HYPOTHESES


The concept of hypothesis in statistical inference is classified into two:
1. Null hypothesis – denoted by H0 refers to the statement about the absence of any effect claimed for a
certain action. This hypothesis also asserts the absence of difference between the observed and the expected
values. The null hypothesis should be stated by saying “There is no significant difference…”, “There is no
relationship…”, or “there is no change…”
2. Alternative Hypothesis – denoted by Ha refers to the assertion contradicting the null hypothesis.
Thus, if the null hypothesis is proven to be true, then the alternative hypothesis should be false. To
state the alternative hypothesis of our null, we may say, “There is significant relationship between…”
The alternative hypothesis tells us if the test is one-tailed or two-tailed test.

ONE-TAILED AND TWO-TAILED TEST


1. One – tailed test – is the test where the area of rejection is at either side. The one-tailed test is used
if the alternative hypothesis is directional.
Example:
A teacher employed two different teaching strategies in presenting her lesson: lecture and
discussion method. After the presentation, a 30 – point quiz was given. The mean score of the
students where the discussion method was the strategy used was found out to be 25 with the
standard deviation of 3. The mean score of the students where the lecture method was used was
found out to be 19 with a standard deviation of 3.2. At the 0.01 significance level, can we conclude
that the discussion method is more effective than the lecture method?
H0 = The discussion method is as effective as the lecture method.
Ha = Discussion method is more effective than the lecture method.

2. Two-tailed test – a test where the areas of rejection are both sides of the distribution. The two-tailed
test is used if the alternate hypothesis is non-directional.
Example:
A test was administered to two groups of students – the HRM student group and the tourism
student group. At the 0.05 significance level, is there difference between the scores obtained by the
two groups of students?
H0 = There is no significant difference between the scores obtained by the two groups of students.
Ha = There is significant difference between the scores obtained by the two groups of students.
LEVEL OF SIGNIFICANCE
 It is the probability of rejecting a true null hypothesis.
 If the null hypothesis is true and is rejected, it is called TYPE I ERROR. And if the null hypothesis is
false and is accepted, it is called TYPE II ERROR.

Decision
Null Hypothesis
Reject H0 Accept H0
H0 is true Type I Error Correct Decision

H0 is false Correct Decision Type II Error

G NE D0 3: Mat he mat ics i n t he Mo de r n Wo r ld | A. B . Ag ui lar


24

CRITICAL VALUE
 The value that divides the area of rejection and the area of acceptance.

Region of
acceptance
Region of Region of
rejection rejection

-1.701 1.701
STEPS IN HYPOTHESIS TESTING
1. State the null hypothesis (H0) and the alternative hypothesis (Ha).
2. Set the desired level of significance.
3. Determine the appropriate test statistic and establish the critical region.
4. Compute the test statistic as a basis for decision.
5. Formulate the decision.

Examples:

For each of the problems below, do the following:


 Define the variable that you are going to use to represent information.
 Formulate the appropriate null hypothesis (H0) and the appropriate alternative hypothesis (Ha).

1. The soft drink dispenser of a fast food center was just readjusted. The manager, wanting to know if
the dispenser is really in good condition, got a sample of 50 cups filled by the dispenser. She would
only classify the dispenser as “in good condition” (and therefore need not to be readjusted again) if
the average fill per cup of the dispenser is 8 ounces.
Solution:
 Variable: The variable that will represent the information is –
X = fill per cup of the dispenser.

 Hypothesis: Ho: μ = 8 ounces (The dispenser is “in good condition”.)


Ha: μ ≠ 8 ounces (The dispenser is not “in good condition”.)

2. Jenny suspects that male CvSU-CCC students spend less time studying compare to their female
counterpart. She decided to conduct a study regarding the study habits of both male and female
CvSU-CCC student spends doing his/her school work.
Solution:
 Variable: The variable that will represent the information is –
X = time spent by male CvSU-CCC student in doing school work.
Y = time spent by female CvSU-CCC student in doing school work

 Hypothesis: Ho: μx = μy (The average time spent by male CvSU-CCC students in doing
school work is the same with the female CvSU-CCC students.)

Ha: μx < μy (The average time spent by male CvSU-CCC students in doing
school work is less than the female CvSU-CCC students.)

G NE D0 3: Mat he mat ics i n t he Mo de r n Wo r ld | A. B . Ag ui lar


25

4.7 STATISTICAL TESTS


TEST OF RELATIONSHIP
1. Pearson Product Moment Correlation (Pearson R)
FUNCTION: Parametric. It is used to test relationship between two variables in the interval or
ratio.
LEVEL OF MEASUREMENT: Interval/Ratio
SAMPLE DATA: Test Scores, Grades, IQ, Academic performance, Attendance, Budget
RESEARCH PROBLEM: Is there a significant relationship between the level of academic
motivation and academic performance of the participants?

2. Spearman Rank-Order Correlation (Spearman’s Rho)


FUNCTION: Non-parametric. Used to determine if there is a correlation of relationship
between two variables of ordinal type.
LEVEL OF MEASUREMENT: Ordinal
SAMPLE DATA: Percentile, class ranking, social status
RESEARCH PROBLEM: Is there a significant relationship between the student’s ranking in
Mathematics and Science subjects?

3. Chi-Square Test of Independence


FUNCTION: Non-parametric. Used to determine if there is a correlation or relationship or
association between variables of nominal type.
LEVEL OF MEASUREMENT: Nominal
SAMPLE DATA: Gender / Sex, School location, Number of responses (Frequency)
RESEARCH PROBLEM: Is sex related to color preference?; Is there a relationship between the
type of school attended and students’ gender?

TEST OF DIFFERENCE
1. Z – Test of One Population Mean
FUNCTION: Parametric. Used to determine if a given sample mean was drawn from the
population with known parameters.
LEVEL OF MEASUREMENT: Interval/Ratio
SAMPLE DATA: SATT Scores, Average, Ratings, IQ, Budget, Gross Income
RESEARCH PROBLEM: Is the group of teenagers in Makati represent Metro Manila teenagers?;
Is there enough evidence to contradict the rental company’s claim that the mean time to
rent a car on their website is 60 seconds if the mean time of rent of random sample of 36
customers was 75 seconds?; Is there a significant difference between the mean score of the
2018 LET passers from CvSU with mean score of the total LET passers of CvSU?

2. Z – Test of Independent Proportions


FUNCTION: Non-parametric. Used to determine if there is a significant difference between
two independent or two different groups on situations that call for two types of responses.
LEVEL OF MEASUREMENT: Nominal
SAMPLE DATA: Gender/Sex, Public/Private School, Married/Single, Number of responses
RESEARCH PROBLEM: Is there a significant difference between the students and the teachers
who are in favor of Duterte’s war on drugs?;

3. Z – Test of Dependent Proportions


FUNCTION: Non-parametric. Used to determine if there is a significant difference between
pairs of observation from a single group.
LEVEL OF MEASUREMENT: Nominal
SAMPLE DATA: Gender/Sex, Public/Private School, Married/Single, Number of responses

G NE D0 3: Mat he mat ics i n t he Mo de r n Wo r ld | A. B . Ag ui lar


26

RESEARCH PROBLEM: Is there a significant difference between students who are in favor of
Duterte’s war on drug before and after the forum?; Is there a significant difference between
voters’ choice of candidate before and after the political debate?

4. T – Test of Independent Means


FUNCTION: Parametric. Used to determine if there is a significant difference between two
different or two independent groups in terms of means.
LEVEL OF MEASUREMENT: Interval/Ratio
SAMPLE DATA: SATT Scores, Average, Ratings, IQ, Budget, Gross Income
RESEARCH PROBLEM: Is there a significant difference between the academic performance
in Mathematics of K-12 and Non-K-12 graduates?; Is there a significant difference between
the perception of the teachers and students on the use of an on-line learning management
system?

5. T – Test of Dependent Means (Paired T-Test)


FUNCTION: Parametric. Used to determine if there is a significant difference between two
groups or two sets of correlated scores; usually used when undergone a treatment
LEVEL OF MEASUREMENT: Interval/Ratio
SAMPLE DATA: Pre-test and Post-test Scores; Mean weight before and after intensive training;
SATT Scores, Average, Ratings, IQ, Budget, Gross Income
RESEARCH PROBLEM: Is there a significant difference on the diagnostic and summative
exam scores of the students after undergoing intervention program?; Is there a significant
difference on the English proficiency level of the participants before and after attending
Speech Communication courses?
6. Chi – Square Test of Goodness of Fit
FUNCTION: Non-parametric. Used to determine if there is a significant difference between
the observed distribution and the expected distribution.
LEVEL OF MEASUREMENT: Nominal
SAMPLE DATA: Gender/Sex, Public/Private School, Married/Single, Number of responses
RESEARCH PROBLEM: Is there a significant difference between the observed distribution and
the expected distribution of teachers’ responses on the issue of Duterte’s making alliance
with China and Russia?; Is there a significant difference between the observed and the
expected distribution of male and female enrollees in CvSU – CCC?

7. One – Way Analysis of Variance (ANOVA I)


FUNCTION: Parametric. Used to determine if there is a significant difference between two or
more groups in terms of means.
LEVEL OF MEASUREMENT: Interval/Ratio
SAMPLE DATA: Average, Ratings, IQ, Budget, Gross Income, Speed
RESEARCH PROBLEM: Is there a significant difference between three models of photocopy
machines in terms of average no. of photocopies it can produce in a week?; Is there a
significant difference on the Mathematics anxiety level of the participants in terms of their
learning styles?

8. Two – Way Analysis of Variance(ANOVA II)


FUNCTION: Used to determine if there is a significant difference in terms of means between
two or more groups that have two or more independent variables.
LEVEL OF MEASUREMENT: Interval/Ratio
SAMPLE DATA: Average, Ratings, IQ, Budget, Gross Income, Speed
RESEARCH PROBLEM: Is there a significant difference between the mean scores of the
students on the use of modularized instruction, cooperative learning and lecture method in
terms of medium of instruction in English and Filipino?; Is there a significant difference on
the ratings of the students to music and movies in terms of genres?

G NE D0 3: Mat he mat ics i n t he Mo de r n Wo r ld | A. B . Ag ui lar

Vous aimerez peut-être aussi