Vous êtes sur la page 1sur 98

Descriptive Statistics

I. INTRODUCTION TO STATISTICS
Do you know?
• Statistics is part of our life. You will find statistics
everywhere!

• You find them in newspapers, at the back of the


things you buy, flashed on TV etc.
What is Statistics?

• Statistics is the branch of science that


deals with the
• (1) collection,
• (2) presentation,
• (3) organization,
• (4) analysis, and
• (5) interpretation of data
Why is it important?
• Information empowers us to make intelligent
choices

• For example, you are enrolled in a class that


requires you to take 4 long exams. You already
took 3 of them and was wondering what your
class standing is. The deadline of dropping is fast
approaching. What would you do to know if your
class standing is passing (or consider dropping)?
Roles of Statistics

• Sports
• Statistics can help us prepare for the next
rival/game

• In Business
• Statistics can help us in marketing etc.

• Economics
• Statistics can help us understand economic growth
Roles of statistics
• Governance
• Statistics can help us in decision and policy making (eg.
Police ratios in cities)

• Tourism
• Statistics can help us understand tourist preferences

• Engineering
• Statistics can help us determine the endurance of
machinery in different extreme environments before
failure or malfunction
Statistical Inquiry
• It is a designed research that provides
information needed to solve a research
problem.

• It has 6 basic steps


Step 1: Identify the problem

Step
Step 2: Plan the Study

Step 3: Collect the data

Step 4: Explore the data

Step 5: Analyze data and interpret the results

Step 6: Present the results


BASIC CONCEPTS
Population and Sample
• The POPULATION is the group of
study/interest. It is the group which we
like to understand, characterize or
describe.

• The Sample is a subset of the population


Population and Sample
• All human beings
• Filipinos

• All teenagers
• 16 to 19 year old teen agers

• All UP Diliman students


• Stat 101 students
Variables
• In research, we are interested in studying
characteristics or attributes of the population.

• The VARIABLE is a characteristic or attribute of


the elements in a collection that can have
different values for the different elements.

• We denote variable by English capital letters.


Commonly used letters are: X, Y, Z.
Observation
• An OBSERVATION is a realized value of a
variable
Example
Variable Possible Observation

Sex of a student Male, Female

Class standing of a student Passing, Failing

Weekly allowance of a 1,000 pesos


person
Height of a volleyball player 6 ft.
Example
• The office of admission is studying the GWA’s of
UP graduates from 2010 to 2013.

• Population: collection of all graduates of the university


from 2010 to 2013

• Variable of interest: GWA

• Possible sample: collection of all Business


administration graduates of the university from 2010
to 2013
Examples
• The department of health is interested in determining
the percentage of children below 15 years old infected by
the Dengue virus in Quezon city in 2012.

• Population: Set of all children below 15 years old in


Quezon City in 2012

• Variable of interest: Whether or not the child has ever


been infected by the dengue virus

• Possible sample: Set of all children below 15 years old


in barangay UP campus in Quezon city in 2012
Parameter and Statistic
• The PARAMETER is a summary measure
describing a specific characteristic of the
population

• The STATISTIC is a summary measure describing


a specific characteristic of the sample

• Remember,
• Parameter : Population
• Statistic : Sample
Examples
• Proportions

• Percentages

• Means

• Standard deviations

• Correlation
Example
• Mr. Donaldo Chan, a candidate for vice-mayor in Orion,
Bataan, wants to find out if there is a need to intensify
his campaign efforts against his opponents. He requested
the service of a group of students to interview 1,000 of
the 3,000 registered voters of Orion, Bataan. The survey
results showed that 75% of the 1,000 voters in the
sample will vote for him as vice-mayor.

• Identify the population and the sample


• Identify the variable of interest
• Identify the parameter and statistic
Example
• Population: 3000 registered voters of Orion,
Bataan.
• Sample: 1000 registered voters of Orion, Bataan.

• Variable of interest: Whether or not the voter


will vote for Mr. Chan

• Statistic: 75% will vote for him


Fields of Statistics
• DESCRIPTIVE statistics (Describe)
• Utilizes data without any conclusion about a larger
group
• organizes, summarizes, and presents the data on
hand.

• INFERENTIAL statistics (Infer)


• Analyzes data that will lead to
generalization/prediction about a population from
which the sample came from.
Descriptive Statistics
• Uses techniques that are useful in summarizing
and organizing data.

• In descriptive statistics, we use tables and charts,


and compute for summary measures like means,
proportions, and percentages.

• Example,
• A basketball player wants to know his average
points in his past 5 games
Inferential Statistics
• We do not simply describe sample data. Rather,
we use the sample data to form conclusions
about the population.

• Example,
• A tire manufacturer wishes to estimate the
lifetime of their tires by testing a sample of 100
tires.
• Lets do the exercises on page 11 of ACS
Why use statistics?
• What you understand from gathered statistics is
vital for decision making!
• Comparisons (which is better or worse?)
• Explanation (how could it happen?)
• Justification (why did it happen?)
• Prediction (what would happen?)
• Estimation (what could it be?)
Comparisons (which is better or worse?)
Justification (why did it happen?)

8 out of
10 kids
love this?

Is it true?

How
come?
Prediction (what would happen?)
II. COLLECTION OF DATA
Data
• It is a collection of observations.

• The analysis of collected data usually unveils


valuable information that is effective in solving
practical problems. It helps us in decision
making.

• Examples: Census data, Class standing data,


Grades etc.
Measurement
• Measurement is the process of determining the
value or label of the variable based on what has
been observed.

• It has 4 levels:
• Nominal
• Ordinal
• Interval
• Ratio
Nominal level
• The Nominal level of measurement has the ff. properties:
• A. The numbers in the system are used to classify a person/object
into distinct non-overlapping, and exhaustive categories

• The nominal level is the weakest of all levels because we use


symbols or numbers for the sole purpose of classifying an
individual/object into two or more categories.

• The magnitudes of the numbers and the difference between


numbers have no meaning. There is also no absolute zero
point.
Nominal data examples
• SEX: 1-male and 2-female / 1-female and 2-male

• RELIGION: 1-Catholic, 2-Protestant, 3-Muslim, 4-Iglesia ni


Kristo, 5-Others.

• Major Island groups of Residence with categories: 1-


Luzon, 2-Mindanao, 3-Visayas

• Type of movie with categories: 1-romance, 2-adventure,


3-horror, 4-others
Analysis of Nominal level data

• The type of analysis that we perform on nominal level


data is limited.
• We cannot organize data meaningfully by arranging
the observations by magnitude.
• We cannot interpret the differences between numbers
and the ratio of two numbers.
• We can only count the number of observations with
the same value or label, and compute for proportions
and percentages
Ordinal level
• The Ordinal level of measurement has the ff. properties:
• A. The numbers in the system are used to classify a
person/object into distinct non-overlapping, and
exhaustive categories
• B. The system arranges the categories according to
magnitude

• At ordinal level, we can use the numbers in the scale to


classify an observation into categories. In short, we are
able to know the magnitude and order.
Examples of Ordinal level data
• SHIRT SIZE: 1-small, 2-medium, 3-large, 4-extra large

• Ranking of a Student in class: 1st, 2nd, 3rd, and so on.

• Faculty rank of a teacher: 1-professor, 2-associate


professor, 3-assistant professor, 4-Instructor

• Performance ranking: 1-Excellent, 2-very good, 3-good,


4-satisfactory, 5-poor
Analysis of Ordinal level data
• We cannot interpret the distance between
categories or observed values.
• We cannot perform arithmetic operations
• We cannot add or subtract because the differences
are meaningless
• We cannot multiply or divide because the ratios are
also meaningless
Interval level
• The Interval level of measurement has the ff. properties:
• A. The numbers in the system are used to classify a
person/object into distinct non-overlapping, and
exhaustive categories
• B. The system arranges the categories according to
magnitude
• C. The system has a fixed unit of measurement
representing a set size throughout the scale

• The interval level has a zero point but is not an absolute zero.
The zero value in this level has an arbitrary interpretation
and does not mean the absence of the property we are
measuring.
Examples of Interval level data
• Temperature in degrees centigrade
• First, equal values represent the same temperature
• Second, the scale is arranged in ascending order so
that higher readings represent hotter temperatures
• Third, we can compare the differences between two
temperature readings

• IQ scores

• Calendar dates
Analysis of Interval level data
• Ratios of two observed values have no meaning since we
have no true zero point.
• It is not right for us to conclude that 30 degrees Celsius
is twice as hot as 15 degrees Celsius just because their
ratio is equal to 2.
Ratio level
• The Ratio level of measurement has the ff. properties:
• A. The numbers in the system are used to classify a
person/object into distinct non-overlapping, and
exhaustive categories
• B. The system arranges the categories according to
magnitude
• C. The system has a fixed unit of measurement
representing a set size throughout the scale
• D. The system has an absolute zero
Examples of Ratio level data
• Allowance of a student (in pesos)
• First, 100 pesos is different from 101 pesos. You can categorize
distinctly.
• Second, 100 pesos is different and less than 101 pesos. You can
see the magnitude and order.
• Third, 100 pesos is less than 1 peso compared to 101 pesos. You
can see clearly how much different the two is (by 1 peso).
• Fourth, 0 pesos means no allowance

• Distance travelled by an airplane (in Kms)


Analysis of Ratio level data
• It is possible to perform any arithmetic operation on the
collected data:
• The first property allows us to count how many observations
belong in each category
• The second property allows us to organize the data meaningfully
by arranging the observations according to magnitude
• The third property allows us to compute and interpret differences
• The fourth property allows us to compute and interpret the ratio
between two measures
Summary
• Nominal: =, ≠

• Ordinal: =, ≠, <, >

• Interval: =, ≠, <, >, +, −

• Ratio: =, ≠, <, >, +, −, ×, ÷


Exercise
• Let us answer the exercise on page 27
Data Collection Methods
• There are 4 widely used methods for collecting
data:
• A. Use of available documented data in
published or unpublished studies
• B. Surveys
• C. Experiments
• D. Observations
Use of documented data
• A researcher can obtain documented data from
previous studies (Individual, private,
government, non-government agencies etc.):
• NSO
• NSCB
• DoH
• SWSS
Two types of documented data
• Primary data are data documented by the primary
source. The data collectors themselves documented this
data.
• Central bank is a primary source of data on banking
and finance

• Secondary data are data documented by a secondary


source. An individual/agency, other than the data
collectors, documented this data:
• UN compiles data for its yearbook, which were
originally gathered by government statistical agencies
from various countries.
Surveys
• The Survey is a method of collecting data on the variable
of interest by asking people questions

• When the data came from asking all the people in the
population, then the study is called a Census.

• When the data came from asking a sample of people


selected from a well-defined population, the the study is
called a Sample Survey.
Major types of Surveys
• Self-administered questionnaire

• Phone interview

• On-line survey

• Personal interview
Examples
• Election surveys by SWS and Pulse Asia

• Census of population and housing

• The Family income and expenditure survey

• Surveys for individual thesis


Experiment
• The Experiment is a method of collecting data
where there is direct human intervention on the
conditions that may affect the values of the
variable of interest.

• The Before and after idea

• Example: The Plant experiment


Observation
• Observation is a method of collecting data on the
phenomenon of interest by recording the
observations made about the phenomenon as it
actually happens

• Example: observing ones reaction to a certain


stimulus
Exercise
• Lets do the exercise on page 42 (number 1 only)
III. SAMPLING AND SAMPLING
TECHNIQUES
What is Sampling?
• The act or process of selecting an appropriate
sample from a population
Why do we get a sample?
• At times, It is so expensive and more time
consuming to conduct a census if the number of
elements in the population is large.
Why do we sample?
• Sampling is more economical

• Studies based on a sample requires less time to


accomplish

• Sampling is sometimes the only feasible method


Target and Sampled
population
• Target Population: the population we want to
study

• Sampled Population: the population from where


we actually select the sample

• *ideally the target and sampled population


should be the same
Example
• You want to study all the college students in the
country
• Since it is very expensive to travel, you choose
your sample from all the colleges in Luzon
• Your target population and sampled
population is different since there are college
students who study in Vizayas and Mindanao
Elementary and Sampling unit
• Elementary unit / Element is a member of the
population whose measurement on the variable
of interest is what we wish to examine

• The sampling unit is a unit of the population that


we select in our sample

• The Sampling frame is a list or map showing all


the sampling units in the population
Error
• Sampling error
• The error attributed to the variation present
among the computed values of statistics for
gathering possible samples of size n

• Non-sampling error
• Error from other sources beyond sampling
fluctuations
Example of sampling error
• Population = { 1 , 2 , 3 }
• Average = 2

Possible samples of size 2:


Sample Average
(1,2) 1.5
(2,3) 2.5
Sampling Techniques
• A sampling procedure that gives every element of
the population a (known) nonzero chance of being
selected in the sample is called probability
sampling.

• Otherwise, the sampling procedure is called non-


probability sampling.

• *As much as possible, probability sampling should be


used since there is no way to assess the reliability of
the inferences under non-prob sampling
Methods of non-probability
sampling
• Convenience Sampling

• Judgment or Purposive Sampling

• Quota Sampling
Convenience Sampling
• Selects sampling units that come to hand or
are convenient or convenient to get
information from
• Example: Friends, Classmates, Relatives,
people you meet in a restaurant
Judgment Sampling
• In Judgment or purposive sampling, the
researcher chooses a sample that agrees with
his/her subjective judgment of a representative
sample.

• Example: For a study that aims to predict the


senatorial winners in an election, a researcher
may include in the sample the people that have
voted for the actual winners in the past elections
Quota Sampling
• In quota sampling, the researcher selects a
specified number (quota) of sampling units
possessing certain characteristics

• Example: A researcher wants to study a


population of students and believes that males
behave differently from females. He/She can
select any 50 male and 50 female students to
make up his sample size of 100
Methods of probability sampling
• Simple Random Sampling

• Stratified Random Sampling

• Cluster Sampling

• Systematic Sampling

• Multi-stage sampling
Simple Random Sampling
• We select n units (sample size) out of N units
(population size) in the population in such a way
that every distinct sample has an equal chance of
being drawn.

• SRS may be done with replacement (SRSWR), or


without replacement (SRSWOR)
Simple Random Sampling
• 1) List the elements and number them from 1 to N

• 2) Select n numbers from 1 to N, using a randomization


mechanism (Table of random Numbers, ACS table A.1 )
• Select your starting row (eg. Row 3)
• Select your starting column (eg. Column 12) Use this and
the next column to match the same number of digits as N
• If the number is between 1 to N. then include the element
corresponding to that number in the sample. Otherwise,
discard.
• Move downward in the same set of columns until you get n
random numbers
N= 552

n = 10

r = 3, c=12
Final sample
• The elements corresponding to the selected random numbers
• 150
• 505
• 456
• 068
• 149
• 034
• 546
• 471
• 209
• 458
Simple Random Sampling
• Advantages
• Its design is simple and easy to understand
• Estimation methods are simple and easy

• Disadvantages
• Needs a list of all elements in the population

• When to use
• If the elements are not so spread out out
geographically
Stratified Sampling
• The population of N units is first divided into non-overlapping
subpopulations called strata, and a random sample is drawn
independently from each stratum. The sample consists of all
the elements in the different strata.

• Strata should be:


• Homogeneous: units in the same strata should be the
similar with respect to the variable of interest
• Mutually exclusive: each unit in the population should not
fall into more than one stratum
• Mutually Exhaustive: Each and every unit in the population
must have a stratum to fall into
Stratified Sampling
• 1) Divide the population into non-overlapping strata

• 2) Obtain a simple random sample from each stratum

• 3) The sample shall consist of the selected samples in all the


strata
Example
• Population
• Fruits

• Variable of interest
• Sugar content of fruits

• Stratification variable
• Type of fruit
• Let the color orange denote an orange
• Let the color green denote an apple
• Let the color purple denote grapes
• Let the color red denote cherries
Example

Stratification Variable
After Stratification
Sample using stratified
sampling
Stratified Sampling
• Advantages
• Estimates are more reliable compared to SRS of the
same sample size

• Disadvantages
• It needs a list of all elements of the population,
including their values on the stratification table

• When to use
• If we want to perform separate analysis for certain
subpopulations
Systematic Sampling
• A sampling method wherein the selection of the first element
is at random and the selection of the other elements in the
sample is systematic by subsequently taking every kth element
from the random start.

• Here, k is called the sampling interval


Systematic Sampling
• 1) Assign a unique number from 1 to N to each element of the
population

• 2) Determine sampling interval k (k=N/n)

• 3) Select a number from 1 to N using a random mechanism.


Denote the selected number by r

• 4) The other elements of the sample are those assigned to the


numbers r + k, r + 2k, r + 3k, and so on until you get a sample
size of n
Example
• Suppose we want to conduct a survey on Senior math majors
about their feelings towards math.

• N=14 Math seniors in the list we have from the college


secretary

• n=7
Example

N=14,

n = 7,

k = 14/7 = 2

r=3
Systematic Sampling
• Advantages
• Identifying the units in the sample is easy
• The sample is distributed evenly over the entire population

• Disadvantages
• It requires information on the arrangement of the elements in the
sampling frame

• When to use
• If there is no available list of elements in the population
Cluster Sampling
• Method wherein we divide the population into non
overlapping groups or clusters consisting of one or more
elements, and then select a sample of clusters.

• Clusters may be of equal or unequal size.

• The sample will consist of all the elements in the selected


clusters
Cluster Sampling
• 1) Divide the population into non overlapping clusters

• 2) Select a sample of clusters using simple random sampling

• 3) The sample consists of all the elements in the selected


clusters
Example
• Suppose we want to conduct an opinion poll survey of
households in Quezon City

• Clustering Variable: barangay


Example

1 2 3 4 5 6 7 8 9 10

11 12 13 14 15 16 17 18 19 20

21 22 23 24 25 26 27 28 29 30

31 32 33 34 35 36 37 38 39 40

Clustering Variable
After Clustering

1 2 3 4 5
Sample from Cluster Sampling

1 4
Cluster Sampling
• Advantages
• The design needs only a list of clusters and not a list of elements
• Transportation and listing costs are lower

• Disadvantages
• Estimates are usually less reliable when compared to other
sampling designs

• When to use
• If there is no available list of elements
Multistage Sampling
• The population is divided into a hierarchy of sampling units
corresponding to different sampling stages.

• In the first stage of sampling, the population is divided into


primary stage units then a sample of PSU’s is drawn. In the
second stage, each PSU selected is divided into second-stage
units and a sample of SSU’s is drawn.

• The process may go on, by sampling sub-units at each stage


instead of enumerating all units completely.
Multistage Sampling
• Advantages
• Reduced costs

• Disadvantages
• Difficult estimation procedures

• When to use
• If the geographic coverage of the population of interest is wide

Vous aimerez peut-être aussi