Vous êtes sur la page 1sur 7

NATURE of STATISTICS

At the end of the lesson, you should be able to


1. Demonstrate knowledge of statistical terms.
2. Differentiate between the two branches of statistics.
3. Identify types of data.
4. Identify the measurement level for each variable.
5. Identify the four basic sampling techniques.
6. Explain the difference between an observational and an experimental study.

The origin of statistics can be traced to two areas of interest that have very little in common: government (political
science) and games of chance.
Governments have long used censuses to count persons and property. The problem of describing, summarizing, and
analyzing census data has led to the development of methods, which, until recently, constituted about all there was to the subject
of statistics.

Statistics is the science of conducting studies to collect, organize, summarize, analyze, and draw conclusions from data.
Collection refers to the gathering of information or data.
Organization or presentation involves summarizing data or information in textual, graphical or tabular form.
Analysis involves describing the data by using statistical methods and procedures.
Interpretation refers to the process of making conclusions based on the analyzed data.

Students study statistics for several reasons:


1. Like professional people, you must be able to read and understand the various statistical studies performed in your
fields. To have this understanding, you must be knowledgeable about the vocabulary, symbols, concepts, and statistical
procedures used in these studies.
2. You may be called on to conduct research in your field, since statistical procedures are basic to research. To accomplish
this, you must be able to design experiments; collect, organize, analyze, and summarize data; and possibly make reliable
predictions or forecasts for future use. You must also be able to communicate the results of the study in your own
words.
3. You can also use the knowledge gained from studying statistics to become better consumers and citizens. For example,
you can make intelligent decisions about what products to purchase based on consumer studies, about government
spending based on utilization studies, and so on.

Divisions of Statistics
The study of statistics is divided into two categories: descriptive statistics and inferential statistics.
Descriptive statistics consists of the collection, organization, summarization, and presentation of data.
In descriptive statistics the statistician tries to describe a situation. Consider the national census conducted by the
Philippine government every 5 years. Results of this census give you the average age, income, and other characteristics of the
population. To obtain this information, the PSA must have some means to collect relevant data. Once data are collected, then
they must organize and summarize them. Finally, these data are presented in some meaningful form, such as charts, graphs, or
tables.

Inferential statistics consists of generalizing from samples to populations, performing estimations and hypothesis tests,
determining relationships among variables, and making predictions.
Here, the statistician tries to make inferences from samples to populations. Inferential statistics uses probability, i.e.,
the chance of an event occurring. You may be familiar with the concepts of probability through various forms of gambling. If
you play cards, dice, bingo, and lotteries, you win or lose according to the laws of probability. Probability theory is also used in
the insurance industry and other areas.
Example. Determine whether the following statements use the area of descriptive statistics or statistical inference.
1. A bowler wants to find his bowling average for the past 12 games.
2. A manager would like to predict based on previous years’ sales, the sales performance of a company for the next five
years.
3. A politician would like to estimate, based on an opinion poll, his chance for winning in the upcoming senatorial
election.
4. A teacher wishes to determine the percentage of students who passed the examination.
5. A student wishes to determine his average monthly expenditure on school supplies for the past five months.
6. A school administrator forecasts future expansion of a school.
7. An engineer calculates the average height of the buildings along Taft Avenue
8. A psychologist investigates if there is a significant relationship between mental age and chronological age.
9. A sports journalist determines the most popular basketball player for this year.
10. A researcher studies the effectiveness of a new fertilizer to increasing food production.

Definitions of Some Basic Statistical Terms


The following are terms commonly used in statistics:
A population consists of all subjects (human or otherwise) that are being studied.

A sample is a group of subjects selected from a population.

Parameter is any numerical or nominal characteristic of a population. It is a value or measurement obtained from a population.

Statistic is an estimate of a parameter. It is any value or measurement obtained from a sample.

A constant is a characteristic that makes each member of a group similar.

A variable is a characteristic or attribute that can assume different values. Variables can be classified as qualitative or
quantitative.
Qualitative variables are variables that can be placed into distinct categories, according to some characteristic or
attribute. For example, if subjects are classified according to gender (male or female), then the variable gender is qualitative.
Other examples of qualitative variables are religious preference and geographic locations.
Quantitative variables are numerical and can be ordered or ranked. For example, the variable age is numerical, and
people can be ranked in order according to the value of their ages. Other examples of quantitative variables are heights, weights,
and body temperatures.

Quantitative variables can be further classified into two groups: discrete and continuous.

Discrete variables assume values that can be counted. Examples of discrete variables are the number of children in a
family, the number of students in a classroom, and the number of calls received by a switchboard operator each day for a month.

Continuous variables can assume an infinite number of values between any two specific values. They are obtained by
measuring. They often include fractions and decimals.

Data are the values (measurements or observations) that the variables can assume. A collection of data values forms a data set.
Each value in the data set is called a data value or a datum.

Examples. State whether each variable is qualitative or quantitative.


1. number of years of service in a company
2. outcome in tossing a coin
3. monthly salary of an employee
4. employee’s identification number
5. hourly output of a machine
6. address
7. type of computer program
8. speed of train
9. position in an organization
10. amount of money a college student spends on textbook
Example. Identify each of the following variables as discrete or continuous.
1. weight of a body
2. length of a rod
3. number of chairs in a room
4. dimensions of a table
5. number of possible outcomes in throwing a die
6. numbers of passengers in an airplane
7. amount of sales in a business firm
8. speed of light
9. area of a land
10. lifetime of television tubes and batteries

Levels of Measurement
Another common way to classify data is to use four levels of measurement. The level of measurement of data
determines the algebraic operations that can be performed and the statistical tools that can be applied to the data set.

Level 1. Nominal
This is the most primitive level of measurement. The nominal level of measurement classifies data into mutually
exclusive (non-overlapping), exhausting categories in which no order or ranking can be imposed on the data. Gender, nationality
and civil status are of nominal scale.

Level2. Ordinal
In the ordinal level of measurement, data are arranged in some specified order or rank. When objects are measured in
this level, we can say that one is greater than the other, but we cannot tell how much more of the characteristic one has than the
other. The ranking of contestants in a beauty contest, of siblings in a family, of honor students in a class are of ordinal scale.

Level 3. Interval
If data are measured in the interval level, we can say not only that one object is greater or less than another, but we can
also specify the amount of difference. The scores in an examination, and temperature (in °C) are of the interval level of
measurement.

Level 4. Ratio
The ratio level of measurement possesses all the characteristics of interval measurement, and there exists a true zero. In
addition, true ratios exist when the same variable is measured on two different members of the population.
There is not complete agreement among statisticians about the classification. The ratio level of measurement is like the interval
level. The only difference is that the ratio level always starts from the absolute or true zero point. In addition, in the ratio level,
there is always the presence of units of measures. Examples are height, width, and area.

Exercise. Identify the level of measurement for each of the following.


1. religion
2. IQ scores
3. speed of a car
4. academic rank in high school
5. number of books in the library
6. address
7. size of a t-shirt (S, M, L, XL)
8. land area
9. degree program
10. number of hours spent in studying
Data Collection and Sampling Techniques
Data are needed whenever we undertake studies or researches. They have been used to solve particular problems or to
provide a basis from which certain decisions are generated.

Types of Data
1. Primary data are information collected from an original source, which is first-hand in nature. Examples are data
collected from interviews and surveys.

2. Secondary data are information collected from published or unpublished sources like books, newspapers, and theses.

Methods of Data Collection


1. The Direct or Interview Method
In this method, the researcher has a direct contact with the interviewee. The researcher obtains the information
needed by asking questions and inquiries from the interviewee. This method gives precise and consistent information
because clarifications can be made. The interviewee can repeat the question not fully understood by the respondent until
it suits the interviewee’s level. However, this method is time consuming, expensive, and has limited field coverage.

2. The Indirect or Questionnaire Method


This method makes used of a written questionnaire. The researcher distributes the questionnaire to the
respondents either by personal delivery or by mail. Using this method, the researcher can save a lot of time and money
in gathering the information needed because questionnaires can be given to a large number of respondents at the same
time. However, the researcher cannot expect that all distributed questionnaires will be retrieved because some
respondents simply ignore the questionnaires. In addition, clarification cannot be made if the respondent does not
understand the question.

3. The Registration Method


This method of collecting data is governed by laws. For example, birth and death rates are registered in the
PSA for records and future use. The number of registered vehicles can be found at LTO. The list of registered voters in
the Philippines can be found at COMELEC.

4. The Experimental Method


This method is usually used to find out cause and effect relationships. Scientific researchers often use this
method. For example, agriculturists would like to know the effect of a new brand of fertilizer on the growth of plants.
The new kind of fertilizer will be applied to ten sets of plants, while another ten sets of plants will be given another
fertilizer. The growth of the plants will then be compared to determine which fertilizer is better.

Sampling
One of the most important parts of the research work that needs preparation and planning is the sampling method. Any
sampling procedure that produces an inference that underestimates is biased or erroneous. Sampling is a process of selecting
units, like people, organizations or objects, from a population of interest in order to study and fairly generalize the results back to
the population where the sample was taken.

Determining the sample size


In research, we seldom use the entire population because of the cost and time involved. Instead, the sample, which is a
small representative of a population, is used. The characteristics of the whole or entire population is described using the
characteristics observed from the sample.
To determine the sample size from a given population size, the Slovin’s formula is used.
N
n
1  Ne 2

where n = sample size


N = population size
e  margin of error
Example
A group of researchers will conduct a survey to find out the opinion of residents of a particular community regarding
the oil price hike. If there are 10 000 residents in the community and the researchers plan to use a sample using a 10% margin of
error, what should the sample size be? If the researchers would like to use a 5% margin of error, what should the sample size be?

Sampling techniques
As soon as we have chosen the method of collecting data and the sample size to be used in the study, the next step is to
choose the sampling technique to be employed.
Sampling technique is a procedure used to determine the individuals or members of a sample.
Probability sampling is a sampling technique wherein each member or element of the population has an equal chance
of being selected as a member of the sample.
There are several probability sampling techniques, namely: random sampling, stratified random sampling, systematic
sampling, cluster sampling, and multi-stage sampling.

1. Simple Random Sampling


This is the simplest form of random sampling. It is the basic sampling technique where a group of subjects (sample) is
selected for study from a larger group (population). Each individual is chosen entirely by chance and each member of the
population has an equal chance of being included in the sample. The most common technique s for selecting simple random
sample are by using strips of paper (lottery method), use of printed table of random numbers, or use of random numbers
generated by computer programs or scientific calculators.

2. Systematic Random Sampling


If we are to select the members of the sample from a large population, the simple random technique is a long and
difficult process. An easier method is to use the systematic sampling technique. To draw the members of the sample using
this method, we have to select a random starting point, and then draw successive elements from the population. In other
words we pick every nth element of the population as a member of the sample.

3. Stratified Random Sampling


When we use this method we are actually dividing the population into groups (called strata) according to some
characteristics that is important to the study, and then the members of the sample are drawn or selected proportionally from
each group.

4. Cluster Sampling
Sometimes the population is too large that the use of simple random sampling will prove tedious and difficult. Under
this condition we can use cluster sampling. Here the population is divided into groups called clusters by some means such as
geographic area or schools in a large school district, etc. Then the researcher randomly selects some of these clusters and
uses all members of the selected clusters as the subjects of the samples.

Example. Discuss the following problems.


1. An English teacher has 15 complimentary tickets to a stage play. She is planning to distribute those tickets to her class
of size 50. What sampling method will the teacher apply so as to distribute those tickets without being accused of
favoritism?

2. It is known that high-income groups have different consumption patterns from the average and low-income groups. If a
researcher will make a survey on the consumption method, what would you recommend so that each type of income
earner would be properly represented?

3. Every fifth housing unit from the gate of a certain village has a floor area of 100 square meters. If an enumerator was
asked to visit households living in the same size of housing unit in the area, what sampling design do you think is
applicable?

4. There are 20 boxes, each box with 5 dolls, to be inspected by a quality control employee of a company. If he wants to
select 5% of the total dolls manufactured and with the assurance that each box would be inspected, what sampling
design would you advise?

\
Design of Experiments
Observational and Experimental Studies
In an observational study, the researcher merely observes what is happening or what has happened in the past and tries
to draw conclusions based on these observations.
For example, data were collected on the ages and incomes of motorcycle owners for the years 1980 and 1998 and then
compared. The findings showed considerable differences in the ages and incomes of motorcycle owners for the two years. In this
study, the researcher merely observed what had happened to the motorcycle owners over a period of time. There was no type of
research intervention.

In an experimental study, the researcher manipulates one of the variables and tries to determine how the manipulation
influences other variables.
For example, a study conducted at Virginia Polytechnic Institute and presented in Psychology Today divided female
undergraduate students into two groups and had the students perform as many sit-ups as possible in 90 sec. The first group was
told only to “Do your best,” while the second group was told to try to increase the actual number of sit-ups done each day by
10%. After four days, the subjects in the group who were given the vague instructions to “Do your best” averaged 43 sit-ups,
while the group that was given the more specific instructions to increase the number of sit-ups by 10% averaged 56 sit-ups by the
last day’s session. The conclusion then was that athletes who were given specific goals performed better than those who were not
given specific goals.

Sometimes when random assignment is not possible, researchers use intact groups. These types of studies are done quite
often in education where already intact groups are available in the form of existing classrooms. When these groups are used, the
study is said to be a quasi-experimental study. The treatments, though, should be assigned at random.

The independent variable in an experimental study is the one that is being manipulated by the researcher. The
independent variable is also called the explanatory variable. The resultant variable is called the dependent variable or the
outcome variable. The outcome variable is the variable that is studied to see if it has changed significantly due to the
manipulation of the independent variable.

A confounding variable is one that influences the dependent or outcome variable but was not separated from the
independent variable.

Methods of Data Presentation


Data can be classified as grouped or ungrouped data.

Ungrouped data are data that are not organized, or if arranged, could only be from highest to lowest or lowest to
highest.

Grouped data are data that are organized and arranged into different classes or categories.

Data must be presented in an organized and systematic way so that significant characteristics can be easily seen. Data
can be presented in three forms: textual, tabular, and graphical.

A. Textual method
Ungrouped data can be presented in textual form, as in paragraph form. This involves enumerating the important
characteristics, giving emphasis on significant figures and identifying important features of the data.
Example: During the first trimester of SY 2006 – 2007, 1647 students enrolled in computer science programs of FEU-
EAC. There were 28 students enrolled in ACT program, 58 in BSIM program, 361 in BSCS program, 526 in
BSCpE program and 674 in BSIT program.

B. Tabular method
Sometimes we could hardly grasp information from a textual presentation of data. Thus, we may present data using
tables. By organizing the data in tables, important features about the data can be readily understood and comparisons are easily
made.
A table has the following parts:
Table Heading : consists of the table number and the title
Column Header : It describes the data in each column.
Row Classifier : It shows the classes or categories.
Body : This is the main part of the table.
Source Note : This is placed below the table when the data written are not original.

Example: Table 1. Distribution of FEU-EAC Students


Degree Programs
Year Level ACT BSIM BSCS BSCpE BSIT Total
First Year 19 16 168 225 343 771
Second Year 6 24 102 151 196 479
Third Year 3 12 63 101 89 268
Fourth Year 0 6 21 36 41 104
Terminal 0 0 7 13 5 25
Total 28 58 361 526 674 1647
Source: FEU-EAC SRO

C. Graphical Method
Some readers find graphical presentation of data easier to comprehend than when data are presented in tabular form. A
graph adds life and beauty to one’s work, but more than this, it helps facilitate comparison and interpretation without going
through the numerical data.

Vous aimerez peut-être aussi