Vous êtes sur la page 1sur 54

CHAPTER I

INTRODUCTION
TO
STATISTICS
Chapter I
INTRODUCTION TO STATISTICS
1. Origin and Development of Statistics
2. Definition
3. Uses
4. Fields
5. Constant and Variables
6. Data and Information
7. Population and Sample
8. Census and Sampling Techniques
History of Statistics - Timeline
Time Contributor Contribution
Ancient
Philosophers Ideas - no quantitative analyses
Greece
studied affairs of state, vital statistics of
Graunt, Petty
populations
17th
Century Pascal,
studied probability through games of chance,
Bernoulli
gambling
18th Laplace, normal curve, regression through study of
Century Gauss astronomy
astronomer who first applied statistical
Quetelet
analyses to human biology
19th
Century
studied genetic variation in humans(used
Galton
regression and correlation)
Time Contributor Contribution
studied natural selection using correlation,
Pearson formed first academic department of statistics,
Biometrika journal, helped develop the Chi
Square analysis

20th Century Gossett studied process of brewing, alerted the


(early) (Student) statistics community about problems with small
sample sizes, developed Student's test

evolutionary biologists - developed ANOVA,


Fisher stressed the importance of experimental
design
Time Contributor Contribution
biochemist studied pesticides, non-parametric
Wilcoxon
equivalent of two-samples test
Kruskal, economists who developed the non-parametric
Wallis equivalent of the ANOVA

Spearman psychologist who developed a non-parametric


equivalent of the correlation coefficient

Kendall statistician who developed another non-


parametric equivalent the correlation coefficient
20th Century Tukey statistician who developed multiple comparisons
(later) procedure
Dunnett biochemist who studied pesticides, developed
multiple comparisons procedure for control
groups
Keuls
agronomist who developed multiple comparisons
procedure
Computer provided many advantages over calculations by
Technology hand or by calculator, stimulated the growth of
investigation into new techniques
Statistics:
 the art or science of collecting
and analyzing numerical data in
large quantities, especially for
the purpose of inferring
proportions in a whole from
those in a representative sample.
 Statistics is the science of collecting,
organizing, presenting, analyzing, and
interpreting numerical data for the
purpose of assisting in making a more
effective decision.
 Art, on the other hand, refers to the
skill of handling facts so as to achieve a
given objective. It is concerned with
ways and means of presenting and
handling data making inferences
logically and drawing relevant
conclusions.
Uses:
Misuses:

 Bad Samples Refusals


 Small Samples Correlation & Causality
 Misleading Graphs Self Interest Study
 Pictographs Precise Numbers
 Distorted Percentages Partial Pictures
 Loaded Questions Deliberate Distortions
 Order of Questions
Fields of Statistics:
Descriptive Statistics: collection,
presentation, and description of sample data.
Concerned with:
 Percentage distribution of dependents
 Average or typical characteristics of the group
 Homogeniety and heterogeneity of characteristics
 Degree of relationships of group characteristics.

Tools commonly used:


 Measures of location, variability and tendencies
Descriptive Statistics
Inferential Statistics: making decisions and
drawing conclusions from the data collected.
Concerned with:
 Testing the significant difference and independence between
two or more variables.
 Assertion or hypothesis about the population is made and is
intended to be accepted or rejected depending on the result
of the test based from available samples.

Tools commonly used:


 Normal distribution  Estimation
 Sampling distribution  Hypothesis testing
 Probability
Inferential Statistics
Constant and Variable

Constants - the fundamental quantities


that do not change in value.

Variables – the quantities in which the


values can vary from one entity or
another.
.
"Age" is a variable. It can take on many
different values, such as 18, 49, 72, and so on.

"Gender" is a variable. It can take on two


different values, either male or female.

"Place" (in a race) is another variable. It can


take on values such as 1st place, 2nd place,
3rd place, and so on.
VARIABLES

• Dependent
QUALITATIVE QUANTATIVE
• Independent

• Dichotomous • Discrete
• Trichotomous • Continuous
• Multinomous
DATA

SOURCES METHODS SCALES OF PRESENTATION


•Primary • Interview MEASUREMENTS • Textual
• Secondary • Questionnaire • Nominal • Tabular
• Registration • Ordinal •Graphical or Chart
• Observation • Interval • Line Graph
• Experimentation • Ratio • Bar Graph
• Pie Graph
• Pictograph
• Map
• Scatter Point
Diagram
Kinds of Variables

Quantitative Variable
A variable measured numerically.

1. Discrete Variable - a quantitative variable


with a finite number of values.

For example, imagine you rolled a six-sided


die four times and measured how many
times you rolled an even number. What are
your possible outcomes? {0, 1, 2, 3, 4}
2. Continuous Variable - a quantitative
variable with an infinite number of values.

For example, temperature can take on an


infinite number of values, such as 80
degrees, or 80.01 degrees, or
80.0050592359 degrees.
Qualitative /Categorical variable
A variable based on some characteristic

1. Dichotomous - a qualitative variable


that may choose one of the two values.

“Male” or “Female”
2. Trichotomous - a qualitative variable
that may choose one of the three
values.

“For”, “Against” or “Undecided”

3. Multinomous - a qualitative variable


that may choose one of the many
values.

“Always”, “Often”, “Seldom” or “Never”


Independent and Dependent Variables

Independent Variable - any variable that


is being manipulated.

Dependent Variable - any variable that is


being measured.
Imagine that researchers want to test the
effectiveness of a new weight loss medication.

They split participants into three groups: one group


gets a 0mg dosage (control), one group gets a 50mg
dosage, and the last group gets a 100mg dosage.
After six months, the participants’ weights are
measured.

What are the independent and dependent variables in


this experiment?
Imagine that researchers want to test the
effectiveness of a new weight loss medication.

They split participants into three groups: one group


gets a 0mg dosage (control), one group gets a 50mg
dosage, and the last group gets a 100mg dosage.
After six months, the participants’ weights are
measured.

What are the independent and dependent variables in


this experiment?

The independent variable would be dosage, because


dosage is being manipulated.
The dependent variable would be weight, because
weight is being measured.
Data and Information

Data – refers to facts about things such


as status in life of people, defectiveness
of objects or effect of an event in the
society

Information – a set of data that have


been processed and presented in a form
suitable for human interpretation, for the
purpose of revealing trends or patterns
about the population.
Sources of Data

1. Primary source – firsthand


information obtained usually through
personal interview or actual
observation.

2. Secondary source – information


taken from works, reports, readings.
Measurement Scales

1. Nominal data (also known as


qualitative/categorical data) is data that
is split into categories.

For example: what kind of data would you


collect for the variable "Color"? You would
end up with information such as "red",
"green", "blue", and so on. This qualitative
information is called nominal data.
2. Ordinal data is data where order
matters, but distance between values
does not.

For example: imagine three people in a


race. One finishes in 1st place, one in 2nd
place, and the last in 3rd place. This data
can be placed in order, but we can’t
necessarily measure the distance between
values (maybe 1st place finished four
seconds ahead of 2nd place, and 2nd place
finished nineteen seconds ahead of 3rd
place).
3. Interval data is data where order
matters, and distances between values
are equal and meaningful, and a
natural zero is not present.

For example: temperature (in Fahrenheit or


Celsius) is interval data. The difference
between 10 degrees and 20 degrees is 10
degrees. The difference between 80
degrees and 90 degrees is 10 degrees. The
scale at any given point is constant, while a
measurement of 0 degrees does not reflect
a true "lack of temperature".
4. Ratio data is data where order matters,
distances between values are equal
and meaningful, and a natural,
meaningful zero is present.

For example: mass is ratio data. The


difference between 140 grams and 155
grams is 15 grams. The difference between
280 grams and 295 grams is 15 grams. The
scale at any given point is constant, and a
measurement of 0 reflects a complete lack of
mass.
Methods of Collecting Data
• Direct or Interview Method – a person-to-person
interaction between interviewer and interviewee
either tape recorded or written to obtain exact
information.

 Advantage: Precise and consistent answers can


be obtained by modifying or rephrasing the
questions especially to illiterate respondents or to
children under study.

 Disadvantage: Time, money and effort consuming


and applicable only to small population (except
when conducting a census).
• Indirect or Questionnaire Method – written
responses are obtained by distributing
questionnaires.

 Advantage: Lesser time, money and effort are


consumed.

 Disadvantage: Many responses may not be


consistent due to poor construction of the
questionnaire. The meaning of the questions vary
from respondents. Inconsistent responses can no
longer be modified, thus reducing valid numbers of
respondents.
• Registration Method – enforced by both private and
public organization for recording purposes

 Advantage: Organized data from an institution


can serve as a ready reference for future study or
for personal claims of people’s records.

 Disadvantage: Problem arises only when an


agency doesn’t have a Management Information
System and if the system or process of
registration is not implemented well.
• Observation Method – scientific method of
investigation that makes possible use of all senses to
measure or obtain outcomes/ responses from the
object of study.

 Advantage: Usually applied to respondents that


cannot be asked or need not speak, especially
when behaviors of persons/ culture of organization/
performance outcomes of employees/ students are
to be considered.

 Disadvantage: Subjectivity of information sought


cannot be avoided.
• Experimentation – used when the objective is to
determine the cause and effect of a certain
phenomenon under some controlled conditions.

 Advantage: There is objectivity of information


since a scientific method of inquiry is used. An
equal number of respondents with relatively
similar characteristics are being examined to
obtain the different effects of something applied to
the experimental group.

 Disadvantage: It’s too difficult to find respondents


with almost similar characteristics. The whole
method must be repeated if the desired outcome
is not reached.
Population and Sample

Population: A collection, or set, of


individuals or objects or events whose
properties are to be analyzed.

Two kinds of populations: finite or infinite.

Sample: A subset of the population. A


selected group of information taken from
the population.
Let’s say you want to find the average GPA of a student at
your university. Your university has 20,000 students, and you
randomly select 100 students and ask them their GPAs.

Which is your population and which is your sample?


Let’s say you want to find the average GPA of a student at
your university. Your university has 20,000 students, and you
randomly select 100 students and ask them their GPAs.

Your population is the group you’re interested in studying (the


20,000 students), and your sample is a small group or a
subset (100 students)you’ve taken from the population.
Slovin’s Formula
used to calculate the sample size (n) given
the population size (N) and a margin of error
(e).
it's a random sampling technique formula to
estimate sampling size
N
It is computed as n = (1+Ne2)
whereas:
n = no. of samples
N = total population
e = error margin / margin of error
Let’s say you want to find the average
GPA of a student at your university. Your
university has 20,000 students, and you
select 100 students and ask them their
GPAs.

What are N and n in this example?


Let’s say you want to find the average
GPA of a student at your university. Your
university has 20,000 students, and you
select 100 students and ask them their
GPAs.

What are N and n in this example?

N, the size of your population, is 20,000


n, the size of your sample, is 100
Census and Sampling Techniques

Census (Complete Enumeration)


Collection of data from a whole population
rather than just a sample.
Example:
Doing a survey of travel time by MAEd students

 Asking everyone at school is a census (of the


school).

 But asking only 50 randomly chosen people is


a sample.
Simple Random Sampling

Every member of the population(N) has


an equal chance of being selected for
your sample(n).

 The best sampling method, as your sample


is almost guaranteed to be representative
of your population. However, it is rarely
ever used due to being too impractical.
Stratified Sampling

With this method, the population(N) is split


into non-overlapping groups ("strata"), then
simple random sampling is done on each
group to form a sample(n).

Example: Splitting a population of students into


men and women, then sampling from each of
the two groups. This may allow us to collect the
same amount of information as simple random
sampling, but use less people.
Systematic Sampling
.
In this method, every nth individual from
the population(N) is placed in the
sample(n).

 For example, if you add every 7th


individual to walk out of a supermarket to
your sample, you are performing
systematic sampling.
Convenience Sampling

Easily obtained individuals from the


population(N) are placed in the sample(n).

 Pick the easiest way of getting your sample.


This type of sampling is sometimes called
voluntary response sampling, because
individuals often select to be a part of the
sample. This can be a problem, because
there may be a difference between people
who choose to participate and people who
don’t.

Vous aimerez peut-être aussi