Vous êtes sur la page 1sur 16

Notes: Overview of Statistics

Overview

Statistics = The science of collecting, organizing, summarizing, and analyzing


information to draw conclusions or answer questions. In addition, statistics is
about providing a measure of confidence in any conclusions.

I have always been fascinated by statistics. It is a scientific way to predict the


future or just to predict the unknown. Even descriptive statistics that studies what
happened is to then take those results and predict the future. For example, if a
college sees their count of students increase over the last four years then perhaps
the decision is made to build a dormitory. Descriptive statistics is almost never
used just for its own purpose. At that point, it becomes trivia. For example, the
census would be useless trivia if they did not use the information to make
decision on education or the economy.

Population = An entire group of individuals to be studied.

Individual = A person or object that is a member of the population being studied.


Page 2

Sample = A subset of the population that is being studied.

The population is whatever body of elements that is being studied. The


population could be a state, county, city, or even a classroom. The population
does not have to be large. The sample is a subset of this population that hopefully
represents the population in some manner that is being studied.

Statistic = A numerical summary of a sample.

Parameter = A numerical summary of a population.

Most items that are studied in a statistics course are statistics (lower case s). If we
truly have access to the population information, such as a database, then often
the answer is known. The sample allows us to make educated decision without
having all the facts.
Page 3

The Process of Statistics

1. Identify the research objective


2. Collect the data needed to answer the question(s) posed in (1)
3. Describe the data
4. Perform inference

Sometimes it is hard to correctly identify the research objective. As a very


simplistic example, I decided to study the fears of my students one semester. I
asked them to write down on a piece of paper what they were afraid of. I
expected to get two to three items which I could then build a bar graph and do
some analysis. I never expected that every student would have a different fear. I
expected snakes, death, and spiders. A broad research objective can give poor
results. I took a graduate course in research that demonstrated how to narrow
down the research objective without being too specific. It is not as easy as it
might seem.

Descriptive statistics - Consist of organizing and summarizing data. Descriptive


statistics describe data through numerical summaries, tables, and graphs.

Inferential Statistics - Uses methods that take a result from a sample, extend it to
the population, and measure the reliability of the result.
Page 4

Descriptive Statistics

Descriptive statistics describes what happened while inferential statistics allows a


prediction for the future or the use of a sample to identify a population trend of
some sort. Descriptive statistics is often used in conjunction with inferential
statistics.

Variables = The characteristics of the individuals within the population (weight,


age, etc.)

Qualitative or Categorical Variables - Allow for classification of individuals based


on some attribute or characteristic.

Quantitative Variables - Provide numerical measures of individuals. Math


operations such as addition and subtraction can be performed and provided
meaningful results.

Discrete Variable - A quantitative variable that has either a finite number of


possibilities or a countable number of variables No Decimals.

Continuous Variable - A quantitative variable that has an infinite number of


possible values that are not countable Has Decimals.
Page 5

Sometimes the difference between discrete and continuous blurs. I often use the
example in class asking an individual how old that they are. I may get an answer
such as 20 years old. I tell them that they are lying and often get funny looks. I ask
the class how I know they are lying. Sometimes a student can see that age is
continuous, but we treat it as discrete. A very humorous example is when the
statistic is brought up that the average number of children per family in the United
States is 2.01. I have heard the question, "How can someone have 0.01 children?"
They are taking it too literally that the number of children must be discrete in all
measures.

Individuals - A person or object that is a member of the population being studied.

Variables - The characteristics of the individuals within the population.

Data - Describes characteristics of an individual.


Page 6

Levels of Measurements of a Variable

Nominal level of measurement: name, label or categorize (hair color).


Ordinal level of measurement: properties of nominal but you can arrange
by rank or order (car size).
Interval level of measurement: properties of ordinal and the differences in
the values of the variable have meaning (year of birth). Addition and
Subtraction are OK.
Ratio level of measurement: same properties of interval and the ratios of
the values of the variable have meaning. Zero in this measurement means
absence of the quantity. Multiplication and division can be performed at
this level (value of a car).

Observational Studies Versus Designed Experiments

Explanatory Variable - The variable that explains what is happening.

Response Variable - The variable that is controlled by the explanatory variable.


Page 7

Consider ten babies that are raised in a lab. Everything in their entire life is
controlled. If these individuals do poorly in their studies, then the factors that
explain their results can be easily listed. How about ten individuals NOT raised
in a lab? There might be 1000 factors that determine their success or failure. In
mathematics, I have seen administration try to look for that magic explanatory
variable that explains why students do not get through developmental
mathematics. Over the last twelve years the student might have had at least ten
different teachers. Could the combination of these instructions cause the
embodiment of their knowledge? Could the ten generations of their family
determine their aptitude in a subject? Could their diet combination be a factor?

Typically, in a business the focus is put on the variables that probably explain the
results that are seen. For example, oil prices increasing might be based mainly on
the variables: market control of prices and competition, and international politics.

In Statistics, we can analyze the connection between any two variables, but the
individuals knowledge of their subject area must be an important component.

Observational Study - Measures the value of the response variable without


attempting to influence the value of either the response or explanatory variables.
That is, in an observational study, the researcher observes the behavior of the
individuals without trying to influence the outcome of the study.
Page 8

Designed Experiment - Applies a treatment to individuals (referred to as


experimental units or subjects) and attempts to isolate the effects of the
treatment on a response variable.

Observational studies attempt to measure the value without influencing. This can
be difficult to accomplish in many areas. An individual being studied can change
their personality due to this observation. Administration will come in to observe
an instructors classroom technique. Typically, the instructor will change their
technique based on the observation.

Confounding - Occurs in a study when the effects of two or more explanatory


variables are not separated. Therefore, any relation that may exist between an
explanatory variable and the response variable may be due to some other
variable or variables not accounted for in the study.

Lurking Variable - An explanatory variable that was not considered in a study, but
that affects the value of the response variable in the study.

Note: Observational studies do not allow a researcher to claim


causation, only association

Since observational studies cannot control every aspect of the individual, then no
direct claim can be made about what causes the result. That does not mean there
is not a direct connection between the two, but it could perhaps be the other two
thousand variables that are affecting the outcome.
Page 9

Various Types of Observational Studies

Cross-sectional Studies - Collect information about individuals at a specific


point in time or over a very short period of time.
Case-control Studies - These studies are retrospective, meaning that they
require individuals to look back in time or require the researcher to look at
existing records.
Cohort Studies - Identifies individuals and studies them over an extended
period.
Census - A list of all individuals in a population along with certain
characteristics of each individual.

A census is expensive and difficult to accomplish. In many cases it is


impossible. A census implies that EVERY member of the population is
sampled. If the population is small then a true census might be possible but
typically not. Usually enough of the population is sampled such that it can
be considered a census. Once a census is accomplished then there is no
question on the topic being studied since it is so thorough.
Page 10

Simple Random Sampling

Random Sampling - The process of using chance to select individuals from a


population to be included in the sample

Note: If convenience is used to obtain a sample, the results of the


survey are meaningless

Four Basic Sampling Techniques

1. Simple Random Sampling


2. Stratified Sampling
3. Systematic Sampling
4. Cluster Sampling

Simple random sample - A sample of size n from a population of size N is


obtained through simple random sampling if every possible sample of size n has
an equally likely chance of occurring.

Sample Without Replacement - An individual who is selected is removed from


the population and cannot be chosen again.
Page 11

Sample with Replacement - A selected individual is placed back into the


population and could be chosen a second time.

For purposes of this class, all samples are assumed to be simple random sampling
based on the population. If the sample is not done correctly, then the results are
invalid and may not represent what is being studied. Even if the problem does not
specify, then this requirement is assumed to be true.

Frame - A list of all the individuals within the population.

Stratified Sample - Obtained by separating the population into nonoverlapping


groups called strata and then obtaining a simple random sample from each
stratum. The individuals within each stratum should be homogeneous (or similar)
in some way. Needs a frame.

Example: Democrats, Republicans, Independent

Systematic Sample - Is obtained by selecting every kth individual from the


population. The first individual selected corresponds to a random number
between 1 and k. Doesnt require a frame.
Page 12

Steps in Systematic Sampling

1. If possible, approximate the population size, N.


2. Determine the sample size desired, n.
3. Compute N/n and round down to the nearest integer. This value is k.
4. Randomly select a number between 1 and k. Call this number p.
5. The sample will consist of the following individuals: p, p+k, p+2k, , p+(n-
1)k

Example: Consider if the population size is 1000 and the desired sample size is 30

Step 1: N = 1000 (given in the problem)


Step 2: n = 30 (given in the problem)
Step 3: N/n = 1000/30 = 33.33 which rounds down to 33 so k =
33
Step 4: Lets say the random number picked was 5 so p = 5
Step 5: The sample will consist of: p, p +
k, p + 2k,,p + (n-1)k
5,5 + 33, 5 + 2(33),,5 + (30-1)33
5,38,71,,962
Page 13

Cluster Sample - Is obtained by selecting all individuals within a randomly


selected collection or group of individuals. Dont need a complete frame, just a
frame of the groupings.

homogeneous - Similar

heterogeneous - Dissimilar

Convenience Sample - A sample in which the individuals are easily obtained.

Most Popular Type: self-selected: individuals themselves decide to participate


(voluntary response samples).

Bias - Where the results of the sample are not representative of the population.
Three sources of Bias in Sampling:

Sampling bias
Nonresponse bias
Response bias

Bias is sometimes easy to see after the fact but prior and during the event can be
difficult to spot. Sometimes it takes an outside party to ask those questions that
point out what is overlooked.
Page 14

Sampling Bias - Means that the technique used to obtain the samples individuals
tends to favor one part of the population over another. Can also occur from
undercoverage, which occurs when the proportion of one segment of the
population is lower in a sample than it is in the population.

Nonresponse Bias - Exists when individuals selected to be in the sample who do


not respond to the survey have different opinions from those who do.

Ways to Avoid: callbacks, rewards

Response Bias - Exists when the answers on a survey do not reflect the true
feelings of the respondent.

Ways this happens: Interviewer error, misrepresented answers, wording of the


questions, ordering of questions or words, type of question (open or closed), and
data-entry error.

Nonsampling errors result from undercoverage, nonresponse bias,


response bias, or data-entry error.
Sampling errors result from using a sample to estimate information about a
population.

This course is about analyzing the results of a sample not to study the details of
how a study should be performed. If you are interested in that area, then a
research class will give detailed information. But some basic definitions should be
covered to give the student an overview.
Page 15

Basic Definitions

Experiment - A controlled study conducted to determine the effect that varying


one or more explanatory variables (factors) has on a response variable.

Treatment - Any combination of the values of each factor.

Experimental Unit - A person, object, or some other well-defined item to which a


treatment is applied (aka subject).

Control Group - Serves as a baseline treatment that can be used to compare to


other treatments.

Placebo - An innocuous medication, such as sugar tablet, that looks, tastes, and
smells like the experimental medication.

Blinding - Refers to nondisclosure of the treatment an experimental unit is


receiving.

Single-Blind Experiment - One in which the experimental unit (or subject) does
not know which treatment he or she is receiving.

Double-blind - Means that neither the experimental unit nor the experimenter
knows what treatment is being administered to the experimental unit.

Completely Randomized Design - One in which each experimental unit is


randomly assigned to a treatment.

Matched-Pairs Design - An experimental design in which the experimental units


are paired up. The pairs are matched up so that they are somehow related (same
Page 16

person before and after a treatment, husband and wife, same geographical
location)

Blocking - Grouping similar (homogeneous) experimental units together and then


randomizing the experimental units within each group to a treatment. Each group
of homogeneous individuals is called a block.

Randomized Block Design - Used when the experimental units are divided into
homogeneous groups called blocks. Within each block, the experimental units are
randomly assigned to treatments.

It is interesting that many individuals who have taken multiple statistics course
still do not follow the basic ideas. One experiment that was done recently at a
local college did not have a control group. Without the control group, the results
that were obtained cannot be compared to see their effectiveness. It was to see
the results of various anti-bacterial soap and the reduction of the bacteria. It was
an interesting experiment, but without the control group then the results could
not be interpreted. Interestingly enough, one of the products increase the
number of bacteria which could mean a very poor product or the experiment was
done improperly.

Vous aimerez peut-être aussi