Vous êtes sur la page 1sur 62

INTRODUCTION TO

STATISTICS

Lecturer: LE HONG VAN


Foreign Trade University – HCM Campus
Email: lehongvan.cs2@ftu.edu.vn
SYLLABUS
 Chapter 1: Introduction to Statistics
 Chapter 2: Data collection and Summarizing
 Chapter 3: Descriptive Statistics
 Chapter 4-5: Inferential Statistics
 Chapter 6: Correlation and Regression
 Chapter 7: Time-series analysis and Forecasting
 Chapter 8: Indexes
 Textbook
Business Statistics– 8th edition (David
F.Groebner)
 References:
- Statistics for Business and Economics – 11rd
edition, 2003 (Anderson Sweeney Williams)
- Handouts and Turorials
STUDYING METHOD

 Statistically thinking
 Doing exercises
 Group presentation
 Self-study
GRADING BREAKDOWN
MARK (%) FORM OF ASSESSMENT

ATTENDANCE 10% ATTENDANCE CHECK

MID – TERM TEST 30% WRITINGTEST + GROUP


PROJECTS

FINAL EXAM 60% WRITINGTEST


(MULTIPLE CHOICES+
PROBLEMS)
PLUS MARK
CLASSROOM ETIQUETTE
Do not lay your head on the desk, fall asleep.

Do not do your nails, apply makeup, or do


work for other classes.

Keep your mobile phones silent.

Bring good questions. We are not mind


readers, please ask questions if you do not
understand.
CLASSROOM ETIQUETTE
Food and beverages should not be
consumed in the classroom.

Do not use laptops to do private things

If you arrive late to class or you are


returning after an absence, please sit
down quietly without making a production.

TREAT OTHERS THE WAY YOU WANT TO BE TREATED!


PRESENTATION (choose one of 4
projects)
 Projects # 1: Survey
 Projects # 2: Inference
 Projects # 3: Correlation and Regression
 Projects # 4: Time-series analysis and Forecasting
In detailed….
I. What is statistics?
- In a very general way:
Statistics numerical information
- Furthermore:

Statistics Statistical methods

- Collect
- describe
- summarize
- present
- analyze
More details, Statistics covers some
major jobs:

 Making sense of numerical information


 Dealing with uncertainty
 Sampling
 Analyzing relationships
 Forecasting
 Decision making in an uncertain environment
WHO USES STATISTICS?

Business Physical
Economics, Engineering, Sciences
Marketing, Astronomy,
Computer Science Chemistry, Physics

Areas where
STATISTICS
are used
Health &
Medicine Environment
Agriculture,
Genetics, Clinical Trials, Ecology, Forestry,
Epidemiology, Animal Populations
Pharmacology

Government
Census, Law,
National Defense
Source: American Statistical
Association
Applications in
Business and Economics
Accounting
Public accounting firms use statistical
sampling procedures when conducting audits
for their clients.

Economics
Economists use statistical
information in making forecasts
about the future of the economy
or some aspects of it.
Applications in
Business and Economics

Marketing
Electronic point-of-sale scanners at retail
checkout counters are used to collect data
for a variety of marketing research
applications.
Production
A variety of statistical quality
control charts are used to monitor
the output of a production
process.
Applications in
Business and Economics

 Finance
Financial advisors use price-earnings ratios and dividend
yields to guide their investment recommendations.
II/ Definitions
1/ Population is the WHOLE set of all items or
individuals of interest
2/ Sample is an observed subset of population values
3/ Variable is a characteristic that changes or varies over
time for different individuals or objects under
consideration
Population vs. Sample

Population Sample

a b cd b c
ef gh i jk l m n gi n
o p q rs t u v w o r u
x y z y
III/ Descriptive statistics and Inferential
statistics

Statistics

Descriptive Inferential
Statistics Statistics
1/ Descriptive statistics
 Descriptive statistics: Methods used to summarize
and describe the main features of the whole population
in quantitative term.
 Tabular, graphical, and numerical methods (mean,
median, variance, standard deviation…)
 Used when we can enumerate the whole population
Descriptive Statistics

- Collect data
e.g., Survey, Observation,
Experiments

- Present data
e.g., Charts and graphs

- Characterize data x i

e.g., Calculate mean = n


2/ Inferential Statistics
 Inferential statistics: Procedures used to draw
conclusions or inferences about the characteristics of a
population from information obtained from the sample.
 Making estimates, testing hypothesis…
 Used when we can not enumerate the whole population
Inferential Statistics
Drawing conclusions and/or making decisions
concerning a population based on sample results.
 Estimation
 e.g., Estimate the population mean weight
using the sample mean weight
 Hypothesis Testing
 e.g., Use sample evidence to test the claim
that the population mean weight is 120
pounds
IV. Quantitative and qualitative data
Data can be classified as being qualitative
or quantitative.

Depends on whether the data are qualitative or


quantitative, we choose the most
appropriate statistical methods

In general, there are more statistical analysis for


quantitative data.
Qualitative Data
 Labels or names used to identify an attribute of each
element.
 Often be referred to as categorical data
 Nominal or ordinal scale of measurement will be applied to
summarize this kind of data
 Usually nonnumeric data
 Therefore, appropriate statistical analyses are rather limited
in comparison with those of quantitative data
Examples
 Eye colors:
1.Brown 2.Black 3.Blue 4.Green
 Marital status:
1. Single
2. Married
3. Divorced
4. Widowed
Quantitative Data
 Quantitative data can be described as data under the
numeric form. It indicates how many or how much:

There are two types of quantitative data:

discrete data: Continuous data:

- can measure precisely. - can not measured


- Only a finite number of precisely
values is possible. - An infinite number of
- Example: values is possible.
- Example:
Quantitative Data
E.g.
(i)The number of students in a class
(ii)The number of correct answers in a test
(iii)People’s height, weight; students’ GPA
V. Scales of Measurement
 Scales of measurement include:
Nominal Interval

Ordinal Ratio

The scale determines the amount of information


contained in the data.

The scale indicates the data summarization and


statistical analyses that are most appropriate.
Level of measurements

Highest Level
Measurements
Ratio/Interval Scale Complete Analysis

Rankings Higher Level


Ordered Categories Ordinal Scale Mid-level Analysis

Categorical Codes Lowest Level


ID Numbers Nominal Scale Basic Analysis
Category Names
Scales of Measurement
 Nominal

Data are labels or names used to identify an


attribute of the element.

A nonnumeric label or numeric code may be used.


Example
Students of a university are classified by the
school in which they are enrolled using a
nonnumeric label such as Business, Humanities, Education,
and so on.

Alternatively, a numeric code could be used for the school


variable (e.g. 1 denotes Business,2 denotes Humanities, 3
denotes Education, and so on).
Example
Please state which fuel are you using at home?

1. Firewood
2. Coal
3. Oil
4. Gas
Scales of Measurement
 Ordinal

The data have the properties of nominal data and


the order or rank of the data is meaningful.

A nonnumeric label or numeric code may be used.


Example
Students of a university are classified by their class
standing using a nonnumeric label such as Freshman,
Sophomore, Junior, or Senior.

Alternatively, a numeric code could be used for the


class standing variable (e.g. 1 denotes Freshman, 2
denotes Sophomore, and so on).
Example
Please order the kind of fuel that is the most favorite
one for you?

( ) Firewood
( ) Coal
( ) Oil
( ) Gas
Scales of Measurement
 Interval

The data have the properties of ordinal data, and


the interval between observations is expressed in
terms of a fixed unit of measure.

Interval data are always numeric.

There is no zero value that indicates


that nothing exists for the variable at the zero point.
Scales of Measurement
 Interval

The ratio of two values of interval scale is not


Meaningful because there is no zero value in this
scale.

Example: Melissa has an SAT score of 800, while


Kevin has an SAT score of 400. Melissa scored
400 points more than Kevin.
Example
Please state your opinion on customer service at one
restaurant?

-3 -2 -1 +1 +2 +3

Not friendly Friendly


Scales of Measurement
 Ratio

The data have all the properties of interval data


and the ratio of two values is meaningful.

This scale must contain a zero value that indicates


that nothing exists for the variable at the zero point.

Variables such as distance, height, weight, and time


use the ratio scale.
Example
Melissa’s college record shows 36 credit hours earned, while
Kevin’s record shows 72 credit hours earned. Kevin has
twice as many credit hours earned as Melissa.
Example
Assume that you spend VND 100,000 for your family’s fuel.
Please distribute this amount for different kinds that you are
interested in?

1. Firewood.................VND
2. Coal.........................VND
3. Oil............................VND
4. Gas..........................VND
Example: there is a survey on FTU’s students. Describe
them as quantitative or qualitative, and the scales of
measurement
1. Full name:..........................................
2. Sex: Male Female
3. Age :
4. Which year student:
1st 2nd 3rd 4th
5. a/ Have you got a part-time job?
Yes No
b/ If yes, how many hours per week?...........
c/ What do you think how much does your part-
time job fit your study field?
Very suitable Not at all
5 4 3 2 1
DATA COLLECTION
 Methods of Data Collection:
 Cencus
 Sample survey
 Experiment
 Observational study
 Census. A census is a study that obtains data from every member of a
population. In most studies, a census is not practical, because of the cost
and/or time required.
 Sample survey. A sample survey is a study that obtains data from a
subset of a population, in order to estimate population attributes.
 Experiment. An experiment is a controlled study in which the
researcher attempts to understand cause-and-effect relationships. The
study is "controlled" in the sense that the researcher controls (1) how
subjects are assigned to groups and (2) which treatments each group
receives
 Observational study. Like experiments, observational studies
attempt to understand cause-and-effect relationships. However, unlike
experiments, the researcher is not able to control (1) how subjects are
assigned to groups and/or (2) which treatments each group receives.
Survey Design Steps
 Define the issue
 what are the purpose and objectives of the survey?

 Define the population of interest


 Formulate survey questions
 make questions clear and unambiguous
 use universally-accepted definitions
 limit the number of questions
Survey Design Steps
 Pre-test the survey
 pilot test with a small group of participants
 assess clarity and length

 Determine the sample size and sampling method


 Select Sample and administer the survey
Types of Questions
 Closed-end Questions
◦ Select from a short list of defined choices
Example: Major: __business __liberal arts
__science __other
 Open-end Questions
◦ Respondents are free to respond with any value, words, or
statement
Example:What did you like best about this course?

 Demographic Questions
◦ Questions about the respondents’ personal characteristics
Example: Gender: __Female __ Male
Populations and Samples
 A Population is the set of all items or individuals of interest
◦ Examples: All likely voters in the next election
All parts produced today
All sales receipts for November

 A Sample is a subset of the population


◦ Examples: 1000 voters selected at random for interview
A few parts selected for destructive testing
Every 100th receipt selected for audit
Why Sample?

 Less time consuming than a census

 Less costly to administer than a census

 It is possible to obtain statistical results of a sufficiently high


precision based on samples.
Non-probability samples
Voluntary sample
• A voluntary sample is made up of people
who self-select into the survey

Convenience sample
• A convenience sample is made up of
people who are easy to reach
Statistical Sampling
 Items of the sample are chosen based on known or calculable
probabilities

Probability Samples

Simple Stratified Systematic Cluster


Random
Simple Random Samples
 Every individual or item from the population has an equal
chance of being selected
 Selection may be with replacement or without replacement
 Samples can be obtained from a table of random numbers
or computer random number generators
Stratified Samples
 Population divided into subgroups (called strata) according to
some common characteristic
 Simple random sample selected from each subgroup
 Samples from subgroups are combined into one

Population
Divided
into 4
strata
Sample
Systematic Samples
 Decide on sample size: n
 Divide frame of N individuals into groups of k individuals:
k=N/n
 Randomly select one individual from the 1st group
 Select every kth individual thereafter

N = 64
n=8 First Group
k=8
Cluster Samples
 Population is divided into several “clusters,” each
representative of the population
 A simple random sample of clusters is selected
 All items in the selected clusters can be used, or items can be
chosen from a cluster using another probability sampling
technique

Population
divided into
16 clusters.
Randomly selected
clusters for sample
BIAS IN SURVEY SAMPLING
 Bias often occurs when the survey sample does not accurately
represent the population
 Two causes of bias:
 selection bias
 Response bias
Selection bias
 Results from an unrepresentative sample
 3 types of selection bias
 Undercoverage
 Non-response
 Voluntary response
 To improve survey quality: use random sampling
Response bias
 Results from problems in the measurement process
 Two common causes:
 Leading question
 Social desirability
Learn to View Statistics with a
Critical Eye
 There are three kinds of lies…..
 Lies
 Damn Lies
 Statistics
 You need to make statistics work for you, not lie for
you!
Alert
“Statistics don’t lie, statisticians do.”
Exercise 1
Describe the variable implicit in these 10 items as quantitative or
qualitative, and describe the scale of measurement
1. Age of household head
2. Sex of household head
3. Number of people in household
4. Use of electric heating (yes/no)
5. Numbers of large appliances used daily
6. Average number of hours heating is on
7. Average number of heating days
8. Household incomes
9. Average monthly electric bill
10. Ranking of this electric company among 4 electricity suppliers
Problem
An auto analyst is conducting a satisfaction survey, sampling from a list of
10,000 new car buyers. The list includes 2,500 Ford buyers, 2,500 GM
buyers, 2,500 Honda buyers, and 2,500 Toyota buyers. The analyst selects a
sample of 400 car buyers, by randomly sampling 100 buyers of each brand.
Is this an example of a simple random sample?
 (A)Yes, because each buyer in the sample was randomly sampled.
 (B) Yes, because each buyer in the sample had an equal chance of being
sampled.
 (C) Yes, because car buyers of every brand were equally represented in
the sample.
 (D) No, because every possible 400-buyer sample did not have an equal
chance of being chosen.
 (E) No, because the population consisted of purchasers of four different
brands of car.
Problem
Which of the following statements are true?
I. Random sampling is a good way to reduce response bias.
II. To guard against bias from undercoverage, use a convenience
sample.
III. Increasing the sample size tends to reduce survey bias.
IV. To guard against nonresponse bias, use a mail-in survey.
 (A) I only
 (B) II only
 (C) III only
 (D) IV only
 (E) None of the above.

Vous aimerez peut-être aussi