Vous êtes sur la page 1sur 38

1-1

StatistikdanEkonometrik Statistik dan Ekonometrik


Oleh:
Prof. Tri Widodo, Ph.D
1-2
Statistics is the science
of collecting, organizing,
ti l i presenting, analyzing,
and interpreting
numerical data to assist numerical data to assist
in making more
effective decisions.


First PhD Class: Lets cross the street and do your
economics!
What is Meant by Statistics?
1-3
Statistical techniques are
used extensively by
marketing, accounting,
quality control,
consumers professional consumers, professional
sports people, hospital
administrators, administrators,
educators, politicians,
physicians, and many
others.
Who Uses Statistics?
1-4
Descriptive Statistics Descriptive Statistics: Methods of organizing, esc p ve S s cs esc p ve S s cs g g,
summarizing, and presenting data in an informative way.
EXAMPLE : Election Result
Types of Statistics
1-5
Inferential Statistics Inferential Statistics:: A decision, estimate,
prediction, or generalization about a population,
based on a sample.
A Population Population
i Collection Collection
A Sample Sample is a
i
is aCollection Collection
of all possible
individuals,
portion, or part,
of the population
of interest
individuals,
objects, or
measurements of
i
of interest
interest.
Types of Statistics
1-6
Contoh: Hasil penelitian heboh mengenai virginitas: 93%
mahasiswi di Yogya sudah tidak virgin mahasiswi di Yogya sudah tidak virgin
Populasi
Sampling
Populasi
Sampel
Parameter Statistics
Pengujian Hipotesa
Statistik Inferensi
Pengujian Hipotesa
H
o
:; H
1
:
1-7
DATA
Qualitative or attribute
(type of car owned)
Quantitative or numerical
(type of car owned)
discrete
(number of children)
continuous
(time taken for an exam)
Summary of Types of Variables
1-8
Describing Data: Frequency Distributions and Graphic Describing Data: Frequency Distributions and Graphic
Presentation Presentation
Organize data into a frequency distribution Organize data into a frequency distribution.
Portray a frequency distribution in a histogram frequency Portray a frequency distribution in a histogram, frequency
polygon, and cumulative frequency polygon.
Present data using such graphic techniques as line
charts bar charts and pie charts charts, bar charts, and pie charts.
1-9
The three commonly used graphic forms are
Histograms, Frequency Polygons Histograms, Frequency Polygons, and a
Cumulative Frequency Cumulative Frequency distribution.
A i i h i hi h h l
q y q y
A Histogram is a graph in which the class
midpoints or limits are marked on the horizontal
axis and the class frequencies on the vertical axis axis and the class frequencies on the vertical axis.
The class frequencies are represented by the
heights of the bars and the bars are drawn heights of the bars and the bars are drawn
adjacent to each other.
Graphic Presentation of a
Frequency Distribution
1-10
Describing Data: Numerical Measures Describing Data: Numerical Measures
3- 10
Compute and interpret the range the mean deviation the
Describing Data: Numerical Measures Describing Data: Numerical Measures
Compute and interpret the range, the mean deviation, the
variance, and the standard deviation of ungrouped data.
Explain the characteristics, uses, advantages, and
disadvantages of each measure of dispersion.
1-11
It is calculated by
i th l
The Arithmetic Mean Arithmetic Mean is
3- 11
summing the values
and dividing by the
number of values.
the most widely used
measure of location and
shows the central value of
f
shows the central value of
the data.
The major characteristics of the mean are:
Average
Joe
It requires the interval scale.
All values are used.
It is unique.
The sum of the deviations from the mean is 0.
Characteristics of the Mean
1-12
Th Median Median i th
3- 12
The Median Median is the
midpoint of the values after
they have been ordered
There are as many
values above the
median as below it in
they have been ordered
from the smallest to the
largest.
median as below it in
the data array.
g
For an even set of values, the median will be the
arithmetic average of the two middle numbers and is g f
found at the (n+1)/2 ranked observation.
The Median
1-13
3- 13
The Mode Mode is another measure of location and
represents the value of the observation that appears
most frequently.
Data can have more than one mode. If it has two
modes, it is referred to as bimodal, three modes,
trimodal, and the like.
1-14
Dispersion Dispersion
25
30
3- 14
pp
refers to the
spread or
15
20
25
variability in
the data.
0
5
10
M f di i i l d th f ll i range range
0
0 2 4 6 8 10 12
Measures of dispersion include the following: range, range,
mean deviation, variance, and standard mean deviation, variance, and standard
deviation deviation.
Range Range = Largest value Smallest Range Range = Largest value Smallest
value
Measures of Dispersion
1-15
Sample variance (s Sample variance (s
22
))
3- 15
Sample variance (s Sample variance (s
22
))
2
s
2
=
(X - X)
2
n-1 n 1
Sample standard deviation (s) Sample standard deviation (s) Sample standard deviation (s) Sample standard deviation (s)
2
s s =
Sample variance and standard deviation
1-16
Chapter Four
Describing Data: Displaying and Exploring Describing Data: Displaying and Exploring Describing Data: Displaying and Exploring Describing Data: Displaying and Exploring
Data Data
Develop and interpret a stem-and-leaf display.
Develop and interpret a dot plot.
Compute and interpret quartiles, deciles, and percentiles.
Construct and interpret box plots. Construct and interpret box plots.
Compute and understand the coefficient of variation and the
coefficient of skewness.
Draw and interpret a scatter diagram.
Set up and interpret a contingency table. Set up and interpret a contingency table.
1-17 4-17
Dot Plot
Dot plots:
Report the details of each observation
Are useful for comparing two or more data sets Are useful for comparing two or more data sets
Dot Plot
1-18 4-18
Stem-and-leaf Displays
Note: an advantage
f p y
Stem-and-leaf g
of the stem-and-leaf
display over a
f
f
display: A statistical
technique for
di l i t f frequency
distribution is we do
not lose the identity
displaying a set of
data. Each
numerical value is y
of each observation. divided into two
parts: the leading
digits become the digits become the
stem and the
trailing digits the
Stem-and-leaf Displays
leaf.
1-19 4-19
b l A box plot is a graphical
display, based on quartiles,
that helps to picture a set of
Fivepiecesof data
that helps to picture a set of
data.
Five pieces of data
are needed to
construct a box plot:
the Minimum Value,
the First Quartile,
theMedian the the Median, the
Third Quartile, and
the Maximum
V l
Box Plots
Value.
1-20 4-20
Q
1
Q
3
Max Min Median
12 14
16 18 20 22 24
26 28
30
32
1-21 4-21
Skewness is the
measurement of the measurement of the
lack of symmetry of
the distribution.
Thecoefficient of
k
Avalue of 0 indicates a symmetric
skewnesscan range
from -3.00 up to 3.00
when using the following
A value of 0 indicates a symmetric
distribution.
when using the following
formula:
Some software packages use a different formula
which results in a wider range for the coefficient.
s
Median X
sk

=
3
Movie
s
1-22 4-22
Scatter
V i bl t b t l t i t l l d
Scatter
diagram: A
technique
Variables must be at least interval scaled.
q
used to show
the
Relationship can be positive (direct) or
negative (inverse).
relationship
between
variables
negative (inverse).
variables.
Example
The twelve days of stock prices and the overall market
index on each day are given as follows:
Scatter diagram
1-23
i i d C fid l i i d C fid l Estimation and Confidence Intervals Estimation and Confidence Intervals
Construct a confidence interval for the population proportion Construct a confidence interval for the population proportion.
.
1-24
A confidence interval
A point estimate is
a s i ngl e v al ue
f
is a range of values
wi t hi n whi c h t he
a s i ngl e v al ue
(statistic) used to
e s t i m a t e a
population parameter
is expected to occur.
e s t i m a t e a
population value
( p a r a m e t e r ) .
The two confidence
i t l th t d intervals that are used
extensively are the
95% and t he 99%
An Interval Estimate
s t at e s t he r ange
wi t h i n wh i c h a
95% and t he 99%.
wi t h i n wh i c h a
population parameter
p r o b a b l y l i e s .
Point and Interval Estimates
p r o b a b l y l i e s .
1-25
If the population If the population
standard deviation is
unknown, the
s
t X
underlying population
is approximately
normal, and the sample
n
t X
, p
size is less than 30 we
use the t distribution.
The value of t for a given confidence level depends
upon its degrees of freedom.
Point and Interval Estimates
1-26
Confidence interval for the mean
n
s
z X
n
95% CI for the population mean
s
X 96 . 1
n
99%CI for the population mean
ConstructingGeneral Confidence
99% CI for the population mean
X
s
2 58
Constructing General Confidence
Intervals for
X
n
2 58 .
1-27
Ekonometrik
Statistika Statistika
Ekonomi
Matematika
1-28
i i d C l i i i d C l i Linear Regression and Correlation Linear Regression and Correlation
Draw a scatter diagram.
Understand and interpret the terms dependent variable and independent
variable.
Calculate and interpret the coefficient of correlation, the coefficient of
determination, and the standard error of estimate.
Conduct a test of hypothesis to determine if the population coefficient of
correlation is different from zero.
Calculate the least squares regression line and interpret the slope and intercept
values.
Construct and interpret a confidence interval and prediction interval for the
dependent variable.
Set up and interpret an ANOVA table.
1-29
Correlation Analysis Correlation Analysis is a group of statistical techniques to yy g p f q
measure the association between two variables.
AScatter Diagram Scatter Diagram
Advertising Minutes and $ Sales
A Scatter Diagram Scatter Diagram
is a chart that portrays
the relationship
5
10
15
20
25
30
a
l
e
s

(
$
t
h
o
u
s
a
n
d
s
)
the relationship
between two variables.
0
5
70 90 110 130 150 170 190
Advertising Minutes
S
a
The Independent Independent
Variable Variable provides the
The Dependent Dependent
Variable Variable is the variable
basis for estimation. It
is the predictor variable.
Variable Variable is the variable
being predicted or estimated.
Correlation Analysis
1-30
The Coefficient of Correlation Coefficient of Correlation (r) is a measure of the ff f ff f f
strength of the relationship between two variables.
Also called Pearsons r and
P d
It requires interval or
Pearson's r
Pearsons product moment
correlation coefficient.
q
ratio-scaled data.
It can range from It can range from
-1.00 to 1.00.
Values of -1 00 or 1 00
-1
1 0
Values of -1.00 or 1.00
indicate perfect and
strong correlation.
Negative values indicate an
inverse relationship and
positive values indicate a
g
Values close to 0.0 indicate
weak correlation.
The Coefficient of Correlation, r
positive values indicate a
direct relationship.
1-31
10
9
8
7
6
55
4
3
2
1
Y
0
0 1 2 3 4 5 6 7 8 9 10
X
Perfect Negative Correlation
1-32
10
9
8
7
6
55
4
3
2
1
0
Y
0
0 1 2 3 4 5 6 7 8 9 10
X
Perfect Positive Correlation
1-33
10
9
8
7
6
55
4
3
2
1
Y
0
0 1 2 3 4 5 6 7 8 9 10
X
Zero Correlation
1-34
10
99
8
7
6
5
4
3
2
1
0
Y
0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
X
Strong Positive Correlation
1-35
The coefficient of determination coefficient of determination (r
2
) is the The coefficient of determination coefficient of determination (r ) is the
proportion of the total variation in the dependent
variable (Y) that is explained or accounted for by the va iable ( ) t at is explai ed o accou ted fo by t e
variation in the independent variable (X).
It is the square of the coefficient of correlation.
It ranges from 0 to 1.
It does not give any information on the direction
of the relationship between the variables.
Coefficient of Determination
1-36
In Regression Analysis Regression Analysis we use the independent In Regression Analysis Regression Analysis we use the independent
variable (X) to estimate the dependent variable (Y).
The relationship
between the
Both variables
must be at least
between the
variables is linear.
must be at least
interval scale.
The least squares criterion
is used to determine the
equation. That is the term
(Y Y)
2
is minimized.
Regression Analysis
1-37
The regression equation is Y= a + bX The regression equation is Y= a + bX
where where
Y is the average predicted value of Y for any X. Y is the average predicted value of Y for any X.
a is the Y a is the Y--intercept. intercept.
It i th ti t d Y l h X 0 It i th ti t d Y l h X 0 It is the estimated Y value when X=0 It is the estimated Y value when X=0
b is the slope of the line or the average change b is the slope of the line or the average change b is the slope of the line, or the average change b is the slope of the line, or the average change
in Y for each change of one unit in X in Y for each change of one unit in X
The least squares principle is used to obtain a The least squares principle is used to obtain a
and b. and b.
Regression Analysis
1-38
T i k ih Terimakasih

Vous aimerez peut-être aussi