Académique Documents
Professionnel Documents
Culture Documents
Introduction
Difference between chi-square and regression : chisquare test of independence to determine whether a
statistical relationship existed between two variables. The
chi-square test tell if there is such a relationship, but it does
not tell about what that relationship. But regression and
correlation analyses will show how to determine both the
nature and the strength of a relationship between two
variables
Regression analysis is a body of statistical methods
dealing with the formulation of mathematical models that
depict relationships among variables, and the use of these
modeled relationships for the purpose of prediction and other
statistical inferences.
The word regression was first in its present technical
context by Sir Francis Galton, who analyzed the heights of
sons and the average heights of their parents.
26/05/12
Models
yi
+ xi
xi
Statistical Model
Yi = + xi + ei, i = 1, , n
Where :
a) x1, x2, ,xn are the set values of the controlled variable x
that the experimenter has selected for the study.
b) e1, e2, ,en are the unknown error components that are
superimposed on the true linear relation. These are
unobservable random variables, which we assume are
independently and normally distributed with a mean of
zero and unknown variance of 2.
c) The parameters and , which together locate the
straight line, are unknown.
26/05/12
Basic Notations
( x x ) x nx
( y y ) y ny
( x x )( y y ) x y nx y
x
S x2
1
n
S y2
S xy
xi ,
1
n
2
i
2
i
i i
Example
Zippy Cola is studying the effect of its
latest advertising campaign. People
chosen at random were called and asked
how many cans of Zippy Cola they had
bought in the past week and how many
Zippy Cola advertisements they had either
read or seen in the past week.
X (number of ads) 3 7 4
Y( cans purchased) 11 18 9
2
4
0
7
4
6
1
3
2
8
26/05/12
a y x
S xy
b 2
Sx
SSE S y2 2 S x2
( y x )
i
i 1
b)
c)
d)
E (s 2 ) 2
and
E (s)
26/05/12
1 x2
s
n S x2
t(1CI ) / 2
s
Sx
26/05/12
Inference about
H 0 : 0 vs H1 : 0
( 0 )
t
, d.f. n 2
2
1 x
s
2
n Sx
is based on
S y2
Total
SS of y
2 S x2
SSE
SS explained
residualSS
by linear relation (unexplained)
26/05/12
Sum of Squares
d.f.
Mean Squares
Regression
SSR
MSR=SSR/1
MSR/MSE
Error
SSE
n2
MSE=SSE/(n-2)
Total
SST
n1
26/05/12
R2
SSR
SSE
1
SST
SST
0 R 2 1 or 0 R 2 100%
Perfect fitted
regression line
unfitted
regression
model
S xy
S x2 .S y2
26/05/12
Exercise
PUSKESMAS PANCORAN MAS ingin mengetahui
hubungan antara usia dengan besarnya tekanan
darah dari pasien. Diambil 10 pasien dan didapatkan
hasilnya sebagai berikut
Usia
38
36
72
42
68
63
Tekanan darah
115
118
160
140
152 149
49
56
60
55
145
147
155
150
26/05/12
Assumptions
Absence of multicollinearity
No outliers
Independence of errors assumes a
between subjects design. There are
other forms if the design is within
subjects.
10
26/05/12
Background
Background
11
26/05/12
Background
Background
LOGITS ARE CONTINOUS, LIKE Z
SCORES
p = 0.50, then logit = 0
p = 0.70, then logit = 0.84
p = 0.30, then logit = -0.84
12
26/05/12
Y | X B0 B1 X1
13
26/05/12
E(Y | X ) B0 B1 X 1
an expected value is a mean, so
(Y ) PY 1 | X
14
26/05/12
Yi
eu
1 eu
u A B1 X1 B2 X 2 BK X K
15
26/05/12
b0 b1 X1
e
i b0 b1X1
1 e
The logistic function
16
26/05/12
17
26/05/12
Logistic Function
Constant regression
constant different
slopes
v2: b0 = -4.00
b1 = 0.05 (middle)
v3: b0 = -4.00
b1 = 0.15 (top)
v4: b0 = -4.00
b1 = 0.025 (bottom)
1.0
.8
.6
.4
V4
V1
V3
.2
V1
V2
V1
0.0
30
40
50
60
70
80
90
100
Logistic Function
Constant slopes
with different
regression
constants
v2: b0 = -3.00
b1 = 0.05 (top)
v3: b0 = -4.00
b1 = 0.05 (middle)
v4: b0 = -5.00
b1 = 0.05 (bottom)
1.0
.9
.8
.7
.6
.5
.4
V4
.3
V1
.2
V3
V1
.1
V2
V1
0.0
30
40
50
60
70
80
90
100
18
26/05/12
The Logit
P(Y 1| X i )
exp(b0 b1 X1i )
(1 P(Y 1| X i )) (1 )
The Logit
19
26/05/12
The Logit
P(Y 1| X )
ln
ln
b0 b1 X1
(1 P(Y 1| X )) (1 )
For a single predictor
The Logit
ln
b0 b1 X1 b2 X 2 bk X k
(1 )
For multiple predictors
20
26/05/12
The Logit
Conversion
21
26/05/12
THANK YOU
GOOD LUCK
22