Regresion Multiple y Correlacion Parcial

REGRESION MULTIPLE Y CORRELACION PARCIAL
OBJETIVOS: 1. Construir un modelo utilizando dos o ms variables explicativas. 2. Probar una hiptesis para aceptar la validez de un modelo de regresin mltiple 3. Realizar pruebas de hiptesis individuales para descartar aquellas variables no significativas en un modelo de regresin mltiple 4. Calcular en interpretar medidas de asociacin mltiples. Este captulo debe contener necesariamente: 1. El modelo de regresin mltiple 2. Deduccin de la mejor ecuacin de regresin mltiple. 3. Inferencias sobre parmetros de la poblacin. EL MODELO DE REGRESION MULTIPLE La regresin mltiple y el anlisis de correlacin mltiple consiste en estimar una variable dependiente, utilizando dos o ms variables independientes. El modelo genrico ser:
f (X , X , X ,....) Y 1 2 3
V. independiente
Variables independientes
ECUACION DE REGRESION MULTIPLE La forma simblica de la ecuacin lineal con dos variables independientes:
b bX b X Y 0 1 1 2 2
Donde:
#
: Valor estimado correspondiente a la variable dependiente

0
: Interseccin con el eje Y. X1, X2: valores de las dos variables independientes.
b1,b2 : pendientes asociadas con X1 y X2 respectivamente
PROBLEMA DE APLICACIN NUMERO 1:

Hacer un anlisis completo de regresin lineal mltiple de los datos siguientes. Interprete sus resultados.
DOSIS DE COLESTEROL (mg por dia) COLESTERORL TOTAL MEDIO EN LA SANGRE(mg) PESO INICIAL (Kg) RELACION DEL PESO FINAL AL PESO INICIAL INGESTION MEDIA DE ALIMENTO POR Kg DE PESO INICIAL(gm. por da) GRADO DE ARTERIOESCLEROSIS
X1 30 30 35 35 43 43 44 44 44 44 44 44 45 45 45 45 45 45 45 45 44 49 49 49 51 51 51 51
X2 424 313 243 365 396 356 346 156 278 349 141 245 297 310 151 370 379 463 316 280 395 139 245 373 224 677 424 150
X3 2,46 2,39 2,75 2,19 2,67 2,74 2,55 2,58 2,49 2,52 2,36 2,36 2,56 2,62 3,39 3,57 1,98 2,06 2,45 2,25 2,15 2,20 2,05 2,15 2,15 2,10 2,10 2,10
X4 0,90 0,91 0,95 0,95 1,00 0,79 1,26 0,95 1,10 0,88 1,29 0,97 1,11 0,94 0,96 0,88 1,47 1,05 1,32 1,08 1,01 1,36 1,13 0,88 1,18 1,16 1,40 1,05
X5 18 10 30 21 39 19 56 28 42 21 56 24 45 20 35 15 64 31 60 36 27 59 37 25 54 33 59 30
Y 2 0 2 2 3 2 3 0 4 1 1 1 3 2 3 4 4 3 4 4 1 0 4 1 3 4 4 0
Model Summary Adjusted R Model 1 R ,732

a
Std. Error of the Estimate
R Square ,536
Square ,431
1,088
a. Predictors: (Constant), X5, X2, X3, X1, X4 ANOVA Model 1 Regression Residual Total Sum of Squares 30,083 26,025 56,107 df 5 22 27
b
Mean Square 6,017 1,183
F 5,086
Sig. ,003
a
a. Predictors: (Constant), X5, X2, X3, X1, X4 b. Dependent Variable: Y

a
Coefficients
Standardized Unstandardized Coefficients Model 1 (Constant) X1 X2 X3 X4 X5 a. Dependent Variable: Y B -7,106 ,026 ,007 1,613 ,253 ,049 Std. Error 4,357 ,043 ,002 ,682 3,845 ,043 ,101 ,610 ,419 ,031 ,533 Coefficients Beta t -1,631 ,603 3,854 2,365 ,066 1,138 Sig. ,117 ,553 ,001 ,027 ,948 ,267
Los resultados nos indican que el modelo general de los datos esta expresado como:
Y 7,106 0, 026 X
0, 007 X 2 1, 613 X 3 0, 253 X 4 0, 049 X 5
Si se tiene en consideracin que es importante conocer en forma individual que variables independientes explican significativamente el grado de arterioesclerosis entonces se debe contrastar las hiptesis nulas siguientes: H0: b1=b2=b3=b4=b5=0
Sin embargo si tenemos en consideracin los valores de t y la significacin de cada variable segn el cuadro que antecede se observa que las variables que tienen significacin son: X2 y X3 por tanto la ecuacin que debe quedar en el modelo es:
Y b b X
0 2
b3 X 3 e
Lo que significa que debemos encontrar la ecuacin solamente en funcin de estas variables. Coefficientsa Unstandardized Coefficients Model 1 (Constant) X2 X3 a. Dependent Variable: Y De acuerdo a los resultados se obtiene que la ecuacin de regresin debe ser: B -1,216 ,006 ,719 Std. Error 1,954 ,002 ,694 ,465 ,187 Standardize d Coefficients Beta t -,622 2,581 1,037 Sig. ,539 ,016 ,310
Y 1, 216 0, 006 X
0, 719 X 3
Y el ANOVA para este modelo es: ANOVAb Model 1 Regression Residual Total Sum of Squares 12,238 43,869 56,107 df 2 25 27 Mean Square 6,119 1,755 F 3,487 Sig. ,046a
a. Predictors: (Constant), X3, X2 b. Dependent Variable: Y
USO DEL METODO DE SPEPWISE (PASOS SUCESIVOS) PARA DETERMINAR LA MEJOR ECUACION DE REGRESION MULTIPLE. ANOVAd Model 1 Regression Residual Total 2 Regression Residual Total 3 Regression Residual Total Sum of Squares 10,352 45,755 56,107 21,922 34,186 56,107 29,652 26,455 56,107 df 1 26 27 2 25 27 3 24 27 9,884 1,102 8,967 ,000c 10,961 1,367 8,016 ,002b Mean Square 10,352 1,760 F 5,882 Sig. ,023a
a. Predictors: (Constant), X2 b. Predictors: (Constant), X2, X5 c. Predictors: (Constant), X2, X5, X3 d. Dependent Variable: Y Coefficientsa Unstandardized Coefficients Model
1 (Constant) X2 2 (Constant) X2 X5 3 (Constant) X2 X5 X3
Standardize d Coefficients Beta t

,915 ,430 2,425 -1,294 ,512 ,462 3,226 2,909 -3,008 ,614 ,607 ,405 4,161 3,975 2,648
B
,665 ,005 -1,161 ,006 ,043 -5,816 ,008 ,056 1,560
Std. Error
,727 ,002 ,897 ,002 ,015 1,934 ,002 ,014 ,589
Sig.
,369 ,023 ,208 ,003 ,008 ,006 ,000 ,001 ,014
a. Dependent Variable: Y
Excluded Variablesd Collinearity Statistics Tolerance ,998 ,964 ,998 ,968 ,768 ,840 ,116 ,762 ,094
Model 1 X1 X3 X4 X5 2 X1 X3 X4 3 X1 X4
Beta In ,265a ,187a ,373a ,462a ,061b ,405b -,468b ,100c ,005c
t 1,535 1,037 2,266 2,909 ,338 2,648 -1,021 ,613 ,010
Sig. ,137 ,310 ,032 ,008 ,738 ,014 ,317 ,546 ,992
Partial Correlation ,294 ,203 ,413 ,503 ,069 ,476 -,204 ,127 ,002
a. Predictors in the Model: (Constant), X2 b. Predictors in the Model: (Constant), X2, X5 c. Predictors in the Model: (Constant), X2, X5, X3 d. Dependent Variable: Y Por consiguiente el modelo que mejor explica los datos que se presentan esta dado por:
Y 5,816 0, 008 X
0, 056 X 5 1,56 X 3
OTRA FORMA DE ANALISIS

Determinamos la regresin lineal simple de la variable Y contra cada una de las variables independientes. Model Summary Model 1 R ,245a Adjusted R Std. Error of R Square Square the Estimate ,060 ,024 1,424
a. Predictors: (Constant), X1
Model Summary Model 1 R ,430a Adjusted R Std. Error of R Square Square the Estimate ,185 ,153 1,327
Model Summary Model 1 R ,099a Adjusted R Std. Error of R Square Square the Estimate ,010 -,028 1,462
CORRELACIONES SIMPLES ryx1 ryx2 ryx3 ryx4 ryx5
VALORES DE LAS CORRELACIONES 0,245 0,430 0,099 0,353 0,370
De acuerdo a los resultados vemos que el mayor valor de la regresin simple es la variable X2, por tanto es la primera variable que debe entrar en el modelo. Entonces la ecuacin de regresin es: Coefficientsa
Unstandardized Coefficients Model 1 (Constant) X2 a. Dependent Variable: Y B ,665 ,005 Std. Error ,727 ,002 ,430 Standardized Coefficients Beta t ,915 2,425 Sig. ,369 ,023
Y 0, 665 0, 005 X
Debemos evaluar las correlaciones parciales fijando la variable X2 Correlations Control Variables
X2
Y 1,000 . 0 ,294 ,137 . 25
X1 ,294 ,137 25 1,000
Correlation Significance (2tailed) df
X1
Correlation Significance (2tailed) df
Correlations Control Variables X2 Y Correlation Significance (2tailed) df X3 Correlation Significance (2tailed) df Correlations Control Variables X2 Y Correlation Significance (2tailed) df X4 Correlation Significance (2tailed) df . 0 ,413 ,032 . 25 0 Y 1,000 X4 ,413 ,032 25 1,000 . 0 ,203 ,310 . 25 0 Y 1,000 X3 ,203 ,310 25 1,000
Correlations Control Variables X2 Y Correlation Significance (2tailed) df X5 Correlation Significance (2tailed) df . 0 ,503 ,008 . 25 0 Y 1,000 X5 ,503 ,008 25 1,000
De acuerdo a los resultados se puede observar que la correlacin parcial mayor corresponde a la variable X5 esto significa que el nuevo modelo matemtico esta dado por:
Y b b X
0 2
b5 X 5 e
Por lo tanto la ecuacin con datos reales estar dada por:
a. Predictors: (Constant), X5, X2
ANOVAb Model 1 Regressio n Residual Total Sum of Squares 21,922 34,186 56,107 df 2 25 27 Mean Square 10,961 1,367 F 8,016 Sig. ,002a
a. Predictors: (Constant), X5, X2 b. Dependent Variable: Y
Coefficientsa Unstandardized Coefficients Model 1 (Constant ) X2 X5 a. Dependent Variable: Y B -1,161 ,006 ,043 Std. Error ,897 ,002 ,015 ,512 ,462 Standardize d Coefficients Beta t -1,294 3,226 2,909 Sig. ,208 ,003 ,008
Y 1,161 0, 006 X
0, 043 X 5
Ahora debemos fijar las variables X2 y X5 y encontrar las correlaciones parciales del resto de las variables.
Correlations Control Variables X2 & X5 Y Correlation Significance (2tailed) df X1 Correlation Significance (2tailed) df . 0 ,069 ,738 . 24 0 Y 1,000 X1 ,069 ,738 24 1,000
Correlations Control Variables X2 & X5 Y Correlation Significance (2tailed) df X3 Correlation Significance (2tailed) df Correlations . 0 ,476 ,014 . 24 0 Y 1,000 X3 ,476 ,014 24 1,000
Control Variables X2 & X5 Y Correlation Significance (2tailed) df X4 Correlation Significance (2tailed) df .
Y 1,000
X4 -,204 ,317 0 24 1,000
-,204 ,317 . 24
Los resultados de las correlaciones parciales no indican que el mayor valor de las correlaciones parciales en este caso la variable que debe entrar en el modelo es X3 es decir que el nuevo modelo es:
Y b b X
0 2
b5 X 5 b3 X 3 e
Que cuando construimos el ANOVA tambin construimos el nuevo modelo con datos reales:
a. Predictors: (Constant), X3, X2, X5 ANOVAb Model 1 Regressio n Residual Total Sum of Squares 29,652 26,455 56,107 df 3 24 27 Mean Square 9,884 1,102 F 8,967 Sig. ,000a
a. Predictors: (Constant), X3, X2, X5 b. Dependent Variable: Y
Coefficientsa Unstandardized Coefficients Model 1 (Constant ) X2 X5 X3 a. Dependent Variable: Y B -5,816 ,008 ,056 1,560 Std. Error 1,934 ,002 ,014 ,589 ,614 ,607 ,405 Standardize d Coefficients Beta t -3,008 4,161 3,975 2,648 Sig. ,006 ,000 ,001 ,014
Y b b X
0 2
b5 X 5 b3 X 3 e
Luego el modelo matemtico esta dado por:
Y 5,816 0, 008 X
parciales para X1 y X4
0, 056 X 5 1,560 X 3
Ahora fijamos las tres variables ms importantes y hallamos las correlaciones
Correlations Control Variables X3 & X4 & X5 Y Correlation Significance (2tailed) df X1 Correlation Significance (2tailed) . 0 ,127 ,546 . Y 1,000 X1 ,127 ,546 23 1,000
Correlations Control Variables X3 & X4 & X5 Y Correlation Significance (2tailed) df X1 Correlation Significance (2tailed) df . 0 ,127 ,546 . 23 0 Y 1,000 X1 ,127 ,546 23 1,000
Correlations Control Variables X2 & X5 & X3 Y Correlation Significance (2tailed) df X4 Correlation Significance (2tailed) df . 0 ,002 ,992 . 23 0 Y 1,000 X4 ,002 ,992 23 1,000
De acuerdo al anlisis correspondiente vemos que las correlaciones son demasiado bajas por tanto el modelo ptimo es:
Y 5,816 0, 008 X
0, 056 X 5 1,560 X 3
PROBLEMA DE APLICACIN NUMERO 2:

El flujo de calor solar se mide ocmop parte de una prueba de energa trmica solar. Se desea ver como se estima el flujo de calor con base en otras variables: aislamiento, posicin de puntos focales en el este, sur, y norte, y la hora del da. (datos de D.C. Montgomery and E.A. Peck (1982). Introduction to Linear Regression Analysis. John Wiley & Sons. p. 486). Los datos son los siguientes (Exh_regr.Mtw):
orden Flujo_de_calor Aislamiento (Y) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 271.8 264 238.8 230.7 251.6 257.9 263.9 266.5 229.1 239.3 258 257.6 267.3 267 259.6 240.4 227.2 196 278.7 272.3 267.4 (X!) 783.35 748.45 684.45 827.8 860.45 875.15 909.45 905.55 756 769.35 793.5 801.65 819.65 808.55 774.95 711.85 694.85 638.1 774.55 757.9 753.35 Este (X2) 33.53 36.5 34.66 33.13 35.75 34.46 34.6 35.38 35.85 35.68 35.35 35.04 34.07 32.2 34.32 31.08 35.73 34.11 34.79 35.77 36.44 Sur (X3) 40.55 36.19 37.31 32.52 33.71 34.14 34.85 35.89 33.53 33.79 34.72 35.22 36.5 37.6 37.89 37.71 37 36.76 34.62 35.4 35.96 Norte (X4) 16.66 16.46 17.66 17.5 16.4 16.28 16.06 15.93 16.6 16.41 16.17 15.92 16.04 16.19 16.62 17.37 18.12 18.53 15.54 15.7 16.45 Hora (X5) 13.2 14.11 15.68 10.53 11 11.31 11.96 12.58 10.66 10.85 11.41 11.91 12.85 13.58 14.21 15.56 15.83 16.41 13.1 13.63 14.51
22 23 24 25 26 27 28 29
254.5 224.7 181.5 227.5 253.6 263 265.8 263.8
704.7 666.8 568.55 653.1 704.05 709.6 726.9 697.15
37.82 35.07 35.26 35.56 35.73 36.46 36.26 37.2
36.26 36.34 35.9 31.84 33.16 33.83 34.89 36.27
17.62 18.12 19.05 16.51 16.02 15.89 15.83 16.71
15.38 16.1 16.73 10.58 11.28 11.91 12.65 14.06

Regresion Multiple y Correlacion Parcial

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Regresion Multiple y Correlacion Parcial

Transféré par

Droits d'auteur :

Formats disponibles

REGRESION MULTIPLE Y CORRELACION PARCIAL

: Valor estimado correspondiente a la variable dependiente

b1,b2 : pendientes asociadas con X1 y X2 respectivamente

PROBLEMA DE APLICACIN NUMERO 1:

Model Summary Adjusted R Model 1 R ,732

Std. Error of the Estimate

Mean Square 6,017 1,183

a. Predictors: (Constant), X5, X2, X3, X1, X4 b. Dependent Variable: Y

0, 007 X 2 1, 613 X 3 0, 253 X 4 0, 049 X 5

a. Predictors: (Constant), X3, X2 b. Dependent Variable: Y

Standardize d Coefficients Beta t

t 1,535 1,037 2,266 2,909 ,338 2,648 -1,021 ,613 ,010

OTRA FORMA DE ANALISIS

CORRELACIONES SIMPLES ryx1 ryx2 ryx3 ryx4 ryx5

VALORES DE LAS CORRELACIONES 0,245 0,430 0,099 0,353 0,370

Y 1,000 . 0 ,294 ,137 . 25

X1 ,294 ,137 25 1,000

Correlation Significance (2tailed) df

Correlation Significance (2tailed) df

Por lo tanto la ecuacin con datos reales estar dada por:

a. Predictors: (Constant), X5, X2

a. Predictors: (Constant), X5, X2 b. Dependent Variable: Y

Control Variables X2 & X5 Y Correlation Significance (2tailed) df X4 Correlation Significance (2tailed) df .

X4 -,204 ,317 0 24 1,000

a. Predictors: (Constant), X3, X2, X5 b. Dependent Variable: Y

Luego el modelo matemtico esta dado por:

Ahora fijamos las tres variables ms importantes y hallamos las correlaciones

PROBLEMA DE APLICACIN NUMERO 2:

254.5 224.7 181.5 227.5 253.6 263 265.8 263.8

704.7 666.8 568.55 653.1 704.05 709.6 726.9 697.15

37.82 35.07 35.26 35.56 35.73 36.46 36.26 37.2

36.26 36.34 35.9 31.84 33.16 33.83 34.89 36.27

17.62 18.12 19.05 16.51 16.02 15.89 15.83 16.71

15.38 16.1 16.73 10.58 11.28 11.91 12.65 14.06

Vous aimerez peut-être aussi