Vous êtes sur la page 1sur 8

Tutorial:

Regression 102
Thisisthesecondentryinourregressionanalysisandmodelingseries.Inthistutorial,wecontinuethe analysisdiscussionwestartedearlierandleverageanadvancedtechniquestepwiseregressionto helpusfindanoptimalsetofexplanatoryvariablesforthemodel. Again,wewilluseasampledatasetgatheredfrom20differentsalespersons.Theregressionmodel attemptstoexplainandpredictweeklysalesforeachsalesperson(dependentvariable)usingtwo explanatoryvariables:intelligence(IQ)andextroversion.

Data Preparation
Similartowhatwedidinanearliertutorial,weorganizeoursampledatabyplacingthevalueofeach variableinaseparatecolumnandeachobservationinaseparaterow. Next,weintroducethemask.ThemaskisaBooleanarray(0,1),whichchooseswhichvariableis included(orexcluded)fromtheanalysis. Initially,atthetopofthetable,letsinsertthemaskcellsarray,eachwithavalueof1(i.e.included).The arrayisshownhighlightedbelow.

Inthisexample,wehave20observationsandtwoindependent(explanatory)variables.Theresponseor dependentvariableistheweeklysales.

Process
Now,wearereadytoconductourregressionanalysis.First,selectanemptycellinyourworksheet whereyouwishtheoutputtobegenerated,thenlocateandclickontheregressioniconintheNumXL

Regression102Tutorial

SpiderFinancialCorp,2013

tab(ortoolbar).

TheRegressionwizardappears.

Selectthecellsrangefortheresponse/dependentvariablevalues(i.e.weeklysales).Selectthecells rangefortheexplanatory(independent)variablesvalues.ForVariables(X)Mask,selectthecellsatthe topofthedatatable(Booleanarray). Notes: 1. Thecellsrangeincludes(optional)theheading(Label)cell,whichwouldbeusedintheoutput tableswhereitreferencesthosevariables. 2. Theexplanatoryvariables(i.e.X)arealreadygroupedbycolumns(eachcolumnrepresentsa variable),sowedontneedtochangethat. 3. Bydefault,theoutputcellsrangeissettothecurrentlyselectedcellinyourworksheet. PleasenotethatonceweselecttheXandYcellsrange,theOptions,ForecastandMissingValues tabsbecomeavailable(enabled). Next,selecttheOptionstab.

Regression102Tutorial

SpiderFinancialCorp,2013

Initially,thetabissettothefollowingvalues: Theregressionintercept/constantisleftblank.Thisindicatesthattheregressioninterceptwill beestimatedbytheregression.Tosettheregressiontoafixedvalue(e.g.zero(0)),enterit there. Thesignificancelevel(aka. )issetto5%. IntheOutputsection,themostcommonregressionanalysesareselected. LeaveAutoModelingunchecked.Wewilldiscussthisfunctionalityinalaterissue.

Now,clickontheMissingValuestab.

Regression102Tutorial 3 SpiderFinancialCorp,2013

Inthistab,youcanselectanapproachtohandlemissingvaluesinthedataset(XandY).Bydefault,any missingvaluefoundinXorinYinanyobservationwouldexcludetheobservationfromtheanalysis. Thistreatmentisagoodapproachforouranalysis,soletsleaveitunchanged. Now,clickOKtogeneratetheoutputtables:

Analysis
AsidefromtheVariables(X)Masksettings,everythingisexactlythesameaswedidintheprior tutorial,sowhatsournextstep? TheMaskvariabledetermineswhichvariableisincludedintheregressionanalysis,soletstake anotherlookattheCoefficientstable.

First,letsexcludetheIntelligenceinputvariablefromtheanalysis.Thisisdonesimplybyflippingthe maskvalueforthiscelltozero.

Regression102Tutorial

SpiderFinancialCorp,2013

Now,ifyouhavetheCalculationoptionsettomanual,forcerecalculation.Otherwise,thespreadsheet recalculatesautomatically.

Checkingtheoutputtables,wefindthefollowing: Rsquaredroppedby6%. AdjustedRsquaredroppedby1.5%. Standarderrorincreasedby$3. AICdroppedbyone(1). ANOVAtableshowstheregressionissignificant. Residualdiagnosischecksoutforalltests. Intheregressioncoefficientstable,theinterceptandthecoefficientoftheExtroversion variablearebothstatisticallysignificant.

Thismodelhasfewerparameters(i.e.one)andexplainsthevariationinthevaluesoftheresponse variablejustaswellaswhenwehadtwo(2)explanatoryvariables. Now,letsplottheestimatedvaluesagainsttheactual.

Regression102Tutorial

SpiderFinancialCorp,2013


$4,500 $Sales/Week Estimated $4,000

$3,500

$3,000

$2,500

$2,000

$1,500 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Theshadedarearepresentsthe95%confidenceintervalfortheestimatesoftheregressionmodel. Sofar,wehavedemonstratedthatdroppingavariablefromtheanalysisisaseasyasflippingaswitch; nomorecopyingdataandclutteringyourspreadsheetwithtonsofoutputtables.Thisisnice,butyou mightbewondering:ifIhadmoreexplanatoryvariables(say10),whatistheoptimalsetofvariables? ShouldItryeverysinglesubset? NumXLsupportsaninterestingfunctionalitystepwiseregressiontohelpyouselectthisoptimalset. Letsdemonstratehowyouwoulduseit. (1) IntheMaskcellsrange,turnthevariablesonoroffthatyouwishthestepwiseregressionto consider.Forthisdemonstration,wewillturnthemallon.

(2) LocateandclickontheregressioniconintheNumXLtab. Regression102Tutorial 6 SpiderFinancialCorp,2013

(3) TheRegressionWizardpopsup. (4) IntheGeneraltab,selecttheinputcellsrangeandthemaskcellsrange. (5) UndertheOptionstab,checktheStepwiseRegressionbox.

(6) Leavethe3differentmethodschecked. (7) ClickOK. (8) Theoutputtablesaregenerated.

Thestepwiseregressiongeneratesoneadditionaltablenexttothecoefficientstable. Regression102Tutorial 7 SpiderFinancialCorp,2013

Letstakeacloserlookatthisnewtable. ThestepwiseregressioncarriesonaseriesofpartialF testtoinclude(ordrop)variablesfromtheregression model. Forwardselection:westartwithanintercept, andexamineaddinganadditionalvariable. Backwardelimination:westartfromthefull modelwithallvariablesin,andconsider droppingonerepressoratatime. Bidirectionaleliminationisahybridofthetwo methods.

Thetabledisplaysthemaskfortheoptimalmodelfoundineachcolumn.One(1)standsforinclusion andzero(0)forexclusion. Atthebottomofthetable,wecomputetheregressionstatisticsforeachmodelforourcomparison.In thiscase,thethreemodelscamebackwiththesamesetofvariables,sonocomparisonisneeded. Pleasenotethat,giventhesamesetofinputvariablesandresponses,themaskisusedtodifferentiate onemodelfromotherssimplybylistingtheinclusion/exclusionlist.

Conclusion
Sofar,wehavecreatedaregressionmodel,examineditssignificance,verifiedthatitsatisfiesunderlying assumptions,andfoundtheoptimalsubsetofvariablesofthemodel. Formany,thisistheendofanalysis,andtheywouldprobablystartusingitforforecasting. Beforewecanusethemodelforforecasting,therearetwomorequestionsweoughttoanswer: (1) Dowehaveanyobservationthatexertsasignificantinfluence(e.g.outliers)ontheregression model? (2) Istheregressionmodelstableoverthesampledata? Thiswillbecoveredinthe3rdentryinourregressiontutorialseries.Please,readon.

Regression102Tutorial

SpiderFinancialCorp,2013

Vous aimerez peut-être aussi