Académique Documents
Professionnel Documents
Culture Documents
Regression 102
Thisisthesecondentryinourregressionanalysisandmodelingseries.Inthistutorial,wecontinuethe analysisdiscussionwestartedearlierandleverageanadvancedtechniquestepwiseregressionto helpusfindanoptimalsetofexplanatoryvariablesforthemodel. Again,wewilluseasampledatasetgatheredfrom20differentsalespersons.Theregressionmodel attemptstoexplainandpredictweeklysalesforeachsalesperson(dependentvariable)usingtwo explanatoryvariables:intelligence(IQ)andextroversion.
Data Preparation
Similartowhatwedidinanearliertutorial,weorganizeoursampledatabyplacingthevalueofeach variableinaseparatecolumnandeachobservationinaseparaterow. Next,weintroducethemask.ThemaskisaBooleanarray(0,1),whichchooseswhichvariableis included(orexcluded)fromtheanalysis. Initially,atthetopofthetable,letsinsertthemaskcellsarray,eachwithavalueof1(i.e.included).The arrayisshownhighlightedbelow.
Inthisexample,wehave20observationsandtwoindependent(explanatory)variables.Theresponseor dependentvariableistheweeklysales.
Process
Now,wearereadytoconductourregressionanalysis.First,selectanemptycellinyourworksheet whereyouwishtheoutputtobegenerated,thenlocateandclickontheregressioniconintheNumXL
Regression102Tutorial
SpiderFinancialCorp,2013
tab(ortoolbar).
TheRegressionwizardappears.
Selectthecellsrangefortheresponse/dependentvariablevalues(i.e.weeklysales).Selectthecells rangefortheexplanatory(independent)variablesvalues.ForVariables(X)Mask,selectthecellsatthe topofthedatatable(Booleanarray). Notes: 1. Thecellsrangeincludes(optional)theheading(Label)cell,whichwouldbeusedintheoutput tableswhereitreferencesthosevariables. 2. Theexplanatoryvariables(i.e.X)arealreadygroupedbycolumns(eachcolumnrepresentsa variable),sowedontneedtochangethat. 3. Bydefault,theoutputcellsrangeissettothecurrentlyselectedcellinyourworksheet. PleasenotethatonceweselecttheXandYcellsrange,theOptions,ForecastandMissingValues tabsbecomeavailable(enabled). Next,selecttheOptionstab.
Regression102Tutorial
SpiderFinancialCorp,2013
Now,clickontheMissingValuestab.
Regression102Tutorial 3 SpiderFinancialCorp,2013
Analysis
AsidefromtheVariables(X)Masksettings,everythingisexactlythesameaswedidintheprior tutorial,sowhatsournextstep? TheMaskvariabledetermineswhichvariableisincludedintheregressionanalysis,soletstake anotherlookattheCoefficientstable.
First,letsexcludetheIntelligenceinputvariablefromtheanalysis.Thisisdonesimplybyflippingthe maskvalueforthiscelltozero.
Regression102Tutorial
SpiderFinancialCorp,2013
Now,ifyouhavetheCalculationoptionsettomanual,forcerecalculation.Otherwise,thespreadsheet recalculatesautomatically.
Regression102Tutorial
SpiderFinancialCorp,2013
$4,500 $Sales/Week Estimated $4,000
$3,500
$3,000
$2,500
$2,000
$1,500 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Theshadedarearepresentsthe95%confidenceintervalfortheestimatesoftheregressionmodel. Sofar,wehavedemonstratedthatdroppingavariablefromtheanalysisisaseasyasflippingaswitch; nomorecopyingdataandclutteringyourspreadsheetwithtonsofoutputtables.Thisisnice,butyou mightbewondering:ifIhadmoreexplanatoryvariables(say10),whatistheoptimalsetofvariables? ShouldItryeverysinglesubset? NumXLsupportsaninterestingfunctionalitystepwiseregressiontohelpyouselectthisoptimalset. Letsdemonstratehowyouwoulduseit. (1) IntheMaskcellsrange,turnthevariablesonoroffthatyouwishthestepwiseregressionto consider.Forthisdemonstration,wewillturnthemallon.
Letstakeacloserlookatthisnewtable. ThestepwiseregressioncarriesonaseriesofpartialF testtoinclude(ordrop)variablesfromtheregression model. Forwardselection:westartwithanintercept, andexamineaddinganadditionalvariable. Backwardelimination:westartfromthefull modelwithallvariablesin,andconsider droppingonerepressoratatime. Bidirectionaleliminationisahybridofthetwo methods.
Conclusion
Sofar,wehavecreatedaregressionmodel,examineditssignificance,verifiedthatitsatisfiesunderlying assumptions,andfoundtheoptimalsubsetofvariablesofthemodel. Formany,thisistheendofanalysis,andtheywouldprobablystartusingitforforecasting. Beforewecanusethemodelforforecasting,therearetwomorequestionsweoughttoanswer: (1) Dowehaveanyobservationthatexertsasignificantinfluence(e.g.outliers)ontheregression model? (2) Istheregressionmodelstableoverthesampledata? Thiswillbecoveredinthe3rdentryinourregressiontutorialseries.Please,readon.
Regression102Tutorial
SpiderFinancialCorp,2013