Académique Documents
Professionnel Documents
Culture Documents
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
Wiley-Blackwell and The Royal Geographical Society (with the Institute of British Geographers) are
collaborating with JSTOR to digitize, preserve and extend access to Transactions of the Institute of British
Geographers.
http://www.jstor.org
434
Movingoutofthelinearrut:thepossibilities
of
Generalized
Additive
Models
KELVYN JONES and SIMON ALMOND
ReaderinGeography,
andHeadofSystemsDevelopment, ofGeography,
Department University
of
Portsmouth, HantsPO1 3HE
Portsmouth,
ABSTRACT
Generalized modelsin whichtheusuallinearrelationships
AdditiveModelsarestatistical betweentheresponseand
variables
predictor arereplacedbynon-linear
'smooths'.Thispaperdiscusses inwhichthistechnique
thecontext hasbeen
andprovides
developed, Thefirst
twoillustrations. isananalysis
illustration inCalifornia
ofprecipitation a
whichrequires
normal-theorymodel.Thesecondis theprediction
ofland-use fromsatellite
imagery; nature
thediscrete oftheresponse
ofa GAMwitha logitlinkanda binomial
thefitting
necessitates randomterm.
KEYWORDS:Statistical
modelling, dataanalysis,
Exploratory Non-linearity, Remote
Smoothing, sensing
In his polemicaland entertaining essay on statistical The GAM has been developed in a statisticalen-
inference,Gould (1969) accuses geographersof vironmentthat has seen profoundchanges since
being stuckin a linearrut.This articleis centrally Gould's paper.These changesinvolvemodifications
concernedwithgettingout of thatrutby replacing of outlook,as well as substantiveinnovation,both
linearequationswith'smooth'functions in whatare aspectsbeingenabledby improvedtechnology.It is
called Generalized Additive Models (Hastie and particularly importantthatthese developmentsare
Tibshirani,1986, 1990a).1 In the belief that the appreciatedifthe growingrapprochement between
world may be naughtybut not malicious,the re- qualitativeand quantitative approaches(forexample,
sponsevariableis postulatedinsuchmodelsto be the Bryman,1988; Pratt,1989) is not to be made on the
outcomeof theadditionofnon-linear termsforeach basis of simplistic, outdatedand mistakenideas of
predictor variable.Thisintroduction to thetechnique whatconstitutes a quantitative,
statistical
approach.
has threeparts.Afterdealingwiththestatistical con- In termsof technologicalchanges,developments
textin whichthe GAM has been developed,simple in computingpowerand graphicsdevicesarehaving
linearmodelsare thencontrasted withtheiradditive a majorimpacton statisticalpractice.The majority
counterparts which are by the analysisof of techniquesto be found in the textbooks of
illustrated
precipitationdata for California.Generalized linear quantitative geographywere developed in the pre-
modelsare thenoutlinedand generalizedfurther to computerera of the earlypartof thiscentury.As a
takea non-linearbutadditiveformwhichis illustrated resultmostare based on demandingand unrealistic
by a model designedto predictthepresence/absence assumptions(forexample,linearityand normality)
of woodland from satelliteimagery.This paper that were deliberatelychosen to ease the compu-
does not attemptto provide rigorousproofsand tationaland mathematical burden.Moreover,ifthe
derivations,but aims to introducethe data-analytic calculationshad taken hours or days, it was very
potentialitiesof the GAM in an informal man- to
difficult takea highlyscepticalview of theresults
ner, while being mindfulof its assumptionsand thatwould entailfurther computation. For example,
limitations. Fisher(Box, 1978) in derivingtheexactdistributions
the GAM and the usual linearmodel. The general wherethe -signifies an estimatedvalue. The partial
formof themodelremainsunchanged:theintercept residualsare simplytheestimatedresidualsplus the
and randomterm(e) are thesame;3therelation- productof the estimatedslope and the predictor
(f0)
shipbetweentheresponseand each predictoris still variable.The partialresidualswhen plottedagainst
additive,but the natureof the relationships is no theirassociatedpredictorvariableessentiallyshow
longer linearin the parameters.Instead of simple therelationship betweentheresponseand thepredic-
linearrelations,the functionalformbetween each toraftertheeffects ofotherpredictors inthemodel(in
predictorand the responsemay be different and thiscase,X2)havebeen 'removed'.Thus,thecloudof
complex,with asymmetry and 'bends'.4The exact points in a partialresidualplot show the scatter
formoftheserelationships is notpre-determined, but throughwhichthepartiallinearregressionhas been
is derivedduringempiricalanalysis.That is whythe fitted.Partialresidualplots representmulti-variate
term'smooth'is leftdeliberatelyvague. It simply relationshipsas a set of bivariatescatter-plots. It
means less smooth than a straightline,but more is thesepartialresidualsthatare 'smoothed'in the
smooththanjoiningup thepointsina scatter-plot of estimation oftheGAM equations.
theresponseand thepredictor.
Clearly,themoreflexiblespecification ofequation Smoothers
(2) includesthelinearmodelas a specialand limiting The second buildingblock of the GAM estimation
case. The linearmodel,however,has one overwhelm- processis theuse ofscatter-plot smoothers whichcan
ing advantagein termsofease ofcomputation, there operateon the bivariaterelationship of the partial-
is an exact ordinary-least-squares (OLS) solution residualplot.5Thereis a verywidechoiceofdiffering
wherebyparameter estimatescanbe derivedbymini- smoothers(Titterington,1985) and geographers
mizingthe sumsof thesquaredresiduals,thatis the have previouslyused a numberof different typesin
differences betweenthe actual and fittedvalues of theirwork.Localizedfitting is characterized by the
theresponsevariable.No suchexactsolutionis avail- smoothedestimateof a particular observationbeing
able forGAM models,and the equationhas to be derivedon the basis of 'nearby'values. The classic
estimatedin an iterativemanner,withan increased example of thisis runningmeans (Gregory,1963,
burdenof computation,whichis oftenlarge.This pp. 241-3) in whichthe arithmetic average is esti-
iterativeestimationprocess is based on the two mated for successive, fixed length, sequences.
building blocks: partial residualsand scatter-plotAnotherexampleis 'running medians'whicharepre-
smoothers, whichwillnow be discussedin turn. ferredby some due to their'resistance'to outlying,
Movingoutofthelinear
rut 437
a) Y unusuallysmallorlarge,values(Cox andJones,1981;
20 - * Wrigleyand Dunn, 1986). Anotherapproach,is to
use all the observationsin derivingthe smoothfor
anyone observation, thatis 'global'fitting.
Ingeogra-
10 phy, the most commonly-used global methods are
polynomialexpansionswhichformthecoreoftrend-
surfacemodels (Unwin,1975). Afteran exhaustive
0 -* surveyofmanydifferent typesofsmoother, Bujaetal.
(1989,p. 549) conclude
I I I I
I
0 6 12 18 24
! 30 in ourexperience withrelatively
noisydata,in most
casesthechoiceis nottooimportantinthatdifferences
betweensmoothers aresmallrelative
to thedifference
Y betweenthesmoother anda parametric fit.
(say,linear)
20 * *
In practice,and certainlyin the developmentof
* software, itis smoothsbasedon cubicsplines(Eubank,
10-
1988; Silverman, 1985) thathave mainlybeen devel-
oped. Indeed,theyareused intheGAMFIT software
0- * * thatis used laterto calibratesomeillustrative models.
The preference forcubic splinesis motivatedby a
I I I I X numberof arguments. They are fastand efficientto
I
0 6 12 18 24 30 compute, and being symmetrical linear smoothers
theyare analytically tractableso thatan inferential
and theoretical framework can be developed.More-
over,theyrepresent a compromise betweenlocaland
global fitsin which the of
degree smoothingis data
b) Ep,1
dependent.In areas of the scatter-plot where data
0- densityis high, the smooth will reflectthe local
pattern, butifdata aresparse,thesplinewillbe linear
in that area. As Goodall (1990, p. 173) concludes
-10-
'splinesare a particularly versatilechoice'.
The aim in a good smoothis to capturefullythe
trendwhilenot payingtoo muchatten-
-20- 1 underlying
tion to the surroundingnoise. Severe smoothing
potentially leads to bias and thederivedtrendbeing
I I I I i I
0 6 12 18 24 30 off-target, whileinsufficient smoothingleads poten-
tiallyto a large varianceand 'jerkiness'aroundthe
truesmooth.Theseconceptsareshowngraphically in
Ep,2 Figure2. whichshows(a) thetrueunderlying curved
30- relationship;(b) a particularempirical,stochastic
realizationof that relationship;(c) a biased 'over'
smoothwhichmisses the underlyingtrend;(d) an
20- 'under'smoothin whichthereis too muchimpre-
cision. Technically,therefore, a good smooth is a
compromise betweenbias(accuracy)and imprecision.
10- What thereforeis requiredin any smootheris an
abilityto 'tune'thedegreeofsmoothnessimposedon
I I I I the data, and a means of assessingan appropriate
I I X22
0 6 12 18 24 30 choice.
FIGURE1. Scatterandpartial-residual
plotsforequation(3). (a) A veryusefulconceptin specifying the required
Bivariate
scatter residual
plots,(b)partial plots smoothnessis the degreesoffreedom consumedby
438 KELVYN]ONESandSIMONALMOND
a) the smooth. The smoothestpossible relationship
betweena setofpointsis a straight line;more'rough'
butstillverysmoothis a quadraticpolynomial;while
1600 theroughestofallis a running meanbasedon a single
point,which merely joinsup theoriginaldata.In a
all
fittedmodel,theoverallmeanor intercept consumes
800
one degreeof freedom, whilethe linearequationof
thestraight lineconsumesan extradegreeoffreedom
for the slope term.The quadraticconsumesthree
x (interceptand two slopes),whilethe one-pointrun-
-80 -40 0 40 80 ningmean consumesthemall. Thus,the degreesof
freedomis therequired'tuning'parameter; ifit is set
low,verysmoothresultsareobtained;ifitis sethigh,
b) y the smoothwillbe veryroughand complex.In the
GAMFIT software,withthe smoothingperformed
by cubic splines,it is convenientto controlthe
1600-
S *
amount of smoothingby selectinga reasonable
approximate degreesoffreedomforeachpredictor in
800 -? relationto thenumberof observationsinvolved.6In
subsequentanalysisa higheror lower value can be
0 * chosenand theresultant changein theresidualsums
ofsquarescanbe inspectedto revealtheappropriate-
x nessoftheoriginalchoiceintermsofgoodnessoffit.
-80 -40 0 40 80
The choiceof the smoothingparameteris therefore
determined empiricallyfromthedata;a highdegrees
C) of freedom,weak smoothing,and complex fitted
y curvesonlybeingjustified ifthisresultsina good fit,
witha smallresidualsums-of-squares.
1600
'Backfitting'
algorithm
are usually derived froma single
Partial-residuals
800-
overall fit,and the plots are inspectedfor non-
linearitieswhich are then accommodatedin an
0- improvedlinearmodelby a transformation ofone or
x moreofthepredictor variables(Jones,1984;Wrigley
-80 -40 0 40 80 and Dunn, 1986). Unfortunately, such an approach
mayfaildue to non-linearity inone partofthemodel,
maskingnon-linearities elsewhere.This problemis
d) overcomeinGAM modelsby employingan iterative
y schemethateffectively deals withall the variables
simultaneously. In this scheme,the smoothersare
1600 combinedwiththepartialresidualto forma general
'backfitting'
algorithm.Schematically, thisalgorithm
800 has the followingformforthe GAM of equation
(2):7
ModelA
545 23
Intercept 1 -
Altitude 1 87.2
0.003 5.4
Latitude 1 7.6
Distance 1 - 3.05
006 - 27
Rain-shad 1 - 11-3 - 4-7
Total parameters 5
ModelB
412 21
Intercept 1 -
Altitude 3 85.2
0.003
Latitude 1 6.1 0.05
Distance 1 2.97 - 8.2
-- 006
Rain-shad 1 - - 3.7
4.3
Total parameters 7 9.40
ModelC
261 17
Intercept 1 -
Altitude 3 82.9
0.003 6-6
Latitude 3 0.03
2-89 9.0
Distance 3 - - 0.13
Rain-shad 1 - 0.06 - 3.5 0.12
Total parameters 11 8.17 4.2
Model in
Change
comparison DF SS F-ratio Prob
altitude is more marked. However, the partial- We certainlyneed to examine the evidence for
residualplot (d) revealsthatthe most pronounced mid-altitude we willreturnto this
stationscarefully;
between2000 and 5000 feetis based
'non-linearity' matterlater.
on only a few observations.At lower altitudes,
below 2000 feet,withmore observationsreflected
in narrowerstandard-error bands, the smooth (c) GENERALIZED MODELS
appearslinear.Whiletheconfirmatory testsindicate
non-linearity,the plots in providingdetail as well Generalizedlinearmodels
as overall pattern,suggest that the evidence for The responseintheCalifornian precipitationexample
non-linearity must be treatedwith cautionas the itis reasonable
is a continuousvariable,and therefore
non-linearity based on few observations.Witha
is to choose a Gaussian distribution for the random
model of eleven parameters being fittedto only 28 term.However,muchgeographicalresearchis con-
observations,we are in danger of producinga cerned with counts, proportionsand categorical
'noisy overfit'with every 'aberration'or outlier outcomes,for which the choice of normal-theory
in the data being accommodatedin the smoothfit. modelwouldbe unwarranted. The generalizedlinear
Movingoutofthelinearrut 441
a) 30- TABLEII. Components linear
ofgeneralized models
20- Linear
model Link Random
term
-20 4 I I
32 34 36 38 40
Latitude
model,however,dealswithallthesedifferent typesof
b)30- responsevariablesin a unifiedand flexiblemanner
(Dobson, 1990; McCullaghand Nelder,1989). This
20- derivesfrombeing able to specifysepar-
flexibility
O atelythethreecomponentsthatconstitute theGLM,
C
whichin thetwo-predictorcase is givenas:
-10
or in verbalterms:
L
S2 -..----------------- mate guide to the goodness of fitis the pseudo
R-squared,or rho-squareas it is known (Wrigley,
1985,pp. 49-50). Thiscan be derivedas:
T
_4 -2
-6-
Rho-square= 1 --
(Deviance frompostulatedmodel)
(Deviance fromnull,intercept-only
model)
10 20 30 40 50 60 70 80 DN.
whichin thiscase givesa figureof0-52.Whiletheor-
Band 2 - Red eticallythisstatisticcan varybetween0 and 1, the
lattervalue can only be achievedby the ratherun-
6-
likelyoutcomeofall thepredictedprobabilities being
L 2-
correctat either0 or 1. Consequently,McFadden
S0- (1974) has suggestedthata value between0-2 and
I
-2
0-4 representsa 'good fit',whileFotheringham and
T
Knudsen(1987) state that it is rare to see values
4- thatexceed 0-4. In termsof statisticalfit,the three
20 25 30 35 40 45 50 55 60 65 70 75 80 DN.
intensityvariables thereforeperformexceedingly
or morecorrectly,
well in predicting, fittinga model
Band 3 - Infra-red forwoodland. The apparentsuccess of this GAM
-6- approachjustifiesfurther developments.Attempts
are currentlybeing made to improvethe model by
includingfurther predictorssuch as aspect (derived
froma digital terrainmodel), and by lettingthe
relationshipsbetween responseand the predictors
40 60 80 100 120 140 160 N. varyspatiallyin a GAM developmentof an expan-
40 60 80 100 120 140 160 DN.
sion model(Miles etal., 1992). It is also intendedto
evaluate the procedureas a predictivedevice for
FIGURE 4. SmoothsforGAM model fittedto woodland data (df:5
foreach Band)
differentspeciesof woodland,and to cross-validate
thepresentmodelby predicting woodlandpresence/
absencein otherareaswithknowngroundtruth.
Predicted
Probabiity
0.75 1.00
0.25 0.50
0.00 0.25
.(b)
i.
FIGURE 5. Ground truthand predictedprobabilitiesfor woodland, Forest of Bere. (a) Ground truth:woodland presence/absence,
(b) predictedprobabilitiesforwoodland
Moving outofthelinear rut 445
non-linearities where there are none. Hastie and ACKNOWLEDGEMENTS
Tibshirani(1990a, pp. 164, 286) give some graphical
examplesofthisproblem."3 Suchcasesofextremebias Bothauthorsthanktwo refereesfortheircomments,
can be foundfrominspectionof the partialresidual and Steve Framptonfor his considerablehelp in
the the ofa
plots; the smooth will have wide standard-errorprocessing data;KJacknowledges support
bands,and theresidualswilltrackthefitted functions Nuffield Social ScienceFellowship,1991-2.
in the'pure'local regionsofzeroesor ones.
More generally,while the GAM relaxesthe as- NOTES
sumptionof linearity, otherassumptionsassociated
withthe linearmodel,such as independenceof the 1. The use of the acronymGAM is unrelatedto
randomtermand no collinearity (or,concurvity as it Openshaw's (1990) use in which GAM stands for
becomes) stillneedto be fulfilled
(Poole and O'Farrell, geographicalanalysismachine.
1971). Moreover,apparentnon-linearity maybe the 2. The phrase'randomterm'is preferred to otherepithets
result of other formsof model mis-specification. such aserrors, disturbances,nuisances, becausetheyare
not'wrong' andareoften ofsubstantive importance. In
Returningto graphsforCalifornian precipitation in
thiscontext theword'random' meansallowedtovary,
Figure3, itcan be seenon closerinspection, thattwo
andinthismodelthevaluesoftheresponse areallowed
points between 1000 and 4000 feet are influentialin
to varyaroundtheintercept (&o)after allowing forthe
producingthe curveforaltitude.It just so happens effects associated withthetwopredictor variables.
thatboththesepointsrepresent stationsthatare the 3. Otherassumptions of thelinearmodelare usually
furthest inlandand in rain-shadow. A modelwithan carriedover to the GAM; thusthelatterassumesthat
additionalvariablefor rain-shadow/distance inter- all relevantpredictors have been included,and thatthe
action may therefore be preferred to a non-linear errorsare independent.Any empiricalanalysisusing
model.WhileHastieand Tibshirani (1986, p. 297) in theGAM mustbe mindful, therefore, ofbreakingthese
theirearlypaperon the GAM extol thevirtuesof a assumptions.
techniquethatis 'completely automatic, thatis nodetec- 4. The GAM is 'intrinsically non-linear' forit is specified
tivework',it mustbe stressedthe resultsshouldbe to be non-linearin theparameters; thiscontrastswith
the usual approachof transformation of the response
subjectto rigorousscrutiny, forit is possible that and/orthe predictors(Johnston, 1978, pp. 38-41) in
exploratoryprocedurescan capitalizeon a chance which the models remain 'intrinsically linear'. The
resultand lead theresearcher astray.14The appropri- choice of a particulartransformation (logarithms,
ate strategyforguardingagainstsuch problemsis square roots etc.) is a matterof guessworkalthough
cross-validation, whereexploratoryworkis under- suchdevicesas the'ladderofpowers'(Jones,1984) are
takenon partof thedata,and theresultant modelis a help. The transformation must be chosen before
thensubjectedto rigorousconfirmatory analysison model calibration;in contrast,in the GAM approach,
theremaining part. thedata areused to revealan appropriateform.
This paper has sought to demonstratethat the 5. An alternativeis to fit a general smooth surface
GAM providesan effective andhighlyflexible means throughthe multi-dimensional space of the predictor
variables in place of the additive smooths of one
of calibratingnon-linearmodels withina unified
variableat a time.Whilethismaybe reasonablefortwo
statistical
framework. It remainsto be stressed,how-
predictors,the approachrapidlygets unwieldlyand
ever,thatthese methodsare currently undergoing unstable.Unlessthedata are exceedinglyplentiful and
activedevelopment. Most importantly, theGAM has well distributed, the smoothin any partof the space
recentlybeen extendedto otherdata structures and will be based on few observationsand therefore will
to otherresearchdesigns.These includesituations be unstable.Alternatively, the procedurewillhave to
wheretheresponseis orderedover time(Cleveland look further and further for 'neighbours'so that the
etal., 1991), wheretheresponseis an orderedseries estimatedsmoothmaybe severelybiased.Moreover,
of categories,theso-calledproportional-odds model in suchmodelsit is verydifficult to assess theseparate
influence of a particular
predictor variable.
(Hastieetal.,1989),wheretheobservations consistof
6. The program estimates how many degrees have
cases and matched-comparisons (Breslow,1987),and
survivaldata, where timeto the occurrenceof the actuallybeen consumedin thefitting process.As with
all linearsmoothers, smoothsbased on thecubicsplines
responseeventis veryskewand subjectto censoring can be writtenas a function of theactualvalues.The n
so thatforsome unitsthe event has not occurred by n matrixthat'maps' the smoothon to the actual
by the end of the observationalperiod(Hastie and valuesis knownas thesmoothingmatrix;itis thetrace
Tibshirani,1990b). of thismatrix(thesumof its eigenvalues)thatis used
446 KELVYN ]JONESand SIMON ALMOND
to estimate the degrees of freedom that have been BRESLOW, N. S. and DAY, N. E. (1980) Statistical methods
consumedin thefit.An important consequenceis that in cancerresearch, volumeII, theanalysisof case-control
theeffective degreesoffreedomconsumedduringthe studies(International Agency for Researchon Cancer,
fittingprocess is continuous,and not restrictedto Lyon)
integernumbers.The detailsare given in Hastie and BRESLOW, N. S. and DAY, N. E. (1987) Statistical methods
Tibshirani(1990a, AppendixB). incancerresearch,volume II, thedesignandanalysisofcohort
7. The schemeis intendedto conveyconceptuallywhat studies(International Agency for Researchon Cancer,
is happeningduringmodel calibration.In practicea Lyon)
moreefficient algorthim is used,thedetailsaregivenin BRYMAN, A. (1988) Quantity and qualityinsocialresearch
Hastieand Tibshirani(1990a, p. 125). (UnwinHyman,London)
8. For the class of smootherswhichincludesthe cubic BUJA,A., HASTIE, T. and TIBSHIRANI,R. (1989) 'Linear
spline,Buja et al. (1989) have proved thatthe back- smoothers and additive models', Ann. Statist. 17:
fittingalgorithm willconverge. 453-555
9. Details on softwarearegiveninAppendixC ofHastie CLEVELAND, R. B.,CLEVELAND, W. S.,McRAE,J.E. and
and Tibshirani(1990a); Nelder(1989) providesappro- TERPENNING, I. (1991) 'STL: a seasonal trend de-
priate code for the softwarepackage GLIM, with compositionprocedurebased on LOESS', i. Offic. Statist.
amendments in Stasinopoulos(1990). 6: 3-73
10. The originalanalysisused 29 rainfallstations,butthe CLIFF,A. D. and ORD, J.K. (1975) 'The comparisonsof
two most northernones were omittedin the present meanswhensamplesconsistof spatiallyautocorrelated
analysis. These were obvious outliers,reflectinga observations', Environ. Plann.A 7: 725-34
different climaticregimein the extremenorthof the COX, N. J.andJONES,K. (1981) dataanalysis'
'Exploratory
state. in WRIGLEY,N. and BENNETT, R. J.(eds) Quantitative
11. For an explanationof non-integer degreesof freedom geography: a Britishview (Routledgeand Kegan Paul,
see note6. London),pp. 135-43
12. GAMFIT when calibratingthe model does not work togeneralizedlinear
DOBSON, A. J.(1990) An introduction
on the individualpixelsbut on all the unique combi- models(Chapmanand Hall,London)
nationsofthepredictor variables.Thus,itis thelogitof
DUNN, R. (1989) 'Buildingregressionmodels:theimport-
the proportionof pixels that are woodland at each ance ofgraphics',J.Geogr.HigherEduc.13: 15-30
unique combinationof the predictorvariablesthatis ERDAS (1990) PC ERDAS 7.4systems manuals(ERDAS Inc.,
actuallymodelledin theprogram. Atlanta)
13. This 'blowing-up'does occur where there are few
ESRI (1990) PC ARC/INFO 3.4D manualset(ESRI Institute,
pixels at the extremesof the graphsin Figure4, es- Redlands,California)
peciallyat highvaluesoftheGreenband;theyarenot,
however,associatedwithanymarkednon-linearities.
EUBANK, R. L. (1988) Smoothing splinesand nonparametric
regression(Marcel Dekker, New York)
14. By 1990, themessagewas 'we need to use ourflexible
A. S. and KNUDSEN, C. (1987)
models with some caution' Hastie and Tribshirani FOTHERINGHAM,
'Goodness-of-fit statistics'Conceptsand Techniquesin
(1990a, p. 287).
Modern Geography46 (EnvironmentalPublications,
Norwich)
REFERENCES GOODALL, C. (1990) 'A surveyofsmoothingtechniques'
in FOX, J. and LONG, J. S. (eds) Modern methods
ALMOND, S. and JONES,K. (1992) 'Predictingland-use
fromsatelliteimagery:a non-parametric of data analysis (Sage, Newbury Park, California),
logitapproach
based on additive smooths',mimeo (Departmentof pp. 126-76
of GOULD, P. R. (1969) 'Is Statistix inferensthegeographical
Geography,University Portsmouth)
name for a wildgoose? Econ. Geogr. 46: 439-48
BERK,R. A. (1990) 'A primeron robustregression'inFOX,
Statistical methods and thegeographer
J.and LONG, J.S. (eds) Modernmethods ofdataanalysis GREGORY, S. (1963)
(Sage,NewburyPark,California), pp. 292-324 (Longman,London)
in thesocialand
BOLLEN, K. A. and JACKMAN,R. W. (1990) 'Regression HAINING, R. (1990) Spatialdata analysis
an of outliersand environmental sciences (Cambridge University Press,
diagnostics: expositorytreatment
influentialcases' inFOX,J.and LONG, J.S. (eds) Modern Cambridge)
R. (1986) 'Generalized
methods ofdataanalysis(Sage,NewburyPark,California), HASTIE, T. and TIBSHIRANI,
additivemodels',Statist.Sci. 1: 297-318
pp. 25 7-91
BOX, J.F. (1978) R. A. Fisher:thelifeofa scientist (Wiley, HASTIE, T. and TIBSHIRANI, R. (1990a) Generalized
New York) additivemodels(Chapmanand Hall,London)
BRESLOW, N. S. (1987) 'Statisticaldesignand analysisof HASTIE, T. and TIBSHIRANI, R. (1990b) 'Exploringthe
epidemiologicalstudies',(Departmentof Biostatistics natureof covariateeffectsin the proportionalhazards
Technicalreport81, University ofWashington) model',Biometrics 46: 1005-16
Movingoutofthelinearrut 447
HASTIE, T., BOTHA, J.L. and SCHITZLER, C. M. (1989) tunities'in THOMAS, R. W. (ed.) Spatialepidemiology
'Regression with an ordered categorical response', (Pion,London)pp. 48-78
Statist.Medic.8: 785-94 O'BRIEN, L. G. (1983) 'Generalizedlinearmodellingusing
HEALY, M. J.R. (1988) GLIM: an introduction (Clarendon theGLIM system'Area 15: 327-36
Press,Oxford) O'BRIEN, L. G. and WRIGLEY, N. (1984) 'A GLM
JOHNSTON, R. J.(1978) Multivariate statistical
anlysisin approach to categorical data analysis: theory and
geography (Longman,London) applications in geography and regional science' in
JONES, K. (1984) 'Graphical methods for exploring BAHRENBERG,G., FISCHER,M. M. and NIJKAMP,P.
relationships'in BAHRENBERG,G., FISCHER, M. M. (eds) Recentdevelopments inspatialdata analysis(Gower,
and NIJKAMP,P. (eds) Recent developmentsinspatialdata Aldershot),pp. 231-51
analysis(Gower,Aldershot),pp. 215-27 POOLE, M. A. and O'FARRELL,P. N. (1971) 'The assump-
JONES,K. (1990) 'What'shidingbehindstatisticalmaps?', tions of the linearregressionmodel', Trans.Inst.Brit.
Bull.Soc. Univers.Cartog.24: 23-30 Geogr.52: 145-58
JONES, K. (1991a) 'Specifyingand estimatingmultilevel PRATT, G. (1989) 'Quantitative techniques and
modelsforgeographicalresearch',Trans.Inst.Brit.Geogr. humanistic-historicalmaterialist perspectives' in
NS 16: 148-59 KOBAYASHI, A. and MACKENZIE, S. (eds) Remaking
JONES, K. (1991b) 'Multilevelmodels for geographical humangeography (UnwinHyman,Boston)pp. 101-15
research',Concepts and Techniques in ModernGeography SILVERMAN, B. W. (1985) 'Some aspects of the spline
54 (Environmental Publications,Norwich) smoothingapproachto nonparametric regressioncurve
JONES, K., JOHNSTON, R. J.and PATTIE, C. J.(1992) J.Roy.Statist.Soc.SeriesB 47: 1-52
fitting',
'People,places,regions:exploringtheuse of multilevel STASINOPOULOS, M. (1990) 'Correctionto "General-
modellingin the analysisof electoraldata', Brit.J.Polit. ised Additive Models and GLMS",' GLIM Newsletter
Sci. (inpress) 20: 4
JONES,K. and MOON, G. (1987) Health,diseaseandsociety TAYLOR, P. J. (1980) 'A pedagogic application of
(Routledge,London) multipleregressionanalysis:precipitation in California',
KORK (1990) KorkDigitalMappingSystem, KDMS reference Geography 65: 203-12
guide,(KorkSystemsInc.,Bangor,ME) TITTERINGTON, D. M. (1985) 'Common structureof
MARSH, C. (1988) Exploring data(PolityPress,Oxford) smoothingtechniquesin statistics', Statist.Rev.
Internat.
McCULLAGH, P. and NELDER, J.A. (1989) Generalized 53: 141-70
linearmodels, 2nd edition(Chapmanand Hall,London) TUKEY, J.W. and WILK, M. B. (1966) 'Data analysisand
McFADDEN, D. (1974) 'Conditionallogitanalysisofquali- statistics:an expositoryoverview',Amer.Feder.Inform.
tativechoicebehaviour'inZAREMBKA,P. (ed.)Frontiers Process.Soc.Confer. Proc.29: 695-709
ineconometrics (AcademicPress,New York)pp. 105-42 UNWIN, D. J. (1975) 'An introductionto trendsurface
MILES,M., STOW, D. A. andJONES,J.P. (1992) 'Incorpor- analysis',Concepts and Techniques inModernGeography 5
atingthe expansionmethodintoremotesensing-based (Geo Abstracts, Publications,Norwich)
waterqualityanalyses'in CASETTI, E. and JONES,J.P. WILLIAMSON, H. D. (1992) 'Developing a methodology
(eds) Applicationsof the expansionmethod(Routledge, forestimating grasslandvariableswithremotelysensed
London)pp. 279-96 data',Area24: 36-44
NELDER, J.A. (1989) 'Generalizedadditivemodels and WRIGLEY, N. (1983) 'Quantitativemethods:on data and
GLM's', GLIM Newsletter 18: 4-13 diagnostics',Prog.Hum.Geogr.7: 567-77
NELDER, J. A. and WEDDERBURN, R. W. M. (1972) WRIGLEY, N. (1984) 'Quantitativemethods:diagnostics
'Generalizedlinearmodels',J.Roy.Statist.Soc. SeriesA revisited',Prog.Hum.Geogr.8: 525-35
135: 370-84 WRIGLEY, N. (1985) Categorical data analysisforgeogra-
NOREEN, E. W. (1988) Computer-intensive methodsfor phersandenvironmental scientists
(Longman,London)
testing (Wiley,New York)
hypotheses WRIGLEY,N. and DUNN, R. (1986) 'Graphicaldiagnostics
OPENSHAW, S. (1990) 'Automatingthesearchforcancer for logistic oil exploration models', Mathematical
clusters:a review of problems,progressand oppor- Geology18: 355-74