GIS and Geostatistics: Essential Partners For Spatial Analysis

Environmental and Ecological Statistics 8, 361377, 2001
GIS and geostatistics: Essential partners for spatial analysis

P. A . B U R RO U G H
Utrecht Centre for Environment and Landscape Dynamics (UCEL), Faculty of Geographical Sciences, Utrecht University, Post Box 80.115, 3508 TC Utrecht, The Netherlands E-mail: p.burrough@geog.uu.nl Received June 1999; Revised May 2001 Initially, geographical information systems (GIS) concentrated on two issues: automated map making, and facilitating the comparison of data on thematic maps. The rst required high quality graphics, vector data models and powerful data bases, the second is based on grid cells that can be manipulated by suites of mathematical operators collectively termed ``map algebra''. Both kinds of GIS are widely available and are taught in many universities and technical colleges. After more than 20 years of development, most standard GIS provide both kinds of functionality and good quality graphic display, but until recently they have not included the methods of statistics and geostatistics as tools for spatial analysis. Recently, standard statistical packages have been linked to GIS for both exploratory data analysis and statistical analysis and hypothesis testing. Standard statistical packages include methods for the analysis of random samples of cases or objects that are not necessarily co-located in spaceif the results of statistical analysis display a spatial pattern then that is because the underlying data also share that pattern. Geostatistics addresses the need to make predictions of sampled attributes (i.e., maps) at unsampled locations from sparse, often expensive data. To make up for lack of hard data geostatistics has concentrated on the development of powerful methods based on stochastic theory. Though there have been recent moves to incorporate ancillary data in geostatistical analyses, insufcient attention has been paid to using modern methods of data display for the visualization of results. GIS can serve geostatistics by aiding geo-registration of data, facilitating spatial exploratory data analysis, providing a spatial context for interpolation and conditional simulation, as well as providing easy-to-use and effective tools for data display and visualization. The value of geostatistics for GIS lies in the provision of reliable interpolation methods with known errors, methods of upscaling and generalization, and for supplying multiple realizations of spatial patterns that can be used in environmental modeling. These stochastic methods are improving understanding of how errors in models of spatial processes accrue from errors in data or incompleteness in the structure of the models. New developments in GIS, based on ideas taken from map algebra, cellular automata and image analysis are providing high level programming languages for modeling dynamic processes such as erosion or the development of alluvial fans and deltas. Research has demonstrated that these models need stochastic inputs to yield realistic results. Non-stochastic tools such as fuzzy subsets have been shown to be useful for spatial analysis when probabilistic approaches are inappropriate or impossible. The conclusion is that in spite of differences in history and approach, the linkage of GIS, statistics and geostatistics provides a powerful, and complementary suite of tools for spatial analysis in the agricultural, earth and environmental sciences.
1352-8505 # 2001 Kluwer Academic Publishers
362
Burrough
Keywords: geographic information systems, geostatistics, statistical methods, spatial analysis, environmental modeling, map algebra, fuzzy sets
1352-8505 # 2001 Kluwer Academic Publishers
1. IntroductionGIS, statistics and geostatistics

Geographical information systems, in the sense of computer tools for handling spatial data (Burrough and McDonnell, 1998), have been used since the late 1960s (Coppock and Rhind, 1991). Their initial development was mainly in North America, stimulated by the need to map, plan and manage large areas of terrain, but major contributions came also from Britain and other European countries, and from Japan and Australasia. Initially there were two different kinds of GIS. The rst kind, dominated by cartographers, aimed at automating the map making process: ultimately this was to replace the paper map by the much more exible electronic database. Initially, the essential ingredients of this approach were geometrical accuracy, and elegant hard copy output. The second approach, pioneered by the Harvard Laboratory for Computer Graphics, focused on spatial analysis, in particular the overlaying of different thematic maps so that relations and conicts in land use could be resolved. Whereas the rst approach was an automated version of the cartographer's eye, arm and hand, and insisted on full cartographic design standards, the Harvard approach concentrated on the clever combination of data linked to a gridded division of space. As the computer output devices of the time were limited to line printers having a unit cell measuring 1=661=10 inch, differences in values could only be indicated by overprinting different alphanumeric characters, so the gridded (or raster maps) were not at all pretty. GIS anno 1980 consisted of two opposing camps, the one with expensive, beautiful, but essentially dumb products that were the electronic equivalent of paper maps; the other, a sort of mapping spreadsheet, in which spatial analysis could be carried out with great mathematical exibility, but ugly results and huge demands on the then limited computer memories. Developments in computer technology and the analysis of remotely sensed images has reinforced the gridded approach for environmental study. Iinitially, however, the differences in budgets and apparatus between the remote sensing professionals and environmental scientists ensured that raster GIS and the classication and display of remotely sensed images remained separate areas of development. Technical advances since the 1980s have ensured that the division of GIS practitioners into two opposing camps has largely disappeared, and the input of gridded maps and remotely sensed images to GIS has now become standard practice. True, there are still arguments today as to whether the raster (gridded) or the vector ( point, line, polygon) approach is better, but the discussion now focuses on the correct choice of spatial paradigm for a given application, and not on the limitations of the approaches per se (Burrough and McDonnell, 1998). Today, most commercial GIS provide facilities for working with raster or vector data, either individually, or in combination. They also provide database facilities for storing, retrieving, modifying the attributes of the spatial entities that have been recognized for the given application, and many also include their own internal programming languages which allow the user to treat the spatial data as inputs to a virtually unlimited range of environmental models (Burrough, 1996).
GIS and geostatistics
363
In brief, GIS are sets of computer tools for the storage, retrieval, analysis and display of spatial data. GIS may also be required to supply data to numerical models of environmental processes (e.g., air quality, water quality and quantity, plant-soilenvironment responses, etc.) and display the results of these models as cartographically acceptable screen or hard copy images. By convention, GIS analyses are almost exclusively deterministic and data are assumed to be exact. Apart from specialists (e.g., Heuvelink and Lemmens, 2000) the GIS community has shown little regard for issues of uncertainty and spatio-temporal variability apart from geometric precision. This is not because of computational problems, but because market forces have determined that many GIS applications need not address these issues.
1.1 GIS and statistics

Statistical theory and practice for describing the average properties of samples, and for hypothesis testing are well known in environmental science. Conventionally, the geographical location of the individual observations is not taken into account, but if these methods are used for attributes of spatially located objects then one may be able to set up and test hypotheses as to whether geographically separate, but eponymous objects (e.g., instances of soil series, land use classes) really share the same sets of attributes. Statistical spatial data analysis (SSDA) (Wise et al., 2001) treats the objects in the spatial data base ( points, lines, areas, pixels) as though they and their attributes were samples from a larger population. As Wise et al. (2001) point out, two main approaches have been developedexploratory spatial data analysis (ESDA) and conrmatory spatial data analysis (CSDA). ESDA is a spatial extension of Tukey's (1977) methods for robust and visual analysis of data: the accent is on descriptive univariate and multivariate statistics (means, deviations, ranges, correlations, principal components) in which one searches for outliers or oddities in the value patterns of the spatial objects under consideration. In CSDA, attention is focused on building empirical regression models and/or the testing of hypotheses. Several standard statistical packages (SPSS, S-plus, etc.) include a wide range of methods for EDA and CDA, though they may not include all the hyper data links envisaged by the developers of ESDA (e.g., Wise et al., 2001). Never the less, today it is comparatively easy to link a statistical analysis of tabular attribute data to a set of geographical objects in a GIS like ARC-VIEW, either via a DBase le (e.g., using SPSS) or embedded links (using S-plus). As an example of simple descriptive statistical analysis linked to GIS, consider Fig. 1, which shows a soil map with three soil types and 126 sample locations. In the study area the soil is usually less than 100 cm thick over bedrock. In a GIS analysis we might want to test the hypothesis that there is no signicant difference in soil thickness between the three soil types so that the map pattern may be simplied without loss of information. Visual inspection of the right hand gure suggests that the different soil types do have different soil thickness, and this is easily conrmed by extracting the observed thickness data for each site and carrying out an ANOVA analysis for all soil types. As Table 1 shows, the mean soil thickness per soil type does differ signicantly; the test suggests that all analysis returns a F-value of 22.67 with p40.001. A post-hoc Scheffe
364
Burrough
Figure 1. Left: Soil prole classes at sample sites (dot is unit Cr, small circle with dot is unit Ct and large circle with dot is unit Ia). Right: Soil thickness at sample sites (dot is 040 cm, small ag is 40 80 cm, and large ag is 4 80 cm).
three soil types have signicantly different means (Table 2) so there is little point in simplifying the soil map. As another example of straightforward statistical analysis using a linked statistics package, Fig. 2 presents the results of carrying out a multivariate discriminant analysis on all the 20 attributes of the soil collected at each of the 126 sample sites. This clearly shows that though the centroids of the three soil types clearly differ in multivariate space, there is considerable overlap.
1.2 GIS and geostatistics

As noted, the standard GIS approach to recording and analyzing the attributes of predened objects implies no spatial variation within an object, and all change occurs at object boundaries. In many applications (hydrology, oceanography, earth sciences, soil
Table 1. Descriptive statistics of soil thickness for each soil type. Soil Type Ct Cr Ia Total N 36 31 59 126 Mean 51.15 67.02 32.76 46.45 Std. Error 4.20 4.48 2.79 2.42

test indicates that all three soil types have signicantly different Table 2. Post hoc Scheffe thicknesses. Subset for Alpha 0.05 Soil Type 3 1 2 Sig. N 59 36 31 1 32,7617 1000 2 51,1528 1000 3
365
67,0232 1000
science to name but a few), this approach is not always sensible and it is better to consider the variation of the attribute in terms of a continuous, but noisy surface. This surface is often constructed by interpolation from sets of point data. Though there are many methods for interpolation (see Burrough and MacDonnell, 1998), most of these treat the data as if they can be modeled by a smooth, differentiable surface and no attention is paid to the uncertainty of the results. The methods of geostatistics (Matheron, 1965; Journel, 1996; Goovaerts, 1997) use the stochastical theory of spatial correlation both for interpolation and for apportioning uncertainty. Although still unfamiliar to many GIS users, in terms of technical development, the
Figure 2. Plot of discriminant functions for all 126 soil observations compared with map classes.
366
Burrough
methods of geostatistics are of similar age to GIS, but have different roots. Whereas GIS was seen as a way to automate the creation of exact, deterministic models of the world in a dominantly cartographic context, geostatistics is about making predictions under conditions of uncertainty and limited information. The path of geostatistics from its founders Krige and Matheron in the 1960s and 1970s to present day exponents such as Journel, Goovaerts and others emphasizes the role of chance in spatial prediction. Where GIS ignores statistical variation, geostatistics uses the understanding of statistical variation as an important source of information for improving predictions of an attribute at unsampled points, given a limited set of measurements. Geostatistics are therefore a very useful `àdd on'' or extension to the GIS toolkit for spatial analysis. A central aspect of geostatistics is the use of spatial autocovariance structures, often represented by the (semi)variogram, or its cousin the autocovariogram, which differentiate different kinds of spatial variation. The semivariance indicates the degree of similarity of values of a regionalized variable Z over a given sample spacing or lag, h. Semivariograms (Fig. 3) are graphs of the semivariance gh against sample spacing or lag, h: they are dened as: 1 gh Var fZ xi Zxi h g 2 and estimated from sampled data by:
n 1 X 2 ^ fzxi zxi h g g h n 2 i1
where n is the number of samples, and zxi ; zxi h are measurements separated by a distance h. In practice, ^ gh is estimated from sets of point samples which can be extracted from the GIS data base. Because experimentally derived semivariances do not always follow a smooth increase with sample spacing, a theoretical variogram model is tted to the data (Burrough and McDonnell, 1998; Deutsch and Journel, 1998; Goovaerts, 1997). The interpolation weights for predicting the value of attribute ^ z at unsampled locations x are derived with the help of this tted model and the method is known as ordinary point kriging (OPK) after its rst exponent. Predictions can also be computed for units of land (blocks) larger than those sampled, thereby smoothing out local variationsthis is known as block kriging. Much practical geostatistics is concerned with the estimation and tting of variograms to experimental data (Pannatier, 1996) followed by interpolation or conditional simulation of gridded surfaces (Pebesma and Wesseling, 1998). Besides interpolation, kriging provides information on interpolation errors. Knowledge of the spatial correlation structures may also be used to generate sets of equiprobable realizations (simulations) of the attribute z that can be of great value for studying error propagation through spatial models that may be linked to the GIS. For many users of GIS, kriging is no more than an alternative method of interpolation (see Burrough and McDonnell, 1998 for references). Indeed, many statisticians and geographers use other methods for statistical spatial analysis (c.f. Bailey and Gatrell, 1995; Cressie, 1991). The general lack of appreciation of geostatistics by the GIS community during the seminal years from the mid-1970s to the mid-1990s was due to many factors, including the publication of Matheron's original treatize in French (Matheron, 1965),
367
Figure 3. Example of a semivariogram tted to experimental data. The numbers indicate the numbers of pairs of points used at each lag.
which is therefore inaccessible to most native English speakers. Until the mid-1990s, the high prices charged for geostatistics software packages and their almost exclusive use by mining corporations made it difcult to teach geostatistics in many universities. Of course, a contributing factor to the lack of interest in geostatistics by the GIS practitioner is its grounding in mathematical statistics which clearly bafes those of us who have little feeling for the statistical treatment of sampling, variance analysis and correlation and regression.
2. The mutual benets of linking GIS, statistics and geostatistics

In this Section I present some examples of the ways in which GIS, statistics and geostatistics complement each other in spatial analysis.
2.1 The value of GIS for geostatistics

Besides acting as a spatial database, GIS provides several benets to statisticians and geostatisticians that are largely concerned with the correct geometric registration of sample data, prior data analysis, the linking of hard and soft data, and the presentation of results. Geo-registration. As with all spatial data, spatial analysis must be carried out on data that have been collected with reference to a properly dened coordinate system. GIS can
368
Burrough
provide the means to register the locations of samples directly (via GPS or other methods), or to convert local coordinates to standard coordinates. The use of standard coordinates ensures that data collected at different times can be properly combined and overlaid on conventional maps. The use of standard coordinate systems is particularly important when international databases are created from different sources, such as occurs in Europe, for example. Exploratory spatial data analysis. As already noted, ESDA is a useful toolkit for examining data prior to analysis. For geostatisticians, the presence and location of spatial outliers, or other irregularities in the data may have important consequences for the tting of variograms, or for determining whether data should be transformed to logarithms. GIS often provide search engines that can be linked to statistical packages to determine whether any given data set contains anomalies or unexpected structure. The underlying reasons for such anomalies may sometimes be easily seen when these data are displayed on a map together with other information. Not all users of ESDA in GIS use conventional geostatistics, however, and other measures of spatial autocorrelation such as Moran's I statistic are often used (Pereira et al., 1998). Spatial context and the use of external information. Increasingly, the suite of geostatistical methods currently available allow the user to incorporate external information that can be used to modify, and possibly improve, the predictions or simulations required. Geostatisticians term the external information ``secondary'', because they believe that the ``hard data'' measured at the sample locations is most important. But GIS practitioners might prefer to call the ``primary data'' that which separates a landscape into its main componentsdifferent soils, or rock types, or land cover classes, regarding the sampled data as merely lling in the details that were not apparent at the smaller map scale. In any case, GIS makes it possible to incorporate data from other aspects of the environment with the geostatistical study of autocorrelation structures, so that differentiated knowledge of different patterns of variation can be used to best effect. For example, in the c. 5 6 2 km study area used in Principles of Geographical Information Systems (Burrough and McDonnell, 1998) the distribution of heavy metals (zinc) in the top soils of the river alluvium was clearly inuenced by ooding regime, which in turn is affected by factors such as distance from the river and the relative elevation of the oodplain. Fig. 4 shows how the extra information may be used in several ways. Stratied kriging involves dividing the original set of 155 soil samples into classes based on ooding frequencya simple ``point-in-polygon'' search in GISto yield three strata. Variograms were estimated for each stratum and these were interpolated to yield a single map (Fig. 4b). In a second approach, a multiple regression model was computed from the triplets of zinc level, elevation and distance to river measured at all data points (Fig. 4c). A third approach, known as `Ùniversal kriging'' directly incorporates the trend in the estimation of the interpolation weights and Fig. 4d illustrates how both stratication and trends may be combined. The results clearly show the differences in the patterns obtained with and without the ancillary data. The single, or combined incorporation of external information through stratication and strata-specic trends yielded maps with good levels of prediction and a spatial resolution that was better than could have been obtained from ordinary point kriging alone. Other examples are given in Goovaerts (1997, 1999). Display and visualization2D, 3D, plus time. Who is the recipient of a geostatistical interpolation? If a geostatistician, or statistician, then simple maps and tables of numbers
369
Figure 4. Results of interpolating the ln(Zinc) levels of topsoils (010 cm) in a frequently ooded part of the Maas oodplain, Limburg, NL. a: ordinary point kriging, b: OPK within different ooding strata, c: using a regression model based on elevation and distance from the river, d: universal kriging with a single trend, e: universal kriging with stratication and different trends for each stratum.
may sufce, but environmental managers need to see how the results relate to other aspects of the terrain. Today it is easy to import the results of a kriging interpolation into a GIS and display the results in conjunction with a scanned topographic map, or display them in 3D over a digital elevation model (DEM) of the landscape from which the samples were taken (Fig. 5). Such presentation invites visual interpretation, the re-evaluation of results and the discovery of more information, and therefore is an essential part of the spatial analysis process.
370
Burrough
Figure 5. 3-Dimensional display of interpolation results obtained from stratied kriging on a digital elevation model with shading and transparency oated above a scanned topographic map. Dark gray zones indicate heavy metal concentrations.
2.2 The value of geostatistics for GIS

Besides providing powerful means of interpolating point data to areas, there are many useful ways in which statistics and geostatistics can bring major improvements to the understanding of uncertainty and error in GIS-based spatial analyzes. This is particularly so for most kinds of GIS-based environmental modeling where a priori we are dealing with incomplete data and uncertainty. Indeed, to pretend, as the standard GIS paradigms do, that all data are exactly known, and exactly located, is not to recognize reality Geostatistics provides at least the following attractive options for environmental GIS and environmental decision support systems: interpolation from point data and estimates of error bounds, estimates of error propagation and uncertainty ranges for spatial and temporal modeling, and data reduction and generalization. Interpolation errors. Although surfaces interpolated by kriging are smooth, all forms of kriging yield estimates of the estimation uncertainty or kriging error. Such values can be mapped to provide error surfaces which can be combined with other information. Kriging errors depend on the form of the variogram and the disposition of observationsthe more
371
data surrounding an unsampled location, and the stronger the autocorrelation structure, the lower the estimation variance. Error propagation in spatial models. When data from interpolated surfaces are used as inputs to numerical models, the error surfaces associated with kriging interpolation may be used to understand the propagation of errors through spatial models. Heuvelink (1998) gives both theory and examples of using Taylor series expansion on interpolated data to compute error propagation through cartographic modelssee also Burrough and McDonnell (1998). An increasingly popular alternative to the Taylor expansion method is to use methods of conditional simulation (Pebesma and Wesseling, 1998) to provide sets of multiple realizations of data surfaces for inputs to numerical models like the 3D groundwater model ``MODFLOW'', so that error propagation and model sensitivity can be followed using z-Herna ndez and Journel, 1992). Monte Carlo methods (e.g., Bierkens, 1994; Gome Monte Carlo techniques using conditional simulation may also be useful for comparing data collected at different times and locations within the same area. Recent work on the redistribution of137Cs fallout from the Chernobyl nuclear disaster in 1986 has shown that the normal decay of radiocaesium levels and uptake rates in cow's milk can be temporally reversed if the cows are grazing on recently ooded, poorly drained peat soils (Burrough and McDonnell, 1998; Burrough et al., 1999a). The data for these studies consisted of radionuclide determinations made on bulked soil samples taken in 1988 and 1993. Unfortunately, the samples were collected at different sites in the two years, so it was difcult to use the raw data to test the hypothesis that the ood events had really enhanced radio caesium levels near the rivers. However, by computing the variograms for the data sets from both years and using these to compute sets of conditional simulations of the normalized differences of radiocaesium in the topsoil between the two sampling times and at all sampled sites, it was possible to establish a clear relation between the incidence of ooding and ood-induced enhancement of radiocaesium which could enter the food chain (Burrough et al., 1999a). Fig. 6 shows clearly that although there seem to be systematic differences between the two years (mean values for 1993 exceed those for 1988 by 0.51.0 standard errors) sites within 1.5 km of a ooding river are not only more variable, but many have higher levels of radio caesium. Data reduction and spatial generalization. In some applications there may be too much data, which may need to be reduced to manageable proportions or common coordinates. An example is the need to compare the yields of different crops over several years on the same plot when yields have been recorded using data loggers and GPS. For example, Burrough and Swindell (1997) report the collection of annual yield data for three successive crops on a 5 ha eld at the experimental farm of the Royal College of Agriculture, Cirencester, UK. Data were collected on wheat, barley and oilseed rape in successive years by a combine harvester tted with a data logger whose location was pinpointed by locally referenced GPS. The spatial resolution of the sample was approximately 4 m (the width of the harvester) 6 2.5 m (along the cut), and each survey yielded some 2000 samples or more. Because of locational noise in the GPS and errors in the amount of crop cut each 2.5 m by the harvester, it was not possible to relate the yields of the three crops directly to location in the eld nor to investigate links between crop yields and soil conditions. To generalize and smooth the data, for each year an isotropic variogram was computed: the data were then interpolated to a common grid of 2.5 m resolution using block kriging with
372
Burrough
Figure 6. Plots of conditional simulations for the 19881993 normalized differences of 137Cs at data points, with distance to rivers that ood.
units of 25 6 25 m. Each annual map was normalized to give a map showing relative yield; these three maps were then combined to give a three year, normalized average. Comparison of the normalized average yield map with a computer enhanced, scanned aerial image of the site (Fig. 7) demonstrates clear relations between site conditions and normalized crop yields that otherwise were not apparent.
Figure 7. Comparison between aerial photo image of eld A and displayed on its right, the average, standardized crop yields as interpolated using block kriging.
373
Geostatistics and remote sensing. The applications of geostatistical methods in the analysis of remotely sensed images is a topic in itself. Here I refer the reader to the recent issue of Photogrammetric Engineering and Remote Sensing (January, 1999) for a recent compilation of research. Remote Sensing applications of geostatistics have less to do with interpolation from sparse data (the images are complete unless masked by cloud cover in which interpolation could be used to ll in the gaps) than with the description and analysis of gridded, stochastic surfaces and the simulation of multiscale data sets.
3. Stochastic inputs to the modeling of spatial processes

As already indicated, geostatistical methods of conditional simulation are useful for following the propagation of errors through spatial models that may be linked to, or run from GIS. Recent research in the modeling of dynamic spatial processes (van Deursen, 1995; Takeyama and Couclelis, 1997; Wesseling et al., 1996) indicates the value of including an understanding of errors and roughness in many models of dynamic spatial processes, particularly when processes are non-linear. Stability of the topology of drainage nets. The automatic derivation of surface topology from gridded digital elevation model is now a standard operation in GIS that are used for hydrological projects (Fig. 8a). The usual procedure is to use thin plate splines to interpolate a DEM (digital elevation model) from digitized contours to a ne grid so that the resulting topological net is free from discontinuities (Mitasova and Hoerka, 1993). Unfortunately, although smooth interpolators guarantee continuity in surface topology, they also constrain the topology to a single set of drainage lines, which may result in serious artefacts in hydrological derivatives such as wetness indices (see Burrough and MacDonnell, 1998 for denitions). Simple methods, such as the D8 algorithm, for deriving drainage nets from gridded surfaces, produce a unique solution in which the main stream line is only one cell wide (e.g., Fig. 8a). Large differences in the size of the upstream contributing catchment area between a cell on the main drainage line and its off-line neighbor may arise. This is counter-intuitive, because we expect cells close to each other to have similar conditions and contributing areas, especially in the bottoms of valleys. A
Figure 8. a: Single realization of a drainage network derived from a smooth DEM; b: average image computed from 100 realizations derived from the initial DEM plus 10 cm root mean square (RMS) error.
374
Burrough
better idea of surface water drainage may be obtained by considering the average properties of a suite of possible drainage nets that are obtained when surface roughness is added to the DEM. The roughness can easily be modeled by a small Gaussian noise which is added to each cell (a standard deviation equal to 0.1% of the maximum relief difference in the area is enough as a rst approximation); the result yields one possible realization of the net. Repeating the procedure for 1001000 times with different random values for roughness creates an average probability density map of the cumulative contributing area (Fig. 8b) which appears to be more realistic than the single deterministic solution. Note that one cannot compute Fig. 8b by passing a moving window smoothing function over Fig. 8a. The effects of small errors on the derived ow paths may be effectively demonstrated by displaying the whole set as a movie, when the amplitudes and locations of the swings of drainage paths resulting from the minor errors will become very apparent. Though this example uses spatially uncorrelated noise for each realization of the DEM surface, one could of course examine the effects of spatially correlated noise on the model by rst creating a set of conditional simulations based on a known or assumed variogram. Repeating the analysis for multiple realizations and displaying these using dynamic visualization enhances understanding of the results. Adding stochasticity to make a deterministic process model work properly. In certain situations it appears to be necessary to add roughness to a surface so that a well-known deterministic process can be modeled effectively, and this is illustrated using the example of the creation of an alluvial fan. If a hillside is modeled as a smooth inclined plane, then the topology consists merely of a set of parallel lines that run from top to bottom, much like the way rain falling on the windscreen of a stationary car runs off in parallel streams. These streams can be ``forced'' to merge if the initial surface is roughend (e.g., Liverpool and Edwards, 1995). In the case of the alluvial fan, each `èvent'' by which material falls down the slope and is added to the fan modies the surface roughness in a way that is very difcult to predict, but which must not be ignored. So the initial roughness is modied by feedback from the sedimentation process so that for each cycle there is a new surface for the ow and deposition. If the deposits are sufciently large, the surface topology changes with each cycle. The need for initial roughness which is modied but maintained during the development of the delta is a nice example of how a better understanding of the physical process may arise by linking geostatistics with interactive dynamic modeling. Ongoing research in Utrecht and elsewhere is beginning to demonstrate the value of conditional simulation in dynamic, as well as static models of landscape change (see Karssenberg et al., in press).
4. Non-stochastic tools for analyzing uncertainty in spatial data: fuzzy subsets

In many situations we know there is uncertainty, but we do not know, nor can we construct probability distributions. We may also be uncertain how to dene the geographical objects in the data base (Burrough and Frank, 1996). The development of fuzzy subsets in environmental science is increasingly being seen not as a replacement for statistics and
375
geostatistics, but as a complementary suite of methods for operating in uncertain conditions. The main uses of fuzzy subsets in GIS are for the selection and retrieval of data under conditions of uncertainty (eg., Burrough and McDonnell, 1998; Canters, 1997), and in creating multivariate classes that overlap (fuzzy k-means) (Burrough et al., 1999b). Data retrieval using fuzzy subsets has been demonstrated to be less error prone than conventional Boolean SQL methods (Heuvelink and Burrough 1993). Fuzzy memberships can be interpolated using kriging (de Gruijter et al., 1997; Burrough and McDonnell, 1998) and the application of fuzzy k-means to derivatives of digital elevation models provides convincing and objective methods for classifying terrain (Burrough et al., 2000, 2001). Fuzzy subsets can also be used to address issues of the crispness of spatial boundaries (e.g., Lagacherie et al., 1996) or the intervisibility across 3D surfaces (Fisher, 1995). Fuzzy subsets may also be used to dene sensible ways to select point data for kriging.
5. Conclusions
This review has demonstrated that GIS, statistics and geostatistics have much to give to each other, particularly when GIS are used for environmental analysis. Geostatistics benet from having standard methods of geographical registration, data storage, retrieval and display, while GIS benets by being able to incorporate proven methods for testing hypotheses and for handling and understanding errors in data and illustrating their effects on the outcomes of models used for environmental management. In some situations, geostatistics may be supplemented by non-probabilistic methods of handling uncertainty such as provided by fuzzy subsets.
References
Bailey, T.C. and Gatrell, A.C. (1995) Interactive Spatial Data Analysis, Longman, Harlow, 413 pp. Bierkens, M.F.P. (1994) Complex Conning Layers: A Stochastic Analysis of Hydraulic Properties at Various Scales, Royal Dutch Geographical Association (KNAW)/Faculty of Geographical Sciences, University of Utrecht, Utrecht, NL. Burrough, P.A. (1996) Opportunities and limitations of GIS-based modeling of solute transport at the regional scale. In: Application of GIS to the Modeling of Non-Point Source Pollutants in the Vadose Zone, SSSA Special Publication 48, Soil Science Society of America, Madison, 1937. Burrough, P.A. and Frank, A. (1996) (eds), Geographic Objects with Indeterminate Boundaries, GISDATA Series 2, Taylor and Francis, London. Burrough, P.A., van Gaans, P.F.M., and MacMillan, R.A. (2000) High-resolution landform classication using fuzzy k-means. Journal of Fuzzy Sets and Systems, 113, 3752. Burrough, P.A., van Gaans, P.F.M., Wilson, J., and Hansen, A.J. (2001) Fuzzy k-means classication of topo-climatic data as an aid to forest mapping in the Greater Yellowstone Area, USA. Landscape Ecology, 16, 52346. Burrough, P.A. and McDonnell, R.A. (1998) Principles of Geographical Information Systems, Oxford, Oxford University Press, 330 pp. Burrough, P.A. and Swindell J. (1997) Optimal mapping of site-specic multivariate soil properties. In Precision Agriculture: Spatial and Temporal Variability of Environmental Quality, J. Lake, G. Bock, and J. Goode (eds), Proc: CIBA Foundation Symposium 210, John Wiley and Sons, Chichester, pp. 20820.
376
Burrough
Burrough, P.A., van der Perk, M., Howard, B., Prister, B., Sansone, U., and Voitsekhovitch, O.V. (1999a) Environmental mobility of Radiocaesium in the Pripyat Catchment, Ukraine/Belarus. Water, Air and Soil Pollution, 110, 3555. Burrough, P.A., van Gaans, P.F.M., and MacMillan, R.A. (2000) High-resolution landform classication using fuzzy k-means. Journal of Fuzzy Sets and Systems, 113, 3752. Canters, F. (1997) Evaluating the uncertainty of area estimates derived from fuzzy land-cover classication. Photogrammetric Engineering and Remote Sensing, 63, 40314. Coppock, J.T. and Rhind, D.W. (1991) The history of GIS. In: Geographical Information Systems, Vol. 1, Principle, D.J. Maguire, M.F. Goodchild, and D.W. Rhind (eds), Longman Scientic and Technical, New York, pp. 2143. Cressie, N. (1991) Statistics for Spatial Data, Wiley, New York, 900 pp. De Gruijter, J.J., de Walvoort, D., and van Gaans, P. (1997) Continuous soil mapsa fuzzy set approach to bridge the gap between aggregation levels of process and distribution models. Geoderma, 77, 16995. Deutsch, C. and Journel, A.G. (1998) GSLIB Geostatistical Handbook, 2nd edition, Oxford. Fisher, P.F. (1995) An exploration of probable viewsheds in landscape planning. Environment and Planning B: Planning and Design, 22, 52746. z-Herna ndez, J.J. and Journel, A.G. (1992) Joint sequential simulation of multigaussian elds. Gome In: A. Soares (ed), Proc. Fourth Geostatistics Congress, Troia, Portugal. Quantitative Geology and Geostatistics, (5), 8594, Dordrecht, Kluwer Academic Publishers. Goovaerts, P. (1997) Geostatistics for Natural Resources Evaluation, Oxford University Press, 483 pp. Goovaerts, P. (1999) Using elevation to aid the geostatistical mapping of rainfall erosivity. CATENA, 34, 22742. Heuvelink, G.B.M. (1998) Error Propagation in Environmental Modeling, Taylor and Francis, London, 127 pp. Heuvelink, G.B.M. and Burrough, P.A. (1993) Error propagation in cartographic modeling using Boolean logic and continuous classication. Int. J. Geographical Information Systems, 7, 231 46. Heuvelink, G.B.M. and Lemmens, T. (2000) (eds), Accuracy 2000. Proceedings of the 4th International Meeting on Accuracy in Spatial Data, Amsterdam, July, Delft University Press, Delft. Karssenberg, D.J., Torqvist, T., and Bridges, J. (2001) Conditioning a process-based model of sedimentatry architecture to well data. Journal of Sedimentary Research, 71(6). Lagacherie, P., Andrieux, P., and Bouzigues, R. (1996) Fuzziness and uncertainty of soil boundaries: from reality to coding in GIS. In: P.A. Burrough and A.U. Frank (eds), Geographical Objects with Indeterminate Boundaries, Taylor and Francis, London, pp. 27586. Liverpool, T. and Edwards, S. (1995) Modeling meandering rivers. Physical Review Letters, 75, 3016. Matheron, G. (1965) La Theorie des Variables Regionalisee et ses Applications, Masson, Paris. Mitasova, H. and Hoerka, J. (1993) Interpolation by regularized spline with tension: Application to terrain modeling and surface geometry analysis. Mathematical Geology, 25, 65769. Pannatier, Y. (1996) Variowin. Software for spatial data analysis in 2D. Statistics and Computing, Springer Verlag, Berlin, 91 pp. Pebesma, E. and Wesseling, C.G. (1998) GSTAT: A program for geostatistical modeling, prediction and simulation. Computers and Geosciences, 24, 1731. Pereira, J.M.C., Carreiras, J.M.B., and Perestrello de Vasconcelos, M.J. (1998) Exploratory data analysis of the spatial distribution of wildres in Portugal 19801989. Geographical Systems, 5, 35590. Takeyama, M. and Couclelis, H.M. (1997) Map dynamics: integrating cellular automata and GIS through Geo-Algebra. International Journal of Geographical Information Science, 11, 7392.
377
Tukey, J.W. (1977) Exploratory data analysis, Addison-Wesley, Reading, Massachusets. Van Deursen, W.P.A. and Wesseling, C.G. (1995) PCRaster, Department of Physical Geography, Utrecht University. Wesseling, C.G., Karssenberg, D., Burrough, P.A., and van Deursen, W.P.A. (1996) Integrating dynamic environmental models in GIS: The development of a dynamic modeling language. Transactions in GIS 1, 408. Wise, S., Haining, R., and Ma, J. (2001) Providing spatial statistical data analysis functionality for the GIS user. The SAGE project. International Journal of Geographical Information Science, 15, 239254.
Biographical sketch
Peter A. Burrough, since 1984, is Professor of Physical Geography and Geographical Information Systems, Faculty of Geographical Sciences, University of Utrecht. Dr. Burrough is also the Director of the Utrecht center for Environment and Landscape Dynamics (UCEL). He is Chairman of the Interfaculty center for Hydrology, Utrecht (ICHU). He is a member of the advisory committee on Earth Sciences, Physical Geography and Geology for the Dutch National Science Foundation NOW, and a member of the Scientic Board for the ``Fonds voor Wetenschappelijk Onderzoek'' (FWO) for Vlaanderen, Belgium.

GIS and Geostatistics: Essential Partners For Spatial Analysis

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

GIS and Geostatistics: Essential Partners For Spatial Analysis

Transféré par

Droits d'auteur :

Formats disponibles

Environmental and Ecological Statistics 8, 361377, 2001

GIS and geostatistics: Essential partners for spatial analysis

1. IntroductionGIS, statistics and geostatistics

GIS and geostatistics

1.1 GIS and statistics

1.2 GIS and geostatistics

GIS and geostatistics

GIS and geostatistics

2. The mutual benets of linking GIS, statistics and geostatistics

2.1 The value of GIS for geostatistics

GIS and geostatistics

2.2 The value of geostatistics for GIS

GIS and geostatistics

GIS and geostatistics

3. Stochastic inputs to the modeling of spatial processes

4. Non-stochastic tools for analyzing uncertainty in spatial data: fuzzy subsets

GIS and geostatistics

GIS and geostatistics

Vous aimerez peut-être aussi