Vous êtes sur la page 1sur 20

International Journal of Remote Sensing

ISSN: 0143-1161 (Print) 1366-5901 (Online) Journal homepage: https://www.tandfonline.com/loi/tres20

Application of binary logistic regression analysis


and its validation for landslide susceptibility
mapping in part of Garhwal Himalaya, India

J. Mathew , V. K. Jha & G. S. Rawat

To cite this article: J. Mathew , V. K. Jha & G. S. Rawat (2007) Application of binary
logistic regression analysis and its validation for landslide susceptibility mapping in part of
Garhwal Himalaya, India, International Journal of Remote Sensing, 28:10, 2257-2275, DOI:
10.1080/01431160600928583

To link to this article: https://doi.org/10.1080/01431160600928583

Published online: 09 Jul 2007.

Submit your article to this journal

Article views: 448

View related articles

Citing articles: 36 View citing articles

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=tres20
International Journal of Remote Sensing
Vol. 28, No. 10, 20 May 2007, 2257–2275

Application of binary logistic regression analysis and its validation for


landslide susceptibility mapping in part of Garhwal Himalaya, India

J. MATHEW*{, V. K. JHA{ and G. S. RAWAT{


{Regional Remote Sensing Service Centre, ISRO, Dehradun, Uttaranchal, India
{Department of Geology, HNB Garhwal University, Srinagar, Uttaranchal, India

(Received 23 March 2005; in final form 19 July 2005 )

Landslides cause heavy damage to property and infrastructure, in addition to


being responsible for the loss of human lives, in many parts of the Himalaya. It is
possible to take appropriate management measures to reduce the risk from
potential landslide hazard with the help of landslide hazard zonation (LHZ)
maps. The present work is an attempt to utilize binary logistic regression analysis
for the preparation of a landslide susceptibility map for a part of Garhwal
Himalaya, India, which is highly prone to landslides, by taking the geological,
geomorphological and topographical parameters into consideration. Remote
sensing and the geographic information system (GIS) were found to be very
useful in the input database preparation, data integration and analysis stages.
The coefficients of the predictor variables are estimated using binary logistic
regression analysis and are used to calculate the landslide susceptibility for the
entire study area within a GIS environment. The receiver operator characteristic
curve analysis gives 88.7% accuracy for the developed model.

1. Introduction
Every year landslides cause heavy damage to property and infrastructure and the loss
of human lives in the Himalaya region and are influenced by a variety of controlling
factors, such as geology and topography, and triggering factors such as earthquakes
and heavy rainfall (NRSA 2000). A landslide susceptibility mapping effort would
identify the areas prone to the occurrence of landslides, depending upon the geological
and topographical conditions as well as the existing landslide scenario. Such landslide
hazard zonation (LHZ) maps (Anbalagan 1992) can be used for substantially
reducing the risk from potential landslides by adopting appropriate management
measures. The disaster management agency of the state, the state administration and
the border roads organization can utilize the LHZ maps for disaster mitigation,
landslide management and also while planning for any new developmental activities.
The reliability of such landslide susceptibility maps will be controlled by the quality of
data and the methodology or the model used in preparing the maps (Ayalew and
Yamagishi 2005). The geographical information system (GIS) and remote sensing are
widely used in the database generation stage and modelling stage of landslide
susceptibility mapping (van Westen 1993).
The methods available for preparing LHZ maps can be divided into two
categories: qualitative and quantitative (Ayalew and Yamagishi 2005). Qualitative

*Corresponding author. Email: john_isro@yahoo.com


International Journal of Remote Sensing
ISSN 0143-1161 print/ISSN 1366-5901 online # 2007 Taylor & Francis
http://www.tandf.co.uk/journals
DOI: 10.1080/01431160600928583
2258 J. Mathew et al.

methods involve the process of ranking the area into different zones of hazard based
on expert opinions. Many researchers, like Keinholz (1978) and Rupke et al. (1988)
used geomorphological and qualitative methods for assessing landslide hazard. Gee
(1992) suggested the term ‘‘blind weighting method’’ for the qualitative/semi
quantitative approach in LHZ process involving the assignment of weights to the
contributing factors and thereby arriving at a cumulative influence factor. Pachauri
and Pant (1992) demonstrated a weighted landslide hazard mapping procedure in
the Aglar catchment of Himalaya. Gupta et al. (1999) and Saha et al. (2002) used a
parameter-weighting method for landslide hazard zonation mapping in part of the
Bhagirathi valley of the Garhwal Himalaya. Another semi quantitative method is
the analytic hierarchy process (AHP) of Saaty (1980) for deriving the weights for the
parameters and parameter classes, as demonstrated by NRSA (2001) for preparing
landslide hazard zonation maps in the Himalaya. Qualitative and semi-quantitative
methods give ample scope for incorporating expert knowledge in the hazard
zonation mapping process, which is essential to produce reliable LHZ maps. These
methods involve some degree of subjectivity. Subjectivity cannot be considered as a
reason for inferiority. This implies mainly poor reproducibility of the output in
repeated processes. An experienced geologist/geomorphologist can prepare a highly
reliable LHZ map even if the analysis is subjective, while an oversimplified objective
analysis may produce an unreliable LHZ map (van Westen 1993). In fact, the expert
knowledge is essential to identify the contributing factors in the first place, whether
it is for subjective analysis or for quantitative analysis. Qualitative methods are
usually applicable for regional studies (Guzetti et al. 1999, Soeters and van Westen
1996).
Quantitative methods for landslide susceptibility mapping include statistical and
process based or geotechnical methods as well as the neural network approach
(Begueria and Lorente 2002). Statistical methods are based on the mathematical
relationship between observed landslides and the factors considered as controlling
parameters (Guzzetti et al. 1999). These are databased and their conclusions do not
imply a cause–effect relationship (Begueria and Lorente 2002). Statistical methods
reduce the subjectivity and ensure better reproducibility of the hazard zonation
processes and their outputs (van Westen 1993). If the controlling factors are
satisfactorily identified, statistical methods can successfully model the landslide
hazard in a given area. The model developed for one area may not exactly give the
same type of result in a different area. This is a drawback of statistical procedures in
landslide hazard zonation studies (Begueria and Lorente 2002). The statistical
methods involve both bivariate as well as multivariate techniques for LHZ mapping.
Several bivariate methods, like the information value method (Yin and Yan 1988),
and weights of evidence model (Panickar 2000) were used in landslide hazard
analysis. The commonly used multivariate statistical methods in landslide hazard
assessment are linear regression, discriminant analysis and logistic regression.
Carrara et al. (2003) used discriminant analysis to classify the slopes in Italy. Chung
et al. (1995) used multivariate statistical method for landslide susceptibility
mapping. Discriminant analysis and logistic regression are suitable when the
predicted variable is of binary nature and is applicable in case of a landslide, which
can be considered as the dependent variable with its presence or absence defining the
binary nature.
Logistic regression is better than discriminant analysis when the predictor
variables are categorical, continuous or a combination of both. It is also robust to
Application of binary logistic regression analysis 2259

the violation of multinormality assumption (Johnson 1998, Begueria and Lorente


2002). This means that, if the distributions are not highly skewed and not
multimodal, then the estimates of the regression parameters are not adversely
influenced. Violation of multinormality could invalidate the results of parametric
tests, but usually only to the extent that that specification of the associated
probability level is not exact (Davis, 2006, personal communication). Logistic
regression requires fewer theoretical assumptions than discriminant analysis
(Ayalew and Yamagishi 2005). Moreover, logistic regression has similarity to linear
regression and is related through an appropriate link function. Just like ordinary
regression, logistic regression also has straightforward statistical tests and the ability
to incorporate non-linear effect and a wide variety of diagnostics (Lee 2005, Hair
et al. 1998).
The landslides in the Himalayan terrain have varying controlling factors. At some
localities, the lithology seems to be the main controlling factor, but at some other
locations the proximity to regional geological discontinuities plays a crucial role in
creating slope instability. Sometimes a combination of various factors becomes
responsible for landsliding. Thus it is very difficult to develop a generalized regional
model to successfully predict the landslide hazard across the Himalaya. This calls for
the development of models applicable to limited geographical extent, based on the
existing conditions of landslide phenomenon. For this reason and also because of
the advantages of logistic regression over other multivariate statistical procedures as
discussed earlier, in the present study a binary logistic regression analysis has been
used. Many researchers demonstrated the application of logistic regression analysis
for LHZ mapping, including Lee (2005), Ohlmacher and Davis (2003), Dai and Lee
(2002), Davis and Ohlmacher (2002), Dai et al. (2001) and Gorsevski et al. (2000).
Süzen (2002) used this method for evaluating slope stability in Asarsusu catchment
in Turkey.

2. Study area and data used


The present study area is a part of Garhwal Himalaya and falls in the catchment of
river Bhagirathi, which is a tributary of River Ganges. Bhagirathi River originates
from Gaumukh, which is a holy place, and the study area selected is along the
corridor of the road leading to Gaumukh. The importance of this route is that every
year thousands of pilgrims visit Gaumukh via this road and, because of severe
landsliding, it is often badly damaged and pilgrims are stranded for days. In
addition to this, human lives are also lost. Therefore, an appropriate model for
landslide hazard mapping in this sector is essential as the LHZ map can be used to
take necessary management measures to reduce the damage from future landslide
events. The study area is in the Uttarkashi district of Uttaranchal and covers an area
of about 100 km2. The location map is given in figure 1.
The Indian Remote Sensing (IRS) 1D satellite data (LISS III and PAN) of 2
January 2000 was used in the study. The LISS III sensor has three VNIR bands at
23.5 m resolution and one SWIR band at 70.5 m resolution. The PAN data has a
spatial resolution of 5.8 m. These data are useful for mapping many of the input
parameters such as land use/land cover, geomorphology, geology and lineaments.
For the preparation of a landslide inventory map, the LISS III data was fused with
PAN data for better spatial resolution. In addition to these, the Survey Of India
(SOI) toposheets were also used. Several workers (Lakhera et al. 1992, Gupta et al.
1999 and NRSA 2001, to name a few) used a similar dataset for LHZ studies under
2260 J. Mathew et al.

Figure 1. Location map of the study area.

similar geological/topographical conditions. Fieldwork was carried out in 2000. The


mapping scale was kept at 1 : 50,000.

3. Input parameters
The parameters that are influential to the slope stability and which were considered
in the present study are described below.
Application of binary logistic regression analysis 2261

3.1 Geology
The geology of the Bhagirathi valley has been studied by many workers, including
Griesbach (1891), Auden (1949), Dhoundiyal and Ali (1967), Jain (1972) and
Agarwal and Gopendra Kumar (1973). The geology of the Lesser Himalaya has
been described by Rupke (1974) and Valdiya (1980). GSI (1989) gives a detailed
account of the geology and tectonics of the Himalaya. The study area falls partly
under the Garhwal Lesser Himalaya and partly under Higher Himalaya. The main
central thrust (MCT), which passes through the study area, separates the two. In the
Uttarkashi district, four stratigraphic units have been recognized by Agarwal and
Gopendra Kumar (1973): the Central Crystallines, Martoli Formation, Dudatoli
Group and Garhwal Group. The main lithological units in the study area are
quartzites, amphibolites, epidiorites, granitic gneisses, migmatites and schists. In
addition to these, minor amounts of metavolcanic rocks are also present. The
quartzites vary in colour from white/buff to purple and are medium to coarse
grained. Graded bedding and jointing have been shown by these rocks at many
places. The amphibolites are observed immediately after MCT. Further north, mica
schists have been found along the road section. Following the geological map of
Agarwal and Gopendra Kumar (1973), a lithological map has been prepared
(figure 2).
During the fieldwork it has been observed that landslides are common in the area
dominated by gneisses, migmatites and schists. It is a fact that the stability of slopes
is governed by the type of rock and the structure present. Discontinuities like
bedding planes, faults, fractures etc., reduce the strength of rock masses. In the
present case, the presence of weak planes in gneisses and schists make the rocks
weak, thereby creating slope instability. The amphibolite and epidiorite areas also
have landslides but they are less frequent. In these areas the high amount of
weathering and the presence of clay-rich soils seem to be responsible for making the
slopes unstable. Very few slides are found in the quartzitic areas and they are
associated with the presence of joints and local intercalations of quartzites with
slate/chlorite schist.

3.2 Geomorphology
The study area has highly uneven topography with the Bhagirathi River forming the
major valley. Bhagirathi River flows through the study area predominantly in the
NEE–SWW direction. This has a constricted V shaped valley together with high
run-off and steep gradient indicating the youthful geomorphological nature of the
region (Gupta et al. 1999). The elevation varies from about 1200 to 3000 m.
The selected study area has more or less uniform morphology, dominated by hills
and valleys. As the valley side slopes are generally steep and as they fall directly into
the Bhagirathi River, no piedmont zone is observed in the study area. The only
criterion by which further classification can be attempted is dissection. The joints,
fractures, drainage system and discontinuity planes dissect the area. As dissection
increases, the chances of weathering and soil formation processes also increase,
thereby increasing the instability of the slopes. Based on the intensity of dissection,
the area is classified into denuadational hills of high dissection, moderate dissection
and low dissection (NRSA, 2001). Being an active tectonic terrain, which is
continuously undergoing uplifting, former levels of the River Bhagirathi occur as
terraces and are mapped at many places in the study area.
2262 J. Mathew et al.

Figure 2. Lithological map of the study area (prepared after Agarwal and Gopendra
Kumar, 1973).

Landslides observed in the area are distributed across the highly dissected and
moderately dissected denudational hill units. The geomorphological map (figure 3)
has been prepared with the help of the false colour composite (FCC) of satellite data
and is considered in the present study as a predictor variable.

3.3 Lineaments
The traces of faults/fractures which appear as linear to curvilinear features on the
satellite image are considered as important lineaments with respect to landslides.
Application of binary logistic regression analysis 2263

Figure 3. Geomorphological map of the area.

These are very important in slope stability studies, as faults and fractures tend to
destabilize the area through deterioration of the strength of the rocks and also by
accelerating the weathering process. The surface trace of the MCT is the most
important lineament in the study area. All the geological lineaments have been
mapped from the satellite data with the help of appropriate digital image processing
techniques. The LISS III bands have been decorrelation-stretched (PCI 1996) and
FCC has been prepared by combining edge-enhanced PAN (in red) with hue and
2264 J. Mathew et al.

saturation components from the IHS transformation (Jensen 1996) of the three LISS
III-VNIR bands (in green and blue; Philip et al. 2003) and the lineaments have been
visually interpreted on this FCC. These lineaments are classified as major (if the
length greater than 2 km) and minor (if length less than 2 km). Buffering has been
done for the lineaments with 300 m buffer distance for major lineaments and 100 m
for minor lineaments. This distance is based on the assessment of proximity of
existing landslides from the lineaments in the study area. By and large, it has been
observed that the presence of significant lineaments is one of the most important
factors governing the stability of slopes in the study area. Thus the lineament buffer
is taken as one of the independent variables.

3.4 Drainage
The presence of streams adversely affect the stability by toe erosion or by saturating
the slope material or by both (Gokceoglu and Aksoy 1996). In order to account for
these, the drainage density (Süzen and Doyuran 2004, Panickar 2000) was
considered as an influencing parameter in the present study.
The drainage shows sub-parallel to sub-dendritic pattern. The drainage map has
been prepared from the toposheet and updated with the help of the merged LISS III
and PAN satellite data. Using a 25625 m grid, the drainage density values have
been calculated and are found to vary between zero and 95 km per km2. These values
are classified into six classes (,5, 5–15, 15–25, 25–35, 35–50, .50) to obtain a
classified drainage density map.

3.5 Land use/land cover


The presence of vegetation usually makes the slope stable by better bonding of the
slope material. Thus slopes with dense vegetation have to be less prone to the
occurrence of shallow landslides than barren slopes, all other factors being constant.
Improper land use practices may also contribute to slope destabilization. In order to
assess the contribution of different land use/land cover classes to slope destabiliza-
tion, this factor has also been taken into account.
The land use/land cover map of the area has been prepared with the help of IRS
LISS III data. The cell size is kept at 25625 m. Supervised classification using
maximum likelihood estimation method has been used for classifying the LISS III
data with appropriate training sets for different classes. Post-classification
contextual refinement has been applied on the classified data to get the final land
use/land cover map. Dense forest, open forest, scrub, barren land agriculture and
snow-covered areas are the land use/land cover classes in the study area.

3.6 DEM and derivatives


Topographic parameters like slope and slope aspect play crucial role in governing
the stability of terrain in the Himalayan region. As the slope increases, the chance of
failure also increases. At many places in the study area, slopes in the range of 35–60u
are found to be prone to slope failure. This may be because colluvial accumulations
are found around slopes of 35u. Slope aspect is another factor influencing the
stability. In the Himalaya, generally it has been found that the southern slopes are
usually drier and are devoid of much vegetative cover, whereas the northern slopes
are moister and support better growth of vegetation. So, in addition to slope, slope
aspect has also been considered as an independent variable in building the model.
Application of binary logistic regression analysis 2265

A digital elevation model with 25625 m grid size has been created using the
elevation contours. The elevation in the area varies from 1200 to 3000 m. Using the
DEM as input, the slope and aspect have been derived. The slope values vary from 1
to 70u. These have been classified into five classes (,5, 5–15, 15–35, 35–60 and
.60u) to obtain a classified slope map and the aspect values have also been grouped
into eight classes (N, NE, E, SE, S, SW, W and NW) to create a classified aspect
map.

3.7 Landslide location map


The landslides of the study area have been mapped using PAN and LISS III merged
data and SOI toposheet and have been verified in the field. It was difficult to identify
all the old slides directly on the satellite image, but the locations were confirmed
through interaction with local residents during field checks. Twenty-five slides were
mapped, varying in area from 0.08 to 15 ha. Most of the slides are close to the River
Bhagirathi. A list of all the variable classes is given in table 1.

4. Methodology
In the present study, binary logistic regression model has been used for preparing
the LHZ map. Logistic regression is a mathematical modelling approach that can be
used to describe the relationship of several independent variables to a dichotomous
dependent variable (Kleinbaum 1994), such as landslide (S). The logistic model is
based on the logistic function e(z), and is given as:
f ðzÞ~1=ð1ze{z Þ ð1Þ
where z varies from 2‘ to + ‘. The range of e(z) is between 0 and 1, regardless of
the values of z. Because of this peculiarity of the logistic function, the logistic model
can be used to find out the probability that will give the risk of any particular grid
undergoing slope failure.
The logistic model can be developed from the logistic function by representing z
as a linear sum of some constant and the products of independent variables and their
corresponding coefficients, i.e. z5a + SbiXi, where a and bi are constants
representing unknown parameters. In other words, z is an index that combines
the independent variables (Xi). Thus
. P 
f ðzÞ~1 1ze{ðaz bi X i Þ ð2Þ

The probability that any cell will be susceptible to landslide (S), given the presence
of the independent variables can be represented as the conditional probability,
. P 
PðS~1jX1 {Xn Þ~1 1ze{ðaz bi X i Þ ð3Þ

where i varies from 1 to n, for n independent variables.


By fitting this model to our observations of Xi for a group of cells for which we
know the status of S also as 0 (landslide absent) or 1 (landslide present), we can
estimate the values of a and bi. Using these estimates, we can calculate the
probability of any cell undergoing slope failure for observed values of Xi.
The methodology proposed by Süzen (2002) has been used in the present study. It
is assumed that the predictor variables are mutually independent. All the input
2266 J. Mathew et al.

Table 1. List of predictor variables and their categories.

Variable Class
Quartzite
Amphibolite
Lithology Epidiorite
Metavolcanic
Granitic gneisses/Migmatites/Schists
Denudational Hill (High Dissection)
Denudational Hill (Moderate Dissection)
Geomorphology
Denudational Hill (Low Dissection)
Terrace
Scrub
Agriculture
Open Forest
Land use/Land cover
Dense Forest
Barren land
Snow
Within Lineament buffer
Lineament
Outside Lineament buffer
Slope 0 to 5 degrees
Slope 5 to 15 degrees
Slope Slope 15 to 35 degrees
Slope 35 to 60 degrees
Slope .60 degrees
Aspect North
Aspect Northeast
Aspect East
Aspect Southeast
Slope aspect
Aspect South
Aspect Southwest
Aspect West
Aspect Northwest
,5 km/km2
5–15 km/km2
15–25 km/km2
Drainage density
25–35 km/km2
35–50 km/km2
.50 km/km2

parameters were taken in raster format with 25625 m grid size. There are 1280 cells
with landslide occurrence in the study area. This accounts for 0.8% of the total study
area. It difficult to determine the optimum area percentage of the total landslide
samples for building the model. If the model predicts the majority of the observed
landslide pixels correctly, then it will be a successful model. The percentage of the
area under landslides in the study area, although not very large, is sufficient to build
the model. Had there been more slide cells, part of the sample could have been used
for building the model and part could have been reserved for validation. This
freedom is not there in the present study area due to the limited number of
landslides. The landslide pixels, together with a randomly selected set of 1280 cells
from the non-slide area, and their values for the independent variables and
dependent variable have been put together to create the input table for logistic
regression modelling. By using the statistical analysis software SPSS, the model
Application of binary logistic regression analysis 2267

coefficients were estimated which were then used to calculate the probability of
hazard for all the pixels in the study area, with in a geographic information system.
The binary logistic regression procedure offers several methods for stepwise
selection of the best predictors to include in the model. In the present study a
forward stepwise analysis has been done. This method starts with a model that does
not include any predictor variables. At each step, the predictor with the largest score
statistic whose significance value is less than the specified value (0.05, for 95%
confidence interval) is added to the model. The variables left out of the analysis at
the last step all have significance values larger than 0.05, so no more are added.

5. Results
The model building process started with 36 predictor variable categories (table 1)
from the seven variables (lithology, geomorphology, drainage density, slope, slope
aspect, land use/land cover and lineament buffer) discussed in one of the earlier
sections. The final model retained only 11 of these 36 categories (table 2). All these
retained categories have the estimated coefficients (b) statistically different from 0.
The null hypothesis used to test this is that the coefficient is 0. The statistical test
used the Wald chi-square value [(b/standard error )2] and the 95% confidence
interval for the corresponding degree of freedom (d.f.). So the variables with
estimated coefficients having a significance value (Sig.) of less than 0.05 were found
to be significantly different from zero or, in other words, these have been accepted as
influential predictor variables.
Using these estimated coefficients, the landslide hazard values have been
calculated for the entire study area within a GIS and are in the range between 0
and 0.99. As discussed earlier, these values can be considered as the probabilities of
the grids undergoing slope failure, given the presence of the independent variables
considered in the model. The classification summary is given in table 3. The pre-
dicted probability values and the frequency of the observed actual state of the pixels
used in model are given in figure 4. In this figure, each symbol (1 for the presence of

Table 2. Variables retained in the logistic regression model and their coefficients (SPSS
output).

Parameter b S.E. Wald df Sig. Exp(b)


Slope 35u to 60u 0.377 0.121 9.746 1 0.002 1.458
Aspect South 1.024 0.149 47.382 1 0.000 2.784
Aspect Northwest 1.132 0.171 43.855 1 0.000 3.102
Aspect East 20.676 0.183 13.685 1 0.000 0.509
Amphibolite 22.438 0.503 23.502 1 0.000 0.087
Gneisses/Migmatites/Schists 0.715 0.133 28.876 1 0.000 2.044
Denudational Hill (high dissection) 21.709 0.256 44.547 1 0.000 0.181
Denudational Hill (moderate 20.275 0.126 4.751 1 0.029 0.760
dissection)
Open Forest 21.491 0.121 151.860 1 0.000 0.225
Dense Forest 22.582 0.218 140.834 1 0.000 0.076
Within Lineament buffer 1.891 0.123 237.570 1 0.000 6.626
a 2.857 0.481 35.218 1 0.000 17.410

S.E. Standard Error of estimate


Wald Wald chi-square values
df degree of freedom
Sig. significance
2268 J. Mathew et al.

Table 3. Classification summary of the logistic regression model.

Predicted
LS
Landslide Percentage
Observed Absent Landslide Present Correct
Landslide Absent 1043 237 81.5
LS
Landslide Present 258 1022 79.8
Overall Percentage 80.7

a The cut-off value is 0.500


LS Landslide

landslide or 0 for the absence of landslide) represents 12.5 cases and it can be seen
that, by taking a cut-off probability of 0.5, the cells with the values greater than the
cut-off value (0.5) are dominated by the presence of the binary dependent variable,
i.e. landslide (represented by 1). There are equal numbers of cells with and without

Figure 4. The predicted probabilities of observed groups of pixels.


Application of binary logistic regression analysis 2269

landslides at the cut-off probability, so if we take a cut-off probability of 0.5, then


those pixels with value greater than this value will have high probability of slope
failure. The predicted probability values have been then grouped into four hazard
classes. The range of values for different hazard classes are 0–0.3 for low hazard,
0.3–0.5 for medium hazard, 0.5–0.8 for high hazard and .0.8 for very high hazard.
The low hazard class constitutes the maximum area (51.75% of the total area). The
medium hazard class covers 23.74% of the study area, high hazard class occupies
10.66% and very high hazard class accounts for 13.85% of the total study area. The
final landslide hazard zonation map is given in figure 5. The locations of the
centroids of the existing landslides are also shown on this map.

6. Validation of the model


The success of the developed model can be evaluated with respect to the occurrence
of future landslides in the area or could be applied in an adjacent area of similar
geological/topographical conditions to find out its reliability. At the same time it is
possible to assess the validity of the model by comparing the calculated probability
values for different cells and their actual present condition. This is achieved by the
help of receiver operator characteristic (ROC) curve analysis (Zweig and Campbell
1993). The ROC curve is the plot of the probability of true positive identified
landslides vs the probability of false positive identified landslides, as the cut-off
probability varies (Gorsevski et al. 2000). Equivalently, it is a representation of the
trade-off between sensitivity (Sn) and specificity (Sp). Sensitivity is the probability
that a slided cell is correctly classified, and is plotted on the y-axis in an ROC curve;
1 – sensitivity is the false negative rate. Specificity is the probability that a non-slided
cell is correctly classified; 1 – specificity is the false positive rate and is taken along
the x-axis of the curve. The area under the curve represents the probability that the
model calculated landslide susceptibility value for a randomly chosen slid cell will
exceed the result for a randomly chosen non-slid cell. Thus, the area under the ROC
curve can be used as a measure of the accuracy of the model (Hanley and McNeil
1982).
The ROC curve analysis was done using the same dataset, which was used for
developing the logistic regression model, together with their calculated probability
values, to assess the number of true positives and false negatives with varying cut-off
probability values. The ROC curve for the model developed is given in figure 6. The
area under the curve is 0.887 (table 4), which gives an accuracy of 88.7% for the
model developed using logistic regression. The asymptotic significance is less than
0.05, which means that using the model to predict the landslide is better than
guessing. By analysing the coordinates of the ROC curve and the corresponding
probability, it can be seen that, for a cut-off probability of 0.5, the sensitivity is 0.798
and 1 – specificity is 0.185. In other words, at this cut-off probability, 79.8% of the
landslides will be correctly identified and 18.5% of the non-landslide area will be
incorrectly classified as landslide area. This is also evident from the classification
summary table (table 3).

7. Discussion and conclusion


The present study is an attempt to use logistic regression model to assess the
landslide susceptibility of part of the Himalaya, which is highly prone to the
occurrence of landslides. A remote sensing- and GIS-based approach for data
2270 J. Mathew et al.

Figure 5. Landslide hazard zonation map.

preparation and integration has helped in reducing the time involved in the
preparation of input themes and also enabled the involvement of complex model for
landslide susceptibility calculation.
The meaning of the logistic regression coefficient is not as straightforward as that
of a linear regression coefficient. It is easier to interpret the exponential values of the
coefficients [exp(bi)]. This represents the ratio change in the odds of the event (S) for
a unit change in the value of the respective predictor variable, all other things being
equal. For example the estimated coefficient of the parameter slope 35–60u is 0.377
Application of binary logistic regression analysis 2271

Figure 6. Receiver operator characteristic curve.

and the exp value is 1.458. So if there is a cell with 0.5 probability of slide which is not
in the 35–60u slope category, then the corresponding odds of the slide are 1 [O(s)5p/
(12p)] for that cell. Since the odds ratio for the 35–60u slope category is 1.458, the
odds of 1 becomes 1.458 if the cell falls into this slope category. Here the probability of
the slide becomes 0.593, which is roughly 19% higher than the initial probability.
Other coefficients can be interpreted in the same way. This change is higher for the
initial probabilities close to 0.5 and lower for probabilities towards 1 or 0.
By applying the logic mentioned above, the predictor variables retained by the
model and their coefficients can be seen to be well in agreement with what has been

Table 4. Area under the ROC curve.

Asymptotic 95% Confidence Interval


Area Std. Error(a) Asymptotic Sig.(b) Lower Bound Upper Bound
.887 0.006 0.000 0.874 0.899

The test result variable: Predicted probability, has at least one tie between the positive actual
state group and the negative actual state group. Statistics may be biased.
a Under the nonparametric assumption
b Null hypothesis: true area 5 0.5
2272 J. Mathew et al.

observed in the field. The areas occupied by Higher Himalayan Crystallines


(gneisses/migmatites/schists) are the most susceptible areas to landslide (figures 2
and 5). In fact, this can be clearly seen as one traverses along the road to Gaumukh;
after the MCT, there are widespread occurrences of landslides in these lithologies.
The amphibolite-covered area has a smaller influence in creating slope instability.
Proximity to geologically important lineaments turns out to be the most critical
parameter in creating slope instability and is evident from exp(b) value of this class.
The presence of lineaments greatly reduces the strength of rock mass units and they
contribute positively to making the slopes unstable. It is observed widely in the
Himalaya that, close to regional geological lineaments, the landsliding phenomenon
is very severe.
Amongst the topographic attributes, the 35–60u slope category and the slope
aspects of south, northwest and east are found to be significant contributors with
respect to landslides. The 35–60u slope category is the most unstable one in the study
area. This can be explained by the fact that most of the colluvial accumulations are
found around 35u slope angle. Slopes gentler than 30u are found to be stable in the
area. It has also been observed in the field that steep slopes (.60u) stand out with
rock exposures and are mostly stable provided they are devoid of any geological
discontinuities. The southern, northwestern and eastern aspects receive maximum
sunlight in the area because of the predominant NEE–SWW orientation of the main
valley and consequent orientations of tributary valleys, for the major portion of the
study area. This creates drier slopes in these aspects and as a result have less
vegetation growth. Thus these aspects tend to be unstable.
Considering the various land use/land cover categories, the open forest area is
more susceptible to landsliding as compared with dense forest areas. The next factor
retained in the model is geomorphology with the highly dissected and moderately
dissected denudational hill units contributing towards slope instability (figures 3 and
5). The highly dissected denudational hills are mostly in the quartizitc areas that do
not support much landsliding and hence in the present model this unit shows a
smaller contribution than moderately dissected denudational hills.
Thus the factors retained by the model satisfactorily explain the slope stability
condition in the study area. The validation of the model by ROC method shows an
overall accuracy of 88.7%, indicating that the model with its retained independent
variables is successful in predicting the probability of landslide hazard for the study
area based on the sample points collected.
Although statistical procedure reduces the subjectivity in the analysis, there is a
certain amount of subjectivity involved in the decision of class boundaries of many
input variables and this is reflected in the final hazard map. Also, instead of using
many categorical data, it would be more appropriate to use continuous variables
(Süzen 2005, personal communication). Moreover, the use of a variable to account
for the influence of human interference may further refine the model. Logistic
regression is a successful method for estimating the probability of occurrence of a
dichotomous dependent variable like landslide, even when many of the independent
variables are categorical, in a landslide-prone area like the Himalaya. Planners,
administrators and managers can utilize the outcome of such LHZ studies.

Acknowledgements
J.M. and V.K.J. thank Dr V. Jayaraman, Director, NNRMS/EOS, Department of
Space, for his support and encouragement. J.M. thankfully acknowledges the
Application of binary logistic regression analysis 2273

suggestions and help extended by Professor John C. Davis, Department of


Petroleum Engineering, Leoben, Austria and Professor M. L. Süzen, METU,
Turkey, in carrying out logistic regression analysis. J.M. extends his thanks to Dr G.
Philip, Wadia Institute of Himalayan Geology, Dehradun, for his guidance during
the fieldwork. Mr Neeraj Kumar Sharma is thanked for suggesting some time-
saving approaches in preparing the database for statistical analysis. The authors
thank the anonymous reviewers for their critical evaluation and constructive
suggestions in improving the document.

References
AGARWAL, N.C. and KUMAR,Gopendra, 1973, Geology of the Upper Bhagirathi and
Yamuna Valleys, Uttarkashi District, Kumaun Himalaya. Himalayan Geology. In
Himalayan Geology, A.G. Jhingran and K.S. Valdiya (Eds), 3, pp. 1–23 (Delhi:
Hindustan Publishing, 1973).
ANBALAGAN, R., 1992, Landslide hazard evaluation and zonation mapping in mountainous
terrain. Engineering Geology, 32, pp. 269–277.
AUDEN, J.B., 1949, Record of the Geological Survey of India, No. 78 (Calcutta: GSI).
AYALEW, L. and YAMAGISHI, H., 2005, The Application of GIS-based logistic regression for
landslide susceptibility mapping in the Kakuda-Yahiko Mountains, Central Japan.
Geomorphology, 65, pp. 15–31.
BEGUERIA, S. and LORENTE, A., 2002, Landslide hazard mapping by multivariate statistics:
comparison of methods and case study in the Spanish Pyrenees. Technical report,
Instituto Pirenaico de Ecologia, Zaragoza, Spain.
CARRARA, A., CROSTA, G. and FRATTINI, P., 2003, Geomorphological and historical data in
assessing landslide hazard. Earth Surface Processes and Landforms, 28, pp.
1125–1142.
CHUNG, C.F., FABBRI, A.G. and VAN WESTEN, C.J., 1995, Multivariate regression analysis for
landslide hazard zonation. In Geographical Information Systems in Assessing Natural
hazard, A. Carrara and F. Guzzetti (Eds), pp. 107–134 (Dordrecht: Kluwer Academic,
1995).
DAI, F.C. and LEE, C.F., 2002, Landslide characteristics and slope instability modelling using
GIS, Lantau Island, Hong Kong. Geomorphology, 42, pp. 213–238.
DAI, F.C., LEE, C.F., LI, J. and XU, Z.W., 2001, Assessment of landslide susceptibility on the
natural terrain of Lantau Island, Hong Kong. Environmental Geology, 40, pp. 381–391.
DAVIS, J.C. and OHLMACHER, G.C., 2002, Landslide hazard prediction using generalized
logistic regression In Proceeding of IAMG 2002, Berlin, 15–20 September 2002.
DHONDIYAL, D.P. and ALI, K.N., 1967, Record of the Geological Survey of India, No. 99
(Calcutta: GSI).
GEE, M.D., 1992, Classification of landslide hazard zonation methods and a test of predictive
capability. In Proceedings of the 6th International Symposium on Landslides, Vol. 2,
Christchurch, pp. 947–952.
GOKCEOGLU, C. and AKSOY, H., 1996, Landslide susceptibility mapping of the slopes in the
residual soils of the Mengen region (Turkey) by deterministic stability analyses and
image process techniques. Engineering Geology, 44, pp. 147–161.
GORSEVSKI, P.V., GESSLER, P. and FOLTZ, R.B., 2000, Spatial prediction of landslide hazard
using logistic regression and GIS. In Proceedings of 4th International Conference on
Integrating GIS and Environmental Modelling: Problems, Prospects and Research
Needs, Banff, Alberta, 2–8 September 2000.
GRIESBACH, C.L., 1891, Memoirs of the Geological Survey of India, No. 23 (Calcutta: GSI).
GSI 1989, Geology and Tectonics of the Himalaya, GSI Special Publication no. 26 (Calcutta: GSI).
GUPTA, R.P., SAHA, A.K., ARORA, M.K. and KUMAR, A., 1999, Landslide Hazard Zonation
in part of Bhagirathi Valley, Garhwal Himalaya, using integrated remote sensing –
GIS. Himalayan Geology, 20, pp. 71–85.
2274 J. Mathew et al.

GUZZETTI, F., CARRARA, A., CARDINALI, M. and REICHENBACH, P., 1999, Landslide hazard
evaluation: a review of current techniques and their application in a multi-scale study,
Central Italy. Geomorphology, 31, pp. 181–216.
HAIR, J.F., ANDERSON, R.E., TATHAM, R.L. and BLACK, W.C., 1998, Multivariate Data
Analysis (London: Prentice Hall).
HANLEY, J.A. and MCNEIL, B.J., 1982, The meaning and use of the area under a receiver
operator characteristic (ROC) curve. Radiology, 143(1), pp. 29–36.
JAIN, A.K., 1972, Overthrusting and emplacement of basic rocks in Lesser Himalaya,
Garhwal, UP. Journal of Geological Society of India, 13(2), pp. 226–237.
JENSEN, J.R., 1996, Introductory Digital Image Processing: a Remote Sensing Perspective
(Upper Saddle River, NJ: Prentice Hall).
JOHNSON, D.E., 1998, Applied Multivariate Methods for Data Analysis (Behmont: Duxbury
Press).
KEINHOLZ, H., 1978, Maps of geomorphology and natural hazards of Grindelwald,
Switzerland, scale1 : 10000. Arctic and Alpine Research, 10, pp. 169–184.
KLEINBAUM, D.G., 1994, Logistic Regression: a Self-learning Text (New York: Springer).
LAKHERA, R.C., ROY, A.K., PRUSTY, B.G. and MITTAL, S.K., 1992, Landslide hazard
zonation studies in parts of Garhwal Himalayas using remote sensing and GIS
techniques. In Proceedings of the National Symposium on Remote Sensing for
Sustainable Development, B. Sahai, T.S. Kachhwaha, K.V. Ravindran, A.K.
Roy, N.D. Sharma and P.K. Sharma (Eds), pp. 227–231 (Lucknow: ISRS,
Dehradun and RSAC, UP, 1992).
LEE, S., 2005, Application of logistic regression model and its validation for landslide
susceptibility mapping using GIS and remote sensing data. International Journal of
Remote Sensing, 26, pp. 1477–1491.
NRSA 2001, Landslide hazard zonation mapping along the corridors of the pilgrimage routes
in Uttaranchal Himalaya. Technical Document, NRSA, Department of Space, India.
NRSA 2000, Methodology Manual for Landslide Hazard Zonation Mapping. Technical
Document, NRSA, Department of Space, India.
OHLMACHER, C.G. and DAVIS, J.C., 2003, Using multiple logistic regression and GIS
technology to predict landslide hazard in northeast Kansas, USA. Engineering
Geology, 69, pp. 331–343.
PACHAURI, A.K. and PANT, M., 1992, Landslide hazard mapping based on geological
attributes. Engineering Geology, 32, pp. 81–100.
PANICKAR, S.V., 2000, Landslide hazard zonation in Mussoorie Hills. PhD Thesis,
Department of Earth Sciences, IIT Bombay (unpublished).
PCI 1996, Software Users’ Manual (Ontario: PCI).
PHILIP, G., RAVINDRAN, K.V. and MATHEW, J., 2003, Mapping the Nidar Ophiolites of the
Indus Suture Zone, Northwestern Trans-Himalaya, using IRS 1C/1D data.
International Journal of Remote Sensing, 24, pp. 4979–4994.
RUPKE, J., 1974, Stratigraphic and structural evolution of the Kumaon Lesser Himalaya.
Sedimentary Geology, 11, pp. 81–265.
RUPKE, J., CAMMERAAT, E., SEIJMONSBERGEN, A.C. and VAN WESTEN, C.J., 1988,
Engineering geomorphology of the Widentobel catchment, Apenzell and Sankt
Gallen, Switzerland: A geomorphological inventory system applied to geotechnical
appraisal of slope stability. Engineering Geology, 26, pp. 33–68.
SAATY, T.L., 1980, The Analytical Hierarchy Process (New York: McGraw Hill).
SAHA, A.K., GUPTA, R.P. and ARORA, M.K., 2002, GIS-based Landslide Hazard Zonation in
the Bhagirathi (Ganga) valley, Himalayas. International Journal of Remote Sensing,
23, pp. 357–369.
SOETERS, R. and VAN WESTEN, C.J., 1996, Slope instability recognition, analysis and zonation.
In Landslides: Investigation and Mitigation, Transportation Research Board National
Research Council, Special Report K.A. Turner, and R.L. Schuster (Eds), Vol. 247,
pp. 129–177 (Washington, DC: National Academy Press, 1996).
Application of binary logistic regression analysis 2275

SÜZEN, M.L., 2002, Data driven landslide hazard assessment using geographical information
systems and remote sensing. PhD Thesis, Middle East Technical University, Turkey
(unpublished).
SÜZEN, M.L. and DOYURAN, V., 2004, Data driven bivariate landslide susceptibility
assessment using geographical information systems: a method and application to
Asarsuyu catchment, Turkey. Engineering Geology, 71, pp. 303–321.
VALDIYA, K.S., 1980, Geology of Kumaon Lesser Himalaya (Dehradun: Wadia Institute of
Himalayan Geology).
VAN WESTEN, C.J., 1993, Application of geographic information systems to landslide hazard
zonation. ITC Publication, vol. 15, ITC, Enschede, The Netherlands.
YIN, K.L. and YAN, T.Z., 1988, Statistical prediction model for slope instability of
metamorphosed rocks. In Proceedings Fifth International Symposium on Landslides, 2,
Rotterdam, pp. 1269–1272.
ZWEIG, M.H. and CAMPBELL, G., 1993, Receiver Operator Characteristic plots: a
fundamental evaluation tool in clinical medicine. Clinical Chemistry, 39(4), pp.
561–577.

Vous aimerez peut-être aussi