Vous êtes sur la page 1sur 10

Socio-Economic Planning Sciences 43 (2009) 141150

Contents lists available at ScienceDirect

Socio-Economic Planning Sciences


journal homepage: www.elsevier.com/locate/seps

Studying the association between air pollution and lung cancer


incidence in a large metropolitan area using a kernel density function
Boris A. Portnov a, *, Jonathan Dubnov b, d, Micha Barchana c, d
a

Department of Natural Resources & Environmental Management, Graduate School of Management, University of Haifa, Israel
Haifa District Health Ofce, Ministry of Health, Israel
c
Israel National Cancer Registry, Ministry of Health, Israel
d
School of Public Health, University of Haifa, Israel
b

a r t i c l e i n f o

a b s t r a c t

Article history:
Available online 9 October 2008

In the absence of patient-specic data, composite level data are often used in epidemiological studies. However, since individual exposure levels cannot accurately be inferred
from aggregate data, such an approach may lead to erroneous estimates of health effects of
potential environmental risk factors. In the present study, we attempt to address this
information-loss problem by using the kernel density function, which estimates the
intensity of events across a surface, by calculating the overall number of cases situated
within a given search radius from a target point. The present paper illustrates the use of
this analytical technique for a study of association between the geographical distributions
of lung cancer cases and SO2 air pollution estimates in the Greater Haifa Metropolitan Area
(GHMA). In the analysis, the results obtained by kernel smoothing are contrasted with
those obtained by areal aggregation techniques more commonly used in empirical studies.
2008 Elsevier Ltd. All rights reserved.

Keywords:
Kernel density
Air pollution
Cancer

1. Introduction
According to recent epidemiological studies, the
majority of cancer cases derive from environmental causes,
that is, factors attributed to either the indoor environment
(housing conditions, indoor pollution and professional
exposure) or outdoor air pollution and soil contamination
(see inter alia [13]). A popular approach to exploring the
etiology of malignant diseases relates the spatial distribution of disease cases to the spatial patterns of potential
health-risk factors [46].
However, a typical criticism of traditional epidemiological studies is that they often rely on aggregated data
and frequently make use of coarse geographical units,
dened for purposes other than health investigations [79].
An underlying assumption behind this aggregate
approach is that the estimated average exposure in

* Corresponding author.
E-mail address: portnov@nrem.haifa.ac.il (B.A. Portnov).
0038-0121/$ see front matter 2008 Elsevier Ltd. All rights reserved.
doi:10.1016/j.seps.2008.09.001

a particular region may serve as a proxy for the actual


exposure of individuals.
However, as early as in 1950, Robinson [10] distinguished between two types of correlation - ecological and
individual. The former is obtained for a group of people,
while the latter is estimated for indivisible units, such as
individuals. According to Robinsons line of argument,
ecological and individual correlations tend to be dissimilar.
As a result, any assumption about an individual based on
average data obtained for a group to which the individual
belongs may result in an assessment error, known as
ecological fallacy or ecological bias [7,1114]. Follow up
studies (see inter alia [15,16]) shed additional light on
Robinsons ndings, showing that the size of correlation
coefcients tends, in general, to increase with data aggregation into areal units of larger size or with changes in the
shape of aerial units under investigation. Openshaw [15]
termed this phenomenon the modiable areal unit
problem or MAUP.
Even if individual-level health data are available, there is
a difculty to match them with socio-economic variables

142

B.A. Portnov et al. / Socio-Economic Planning Sciences 43 (2009) 141150

which are usually aggregated into census-designated


statistical areas (i.e., census blocks and tracts), and are
rarely available for individuals [79].
Geographic Information Systems (GIS) technology,
which has become increasingly popular in environmental
studies in recent years, may help to establish the individual
air pollution estimates via interpolation, and thus to
minimize the possibility of ecological bias attributed to
information loss due to areal data aggregation [8,9,1720].
However, the applicability of this technique to cancerrelated studies remains somewhat limited. The matter is
that data on individual patients, available in cancer registries, contain only partial information, normally limited to
street addresses, sex and age at time of the diagnosis of the
disease, and, in certain instances, ethnicity [2123]. In
order to control for potential confounders, such as e.g.,
income levels and housing conditions, the researcher needs
to match the individual-level data (i.e., the location of
homes of individual cancer patients) with socio-economic
estimates available for statistical areas, thus reverting again
to the use of aggregate data.
Even if sufciently complete background information on
individual cancer patients is available, it does not always
provide an efcient solution to the above informationloss problem. In order to estimate incidence rates, the
researcher needs to normalize the overall number of cancer
cases recorded in a chosen territorial unit (e.g., statistical
area or air pollution zone) by its overall population.
Although, technically, this task is relatively simple (given
that population numbers for areal units under investigation
are available), it effectively returns the researcher back to
square one, that is, to the areal aggregation of data
accompanied by a potential loss of information and
a possibility of erroneous estimates it entails.
Fig. 1 illustrates a situation in which ecological bias
emanates from the areal aggregation of individual-level
data. Although there may be no confounding at the individual level and no differences in the exposure effect within
the groups, an ecological bias may nevertheless occur upon
data aggregation. While Regions 1 & 4 are equally exposed
to the health-risk factor, only in Region 4, the expected link
between the risk factor and its health effects in the study
population may expectedly be found. The potential bias
thus emanates from the loss of information about actual
exposure due to the aggregation of individual-level data
into areal units (regions).
In the present study, we attempt to address this information-loss problem by using the analytical technique
known as the kernel smoothing or kernel density
function (KDF). KDF estimates the intensity of events
across a surface, by calculating the overall number of cases
situated within a given search radius from a target point. In
the process of calculation, all the points (i.e., observations)
that fall within the search radius are summed up and then
divided by the search areas size, to get each points density
value. During the calculation, points lying near the target
points search area are weighted more heavily than those
lying near the edge [24]. The result is a smooth distribution
of values that can be useful for construction of incidence/
prevalence maps and cluster detection in the study area
[25,26].

Fig. 1. A hypothetical example illustrating a possibility of ecological bias


emanating from the areal aggregation of individual-level data. In the
diagram, four adjacent squares represent territorial entities under study
(e.g., census regions); grey cones mark geographic areas affected by a healthrisk factor (e.g., an abnormally high level of air pollution), and small black
dots represent clusters of individuals covered by the study. Although there
may be no confounding at the individual level and no differences in the
exposure effect within the groups, an ecological bias may nevertheless occur
upon data aggregation. While Regions 1 & 4 are equally exposed to the
health-risk factor, only in Region 4, the expected link between the risk factor
and its health effects in the study population may expectedly be found. The
potential bias thus emanates from the loss of information about actual
exposure due to the aggregation of individual-level data into aerial units
(regions).

The KDF approach does not require the presence of


a parameters value in a given location (e.g., incidence rate),
which is necessary for more commonly used smoothing
techniques, such as spline, kriging, etc. This feature of KDF
smoothing is thus especially benecial for empirical studies
in which individual observations are represented only by
their geographic location, that is, by their x,y coordinates, and
have no other attributes whatsoever. Under these condition
the KDF technique makes it possible to generate a continuous surface of individual cases density by using the
information on the location of neighboring cases, i.e., without
requiring any specic information on the subject cases per se.
The present paper illustrates the use of KDF smoothing
for a study of association between the geographical distributions of lung cancer (LC) cases and SO2 air pollution
estimates in the Greater Haifa Metropolitan Area (GHMA).
In the analysis, the results obtained by KDF smoothing are
contrasted with the results obtained by areal aggregation
techniques, more commonly used in empirical epidemiological studies.
2. Materials and methods
2.1. Study population
The city of Haifa and its two nearest suburbs - the towns
of Nesher and Tirat Karmel - form the study area. On the

B.A. Portnov et al. / Socio-Economic Planning Sciences 43 (2009) 141150

total, the area hosts about 320,000 residents, of whom ca.


1450 persons were diagnosed in 19952004 with lung
cancer (LC). The data on LC morbidity were obtained from
the Israel National Cancer Registry (INCR), a population
based national cancer registry established in 1960 and
covering the entire country. Since 1982, reporting to the
registry is mandatory for all medical facilities (i.e., medical
institutions and pathology laboratories, both public and
private). In accordance with the law, INCR retrieves each
cancer patients personal data from the central population
registry, including the place of birth, immigration date,
current and historical place of residence (street address
including house number), ethnicity and gender.1 (Smoking
habits and occupational information are not reported in the
registry).
Recently, the Israel National Cancer Registry (INCR)
published its ndings, showing that the residents of the
Haifa sub-district suffered in 19841999 from high agestandardized incidence rates (ASIR) of cancer, considerably
exceeding the average national rates. Thus, ASIR of all
cancer causes among men in Haifa was ca. 370 per 100,000
residents vs. the average national rate of 289 cases per
100,000 residents, whereas the ASIR of lung cancer was
36.5 and 29.6, respectively [21].
The elevated rates of cancer in the Haifa bay area may be
linked to its unfavorable ecological situation. Since the
early 1930s, the City of Haifa and its suburbs have developed into Israels primary industrial region with multiple
sources of air pollution scattered all over the area. Thus,
GHMA hosts the countrys largest oil reneries (with the
total production capacity of some 8 million tons of oil per
year), a major oil-red power plant (ca. 430 MW), as well as
several smaller petrochemical and agrochemical facilities.
The location of large industrial complexes in GHMA, its
complex hilly topography, and unique sea-land meteorology result in distinctive spatial patterns of air pollution
[27,28]. In addition, due to the historical patterns of urban
development, a large share of residents of GHMA live in
satellite towns and city neighborhoods adjacent to industrial complexes and characterized by high concentrations
of airborne toxic agents [28].
Although there have been several attempts in the past to
investigate the link between the elevated levels of air
pollution in GHMA and local cancer rates, the results of
these studies were largely inconclusive and failed to reveal
a signicant association between cancer incidence rates
and local air pollution levels [30,31]. Concurrently, several
studies carried out elsewhere reported a signicant association between air pollution and certain types of cancer,
such as lung cancer (LC), non-Hodgkins lymphoma (NHL),
and bladder cancer [6,23,3234].
A possible reason for the failure of previous GHMA
studies to demonstrate any signicant association between
its elevated cancer rates and local air pollution levels may be
attributed to composite level data used in the analysis [35].
In the present study, we examined data on all the residents living the study area and diagnosed with LC during

1
A comprehensive survey carried out in 1991 revealed that the
completeness of the INCR registration was above 94% [29].

143

the ten-year period of 19952004. The average latency


period of LC is estimated as 1015 years prior to the diagnosis date [36,38]. Therefore, only the patients who lived in
the study area for at least 10 years prior to the diagnosis
were included in the analysis. Accurate street addresses
suitable for mapping were available for 1330 out of 1446
reported LC cases (92%). This number (1330 cases) thus
formed the basis for the subsequent investigation of
possible linkages between estimated air pollution and LC
incidence. The sample was veried and found to be fairly
representative of the entire LC cohort living in the study
area with respect to age and gender.
2.2. Air pollution data
Air pollution data were obtained from 20 air quality
monitoring stations located in the study area. These
stations continuously measure and report half-hour SO2
concentration levels. Most of these stations became operational in 1995, while only 7 stations monitor SO2 levels in
the Haifa bay area since 1990. In the analysis, we used the
average annual values of SO2 recorded in ppb and transformed into mg/m3 by the Department of Civil and Environmental Engineering of the Israel Institute of Technology
(Yuval and Broday, unpublished data). Average SO2 levels
for 19952003 were used in the analysis, because pollution
data prior to 1995 were considered to be too sparse to allow
spatial interpolation. Such an interpolation was required to
estimate pollution levels in places where the individual
cancer patients reside. For the same reason (data sparseness), our analysis did not include particulate matters (such
as PM2.5 and PM10) because their measurements were
available only for 6 (out of 20) monitoring stations located
in the study area.
The recent studies conducted in USA [4] and Europe
[42,43] revealed elevated risks of mortality even for relatively low concentrations of air pollutants. Therefore our
choice of average annual concentrations is concomitant with
current state of the knowledge about adverse effects of air
pollution even in low concentration [4,3941,44].
Neither the location of major industrial enterprises in
the Haifa bay area, nor the local meteorology (in particular
the wind speed and direction) has changed considerably
over the past decades, albeit the emission of local air
pollutants tended to decline overall [28].
At the next stage of the analysis, the average annual
values of SO2 recorded by the abovementioned 20 monitoring stations were interpolated by the inverse distance
weighted (IDW) interpolation method [45], to obtain
a continuous SO2 pollution surface for the entire study area
(see Fig. 2).
2.3. Exposure classication
The digital map of GHMA containing boundaries of
Small Census Areas (SCAs), similar in size to Census Block
Groups in the U.S.A., and street layers were obtained from
the Israel National Survey & Mapping Center.
In the rst phase of the analysis, the age-standardized
incidence rates (ASIR) of lung cancer and average air
pollution levels were calculated for each SCAs. The SCAs are

144

B.A. Portnov et al. / Socio-Economic Planning Sciences 43 (2009) 141150

Fig. 2. Location of homes of lung cancer patients in the city of Haifa and its suburbs juxtaposed upon SO2 air pollution estimates (average annual concentration in
mg/m3 in 19952003).

the smallest geographical units for which Israel Central


Bureau of Statistics provides socio-demographic data [46].
A total of 87 SCAs were covered by the analysis. The
calculation of cancer rates was performed separately for
males and females by calculating gender and age specic
rates and normalizing by SCAs overall population size to
the World Standard Population in order to obtain world
standardized incidence rates for each SCA [47]. The SO2
values of raster cells located within SCAs were also averaged, to obtain the representative estimate of SO2 air
pollution in each SCA.
In order to determine whether kernel smoothing
provides different estimates of LC incidence air pollution relationship as opposed to SCA averaging, in the
second phase of the analysis, we estimated the level of SO2
pollution at the place of residence of each LC patient
covered by the analysis.2 To this end, the residences of
individual patients were mapped using the ArcGIS 9
software (see small black dots in Fig. 2).
Next, individual SO2 exposure estimates were calculated
for each LC patient using the spatial join tool of the
ArcGIS9 software. The spatial join operation involves
matching rows from different geographic layers to the

2
It may be instructive to compare the KD technique with other
smoothing techniques commonly used in geographic and epidemiological
literature, such as e.g., spline or kriging interpolations. However, such
a comparison is rather impracticable. The matter is that the latter techniques require information on a parameters value in each location (e.g.,
disease rate). However, this information is unavailable for simple point
data representing locations of homes of individual LC patients, such as
those used in the present study. In contrast, as previously mentioned, the
KD smoothing techniques make it possible to construct a continuous
density surface of disease incidence using solely the information on the
location of neighboring cases, i.e., without requiring any specic data on
the subject cases per se.

target layer based on the spatial relationship between them


[45]. In the calculation, SO2 values and the locations of
homes of individual cancer patients shown in Fig. 2 were
juxtaposed over the continuous SO2 pollution surface, to
obtain individual SO2 pollution estimates. The spatial join
tool of ArcGIS was also used to calculate the distance
from each cancer patient to the nearest major road
(see Fig. 2), so as to account for individuals exposure to
mobile sources of air pollution, such as motor vehicles.
In a similar manner (i.e., using the spatial join tool), we
calculated the elevation of the home of each LC patient
above the sea level, using the layer of elevation contours
provided by the engineering department of the Haifa
municipality. The elevation variable is particularly
important in the local context. In a hot and humid climate
of the area, high elevations of dwellings above the sea level
help to provide cross-ventilation of indoor spaces, which is
especially important during long and hot summers. High
elevations also tend to provide better views of open spaces,
and other attractive landscape features. Therefore, in Haifa,
high elevations of dwellings are closely associated with
more expensive housing and, generally, with a high socioeconomic status of local residents [48].3 In the absence of
detailed socio-economic characteristics of individual
cancer patients, the elevation variable was used in the
present study as a proxy for the individual LC patients
welfare levels.

3
The interrelationship between the hedonic value of urban location
and property prices is a subject of long-standing debate in real estate
literature. Importantly, the association between the elevation of existing
dwellings and welfare levels of their residents is not bi-directional, that is,
welfare levels may be affected by the location of dwellings on the
geographic terrain but not vice versa.

B.A. Portnov et al. / Socio-Economic Planning Sciences 43 (2009) 141150

145

Fig. 3. Kernel density of lung cancer (LC) patients in the city of Haifa and its suburbs in 19952004 (the average number of LC patients per km2).

At the next stage, the kernel density surface of LC


patients was calculated (see Fig. 3). In this surface, each
50  50-m cell (which is sufciently small to distinguish
between clusters of individual buildings) represents the
estimated number of LC patients per km2 of surrounding
area, i.e., the density of cancer patients whose homes are
located within a predened search radius around each
cell.4 By overlapping this surface with the location of

4
Search radius (or kernel bandwidth) is a user-specied variable which
determines the distance a point must lie within in order to contribute to
a cell value. A large bandwidth value leads to over-smoothing of the
kernel surface, while a small bandwidth value leads to a spiked kernel
over a location [24]. Therefore, setting the bandwidth may change the
outcome of the analysis. To verify this possibility, three different search
radii 500 m, 750 m, and 1000 m were tested. In the following
discussion only the results for the best performing (500 m) radius are
reported.

homes of individual cancer patients, the values of LC


patients density were estimated for all locations in which
individual cancer patients reside.
There are two basic approaches to density calculation simple or kernel density. In a simple density calculation,
points that fall within a search area bandwidth are summed
and then divided by the search area size. In contrast, in the
kernel density calculation, points lying near the center of
a search area are weighted more heavily than those lying
near its edge, which results in a smoother distribution of
values [24]. In the present study, the latter calculation
approach was implemented, using the Spatial Analyst
module of the ArcGIS 9 software.
The density of cancer patients around a point of space
may be high because the area is densely populated. To
account for population density patterns, the data on the
residential location of each person residing in the study
area (both LC patients and all other residents of the study

146

B.A. Portnov et al. / Socio-Economic Planning Sciences 43 (2009) 141150

area) are required. However, such information was not


available for the analysis. Therefore, we opted for the
geographic layer of individual houses, to represent the
density of development in each part of the city. Since
the area under study is highly urbanized and formed
predominately by multi-storey structures built in the
1960s to early1990s (with an exemption of a few small
enclaves of low density development), the layer of building
density of residential houses used in the present analysis
was assumed to represent fairly accurately the density of
population, in the absence of other, more direct population
density measures.
2.4. Statistical analysis
The strength of association between LC rates and several
explanatory variables was investigated using the multiple
regression analysis (MRA) technique in the SPSS software.
The analysis was performed in two stages. First, the 1995
2004 age-standardized LC incidence rates (ASIR per
100,000 inhabitants) were calculated for Small Census
Areas (SCAs), for each gender separately, standardized to
the world standard population [47,49]. Next, the multiple
regression analysis (MRA) was used to identify and
measure the effects of the social-economic variables on the
lung cancer incidence rates. During the analysis, multicollinearity, normality, and homogeneity of variance
assumptions were tested and their results were found
satisfactory. The threshold for statistical signicance was
set as p < 0.05 (two-tailed). Testing the normality of OLS
regression residuals by the KolmogorovSmirnov (KS) test
revealed that their distributions were fairly normal (Z > 1.2,
p > 0.10).
In the analysis, the average SCA levels of SO2 were
controlled by several socio-economic variables available at
the SCAs level, viz., percent of college graduates, housing
density, and average family income. The regression analysis
was performed separately for male and female LC patients,
to detect possible differences in the SO2 exposure effects
between genders.
During the second phase of the analysis, the association
between SO2 exposure and LC incidence rates was investigated using kernel density estimates. In the analysis, such
estimates were calculated for 1330 locations in which
individual LC cancer patients reside (see Fig. 3), and used as
the dependent variable. In particular, the kernel density
was estimated for all LC patients, separately for both
genders, and for LC patients of 65 years old.5
In the analysis, the following variables were used as
predictors: individual SO2 exposure estimates, elevation of
a LC patients home above the sea level, building density,
and distance to the closest main road (see the previous
subsection for more detail).
The investigation of regression residuals from the OLS
models revealed the signicant autocorrelation of residuals
within up to the 500600-m proximity range (Morans

5
In Israel, as in most western countries, incidence rate of LC differs
considerably by gender and age, with approximately 70% of all LC cases in
men diagnosed at the age of 65 and older [49].

I > 0.1; p < 0.05; see Fig. 4). The observed autocollinearity
necessitated the use of spatial dependency models. There
are two primary types of such models: a) the spatial error
(SE) model, which assumes that the error terms across
different spatial units are correlated, and b) the spatial lag
(SL) model, which presumes that the dependent variable in
place i is affected by the independent variables in both
place i and in neighboring locations, j [58]. Since from the
outset of the analysis, the density of LC patients in a given
point of space was assumed to depend on the predictors
values at both the subject location and in neighboring areas
(especially, in respect to SO2 pollution levels), the SL model
best tted our needs. In the analysis, the SL model was
estimated by three alternative methods Conditional
Autoregression (CAR), Simultaneous Autoregression (SAR)
and Moving Averages (MA). (For brevitys sake, spatial lag
models only for all LC cases are reported in the following
discussion). The analysis was performed in the
SSpatialStats software.
3. Results
Table 1 reports the factors affecting age-standardized
rates (ASIR) of lung cancer in the city of Haifa and its
suburbs, estimated by the multiple regression analysis
(MRA) procedure for Small Statistical Areas (SCAs), separately for both genders. Concurrently, Table 2 reports the
factors affecting the kernel density of LC patients estimated
by the MRA method for all LC patients (Model 1), LC
patients of 65 years old and older (Model 2), and, separately, for males and females (Models 3 & 4).
As Table 1 shows, the regression ts of the SCA-level
models, measured by their R2, are not especially high
(R2 0.239 for males, and R2 0.024 for females), and none
of the explanatory factors coved by the analysis appears to
be statistically signicant at the established signicance
level (p < 0.05). Furthermore, contrary to all expectations,
in neither model, the SO2 variable crosses a 0.10 signicance threshold, indicting that there is insufcient
evidence that local LC rates are associated signicantly with
SO2 pollution levels in small census areas.
However, the outcome of the analysis appears to be
distinctively different when the association between SO2
pollution and LC incidence is investigated using kernel
density estimates (see Table 2). First, the model ts appear
to be substantially higher than in the previous run
(R2 0.2170.496; Table 2 vs. R2 0.0240.239; Table 1).
Second, several explanatory variables (including SO2 individual exposure estimates) emerged as highly statistically
signicant (p < 0.01) and exhibit expected signs. In particular, in all models (see Table 2), the areal density of LC
patients appears to increase in line with SO2 pollution
levels (t > 2.8; p < 0.01), building density (t > 10;
p < 0.001), and drop as elevation increases (t > 2.1,
p < 0.01). Characteristically, the distance to main road is
statistically signicant (t > 2.4; p < 0.05) only in Model 4
(females), the effect that may apparently be attributed to
the fact that females spend more times inside the houses
and in their vicinity thus exposing themselves more to
trafc-generated air pollution from nearby thoroughfare
roads [38,50,51].

B.A. Portnov et al. / Socio-Economic Planning Sciences 43 (2009) 141150

1.4

60
Moran's I

1.2

Z-Normal I

50

40

0.8

30

0.6

20

0.4

10

0.2

0
-0.2

Z-normal I

Moran's I

147

-10
0-50 50- 100- 150- 200- 250- 300- 350- 400- 450- 500- 550- 600- 650- 700- 750100 150 200 250 300 350 400 450 500 550 600 650 700 750 800

-20

Lag increment, meters


Fig. 4. Spatial autocorrelation of regression residuals (Morans I index and its Z-Normal values). Note: Dependent variable all LC cases (see Model 1; Table 2).

In general, the use of spatial lag models (see Table 3:


Models 57) does not change the outcome of the analysis
substantially, conrming that the SO2 pollution variable
appears to retain its statistical signicance (p < 0.01) even
after taking the spatial interdependency of residuals into
account.
Table 4 reports the results of a sensitivity test of change
in the LC density (number of LC cases per km2) attributed to
a plausible change in the SO2 air pollution levels (from zero
SO2 pollution to 10.4 mg/m3, which is the maximum accumulated SO2 pollution level observed in the study area in
19952003). The simulation is based on Model 1 (All cases;
Table 2) and sets the levels of all other predictors (except
for SO2) to their respective mean values observed in the
study area.
The increase in the density of LC cases appears to be
substantial: viz., 18.6% for the minimum-mean SO2 level
comparison, and 29.2% for the mean-maximum SO2 level
comparison. Thus, on the average, the kernel density of LC

Table 1
Factors affecting age-standardized rates of lung cancer incidence in the
city of Haifa and its suburbs in 19952003 (units of analysis small census
areas; method multiple linear regression (MRA)).
Variable

ASR (males)
Ba

Constan)
SO2
Education
(BA)
Density
Income
N of cases
R2
Adjusted R2
Std. Error
F
a
b
c
d

tb

ASR (females)
Sig.c

VIFd

Ba

tb

Sig.c

VIFd

17.082
0.537 0.593
2.254 0.164 0.870
0.281 0.204 0.939 1.185
0.505 0.911 0.365 1.188
0.232 0.592 0.555 2.637 0.052 0.321 0.749 2.429
38.190
1.788 0.077 2.881 6.339
7.997 1.869 0.065 2.919
2.163
87
80
0.239
0.024
0.201
0.029
22.415
8.693
6.344
0.000
0.452

0.660 0.511 2.890


1.253 0.214 2.984

Unstandardized regression coefcient.


t-statistic.
Actual signicance of t-statistic.
Variance ination factor (multicollinearity diagnostic).

0.771

patients appears to increase, ceteris paribus, by some 0.45%


for 1 percent rise in the SO2 air pollution levels.
4. Conclusions
According to recent epidemiological studies, the relative
risk of lung cancer mortality increases by some 827% in
heavily polluted areas [6,52]. Furthermore, as Jerrett [4]
demonstrated using modern GIS tools, the relative risk for
cancer mortality reported in previous studies carried out in
USA was likely to be underestimated. In Jerretts view, the
health effects of air pollution are likely to be at least three
times greater than those estimated by the models relying
on comparisons between communities.
The limitations of ecological studies are well known, in
particular their inability to estimate individual exposure
levels. Therefore we believe that the kernel density
approach used in this study and based on individual
exposure assessments may help to minimize this sort of
bias and thus become a useful tool for empirical research,
looking into possible associations between environmental
risk factors and their health outcomes.
Although the use of KD smoothing is not novel in
epidemiologic studies, its use has been mostly limited to
simple visualization tasks, such as disease mapping and the
detection of disease clusters [5357]. However, as the
present study demonstrates, the application of this
analytical technique may surpass this simple visualizationoriented use and become a useful analytical tool for environmental epidemiological research, looking into possible
associations between environmental risk factors and their
health outcomes. Importantly, the KD technique does not
require the presence of a parameters value for each
geographic location (e.g., incidence rate), which is essential
for more commonly used smoothing techniques, such as
spline, kriging, etc [2425,37]. Therefore, even for data
characterized only by x,y coordinates and having no any
other attributes whatsoever, the KD method makes it
possible to generate a continuous surface of disease cases
density (with visualization of peaks or lacunas), by using

148

B.A. Portnov et al. / Socio-Economic Planning Sciences 43 (2009) 141150

Table 2
Factors affecting the kernel density of lung cancer patients (the average number of lung cancer patients per km2) in the city of Haifa and its suburbs (method
MRA).
Variable

SO2
Elevation
Building density
Distance to main road
Constant
N of cases
R2
Adjusted R2
Std. Error of Estimate
F

Model 1 (All cases)

Model 2 (Age 65)

Ba

tb

Sig.c

VIFd

Ba

tb

Sig.c

VIFd

4.714
0.080
0.108
0.005
15.250
1330
0.437
0.435
26.310
257.283

8.339
10.981
28.806
0.483
3.830

<0.001
<0.001
<0.001
0.629
<0.001

1.021
1.048
1.012
1.059

3.704
0.045
0.095
0.010
5.114
966
0.473
0.471
20.733
215.877

7.155
6.620
27.930
1.098
1.385

<0.001
<0.001
<0.001
0.272
0.012

1.021
1.054
1.013
1.069

<0.001

Model 3 (Males)

SO2
Elevation
Building density
Distance to main road
Constant
N of cases
R2
Adjusted R2
Std. Error of Estimate
F
a
b
c
d

<0.001

Model 4 (Females)

Ba

tb

Sig.c

VIFd

Ba

tb

Sig.c

VIFd

3.872
0.065
0.086
0.011
2.941
843
0.496
0.494
18.879
206.221

7.617
9.329
26.449
1.182
0.812

<0.001
<0.001
<0.001
0.238
0.417

1.032
1.063
1.021
1.077

0.946
0.008
0.024
0.013
13.825
487
0.217
0.210
9.223
33.311

2.854
2.109
10.224
2.443
6.055

0.005
0.035
<0.001
0.015
<0.001

1.015
1.035
1.009
1.041

<0.001

<0.001

Unstandardized regression coefcient.


t-statistic.
Actual signicance of t-statistic.
Variance ination factor (multicollinearity diagnostic).

the information on the location of neighboring cases


without requiring any specic information on the subject
cases themselves. Considerable advances in the GIS-technology and its rapid integration into epidemiological
studies, open ways for empirical verication of the
proposed analytical approach by researchers elsewhere.
A further improvement of the performance of the KDF
method may require more detailed personal-level data,
such as direct indices of individual welfare and population
density, accounting for a wider range of possible

confounding factors, and reecting the whole range of


adverse effects of environmental risk factors on disease
etiology.

5. Disclaimer
The content of this paper is the responsibility of its
authors and does not necessarily reect the views of the
Israel Ministry of Health.

Table 3
Factors affecting the kernel density of lung cancer patients (the average number of (all) lung cancer patients per km2) in the city of Haifa and its suburbs
(method spatial autoregression).
Variable

SO2
Elevation
Building density
Distance to the main road
Constant
N of cases
Std. Error of Estimate
Spatial autoregressive
coefcient (r)
Gradient norm
Log-likelihood

Model 5

Model 6

Model 7

Ba

tb

Ba

tb

Ba

tb

4.714
0.080
0.108
0.005
15.250
1330
26.310
5e18

8.339c
10.981c
28.806c
0.483
3.830c

10.084
0.044
0.175
0.006
44.501
1330
15.450
0.029

14.948c
3.990c
54.448c
0.904
7.966c

2.700
0.054
0.067
0.010
18.181
1330
15.551
0.049

3.492c
6.413c
11.135c
1.781
3.384c

1.833e4
9130

0.003
8444

Model 5: Conditional autoregression (CAR); Model 6: Simultaneous autoregression (SAR); Model 7: Moving averages (MA).
a
Unstandardized regression coefcient.
b
t-statistic.
c
Indicates a two-tailed 0.01 signicance level.

2.164e4
8393

B.A. Portnov et al. / Socio-Economic Planning Sciences 43 (2009) 141150


Table 4
Sensitivity test of the changes in the areal density of LC patients to plausible changes in the SO2 levels.
SO2 Level

Estimated density of
LC patients per km2

% Change

SO2 0
Min (SO2 4.4)
Mean (SO2 6.5)
Max (SO2 10.4)

32.4
53.2
63.1
81.5

64.2
18.6
29.2

Note: The estimates are based on Model 1 (All cases; Table 2); the values of
controls are set to the average levels observed in the study area: elevation 110 m; building density 244 housing units per km2; distance to
main road 74 m.

Acknowledgement
The authors express their gratitude to two anonymous
reviewers for their helpful comments and suggestions.
References
[1] Doll R, Peto R. The causes of cancer: quantitative estimates of
avoidable risks of cancer in the United States today. Journal of the
National Cancer Institute 1981;66:1191308.
[2] Doll R. Epidemiological evidence of the effects of behavior and the
environment on the risk of human cancer. Recent Results in Cancer
Research 1998;154:321.
[3] National Cancer Institute & National Institute of Environmental
Health Sciences (NCI & NIEHS). Cancer and the environment.
National Institute of Health; 2003. Pub. 03-2039.
[4] Jerrett M, Burnett RT, Ma R, Pope 3rd CA, et al. Spatial analysis of air
pollution and mortality in Los Angeles. Epidemiology 2005;16(6):
72736.
[5] Nyberg F, Gustavsson P, Jarup L, et al. Urban air pollution and lung
cancer in Stockholm. Epidemiology 2000;11(5):48795.
[6] Pope 3rd CA, Burnett RT, Thun MJ, et al. Lung cancer, cardiopulmonary mortality, and long-term exposure to ne particulate air
pollution. Journal of the American Medical Association 2002;287:
113241.
[7] Elliott P, Cuzick J, English D, Stern R, editors. Geographical and
environmental epidemiology. Methods for small area studies.
Oxford: Oxford University Press; 1992. p. 404 [1996 reprint].
[8] Elliott P, Wartenberg D. Spatial epidemiology: current approaches
and future challenges. Environmental Health Perspectives 2004;
112(9):9981006.
[9] Nuckols JR, Ward MH, Jarup L. Using geographic information
systems for exposure assessment in environmental epidemiology
studies. Environmental Health Perspectives 2004;112(9):100715.
[10] Robinson WS. Ecological correlations and the behavior of individuals. American Sociological Review 1950;15:3517.
[11] Selvin HC. Durkheims suicide and problems of empirical research.
The American Journal of Sociology 1958;63(6):60719.
[12] Rothman KJ. Modern epidemiology. Boston: Little, Brown and Co.;
1986.
[13] Greenland S, Morgenstern H. Ecological bias, confounding, and
effect modication. International Journal of Epidemiology 1989;
18(1):26974.
[14] Morgenstern H, Thomas D. Principles of study design in environmental epidemiology. Environmental Health Perspectives 1993;
101(4):2338.
[15] Openshaw S. The modiable areal unit problem. In: Concepts and
techniques in modern geography. Monograph series, vol. 38. London: Geo Books; 1984. p. 41.
[16] Unwin DJ. GIS, spatial analysis and spatial statistics. Progress in
Human Geography 1996;20(4):54051.
[17] Brauer M, Hoek G, van Vliet P, Meliefste K, Fischer P, Gehring U, et al.
Estimating long-term average particulate air pollution concentrations: application of trafc indicators and geographic information
systems. Epidemiology 2003;14(2):22839.
[18] Cockings S, Dunn CE, Bhopal RS, Walker DR. Users perspectives on
epidemiological, GIS and point pattern approaches to analyzing
environment and health data. Health Place 2004;10(2):16982.
[19] Gotway C, Young LJ. Combining incompatible spatial data. Journal of
the American Statistical Association 2002;97(48):63247.

149

[20] Scoggins A, Kjellstrom T, Fisher G, Connor J, Gimson N. Spatial


analysis of annual air pollution exposure and mortality. Science of
the Total Environmental 2004;321(13):7185.
[21] Barchana M, Lipshitz I, et al. Geographical mapping of malignant diseases in Israel 19841999 [Hebrew]. Jerusalem: Israel
National Cancer Registry, Ministry of Health; 2001. 4/2001,
p. 1545.
[22] Barchana M, Lipshitz I, et al. Changes in incidence patterns of
selected malignant diseases in Israel. [Hebrew]. Jerusalem: Israel
National Cancer Registry, Ministry of Health; 2005.
[23] Johnson KC, Pan S, Fry R, Mao Y. Residential proximity to industrial
plants and non-Hodgkin lymphoma. Epidemiology 2003;14(6):
68793.
[24] McCoy J, Johnston K. Using ArcGIS spatial analyst. Redlands: ESRI;
2001.
[25] Berke O. Exploratory disease mapping: kriging the spatial risk
function from regional count data. International Journal of Health
Geography 2004;3(1):18.
[26] Lai PC, Wong CM, Hedley AJ, Lo SV, Leung PY, Kong J, et al. Understanding the spatial clustering of severe acute respiratory syndrome
(SARS) in Hong Kong. Environmental Health Perspectives 2004;
112(15):15506.
[27] Goren AI, Hellman S, Brenner S, Egoz N, Rishpon S. Prevalence of
respiratory conditions among schoolchildren exposed to different
levels of air pollutants in the Haifa Bay area, Israel. Environmental
Health Perspectives 1990;89:22531.
[28] Yuval, Broday DM, Carmel Y. Mapping spatio-temporal variables:
the impact of the time-averaging window width on the spatial
accuracy. Atmospheric Environment 2005;39:36119.
[29] Fishler Y, Chitrit A, Barchana M, Modan B. Examination of Israel
national cancer data accumulation completeness for 1991 [Hebrew].
Tel Hashomer, Israel: The National Center for Disease Control; 2003.
Publication No. 230.
[30] Epstein L, Katz L, Tamir A, Rishpon S. Incidence of lung cancer in ve
towns in Israel, 196074. Israel Journal of Medical Science 1984;
20(1):2732.
[31] Tamir A. Lung cancer in towns in Israel. Case control study. PhD
thesis [Hebrew], Technion: Haifa, Israel; 1990.
[32] Alberg AJ, Samet JM. Epidemiology of lung cancer. Chest 2003;123
(1 Suppl.):21S49S [Review].
[33] Skipper PL, Tannenbaum SR, Ross RK, Yu MC. Nonsmoking-related
arylamine exposure and bladder cancer risk. Cancer Epidemiology
Biomarkers and Prevention 2003;12(6):5037.
[34] Vineis P, Husgafvel-Pursiainen K. Air pollution and cancer:
biomarker studies in human populations. Carcinogenesis 2005;
26(11):184655.
[35] Dubnov J, Barchana M, Rishpon S, Leventhal A, Segal I, Carel R, et al.
Estimating the effect of air pollution from a coal-red power station
on the development of childrens pulmonary function. Environmental Research 2007;103:8798.
[36] Archer VE, Coons T, Saccomanno G, Hong DY. Latency and the lung
cancer epidemic among United States uranium miners. Health
Physics 2004;87(5):4809.
[37] Hauptmann M, Berhane K, Langholz B, Lubin J. Using splines to
analyze latency in the Colorado Plateau uranium miners cohort.
Journal of Epidemiology and Biostatistics 2001;6(6):41724.
[38] Vineis P, Hoek G, Krzyzanowski M, Vigna-Taglianti F, Veglia F,
Airoldi L, et al. Air pollution and risk of lung cancer in a prospective study in Europe. International Journal of Cancer 2006;119(1):
16974.
[39] Annesi-Maesano I, Forastiere F, Kunzli N, Brunekref B, Environment
and Health Committee of the European Respiratory Society. Particulate matter, science and EU policy. European Respiratory Journal
2007;29(3):42831.
[40] Pope CA. Invited commentary: particulate matter-mortality exposure-response relations and threshold. American Journal of Epidemiology 2000;152:40712.
[41] Naess , Nafstad P, Aamodt G, Claussen B, Rosland P. Relation
between concentration of air pollution and cause-specic mortality:
four-year exposures to nitrogen dioxide and particulate matter
pollutants in 470 neighborhoods in Oslo, Norway. American Journal
of Epidemiology 2007;165(4):43543.
[42] Gehring U, Heinrich J, Kramer U, Grote V, Hochadel M, Sugiri D, et al.
Long-term exposure to ambient air pollution and cardiopulmonary
mortality in women. Epidemiology 2006;17(5):54551.
[43] Hoek G, Brunekreef B, Goldbohm S, Fisher P, van den Brandt PA.
Association between mortality and indicators of trafc related air
pollution in the Netherlands: a cohort study. Lancet 2002;360:
12039.

150

B.A. Portnov et al. / Socio-Economic Planning Sciences 43 (2009) 141150

[44] Daniels MJ, Dominici F, Samet JM, Zeger SL. Estimating particulate
matter-mortality dose-response curves and threshold levels: an
analysis of daily time-series for the 20 largest US cities. American
Journal of Epidemiology 2000;152(5):397406.
[45] Minami M, Environmental Systems Research Institute. Using ArcMap: GIS. Redlands, California: ESRI; 2000. p. 36592.
[46] Central Bureau of Statistics, Israel (CBSI). Characterization and
classication of local authorities by the socio-economic level of the
population; 2004 [S.P. 1222].
[47] Bray F, Guilloux A, Sankila R, Parkin DM. Practical implications of
imposing a new world standard population. Cancer Causes &
Control 2002;13(2):17582.
[48] Portnov BA, Odish Y, Fleishman L. Factors affecting housing modications and housing pricing: a case study of four residential
neighborhoods in Haifa, Israel. Journal of Real Estate Research 2006;
27(4):371407.
[49] Parkin DM, Whelan SL, Ferlay J, Teppo L, Thomas DB, editors. Cancer
incidence in ve continents, vol. VIII. Lyon, France: International
Agency for Research on Cancer (IARC); 2002. Scientic Publications
No. 155.
[50] Kaluzny SP, Vega SC, Cardoso TP, Shelly AA. SSpatialStats. New
York: Springer; 1997.
[51] Teixeira E, Conde S, Alves P, Ferreira L, Figueiredo A, Parente B. Lung
cancer and women. Revista Portuguesa de Pneumologia 2003;9(3):
22547.
[52] Beeson WL, Abbey DE, Knutsen SF. Long-term concentrations of
ambient air pollutants and incident lung cancer in California adults:
results from the AHSMOG study. Adventist Health Study on Smog.
Environmental Health Perspectives 1998;106(12):81323.
[53] Laden F, Schwartz J, Speizer FE, Dockery DW. Reduction in ne
particulate air pollution and mortality: extended follow-up of the
Harvard Six Cities study. American Journal Respiratory and Critical
Care Medicine 2006;173(6):66772.
[54] Bithell JF. An application of density estimation to geographical
epidemiology. Statistic in Medicine 1990;9:691701.
[55] Gatrell AC, Bailey TC, Diggle PJ, Rowlingson BS. Spatial point pattern
analysis and its application in geographical epidemiology. Transactions of the Institute of British Geographers 1996;21:25674.
[56] Sabel CE, Gatrell AC, Loytonen M, Maasilta P, Jokelainen M. Modelling
exposure opportunities: estimating relative risk for motor neurone
disease in Finland. Social Science Medicine 2000;50:112137.
[57] Webster T, Vieira V, Weinberg J, Aschengrau A. Method for mapping
population-based case-control studies: an application using generalized additive models. International Journal of Health Geography
2006;9(5):26.
[58] Wheeler DC. A comparison of spatial clustering and cluster detection techniques for childhood leukemia incidence in Ohio, 1996
2003. International Journal of Health Geography 2007;27(6):13.
Boris A. Portnov (Studying the Association between Air Pollution and
Lung Cancer Incidence in a Large Metropolitan Area Using a Kernel
Density Function) is an Associate Professor, Department of Natural
Resources & Environmental Management, Graduate School of Management, University of Haifa, Israel. He earned an MA in architecture from

Poltava Civil Engineering Institute, Ukraine Republic, a Ph.D. from Central


Scientic and Project Institute of Town-Building, Moscow, and a D.Sc.
(Second Russian Doctoral Degree) from Moscow Architectural Institute.
Professor Portnovs current research program covers ve interrelated
aspects of population geography, urban & regional planning, and environmental studies: (1) environmental factors of real estate appraisal; (2)
urban clustering; (3) environmental epidemiology; (4) interregional
inequality; and (5) sustainability of urban growth in peripheral areas.
Professor Portnov is an Associate Editor of the International Journal of
Sustainable Society, and serves on the Editorial Boards of Socio-Economic
Planning Sciences; International Journal of Society Systems Science, and
Open Family Studies Journal. He has authored or co-authored four books
and one textbook, including Urban Clustering: The Benets and Drawbacks
of Location, Ashgate, 2001; Regional Inequalities in Small Countries,
Springer, 2005. Professor Portnovs research appears in more than 100
refereed articles and book chapters, published in a variety of journals,
including Annals of Regional Science, Chonobiology International, Environmental Science and Policy, International Migration, Italian Journal of
Regional Science (Scienze Regionale), Journal of Arid Environments, Journal
of Regional Science, Regional Studies, Space & Polity, Socio-Economic
Planning Sciences, Urban Studies, etc. He holds, or shares, three patents in
the former USSR.

Micha Barchana (Studying the Association between Air Pollution and


Lung Cancer Incidence in a Large metropolitan Area Using a Kernel Density
Function) is a Director of the Israel National Cancer Registry and a Senior
Lecturer at the School of Public Health, University of Haifa, Israel. He holds
an M.D. from Bologna University, Italy and an MPH degree from Hebrew
University, Jerusalem. He is board certied in Public Health and has
a second specialization in Medical Administration. Dr. Barchana is a Principal Investigator of the Middle East Cancer Consortium (MECC), a Vice
President of the Mediterranean Oncology Society, and a Board Member of
The Israeli Association of Public Health Physicians, and the Israeli Association for Preventive Medicine. He also serves on selected ministerial
committees, including the Committee for the Evaluation of Health
Hazards from Environmental Pollution, and the Inter-Ministerial
Committee on Mutagenous, Teratogenous and Cancerogenous Substances.
Dr. Barchana has published more than 60 peer-reviewed articles, as well as
several dozens of articles in the Hebrew press.

Jonathan Dubnov (Studying the Association between Air Pollution and


Lung Cancer Incidence in a Large Metropolitan Area Using a Kernel
Density Function) is a Lecturer at the School of Public Health, Faculty of
Social Welfare and Health Studies, University of Haifa, and the Deputy
District Health Ofcer, Ministry of Health, Haifa, Israel. He received his
M.D. at State Institute of Medicine, Yekaterinburg (formerly, Sverdlovsk),
Russia and an M.P.H. at School of Public Health, Hebrew University, Israel.
Dr. Dubnov has published several refereed papers in the elds of public
health and environmental science. His current research focuses on air
pollution, environmental epidemiology, and public health impact assessment and policy. He is a board member of the National Steering
Committees on the Effects of Air Pollution on Public Health.

Vous aimerez peut-être aussi