Académique Documents
Professionnel Documents
Culture Documents
Department of Natural Resources & Environmental Management, Graduate School of Management, University of Haifa, Israel
Haifa District Health Ofce, Ministry of Health, Israel
c
Israel National Cancer Registry, Ministry of Health, Israel
d
School of Public Health, University of Haifa, Israel
b
a r t i c l e i n f o
a b s t r a c t
Article history:
Available online 9 October 2008
In the absence of patient-specic data, composite level data are often used in epidemiological studies. However, since individual exposure levels cannot accurately be inferred
from aggregate data, such an approach may lead to erroneous estimates of health effects of
potential environmental risk factors. In the present study, we attempt to address this
information-loss problem by using the kernel density function, which estimates the
intensity of events across a surface, by calculating the overall number of cases situated
within a given search radius from a target point. The present paper illustrates the use of
this analytical technique for a study of association between the geographical distributions
of lung cancer cases and SO2 air pollution estimates in the Greater Haifa Metropolitan Area
(GHMA). In the analysis, the results obtained by kernel smoothing are contrasted with
those obtained by areal aggregation techniques more commonly used in empirical studies.
2008 Elsevier Ltd. All rights reserved.
Keywords:
Kernel density
Air pollution
Cancer
1. Introduction
According to recent epidemiological studies, the
majority of cancer cases derive from environmental causes,
that is, factors attributed to either the indoor environment
(housing conditions, indoor pollution and professional
exposure) or outdoor air pollution and soil contamination
(see inter alia [13]). A popular approach to exploring the
etiology of malignant diseases relates the spatial distribution of disease cases to the spatial patterns of potential
health-risk factors [46].
However, a typical criticism of traditional epidemiological studies is that they often rely on aggregated data
and frequently make use of coarse geographical units,
dened for purposes other than health investigations [79].
An underlying assumption behind this aggregate
approach is that the estimated average exposure in
* Corresponding author.
E-mail address: portnov@nrem.haifa.ac.il (B.A. Portnov).
0038-0121/$ see front matter 2008 Elsevier Ltd. All rights reserved.
doi:10.1016/j.seps.2008.09.001
142
1
A comprehensive survey carried out in 1991 revealed that the
completeness of the INCR registration was above 94% [29].
143
144
Fig. 2. Location of homes of lung cancer patients in the city of Haifa and its suburbs juxtaposed upon SO2 air pollution estimates (average annual concentration in
mg/m3 in 19952003).
2
It may be instructive to compare the KD technique with other
smoothing techniques commonly used in geographic and epidemiological
literature, such as e.g., spline or kriging interpolations. However, such
a comparison is rather impracticable. The matter is that the latter techniques require information on a parameters value in each location (e.g.,
disease rate). However, this information is unavailable for simple point
data representing locations of homes of individual LC patients, such as
those used in the present study. In contrast, as previously mentioned, the
KD smoothing techniques make it possible to construct a continuous
density surface of disease incidence using solely the information on the
location of neighboring cases, i.e., without requiring any specic data on
the subject cases per se.
3
The interrelationship between the hedonic value of urban location
and property prices is a subject of long-standing debate in real estate
literature. Importantly, the association between the elevation of existing
dwellings and welfare levels of their residents is not bi-directional, that is,
welfare levels may be affected by the location of dwellings on the
geographic terrain but not vice versa.
145
Fig. 3. Kernel density of lung cancer (LC) patients in the city of Haifa and its suburbs in 19952004 (the average number of LC patients per km2).
4
Search radius (or kernel bandwidth) is a user-specied variable which
determines the distance a point must lie within in order to contribute to
a cell value. A large bandwidth value leads to over-smoothing of the
kernel surface, while a small bandwidth value leads to a spiked kernel
over a location [24]. Therefore, setting the bandwidth may change the
outcome of the analysis. To verify this possibility, three different search
radii 500 m, 750 m, and 1000 m were tested. In the following
discussion only the results for the best performing (500 m) radius are
reported.
146
5
In Israel, as in most western countries, incidence rate of LC differs
considerably by gender and age, with approximately 70% of all LC cases in
men diagnosed at the age of 65 and older [49].
I > 0.1; p < 0.05; see Fig. 4). The observed autocollinearity
necessitated the use of spatial dependency models. There
are two primary types of such models: a) the spatial error
(SE) model, which assumes that the error terms across
different spatial units are correlated, and b) the spatial lag
(SL) model, which presumes that the dependent variable in
place i is affected by the independent variables in both
place i and in neighboring locations, j [58]. Since from the
outset of the analysis, the density of LC patients in a given
point of space was assumed to depend on the predictors
values at both the subject location and in neighboring areas
(especially, in respect to SO2 pollution levels), the SL model
best tted our needs. In the analysis, the SL model was
estimated by three alternative methods Conditional
Autoregression (CAR), Simultaneous Autoregression (SAR)
and Moving Averages (MA). (For brevitys sake, spatial lag
models only for all LC cases are reported in the following
discussion). The analysis was performed in the
SSpatialStats software.
3. Results
Table 1 reports the factors affecting age-standardized
rates (ASIR) of lung cancer in the city of Haifa and its
suburbs, estimated by the multiple regression analysis
(MRA) procedure for Small Statistical Areas (SCAs), separately for both genders. Concurrently, Table 2 reports the
factors affecting the kernel density of LC patients estimated
by the MRA method for all LC patients (Model 1), LC
patients of 65 years old and older (Model 2), and, separately, for males and females (Models 3 & 4).
As Table 1 shows, the regression ts of the SCA-level
models, measured by their R2, are not especially high
(R2 0.239 for males, and R2 0.024 for females), and none
of the explanatory factors coved by the analysis appears to
be statistically signicant at the established signicance
level (p < 0.05). Furthermore, contrary to all expectations,
in neither model, the SO2 variable crosses a 0.10 signicance threshold, indicting that there is insufcient
evidence that local LC rates are associated signicantly with
SO2 pollution levels in small census areas.
However, the outcome of the analysis appears to be
distinctively different when the association between SO2
pollution and LC incidence is investigated using kernel
density estimates (see Table 2). First, the model ts appear
to be substantially higher than in the previous run
(R2 0.2170.496; Table 2 vs. R2 0.0240.239; Table 1).
Second, several explanatory variables (including SO2 individual exposure estimates) emerged as highly statistically
signicant (p < 0.01) and exhibit expected signs. In particular, in all models (see Table 2), the areal density of LC
patients appears to increase in line with SO2 pollution
levels (t > 2.8; p < 0.01), building density (t > 10;
p < 0.001), and drop as elevation increases (t > 2.1,
p < 0.01). Characteristically, the distance to main road is
statistically signicant (t > 2.4; p < 0.05) only in Model 4
(females), the effect that may apparently be attributed to
the fact that females spend more times inside the houses
and in their vicinity thus exposing themselves more to
trafc-generated air pollution from nearby thoroughfare
roads [38,50,51].
1.4
60
Moran's I
1.2
Z-Normal I
50
40
0.8
30
0.6
20
0.4
10
0.2
0
-0.2
Z-normal I
Moran's I
147
-10
0-50 50- 100- 150- 200- 250- 300- 350- 400- 450- 500- 550- 600- 650- 700- 750100 150 200 250 300 350 400 450 500 550 600 650 700 750 800
-20
Table 1
Factors affecting age-standardized rates of lung cancer incidence in the
city of Haifa and its suburbs in 19952003 (units of analysis small census
areas; method multiple linear regression (MRA)).
Variable
ASR (males)
Ba
Constan)
SO2
Education
(BA)
Density
Income
N of cases
R2
Adjusted R2
Std. Error
F
a
b
c
d
tb
ASR (females)
Sig.c
VIFd
Ba
tb
Sig.c
VIFd
17.082
0.537 0.593
2.254 0.164 0.870
0.281 0.204 0.939 1.185
0.505 0.911 0.365 1.188
0.232 0.592 0.555 2.637 0.052 0.321 0.749 2.429
38.190
1.788 0.077 2.881 6.339
7.997 1.869 0.065 2.919
2.163
87
80
0.239
0.024
0.201
0.029
22.415
8.693
6.344
0.000
0.452
0.771
148
Table 2
Factors affecting the kernel density of lung cancer patients (the average number of lung cancer patients per km2) in the city of Haifa and its suburbs (method
MRA).
Variable
SO2
Elevation
Building density
Distance to main road
Constant
N of cases
R2
Adjusted R2
Std. Error of Estimate
F
Ba
tb
Sig.c
VIFd
Ba
tb
Sig.c
VIFd
4.714
0.080
0.108
0.005
15.250
1330
0.437
0.435
26.310
257.283
8.339
10.981
28.806
0.483
3.830
<0.001
<0.001
<0.001
0.629
<0.001
1.021
1.048
1.012
1.059
3.704
0.045
0.095
0.010
5.114
966
0.473
0.471
20.733
215.877
7.155
6.620
27.930
1.098
1.385
<0.001
<0.001
<0.001
0.272
0.012
1.021
1.054
1.013
1.069
<0.001
Model 3 (Males)
SO2
Elevation
Building density
Distance to main road
Constant
N of cases
R2
Adjusted R2
Std. Error of Estimate
F
a
b
c
d
<0.001
Model 4 (Females)
Ba
tb
Sig.c
VIFd
Ba
tb
Sig.c
VIFd
3.872
0.065
0.086
0.011
2.941
843
0.496
0.494
18.879
206.221
7.617
9.329
26.449
1.182
0.812
<0.001
<0.001
<0.001
0.238
0.417
1.032
1.063
1.021
1.077
0.946
0.008
0.024
0.013
13.825
487
0.217
0.210
9.223
33.311
2.854
2.109
10.224
2.443
6.055
0.005
0.035
<0.001
0.015
<0.001
1.015
1.035
1.009
1.041
<0.001
<0.001
5. Disclaimer
The content of this paper is the responsibility of its
authors and does not necessarily reect the views of the
Israel Ministry of Health.
Table 3
Factors affecting the kernel density of lung cancer patients (the average number of (all) lung cancer patients per km2) in the city of Haifa and its suburbs
(method spatial autoregression).
Variable
SO2
Elevation
Building density
Distance to the main road
Constant
N of cases
Std. Error of Estimate
Spatial autoregressive
coefcient (r)
Gradient norm
Log-likelihood
Model 5
Model 6
Model 7
Ba
tb
Ba
tb
Ba
tb
4.714
0.080
0.108
0.005
15.250
1330
26.310
5e18
8.339c
10.981c
28.806c
0.483
3.830c
10.084
0.044
0.175
0.006
44.501
1330
15.450
0.029
14.948c
3.990c
54.448c
0.904
7.966c
2.700
0.054
0.067
0.010
18.181
1330
15.551
0.049
3.492c
6.413c
11.135c
1.781
3.384c
1.833e4
9130
0.003
8444
Model 5: Conditional autoregression (CAR); Model 6: Simultaneous autoregression (SAR); Model 7: Moving averages (MA).
a
Unstandardized regression coefcient.
b
t-statistic.
c
Indicates a two-tailed 0.01 signicance level.
2.164e4
8393
Estimated density of
LC patients per km2
% Change
SO2 0
Min (SO2 4.4)
Mean (SO2 6.5)
Max (SO2 10.4)
32.4
53.2
63.1
81.5
64.2
18.6
29.2
Note: The estimates are based on Model 1 (All cases; Table 2); the values of
controls are set to the average levels observed in the study area: elevation 110 m; building density 244 housing units per km2; distance to
main road 74 m.
Acknowledgement
The authors express their gratitude to two anonymous
reviewers for their helpful comments and suggestions.
References
[1] Doll R, Peto R. The causes of cancer: quantitative estimates of
avoidable risks of cancer in the United States today. Journal of the
National Cancer Institute 1981;66:1191308.
[2] Doll R. Epidemiological evidence of the effects of behavior and the
environment on the risk of human cancer. Recent Results in Cancer
Research 1998;154:321.
[3] National Cancer Institute & National Institute of Environmental
Health Sciences (NCI & NIEHS). Cancer and the environment.
National Institute of Health; 2003. Pub. 03-2039.
[4] Jerrett M, Burnett RT, Ma R, Pope 3rd CA, et al. Spatial analysis of air
pollution and mortality in Los Angeles. Epidemiology 2005;16(6):
72736.
[5] Nyberg F, Gustavsson P, Jarup L, et al. Urban air pollution and lung
cancer in Stockholm. Epidemiology 2000;11(5):48795.
[6] Pope 3rd CA, Burnett RT, Thun MJ, et al. Lung cancer, cardiopulmonary mortality, and long-term exposure to ne particulate air
pollution. Journal of the American Medical Association 2002;287:
113241.
[7] Elliott P, Cuzick J, English D, Stern R, editors. Geographical and
environmental epidemiology. Methods for small area studies.
Oxford: Oxford University Press; 1992. p. 404 [1996 reprint].
[8] Elliott P, Wartenberg D. Spatial epidemiology: current approaches
and future challenges. Environmental Health Perspectives 2004;
112(9):9981006.
[9] Nuckols JR, Ward MH, Jarup L. Using geographic information
systems for exposure assessment in environmental epidemiology
studies. Environmental Health Perspectives 2004;112(9):100715.
[10] Robinson WS. Ecological correlations and the behavior of individuals. American Sociological Review 1950;15:3517.
[11] Selvin HC. Durkheims suicide and problems of empirical research.
The American Journal of Sociology 1958;63(6):60719.
[12] Rothman KJ. Modern epidemiology. Boston: Little, Brown and Co.;
1986.
[13] Greenland S, Morgenstern H. Ecological bias, confounding, and
effect modication. International Journal of Epidemiology 1989;
18(1):26974.
[14] Morgenstern H, Thomas D. Principles of study design in environmental epidemiology. Environmental Health Perspectives 1993;
101(4):2338.
[15] Openshaw S. The modiable areal unit problem. In: Concepts and
techniques in modern geography. Monograph series, vol. 38. London: Geo Books; 1984. p. 41.
[16] Unwin DJ. GIS, spatial analysis and spatial statistics. Progress in
Human Geography 1996;20(4):54051.
[17] Brauer M, Hoek G, van Vliet P, Meliefste K, Fischer P, Gehring U, et al.
Estimating long-term average particulate air pollution concentrations: application of trafc indicators and geographic information
systems. Epidemiology 2003;14(2):22839.
[18] Cockings S, Dunn CE, Bhopal RS, Walker DR. Users perspectives on
epidemiological, GIS and point pattern approaches to analyzing
environment and health data. Health Place 2004;10(2):16982.
[19] Gotway C, Young LJ. Combining incompatible spatial data. Journal of
the American Statistical Association 2002;97(48):63247.
149
150
[44] Daniels MJ, Dominici F, Samet JM, Zeger SL. Estimating particulate
matter-mortality dose-response curves and threshold levels: an
analysis of daily time-series for the 20 largest US cities. American
Journal of Epidemiology 2000;152(5):397406.
[45] Minami M, Environmental Systems Research Institute. Using ArcMap: GIS. Redlands, California: ESRI; 2000. p. 36592.
[46] Central Bureau of Statistics, Israel (CBSI). Characterization and
classication of local authorities by the socio-economic level of the
population; 2004 [S.P. 1222].
[47] Bray F, Guilloux A, Sankila R, Parkin DM. Practical implications of
imposing a new world standard population. Cancer Causes &
Control 2002;13(2):17582.
[48] Portnov BA, Odish Y, Fleishman L. Factors affecting housing modications and housing pricing: a case study of four residential
neighborhoods in Haifa, Israel. Journal of Real Estate Research 2006;
27(4):371407.
[49] Parkin DM, Whelan SL, Ferlay J, Teppo L, Thomas DB, editors. Cancer
incidence in ve continents, vol. VIII. Lyon, France: International
Agency for Research on Cancer (IARC); 2002. Scientic Publications
No. 155.
[50] Kaluzny SP, Vega SC, Cardoso TP, Shelly AA. SSpatialStats. New
York: Springer; 1997.
[51] Teixeira E, Conde S, Alves P, Ferreira L, Figueiredo A, Parente B. Lung
cancer and women. Revista Portuguesa de Pneumologia 2003;9(3):
22547.
[52] Beeson WL, Abbey DE, Knutsen SF. Long-term concentrations of
ambient air pollutants and incident lung cancer in California adults:
results from the AHSMOG study. Adventist Health Study on Smog.
Environmental Health Perspectives 1998;106(12):81323.
[53] Laden F, Schwartz J, Speizer FE, Dockery DW. Reduction in ne
particulate air pollution and mortality: extended follow-up of the
Harvard Six Cities study. American Journal Respiratory and Critical
Care Medicine 2006;173(6):66772.
[54] Bithell JF. An application of density estimation to geographical
epidemiology. Statistic in Medicine 1990;9:691701.
[55] Gatrell AC, Bailey TC, Diggle PJ, Rowlingson BS. Spatial point pattern
analysis and its application in geographical epidemiology. Transactions of the Institute of British Geographers 1996;21:25674.
[56] Sabel CE, Gatrell AC, Loytonen M, Maasilta P, Jokelainen M. Modelling
exposure opportunities: estimating relative risk for motor neurone
disease in Finland. Social Science Medicine 2000;50:112137.
[57] Webster T, Vieira V, Weinberg J, Aschengrau A. Method for mapping
population-based case-control studies: an application using generalized additive models. International Journal of Health Geography
2006;9(5):26.
[58] Wheeler DC. A comparison of spatial clustering and cluster detection techniques for childhood leukemia incidence in Ohio, 1996
2003. International Journal of Health Geography 2007;27(6):13.
Boris A. Portnov (Studying the Association between Air Pollution and
Lung Cancer Incidence in a Large Metropolitan Area Using a Kernel
Density Function) is an Associate Professor, Department of Natural
Resources & Environmental Management, Graduate School of Management, University of Haifa, Israel. He earned an MA in architecture from