Académique Documents
Professionnel Documents
Culture Documents
Methods
Volume 9 | Issue 1 Article 26
5-1-2010
Recommended Citation
Eidous, Omar M.; Marie, Mohammad Abd Alrahem Shafeq; and Ebrahem, Mohammed H. Baker Al-Haj (2010) "A Comparative
Study for Bandwidth Selection in Kernel Density Estimation," Journal of Modern Applied Statistical Methods: Vol. 9 : Iss. 1 , Article 26.
DOI: 10.22237/jmasm/1272687900
Available at: http://digitalcommons.wayne.edu/jmasm/vol9/iss1/26
This Regular Article is brought to you for free and open access by the Open Access Journals at DigitalCommons@WayneState. It has been accepted for
inclusion in Journal of Modern Applied Statistical Methods by an authorized editor of DigitalCommons@WayneState.
Journal of Modern Applied Statistical Methods Copyright © 2010 JMASM, Inc.
May 2010, Vol. 9, No. 1, 263-273 1538 – 9472/10/$95.00
Omar M. Eidous Mohammad Abd Alrahem Shafeq Marie Mohammed H. Baker Al-Haj Ebrahem
Yarmouk University, Irbid, Jordan
Nonparametric kernel density estimation method does not make any assumptions regarding the functional
form of curves of interest; hence it allows flexible modeling of data. A crucial problem in kernel density
estimation method is how to determine the bandwidth (smoothing) parameter. This article examines the
most important bandwidth selection methods, in particular, least squares cross-validation, biased cross-
validation, direct plug-in, solve-the-equation rules and contrast methods. Methods are described and
expressions are presented. The main practical contribution is a comparative simulation study that aims to
isolate the most promising methods. The performance of each method is evaluated on the basis of the
mean integrated squared error for small-to-moderate sample size. Simulation results show that the
contrast method is the most promising methods based on the simulated families considered.
Key words: Probability Density Function, Bandwidth, Least Squares Cross-Validation, Biased Cross-
Validation, Contrast Method, Direct Plug-In, Solve-The-Equation Rules.
263
BANDWIDTH SELECTION IN KERNEL DENSITY ESTIMATION
−1
where K h (u ) = h K (uh ) . K is the kernel
−1 Formula (1.3) is disappointing because the
optimal bandwidth is a function of the second
function and is assumed to be symmetric
derivative of the density function being
(Silverman, 1986), and h is the bandwidth (or
estimated. Therefore, unless the true density is
the smoothing parameter) that controls the
known, it is impossible to know the optimal
degree of smoothing applied to the data. Both K
bandwidth. Moreover, when the true density is
and h are under the control of the user, therefore,
known, no estimation problem exists.
their determination is necessary in order to ∞
Nonetheless, the quantity in
analyze results about the kernel estimator.
The bandwidth can be chosen to
(f
−∞
′′( x )) 2 dx
minimize the asymptotic mean integrated square (1.3) can be estimated by using a kernel
error, or AMISE (Silverman, 1986). In this case, estimator.
h can be obtained by minimizing
Methodology
∞ Selecting the Bandwidth
(
B ia s fˆ ( x )
) + V a r ( fˆ ( x )) d x .
2
M IS E =
−∞
The practical implementation of the
(1.2) kernel density estimator requires specification of
the bandwidth h. A widely used criterion is to
choose an h that minimizes the AMISE: the
If Bias fˆ( x ) and V ar (fˆ ( x )) are substituted bandwidth controls the smoothness of the fitted
into (1.2), then h is obtained by solving the density curve. Note that a larger h provides a
following equation smoother estimate with smaller variance and
larger bias, while a smaller h produces a rougher
min AMISE = estimate with larger variance and smaller bias.
h
Most methods for choosing the
1 2
∞
2
f ( x)
∞
bandwidth presented in the literature are
min h f ′′( x)k2 + K 2
(t ) dt dx proposed when the underlying probability
h
2
−∞ nh −∞ density function, f(x) has support ( −∞, ∞ ) . In
addition, by surveying the literature, it was
Taking the derivative of AMISE with respect to found that the methods represented herein are
h and equating to zero yields, commonly used to estimate the smoothing
1/ 5 − 1/ 5
parameter h in practice.
∞ ∞
K (t ) d t f ′′ ( t ) d t
−2 /5
h=k 2 2
n − 1/ 5
2 Least squares cross-validation (LSCV)
− ∞ − ∞
1/ 5
Least squares cross-validation (LSCV),
μ (K ) proposed by Rudemo (1982) and Bowman
= ,
k 2 R ( f ′′ ) n
2
(1984), is a completely automatic method for
(1.3) choosing the bandwidth h. Following Rudemo’s
(1982) derivations, the optimal bandwidth
where estimator can be obtained by minimizing:
∞
t LSCV( h ) =
2
k2 = K (t ) dt ,
−∞ ∞ n n
∞
fˆ 2 ( x; h ) dx − 2 n −1 ( n − 1) −1 K h ( X i − X j )
μ (K ) = K
2
(t ) dt , −∞ i =1 j ≠ i
−∞
and (1.4)
∞
R (f ′′) = f ′′ (t ) dt .
2
According to Rudemo (1982), formula (1.4) is
−∞
derived based on the exact MISE. If the kernel
function is Gaussian density, then
264
EIDOUS, SHAFEQ MARIE & AL-HAJ EBRAHEM
∞
is the Gaussian kernel. Because k 2 = 1 and
fˆ 2 ( x; h ) dx
1
−∞ μ (K ) = , BCV(h) is given by
1 ∞ x − Xi x − X j
n n 2 π
= 2 2 K K dx
n h −∞ i =1 j =1 h h BC V (h) =
n n ( X i − X j )2 1
1 − +
=
2n 2 h π
e
i =1 j =1
2 h2
2 nh π
3 n n −
( X i − X j )2
h4 e 4h2
and 32n 2h5 π
i =1 j =1
n n ( X i − X j )2
( X i − X j )2 −
n n
1 n n
1 − −h 2
(X − X j) e 2 4h2
i =1 j ≠ i
Kh ( X i − X j ) =
h i =1 j ≠ i 2π
e 2 h2
. i =1 j =1
i
1 n n −
( X i − X j )2
+
12
(X i − X j) e 4 4h2
i =1 j =1
Therefore,
LSCV(h)=
The optimal value of h is obtained by
n n ( X i − X j )2 minimizing BCV(h) over h.
1 −
2n h π
2 e 2 h2
(f )
2
The optimal bandwidth h is obtained by μ (f (r )
)= (r )
(x ) dx , r = 2, 4, 6, 8, …
−∞
minimizing the right side of (1.5) over h.
265
BANDWIDTH SELECTION IN KERNEL DENSITY ESTIMATION
where the pilot bandwidth for the estimation of p j = 0 , the bandwidth that minimizes the
ψ 4 is a function γ of h. The choice of γ may
j =1
266
EIDOUS, SHAFEQ MARIE & AL-HAJ EBRAHEM
q
K2 = e 8h2 .
h 2 2 π
fˆ ( x; h ) = c j fˆ j ( x; h ).
j =1
Therefore,
The kernels may have an equal weight if q is
chosen as an even integer, where ISE ( h ) CONT =
∞ n ( x − X i )2
2
1 1 −
= 2 2 dx
q 2
= 1 ; c j = 1/ q for j=1,…, q e 2h
c j 2n h −∞ i =1 2π
j =1
∞ n ( x − X i )2 n ( x − X j )2
and 1 1 − 1 −
− 2 2
2
8h2
e 2h
e dx
n h −∞ i =1 2π j =1 2 2π
q
pj = 0; p j = −p2 j for j=1,…, q / 2. ∞ n ( x− X j ) 2
2
1 1
dx
−
+ −∞
j =1 2
e 8h
2n 2h 2 2 2π
j =1
The simulation results in this article were found
( X i − X j )2
by taking, p1 = −p2 , p 2 = −1 , c1 = c2 = 1/ 2 , 31 n n −
= − 2 2
8h π n h
e 10 h 2
267
BANDWIDTH SELECTION IN KERNEL DENSITY ESTIMATION
of the different methods for the estimator of f(x) Fryer (1976) and Deheuvels (1977) first
for different values of the sample size, n. In this showed that the MISE could be calculated
study, four different normal mixture densities exactly when both the underlying density and
were simulated; these densities are (Marron & the kernel function are Gaussian. The integrated
Wand, 1992): squared error (ISE) of the estimator - if the true
underlying density is known to be f(x) as in
a. Gaussian: equation (1.37) - is given by Marron and Wand
f 1 (x ) = φ (x ) . (1992) as
∞ 2
b. Kurtotic Unimodal: ISE( fˆ ) = fˆ ( x; h) − f ( x ) dx
−∞
f 2 ( x ) = 23 φ ( x ) + 1
3
φ1 /1 0 ( x )
n n
1
= 2 φh 2 ( X i1 − X i2 )
c. Bimodal: n i1 =1 i2 =1
f 3 ( x ) = 12 φ2 / 3 ( x + 1) + 12 φ 2 / 3 ( x − 1) 2 n k
− wlφ( h2 +σ l2 )1/2 ( X i − μl )
n i =1 l =1
d. Strongly Skewed:
8
1 +U ( h, 0)
f 4 ( x) = φ(2/3)l −1 {x − 3[(2 / 3)l −1 − 1]} where
l =1 8
(1.9)
U (h, q ) =
k k
where φ A (u ) = A − 1φ (u A ) and φ wl wl φ ( qh 2 ( μ l1 − μ l2 )
1 2 + σ l2 + σ l2 )1/ 2
l =1 l =1 1 2
denotes the probability density function (pdf) of 1 2
φ A (u ) =
2
e2A . expected value of the ISE, called the MISE.
2π A For each normal mixture density in (1.9)
and each sample size n = 50, 100, 200, 500 that
These densities represent Symmetric, Kurtotic were simulated from f(x), 1,000 samples were
Unimodal, Bimodal and Strongly Skewed artificially repeated from each f(x). For each
distributions respectively. Figure 1 displays the sample, the bandwidth h based on LSCV, BCV,
shapes of these densities, which are a small DPI, CONT and STE methods were obtained.
subset of fifteen normal mixtures used by Subsequently, for each sample the ISE values
Marron and Wand (1992). were obtained by using (1.9) according to the
The general normal mixtures density is simulated density f(x). Subsequently, the MISE
given by (Marron & Wand, 1992): values were empirically determined as the mean
of the ISE values obtained in each sample. Table
k 1 displays the simulation results and the MISEs
f ( x ) = wlφσ l ( x − μ l ) against the sample sizes for the different
l =1 underling normal mixture densities. Moreover,
the relative efficiencies of the contrast (CONT)
where − ∞ < μ l < ∞ , σ l > 0 and w l is a method against LSCV, BCV, DPI and STE
methods are given in Table 2. The rule of
vector with positive entries summing to unity
relative efficiency is given by
(weight), for l=1, 2 ,…, k. It is assumed that f
has a normal k-mixture density with parameters
ˆ M ISE ( hˆ* )
{ (w l , μ l , σ l2 ) : l = 1, 2, ..., k } . R E (h)= ,
M ISE ( hˆC O N T )
268
EIDOUS, SHAFEQ MARIE & AL-HAJ EBRAHEM
269
BANDWIDTH SELECTION IN KERNEL DENSITY ESTIMATION
Table 1: The MISE ( f̂ ) for Different Methods to Choose the Value of Bandwidth
MISE ( f̂ )
Method f 1 (x ) f 2 (x ) f 3 (x ) f 4 (x )
Sample Size
270
EIDOUS, SHAFEQ MARIE & AL-HAJ EBRAHEM
4. The DPI and STE methods produce similar achieved. The relative efficiency values are
results in term of their MISE values for all less than one in some cases, which indicates
densities and for all sample sizes. The DPI that the performance of the corresponding
method performs better than the STE method is better than CONT method, but the
method for small sample sizes and as the relative efficiency remains acceptable in
sample size increases the STE is better than these cases.
the DPI method. This indicates that the 7. Comparing the MISE values for different
convergence rate of the STE method is methods when the data are simulated from
faster than that of the DPI method. f 4 (x ) to the MISE values when the data
5. The performance of the CONT method are simulated from the other densities, it
generally is better than the performance of
may be concluded that f 4 (x ) is difficult to
the other methods. A significant
improvement for the CONT method over the estimate by any of the methods considered.
other methods is clearly demonstrated in the That is, the strongly skewed density contains
bimodal ( f 3 (x ) ) and the strongly skewed features that cannot be recovered from the
sample sizes considered.
( f 4 (x ) ) models. 8. On the basis of the simulation results, the
6. The relative efficiency values in Table 2 CONT method may be recommended as a
show that, for most of the densities and global method to select the bandwidth h in
sample sizes, a considerable gain in the kernel density estimation.
relative efficiency for the CONT method is
271
BANDWIDTH SELECTION IN KERNEL DENSITY ESTIMATION
This study has shown that the CONT Fryer, M.J. (1976). Some errors
method is a useful technique for choosing the associated with the nonparametric estimation of
bandwidth of the kernel estimator. The CONT density functions. Journal of Instructional
method produces reasonable estimates for f(x) in Mathematical Applications, 18, 371-380.
almost all cases considered (see Table 2). Hall, P., & Marron, J. S. (1985). Extent
Although the conclusions are based on four to which least-square cross-validation minimizes
different densities, many other candidate shapes integrated square error in nonparametric density
exist for the densities from which it is assumed estimation. Technical Report, 94, University of
that the data was obtained (Marron & Wand, North Carolina, Department of Statistics.
1992). Therefore, it is not possible to claim that Janssen, P., Marron, J. S., Veraverbeke,
the CONT method performs better than the other N., & Sarle, W. (1995). Scale measures for
methods for any set of data. However, based on bandwidth selection. Journal of Nonparametric
the simulation study, the different methods can Statistics, 5, 359-380.
be ranked in ascending order (best to worst) Marron, J. S. (1988). Automatic
according to their performances as follows: smoothing parameter selection: A survey.
Empirical Econometrics, 13, 187-208.
1. CONT. Marron, J. S., & Wand, M. P. (1992).
2. DPI (2-stage) and STE (2-stage ) Exact Mean Integrated Squared Error. Annals of
3. LSCV Statistics, 20, 712-736.
and lastly, Park, B. U., & Marron, J. S. (1990).
4. BCV Comparison of data-driven bandwidth selectors.
Journal of the American Statistical Association,
85, 66-72.
References Rudemo, M. (1982). Empirical choice of
Ahmad, I. A., & Fan, Y. (2001). histograms and kernel density estimators.
Optimal bandwidth for kernel density estimators Scandinavian Journal of Statistics, 9, 65-78.
of functions of observations. Statistics and Scott, D. W. (1992). Multivariate
Probability Letters, 51(3), 245-251. density estimation: Theory, practice, and
Ahmad, I. A., & Mugdadi, A. R. (2003). visualization. New York: Wiley.
Analysis of kernel density estimation of Scott, D. W., & Terrell, G. R. (1987).
functions of random variables. Journal of Biased and unbiased cross-validation in density
Nonparametric Statistics, 15, 579-605. estimation. Journal of the American Statistical
Ahmad, I. A., & Ran, I. S. (1998). Association, 82, 1131-1146.
Kernel contrasts: a data based method of Scott, D. W., Tapia, R. A., &
choosing smoothing parameters in Thompson, J. R. (1977). Kernel density
nonparametric density estimation. Unpublished estimation revisited. Nonlinear Analysis, 1, 339-
Manuscript. 372.
Ahmad, I. A., & Ran, I. S. (2004). Sheather, S. J. (1986). An improved
Kernel contrasts: A data-based method of data-based algorithm for choosing the window
choosing smoothing parameters in width when estimating the density at a point.
nonparametric density estimation. Journal of Computational Statistical Data Analysis, 4, 61-
Nonparametric Statistics, 16, 671-707. 65.
Bowman, A. W. (1984). An alternative Sheather, S. J. (1992). The performance
method of cross-validation for the smoothing of of six popular bandwidth selection methods on
density estimates. Biometrika, 71, 353-360. some real data sets (with discussion).
Deheuvels, P. (1977). Estimation Computational Statistics, 7, 225-50, 271-281.
nonparametrique de la densite par Sheather, S. J., & Jones, M. C. (1991).
historgrammes generalises. Rev. Statistical A reliable data-based bandwidth selection
Applications, 25, 5- 42. method for kernel density estimation. Journal of
the Royal Statistical Society, B53, 683-690.
272
EIDOUS, SHAFEQ MARIE & AL-HAJ EBRAHEM
273