Kernel Density Estimation and Its Application
Kernel Density Estimation and Its Application
1051/itmconf/20182300037
XLVIII Seminar of Applied Mathematics
Abstract. Kernel density estimation is a technique for estimation of probability density function
that is a must-have enabling the user to better analyse the studied probability distribution than when
using a traditional histogram. Unlike the histogram, the kernel technique produces smooth estimate
of the pdf, uses all sample points' locations and more convincingly suggest multimodality. In its
two-dimensional applications, kernel estimation is even better as the 2D histogram requires
additionally to define the orientation of 2D bins. Two concepts play fundamental role in kernel
estimation: kernel function shape and coefficient of smoothness, of which the latter is crucial to the
method. Several real-life examples, both for univariate and bivariate applications, are shown.
*
Corresponding author: [email protected]
© The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(http://creativecommons.org/licenses/by/4.0/).
ITM Web of Conferences 23, 00037 (2018) https://doi.org/10.1051/itmconf/20182300037
XLVIII Seminar of Applied Mathematics
Kernel Definition
3 (1 15 t ) 2 for t 5
K (t )
4 5
Epanechnikov
0 for t 5
1615 (1 t ) for t 1
2 2
Biweight K (t )
0 for t 1
1 t for t 1
Triangular K (t )
0 for t 1
1 t 2 / 2
Gaussian K (t ) e
2
Fig. 2. Construction of kernel density estimator (1) (continuous
for t 1
1
line) with an asymmetric kernel (dashed lines) for the same 4-
Rectangular K (t ) 2
element sample as in Fig. 1. 0 for t 1
Fig. 1 shows that the shape of a symmetric kernel is Table 2. Examples of asymmetrical kernel functions.
the same for all sample points while Fig. 2 reveals that Symbol b denotes the smoothing parameter.
the shape of an asymmetric kernel differs with the point
placement.
Symmetry property allows to write the kernel Kernel Definition
function in a form used most frequently: x / b t / b
t e
Gamma 1 [18] K GAM 1 ( x, b; t )
1 x t b
x / b 1
( x / b 1)
K sym ( x, t ) K (5)
h h t b ( x ) 1et / b
KGAM 2 ( b ( x), b; t ) b ( x )
b ( b ( x))
where parameter h, called smoothing parameter, window Gamma 2 [18]
x / b for x 2b
width or bandwidth, governs the amount of smoothing b ( x ) 1
4 ( x / b) 1 for x [0,2b)
2
applied to the sample (Fig. 3).
For symmetrical kernel functions, the choice of the
1 t x
Inverse 1 2
2 bx x t
shape of the kernel function K(.) has rather little effect K IG ( x, b; t ) e
Gaussian [19] 2 bt
on the shape of the estimator [11, 21], whereas as Fig.
3
Gaussian [19] 2 bt
Too small value of h may cause the estimator to show
ln t ln x 2
insignificant details while too large value of h causes Lognormal 1
8 ln(1 b )
K LN ( x, b; t ) e
oversmoothing of the information contained in the [20] 8 ln(1 b)t
sample, which, in consequence, may mask some of
important characteristics, e.g. multimodality, of f(x) (cf.
Fig. 3). A certain compromise is needed.
2
ITM Web of Conferences 23, 00037 (2018) https://doi.org/10.1051/itmconf/20182300037
XLVIII Seminar of Applied Mathematics
1 n
xi x y j y
fˆ ( x, y ) K K (7)
nhx hy i 1 hx hy
Fig. 5. Different symmetrical kernel functions applied to
MSE x fˆ E fˆ x f x
2
a sample of 45 standardized annual maximum (9)
flows (1961–1995) of Odra river recorded at the
Racibórz-Miedonia gauge station (data source: [22]).
which, after simple transformations, can be presented as
The univariate case can be easily formally extended follows:
to the multivariate case [23]. However, its illustrative
MSE x fˆ Efˆ x f x var fˆ x
2
(graphical) power works well for bivariate case only.
The most frequently used bivariate kernel function is (10)
bias fˆ x var fˆ x
2
symmetric
1 n xi x y j y
fˆ ( x, y) K ,
hy
(6) that is, MSEx is the sum of the square bias and the
nhx hy i 1 hx variance of fˆ ( x) at x. Reducing the bias causes variance
to increase and vice versa, so a trade-off between these
where {xi, yi}, i = 1,2,...,n, is a sample, and hx and hy are terms is needed.
smoothing coefficients. Available are multivariate MSEx is a local measure. Integration of MSEx over
counterparts of univariate kernel functions listed in
Table 1, e.g., multivariate Epanechnikov kernel or all x gives a global measure of conformity of fˆ ( x) with
multivariate Gaussian kernel [11]. f(x), called the mean integrated square error, MISE, [11]:
3
ITM Web of Conferences 23, 00037 (2018) https://doi.org/10.1051/itmconf/20182300037
XLVIII Seminar of Applied Mathematics
The value (15) is widely used in practice and referred to
MISE(fˆ ) MSE x fˆ dx
- as the Silverman’s bandwidth or (Silverman’s) rule of
(11)
thumb, and will be used in most of the remainder of the
bias fˆ x dx var fˆ x dx
2
- - paper.
fˆ x f x dx
2
fˆ x f x
ISE(fˆ ) (12) ISE (h)
2
dx
which is also a discrepancy measure used to estimate the fˆ 2 x dx 2 fˆ x f x dx (16)
magnitude of the smoothing parameter.
f 2 x dx
4 Methods for calculating optimum
value of smoothing parameter
The last part of the expression (16) does not depend
The choice of the optimal smoothing parameter is based,
on the estimator fˆ ( x) (it is a constant), therefore the
i.a., on formulas that minimize the criterion functions
discussed above, mainly ISE [27], MISE [28] and choice of the smoothing parameter (in the sense of
AMISE [11, 15, 29–32]. minimizing ISE) will correspond to the choice of
Many other methods for calculating the smoothing h which minimizes the function
parameter are available in the relevant literature; many
of them are available also through statistical software.
Two methods are described below one for the
fˆ
R fˆ
2
x dx 2 fˆ x f x dx
(17)
K x, x K x, x dx
1
LSCV h i j
n2 i , j
Silverman [11] believes that the value (13) smoothes (20)
non-unimodal distributions too much, and as one of the
K xi , x j
2
remedies proposes a slightly reduced value of the n(n 1) i j i
smoothing parameter (13):
Least squares cross-validation is also referred to as
IQR 1/5 unbiased cross-validation 26.
h 0.9 min ˆ , n
1.34
(15) Unfortunately, the LSCV method also has
drawbacks: the variance of the obtained smoothing
4
ITM Web of Conferences 23, 00037 (2018) https://doi.org/10.1051/itmconf/20182300037
XLVIII Seminar of Applied Mathematics
5
ITM Web of Conferences 23, 00037 (2018) https://doi.org/10.1051/itmconf/20182300037
XLVIII Seminar of Applied Mathematics
however, as Silverman ([11], p. 141) conclude: "It may If the amount of the probability leakage cannot be
be futile to expect very high power from procedures disregarded, one of the remedies is to logarithmize the
aimed at such broad hypotheses as unimodality and data and apply the kernel estimation to such data. If pdf
multimodality". Nevertheless, the kernel estimation is of logarithmized data is gˆ( x) the following recalculation
a good method for an initial stage of the planned study should be used:
on probability distribution.
When the variable under study is nonnegative, it may 1
happen that kernel estimate exhibits an undesirable case: fˆ ( x) gˆ (ln( x)) (21)
x
probability leakage below zero. It occurs when a part of
the sample lies near zero and the magnitude of the Fig. 11(b) shows the result. The leakage has been
smoothing coefficient enables such crossing in removed; unfortunately, the second mode disappeared
a considerable amount. Four such cases are presented in although certain suggestion of non-unimodality has
Fig. 10. remained visible in the heaviness of the right tail.
Another remedy is to use an asymmetric kernel
shown in Fig. 11(c). This approach shows the bimodality
revealed in Fig. 11(a). In terms of cumulative
distribution function (Fig. 11(d)), log transformation and
asymmetric kernel approach are almost equivalent.
6
ITM Web of Conferences 23, 00037 (2018) https://doi.org/10.1051/itmconf/20182300037
XLVIII Seminar of Applied Mathematics
7
ITM Web of Conferences 23, 00037 (2018) https://doi.org/10.1051/itmconf/20182300037
XLVIII Seminar of Applied Mathematics