International Journal of Pure and Applied Mathematics
Volume 117 No. 13 2017, 117-126
ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version)
url: http://www.ijpam.eu
Special Issue
ijpam.eu
Zero-Inflated Negative Binomial-Sushila
Distribution and Its Application
K.M.Sakthivel1 , C.S. Rajitha2 and K.B.Alshad3
1,2,3
Department of Statistics, Bharathiar University,
Coimbatore-641046, Tamilnadu, India.
1
[email protected] Abstract
In statistics literature, there is significant study of mixtures and com-
pound probability distributions used for count model especially for the data
contains excess zeros. In this paper, we introduce a new probability distri-
bution which is obtained as a compound of zero-inflated negative binomial
(ZINB) distribution and Sushila distribution and it is named as zero-inflated
negative binomial-Sushila (ZINB-S) distribution. It can be used as an alterna-
tive and effective way of modeling over dispersed count data. The probability
mass function (PMF) and some vital characteristics of ZINB-S distribution
are derived. MLE method is employed for estimating the model parameters.
Further the example is given to show that ZINB-S provides better fit compare
to traditional models for over dispersed count data.
AMS Subject Classification: 2010: 62F 10; 62P 05
Key Words and Phrases: Negative binomial-Sushila distribution, Method
of maximum likelihood, Zero-inflated distribution, Poisson distribution, Neg-
ative binomial distribution
1 Introduction
Modeling of count data plays a vital role in many research areas, such as public
health, epidemiology and insurance etc. And one of the commonly used models for
modeling the count data is Poisson model and it assumes that mean and variance
of the model are equal. But in many applications this restriction is violated due to
the over dispersed count data. Hence researchers utilized negative binomial model
and Generalized Poisson model (Consul and Jain, 1973) for analyzing this type of
data. In addition to the mean parameter this models incorporates one more pa-
rameter known as dispersion parameter, which helps to detect the over dispersion
117
International Journal of Pure and Applied Mathematics Special Issue
or under dispersion in the population. Moreover the causes of over dispersion is
the presence of excess number of zero counts in the data known as zero inflation.
And the concept of zero inflation was first introduced by Neyman (1939) and Feller
(1943). While considering the actuarial applications, the count data often shows
excess number of zero counts. As a result numerous research have been produced
for developing the flexible probability models such as mixed distribution models
and zero-inflated models for modeling this kind of data. And this new models pro-
vides more suitable fitting of over dispersed count data than the conventional count
distribution models such as Poisson and negative binomial models. Zero-inflated
models (Lambert, 1992, Greene, 1994) hold some desirable properties for modeling
the zero-inflated data. These models were formulated as a mixture of two com-
ponents, the zero counts and the positive counts coming from a truncated count
distribution. Some standard zero inflated distributions considered in the actuarial
literature are zero-inflated Poisson(ZIP) model and zero-inflated negative binomial
(ZINB) model (Neelon et al.,2010). Zero inflation can be observed in many other
situations such as manufacturing, road safety, epidemiology etc. Some work can be
found in health insurance (Mouatassim and Ezzahid, 2012) and dental epidemiology
(Bohning et al,1999). Recent studies shows that negative binomial mixture distribu-
tion provides better fit than its base line distributions such as Poisson and negative
binomial distribution. (Zamani and Ismail, 2010; Gomez-Deniz et al.,2008; Wang,
2011). Recently a negative binomial two parameter Lindley distribution (Denthet
et al, 2016) and negative binomial generalized exponential distribution (Aryuyuen
and Bodhisuwan, 2013) was developed to model the count data.
Further Yip and Yao (2005) sufficiently discussed about some of the zero-inflated
count distribution models such as ZIP, ZINB, zero-inflated generalized Poisson
(ZIGP), zero inflated double Poisson (ZIDP) distributions etc. Subsequently Aryuyuen
et al.,(2014) developed a zero inflated negative binomial generalized exponential
(ZINB-GE) distribution for modeling the count data with excess number of zero
counts and showed that it provides better fit compared to the ZIP and ZINB distri-
butions. Saengthong et al.,(2015) formulated a zero inflated negative binomial-Crack
distribution(ZINB-CR) distribution and showed that it provides better fit compared
to ZIP, ZINB and negative binomial crack(NB-CR) distribution.
For modeling count data, various methods for estimation of parameters are used
in the literature. Famoye (1997) used the method of maximum likelihood estimation
(MLE), minimum chi-square (MC), first two moments and proportion of zeros etc
for estimating the parameters of generalized negative binomial distribution (GNBD)
and compares all the estimators through relative efficiencies. For estimating the pa-
rameters of the generalized Poisson distribution (GPD), Wagh and Kamalija (2017)
used both moment estimation (ME) and MLE and showed that ME performs rela-
tively better compared to MLE when sample size is small.
In this paper we introduced a new zero-inflated negative binomial-Sushila (ZINB-
S) distribution as an alternative distribution to the count data with excess number of
zero counts, which is obtained by considering the excess zero counts and zeros coming
from the negative binomial- Sushila (NB-S) distribution in one part of the model
and the positive counts are coming from a truncated NB-S distribution. The NB-S
distribution has recently been developed by Yamrubboon et al., (2017). Negative
118
International Journal of Pure and Applied Mathematics Special Issue
binomial Lindley (NB-L) distribution (Zamani and Ismail, 2010) is a special case of
this distribution. The rest of the paper has been organized as follows. In section 2,
the PMF and graphical representation of the probability mass function of the zero-
inflated negative binomial- Sushila distribution is given. Some characteristics such
as the factorial moments, mean and variance of the ZINB-S are provided in section
3. Section 4 discusses about the parameter estimation of the ZINB-S distribution
using maximum likelihood estimation (MLE) method. Application of the ZINB-
S distribution to the real data set is given in section 5. Section 6 provides the
discussions and conclusions about the study.
2 Zero-inflated Negative Binomial-Sushila Distribution
2.1 Zero-inflated Count Models
The probability mass function of the zero-inflated count model can be written in
the following form
(
π + (1 − π)g(0), if x = 0
P (X = x) = (1)
(1 − π)g(x, Θ), if x = 1, 2, . . .
where X is the count random variable and π is the proportion of the extra zero
counts. g(x, Θ) is the pmf of X with parameter space θ and π represents the zero-
inflation parameter and 0 < π < 1 .
2.2 Negative Binomial-Sushila Distribution
The probability mass function of the NB-S distribution is given below:
x
θ2 x+r−1 X x (θ + α(r + j) + 1)
f (x, r, α, θ) = (−1)j (2)
θ+1 x j=0
j (θ + α(r + j))2
Where x = 0, 1, 2, ...; r, α, θ > 0 .
The mean, variance and factorial moments of the NB-S distribution are given below
θ2 (θ − α + 1)
E(X) = r −1 (3)
(θ + 1)(θ − α)2
r2 δ3 + rδ3 − 2rδ2 r2 δ22
V (X) = − 2 +r (4)
δ1 δ1
where
θ+1 θ−α+1 θ − 2α + 1
δ1 = 2
, δ2 = 2
, δ3 =
θ (θ − α) (θ − 2α)2
119
International Journal of Pure and Applied Mathematics Special Issue
and
Xk
Γ(r + k) k θ2 (θ − (k − j)α + 1)
µk [X] = E[X(X − 1)(X − 2)...(X − k + 1)] = (−1)j 2
(5)
Γ(r) j=0
j (θ + 1)(θ − (k − j)α)
where k = 1, 2, .. and Γ(.) is the incomplete gamma function represented by
Z x
Γ(t) = xt−1 e−x dx, t > 0
0
2.3 Proposed Zero-inflated Negative Binomial-Sushila Dis-
tribution
If X/λ ∼ ZIN B(r, p = e−λ , π), λ ∼ Sushila(α, θ) then the PMF of X can be
obtained as
Z ∞
g(X = x/λ) = p(X = x/λ)f (α, λ, θ)dλ
0
where
(
π + (1 − π)e−λr , ifx = 0
p(X = x/λ) =
x+r−1 −λr −λ x
(1 − π) x e (1 − e ) , ifx =
6 0
The PMF of the ZIN B − S(π, r, α, θ), when x = 0 is given below
θ2 (θ + rα + 1)
g(x, r, α, θ, π) = π + (1 − π)
(θ + 1)(θ + rα)2
The PMF of the ZINB-S distribution, when x 6= 0 can be obtained as
Z ∞
x + r − 1 −λr
g(X = x, r, α, π) = (1 − π) e (1 − e−λ )x f (λ, α, θ)dλ
0 x
x
x+r−1 X θ2 (θ + (r + j)α + 1)
= (1 − π) (−1)j
x j=0
(θ + 1)(θ + (r + j)α)2
Therefore the ZINB-S distribution is given by
θ2 (θ + rα + 1)
π + (1 − π) , when x = 0
g(x, r, π, α, θ) = (θ + 1)(θ + rα)2 (6)
θ2
x+r−1 Px j (θ + α(r + j) + 1)
(1 − π) j=0 (−1) , when x = 1, 2, . . .
(θ + 1) x (θ + α(r + j))2
where x = 0, 1, 2, ...; r, α, θ > 0, 0 < π < 1. When π = 0, ZINB-S distribution
reduces to NB-S distribution and ZINB-S distribution reduces to negative binomial
-Lindley(NB-L) distribution when π = 0 and α = 1. Hence the ZINB-S is a gen-
eralized form of NB-S distribution and NB-L distribution. Figure 1 illustrates the
PMF of ZINB-S distribution with different values of parameters θ and π .
120
International Journal of Pure and Applied Mathematics Special Issue
Figure 1: PMF of ZINB-S distribution with different values of parameters θ and π
3 Characteristics of the ZINB-S distribution
In this section, we introduce some basic characteristic of the ZINB-S distribution.
Theorem 1. X ∼ ZIN B − S(π, r, α, θ), the factorial moments of order k of X
can be written as
Xk
Γ(r + k) k θ2 (θ − (k − j)α + 1)
µk [X] = (1 − π) (−1)j (7)
Γ(r) j=0
j (θ + 1)(θ − (k − j)α)2
We can easily obtain the mean and variance of the ZINB-S distribution from the
factorial moments.
Theorem 2. X ∼ ZIN B − S(π, r, α, θ), the mean and variance of X can be
written as
2
θ (θ − α + 1)
E(X) = (1 − π)r −1 (8)
(θ + 1)(θ − α)2
V (X) = r(1 − π){(r + 1)δ2 − (2r + 3)δ1 − [(1 − π)r(δ1 − 1)2 ]} (9)
where
θ2 (θ − α + 1) θ2 (θ − 2α + 1) θ2 (θ − 3α + 1)
δ1 = , δ2 = , δ3 =
(θ + 1) (θ − α)2 (θ + 1) (θ − 2α)2 (θ + 1) (θ − 3α)2
4 Parameter Estimation
This section provides the parameter estimation of the ZINB-S distribution using the
method of maximum likelihood estimation. We define an indicator function
(
1, if x = 0
I(x) = (10)
0, if x ∈ {1, 2, . . .}
Then the likelihood function of the ZINB-S distribution can be written as follows
121
International Journal of Pure and Applied Mathematics Special Issue
Yn x
θ2 (θ + rα + 1) θ2 x+r−1 X x
L= {I[π + (1 − π) 2
] + (1 − I)[(1 − π) (−1)j
i=1
(θ + 1)(θ + rα) (θ + 1) x j=0
j
(θ + α(r + j) + 1)
]}(11)
(θ + α(r + j))2
Take
θ2 (θ + rα + 1)
P = π + (1 − π) ,
(θ + 1)(θ + rα)2
and
x
θ2 x+r−1 X x (θ + α(r + j) + 1)
Q = (1 − π) (−1)j
(θ + 1) x j=0
j (θ + α(r + j))2
Then the log likelihood function is in the form
n
X
logL = log(IP + (1 − I)Q) (12)
i=1
The partial derivatives of the log likelihood function are obtained by differentiating
logL with respect to the parameters r, α, θ, π as follows
n
∂logL X 1 (θ + rα)(θ + 1)(3θ + 2rα + 2) − θ(θ + rα + 1)(3θ + rα + 2)
= {I(1 − π)θ[ ]
∂θ i=1
IP + (1 − I)Q (θ + 1)2 (θ + r)3
x
x+r−1 X
+(1 − I)(1 − π)[ (−1)j
x j=0
θ[(θ + 1)(θ + α(r + j)(3θ + 2α(r + j) + 2) − θ(θ + α(r + j) + 1)(3θ + α(r + j) + 2)]
]}
(θ + 1)2 (θ + α(r + j))3
n x
∂logL X 1 θ2 α(θ + rα + 2) θ2 Γ(x + r) X x
= {I[−(1 − π) 3
] + (1 − I)(1 − π) [α
∂r i=1
IP + (1 − I)Q (θ + 1)(θ + rα) θ + 1 Γ(r) j=0 j
Xx 0 0
j+1 (θ + α(r + j) + 2) x j (θ + α(r + j) + 1) Γ(r)Γ(x + r) − Γ(x + r)Γ(r)
(−1) + (−1) [ ]]}(
(θ + α(r + j))3 j=0
j (θ + α(r + j))2 (Γ(r)) 2
n x
∂logL X 1 θ2 (θ + rα + 1) θ2 x+r−1 X x
= {I[1 − 2
]−[ (−1)j
∂π i=1
IP + (1 − I)Q (θ + 1)(θ + rα) (θ + 1) x j=0
j
(θ + α(r + j) + 1)
]}(15)
(θ + α(r + j))2
122
International Journal of Pure and Applied Mathematics Special Issue
∂logL Pn 1 rθ2 (θ + rα + 2)
= i=1 {−I(1 − π)[ ]
∂α IP + (1 − I)Q (θ + 1)(θ + rα)3
θ2 x+r−1
+ (1 − I)(1 − π)[ (−1)j+1
θ+1 x
(r + j)(θ + α(r + j) + 2)
]} (16)
(θ + α(r + j))3
The parameters of ZINB-S distribution are obtained by solving the above mentioned
partial derivative equations using R software.
5 Application
Consider a real data set which provides information on 9,461automobile insurance
policies and it was taken from Zamani and Ismail (2010), it consists of the number
of accidents per each policy. Table 1 provides the estimation of the observed and
expected frequencies of Poisson, negative binomial, NB-S and ZINB-S distributions
and comparison performance of these models in terms of the log-likelihood, p-values
and chi-square statistic. Further, it is concluded that the ZINB-S distribution fits
better than Poisson, negative binomial and NB-S distribution.
Table 1: Fitting of different distributions
Number of Accidents Number of policies Poisson NB NB-S ZINB-S
0 7840 7638.3 7843.3 7846.3 7840.0
1 1317 1634.6 1290.2 1299.2 1320.3
2 239 174.9 257.7 244.4 227.8
3 42 12.5 54.5 52.97 51.7
4 14 0.7 11.8 12.99 14.2
5 4 0 2.6 3.54 4.5
6 4 0 0.6 1.05 1.6
7 1 0 0.2 0.34 0.6
8+ 0 0 0.1 0.11 0.3
λ̂ =0.214 r̂=0.70 r̂=2.0180 r̂=1.4210
Parameter Estimates p̂=0.765 α̂=0.0100 θ̂=9.8729
θ̂=0.1881 α̂=0.8552
π̂=0.4450
293.8 8.65 5.898 0.7078
Chi-square
<0.01 0.07 0.43 0.8714
p -Value
-5490.78 -5348 -5344 -5330.87
Log Likelihood
123
International Journal of Pure and Applied Mathematics Special Issue
6 Conclusion
In this paper, the zero-inflated negative binomial-Sushila distribution is introduced.
We have obtained basic characteristics of the distribution such as factorial moments,
mean and variance. Parameters are estimated by using the method of maximum
likelihood estimation. And the efficiency of the ZINB-S distribution compare to
other conventional probability distributions is illustrated with suitable application
by using a real data set.
References
[1] S.Aryuyuen, W.Bodhisuwan, The negative binomial-generalized exponential
(NB-GE) distribution, Applied Mathematical Sciences, 7 (2013), 1093- 1105.
doi:10.12988/ams.2013.13099.
[2] S. Aryuyuen, W.Bodhisuwan, T.Supapakorn, Zero-inflated negative binomial-
generalized exponential distribution and its applications, Songklanakarin Jour-
nal of Science and Technology, 36 (2014), 483-491.
[3] D. Bohning, E. Dietz, P. Schlattmann, L. Mendonca, U. Kirchner, The Zero-
inflated Poisson model and the decayed, missing and filled teeth index in dental
epidemiology, Journal of the Royal Statistical Society, Series A, 162(1999), 195-
209.
[4] P. C. Consul, G. C. Jain, A generalization of the Poisson distributions, Tech-
nometrics, 15(1973), 791-799.
[5] S.Denthet, A.Thongateeraparp , W.Bodhisuwan, Mixed distribution of negative
binomial and two-parameter Lindley distributions, 12th International Confer-
ence on Mathematics, Statistics and Their Applications (ICMSA), Banda Aceh,
Indonesia (2016).
[6] F. Famoye, Parameter estimation for generalized negative binomial distribu-
tion,Communications in Statistics-Simulation and Computation, 26(1997), 269-
279.
[7] W. Feller, On a general class of contagious distributions, Annals of Mathemat-
ical Statistics, 14(1943), 389-400.
[8] E. Gomez-Deniz, J.M. Sarabia , E. Calderin-Ojeda, Univariate and multivariate
versions of the negative binomial-inverse Gaussian distributions with applica-
tions, Insurance. Mathematics and Economics, 42 (2008), 39-49.
[9] W. Greene, Accounting for excess zeros and sample selection in Poisson and
negative binomial regression models, Working Paper EC- 94-10, Department of
Economics, New York University (1994).
[10] D.Lambert, Zeroinflated Poisson regression with an application to defects in
manufacturing, Technometrics, 34 (1992), 1-17.doi: 10.2307/1269547
124
International Journal of Pure and Applied Mathematics Special Issue
[11] Y.Mouatassim, E. H. Ezzahid, Poisson regression and Zero-inflated Poisson
regression: Application to private health insurance data, European Actuarial
Journal, 2(2012), 187-204.
[12] J. Neyman, On a new class of contagious distributions applicable in entomology
and bacteriology, Annals of Mathematical Statistics, 10(1939), 35-57.
[13] B. Neelon, A.J. OMalley, S.L.T. Normand, A Bayesian model for repeated
measures zero-inflated count data with application to psychiatric outpatient
service use, Statistical Modelling, 10 (2010), 421-439.
[14] P. Saengthong1, W. Bodhisuwan, A. Thongteeraparp, The zero-inflated neg-
ative binomial Crack distribution: some properties and parameter estima-
tion,Songklanakarin Journal of Science and Technology, 37 (2015), 701-711.
[15] Y. S. Wagh, K. K. Kamalja, Comparison of methods of estimation for param-
eters of Generalized Poisson distribution through simulation study, Communi-
cations in Statistics-Simulation and Computation, 46(2017), 4098-4112.
[16] Z. Wang, One mixed negative binomial distribution with application,J. Stat.
Plann. Infer,141 (2011), 1153-1160.
[17] K.C.H. Yip,K.K. W. Yau, On modeling claim frequency data in general in-
surance with extra zeros, Insurance: Mathematics and Economics, 36 (2005),
153-163.
[18] D. Yamrubboon, W. Bodhisuwan, C. Pudprommarat, L. Saothayanun, The
negative binomial-Sushila distribution with application in count data analysis.
Thailand Statistician, 15 (2017), 69-77.
[19] H. Zamani, N. Ismail, Negative binomial-Lindley distribution and its applica-
tion, Journal of Mathematics and Statistics, 6 (2010), 4-9.
125
126