1dejene Tsegaye 2018
1dejene Tsegaye 2018
This is to certify that the thesis prepared by Dejene Tsegaye Ayane, titled: Automatic Plant
Species Identification Using Image Processing Techniques and submitted in partial
fulfillment of the requirements for the Degree of Master of Science in Computer Science
complies with the regulations of the University and meets the accepted standards with respect
to originality and quality.
Examiner:
Examiner:
ABSTRACT
Plants are one of the important things that plays a very essential role for all living beings
exists on earth. Plants form a fundamental part of life on Earth, providing us with
breathable oxygen, food, fuel, medicine and more. Plants also help to regulate the climate,
provide habitats and food for insects and other animals. But due to unawareness and
environment deterioration, many plants are at the verge of extinction. Understanding of
plant behavior and ecology is very important for human being and the entire planet.
Plants possess unique features in their leaf that distinguish them from others. Taxonomists
use these unique features to classify and identify plant species. However, there is a
shortage of such skilled subject matter experts, as well as a limit on financial resources.
Several leaf image based plant species identification methods have been proposed to
address plant identification problem. However, most methods are inaccurate. Invariant
moments that are used for leaf shape features extraction are inadequate. Hu moments are
inadequate when leaves from different species have very similar shape. The computation
of Zernike moments involve discrete approximation of its continuous integral term which
result in loss of information. Hence, it is extremely important to look for an improved
method of plant species classification and identification using image processing
techniques.
In this work a new method based on combined leaf shape and texture features using a
class of ensemble method called Random Forest for the classification and identification
of plant species has been proposed. Morphological features and Radial Chebyshev
moments are extracted from the leaf shape and Gabor filters are extracted from leaf
texture. These three features are combined, important features are selected to form a
feature set that trained the Random Forest classifier.
The Random Forest was trained with 1907 sample leaves of 32 different plant species that
are taken form Flavia dataset. The proposed approach is 97% accurate using Random
Forest classifier.
I would also like to thank my classmates and friends who provided valuable assistance and
support.
Finally, I am deeply grateful to my family for their endless support and encouragement.
TABLE OF CONTENTS
: INTRODUCTION ...................................................................................9
1.5 Methods....................................................................................................................12
: EXPERIMENTATION .........................................................................63
iii
5.4.1. Evaluation Techniques .............................................................................................68
References .........................................................................................................................79
iv
LIST OF TABLES
Table 5.2: Test result for training model using morphological features ............................. 70
Table 5.3: Test result for training model using Chebyshev moments ................................. 71
Table 5.4: Test result for training model using Gabor texture features ............................... 72
Table 5.5: Test result for training model using fused features ............................................ 73
Table 5.6: Test result of training model using selected features ......................................... 75
v
LIST OF FIGURES
Figure 2.1: Different parts of plant as illustrated in [12] ..................................................... 15
Figure 2.2:Simple leaf parts ................................................................................................ 16
Figure 2.7: Relationship between physiological width and height of plant leaf ................. 19
Figure 4.3: Grayscale leaf image after applying weighted averaging method .................... 50
Figure 4.5: Binary leaf image obtained after applying Otsu thresholding method ............. 51
Figure 4.7: Gabor filter output for different orientations and frequency values ................. 58
Figure 4.8: Sample Leaf image before Gabor filter is applied ............................................ 58
vii
ACRONYM AND ABBREVIATION
Plant species identification is the finding of the correct name for an unknown plant in
order to access all the information so far available about that plant. Plants have many
characteristics that can be used to identify particular species: overall size and shape; the
color, size and shape of leaves; the texture, color and shape of twigs and buds; and the
color and texture of bark, fruit and flowers. Growing range is also useful in identifying
plant species. Most people use several of these characteristics to identify a specific plant
[2, 3].
There are a number of methods for plant species identification. Use of pictures and
illustrations, identification keys in botanical books and floras and asking experts are
among others [4].The image-based identification of different species of plants that both
botanical scientists and expert users have collected has become a key study among plant
biology science [5].
Herbaria are also used for identifying plant species. One such herbarium is the National
Herbarium of Ethiopia. This herbarium is the sole Laboratory for researches to identify
and study the plant biodiversity of Ethiopia. The herbarium houses over 80,000 plant
specimens collected from all over Ethiopia and Eritrea(and also from Somalia, Kenya,
Tanzania, Uganda and other countries as well). The Herbarium, in addition to housing
well authenticated plant specimens in two rooms has copies of eight volumes (ten books)
of the Flora of Ethiopia and Eritrea [6].
9
Although there are well known methods of plant species identification in plant science,
the methods are tedious and exhausting which demand for an automatic species
identification method. Hence, this research is to address plant species identification
problem using image processing techniques.
1.2 Motivation
Plant species identification is a tedious task. There are huge numbers of specimen in the
National Herbarium of Ethiopia which will serve for plant species identification. The
books (the Flora of Ethiopia; ten books in eight volumes) are also the key reference for
plant species identification. However, Researchers spent considerable time to search and
identify specimen they brought from different places at the herbarium and referring to
the books to identify the plant is cumbersome and time taking. To greatly speed up the
process of plant species identification, there should be a solution that eases the researches
plant species identification burden and save their precious time. One such solution is the
use of leaf image based plant species identification.
Leaf image based plant species identification method will greatly simplify the
identification process by allowing a user to search through the species image data using
models that match images of newly collected specimens with images of those previously
discovered and described.
10
example, there are huge numbers of specimen (over 80,000) collections at the National
herbarium of Ethiopia. But, these collections are not easily accessible as there are no
visual means of specimen identification.
Other significant sources of knowledge include flora, taxonomic keys and monographs.
Identifying the name of a plant using these sources of knowledge is time consuming and
requires an experienced taxonomist. Moreover, there are a number of plant species in a
given family that complicates identification of species. For example, there are twenty
two endemic plants of Ethiopia with the same family name Euphorbiaceae [7].
1.4 Objective
General objectives
The general objective of this research is to develop an automatic plant species
identification system using image processing techniques.
Specific objectives
The specific objectives of the research are to:
11
1.5 Methods
The following methods will be applied to achieve the objectives of this research work.
Literature Review
Different literature on image processing techniques and identification of plant species
will be reviewed. Articles, thesis and conference papers that are related to the research
topic will also be examined.
Data collection
The publically available Flavia [8] dataset will be used for modeling the automatic plant
species identification system. Attributes (local plant species name, scientific name, etc.)
of each plant species will also be captured as additional data for easy description of each
plant species.
Evaluation
The model designed for identification of plant species will be evaluated for its accuracy,
precision, recall, f-score and support [9].
12
1.7 Application of Results
Results of this method can generally be applied in areas where plant species studies being
conducted. The result of this research benefits taxonomists, botanists, agriculturalists,
environmentalists etc. Different institutions that can research on plant species are also
among the candidate beneficiaries of this study.
The application of such a method has advantage for the plant species identification as it
greatly simplifies the process of plant species identification. Some of the advantages are:
It provides reliable, faster and convenient plant species identification;
It reduces the time that the researches will have to spend in the field and
herbarium to identify and classify plant species; and
It automatically identify and classify plant species.
13
: LITERATURE REVIEW
2.1 Introduction
This chapter is concerned with plant identification, images and image processing
techniques. The main goal of this research is to build a plant species identification model
that classifies leaves and assigns the names of the plant that they belong to, when
supplied only with the leaf image as input. This goal is met through a series of sequential
steps, namely, image preprocessing, segmentation, extraction of features, fusion of
features, feature selection, classification and identification of plant species. Several
researchers have contributed and proposed various algorithms in each of these steps.
This chapter presents a review of some important publications made in these steps to
understand the current research status.
14
A typical plant body consists of different parts as illustrated in Figure 2.1
The plant body consists of two major organ systems, namely, shoot system and root
system. The shoot systems exist above the ground and include organs like buds, leaves,
fruits, flowers and seeds.
Leaf structure: Leaves of most plants have a flat structure called the lamina or blade,
but all leaves are not flat, some are cylindrical. Leaves can be simple, with a single leaf
blade, or compound, with several leaflets. Figure 2.2 and Figure 2.3 show simple and
compound leaves respectively [13].
15
Figure 2.2:Simple leaf parts Figure 2.3: Compound leaf
Edges (or margin) of leaf is structure of leaf at its boundary. The edges (or margin) of
leaves may be smooth, toothed, lobed, incised, or wavy. Figure 2.4 shows the different
leaf margins.
Arrangement of a leaf: it refers to how leaf grows on the stem. Some leaves grow
opposite, some alternate, some in rosette forms and others in whorls. The different leaf
arrangement are illustrated in Figure 2.5.
16
Venation is the imprinted veins in the leaf surface. Venation may be parallel,
dichotomous (forming a “Y”), palmate (emanating out from a central point), pinnate
(where the veins are arrayed from the midrib).
The texture of the leaf is another aspect to consider during plant leaf identification. Leaf
texture can be firm and waxy, shiny, thick, stiff, limp, etc. It is easy to identify whether
the leaf has sticky glands, prickly thorns, or fine hairs. The different leaf types based on
their form, shape venation, margins and arrangement of leaf on the stem are summarized
and shown in Figure 2.6 below [16].
Plant classification : is the placing of known plants into groups or categories to show
some relationship within the category. Scientific classification follows a system of rules
that standardizes the results, and groups successive categories into a hierarchy. In this
hierarchy, kingdom is the broadest and species is the most specific category. As an
example, olive classification is depicted in Table 2.1 [17].
17
Table 2.1: Classification of olive according to classical taxonomy
Leaf shape is valuable for plant species identification. Plant organs such as flowers and
fruits are seasonal in nature and root and stem characteristics are inconsistent. On the
other hand, leaves are present for several months and they also contain taxonomic
identity of a plant which is useful for plant classification. Moreover, plant leaves are two
dimensional in nature while flowers and fruits are three dimensional and are not suitable
for machine processing compared to plant leaves. Hence, a plant’s leaf shape is the most
discriminative feature and classification of plants based on leaves is the quickest and
easiest way to identify a plant [10, 19].
18
2.2.2 Digital Morphological features
The digital morphological features generally include basic geometric features and
morphological features [20]. The features are computed from the contour of leaf. The
basic geometrical features of a leaf include longest diameter, physiological length,
physiological width, leaf area and leaf perimeter. The basic geometrical features of a leaf
are defined below.
Longest Diameter: it is longest distance between any two points on the contours
of leaf. It is denoted as D.
Physiological length: It is the distance between two terminals of a leaf. It is
denoted as LP.
Physiological Width: It is the longest distance orthogonal to physiological
length. We consider two lines are orthogonal if their degree is 900. It is denoted
as WP.
Leaf Area: Smoothed leaf image is considered to find out leaf area. Number of
pixels having binary value 1 is termed as leaf area. It is denoted as A.
Leaf Perimeter: Leaf Perimeter is calculated by counting the number of pixels
consisting leaf margin. It is denoted as P.
Figure 2.7: Relationship between physiological width and height of plant leaf
Based on above five basic geometric features, it is possible to define the following
commonly used morphological features.
19
1. Smooth factor: It is defined as the ratio between area of leaf image smoothed by
5 × 5 rectangular averaging filter and area of leaf image smoothed by 2× 2
rectangular averaging filters.
2. Rectangularity: It is defined as the ratio between physiological length (PL),
physiological width (PW) and leaf area (A). Thus, (PL* PW)/A.
3. Aspect ratio: It is defined as the ratio between physiological length and
physiological width. Thus, LP/ WP.
4. Perimeter ratio of diameter: It is defined as the ratio between perimeter (P) and
diameter (D). Thus, P/D.
5. Form factor: It is defined as 4πA/P. where A is area of leaf and P is perimeter of
the leaf margin.
6. Narrow factor: It is defined as the ratio between diameter and physiological
length. Thus, D / LP.
7. Perimeter ratio of physiological length and physiological width: It is defined as
the ratio between perimeter and sum of physiological length and physiological
width. Thus, P / (PL + PW).
20
A digital image is an array of real numbers represented by a finite number of bits. The
basic definition of image processing refers to processing of digital image, i.e. removing
the noise and any kind of irregularities present in an image. The noise or irregularity may
creep into the image either during its formation or during transformation etc. For
mathematical analysis, an image may be defined as a two dimensional function f(x, y)
where x and y are spatial (plane) coordinates, and the amplitude of f at any pair of
coordinates (x, y) is called the intensity or gray level of the image at that point. When x,
y, and the amplitude values of f are all finite, discrete quantities, we call the image a
digital image. It is very important that a digital image is composed of a finite number of
elements, each of which has a particular location and value. These elements are called
picture elements, image elements, and pixels [22].
There are different types of digital images [23]. These are binary, grayscale and RGB
images. Binary image is the simplest type of image and has two values, black and white
or ‘0’ and ‘1’. The binary image is referred to as a 1 bit image because it takes only one
binary digit to represent each pixel.
RGB Image: RGB image does not use a color map and an image is represented by three
color component intensities such as red, green, and blue. RGB image uses 8-bit
monochrome standard and has 24 bit where 8 bit for each color (red, green and blue).
21
2.3.1. Image Acquisition
The leaf images can be acquired using a digital camera which is also embedded in our
cell-phone. There is no restriction on resolution and image format. Generally scanned
image or digital image which is two dimensional in nature and RGB format is used for
image processing. In some cases we can use binary and grayscale images as well.
However, the image background needs to be cleaned and single colored using image
segmentation [24].
22
Point Processing Operation: it is the most basic operations in image processing where
each pixel value is replaced with a new value obtained from the old one. Point processing
operations take the form shown in equation (2.1)
g(x, y) =T[f(x, y)] (2.1)
Where, f(x, y): input image, g(x, y): output image and T is the transformation function
or point processing operation.
Smoothing filters: smoothing filters are used to reduce noise or prepare the image for
further processing such as segmentation. There are different type of smoothing filters
[27]. A simple mean smoothing filter or operation intends to replace each pixel value in
an input image by the mean (or average) value of its neighbors, including itself. This has
an effect of eliminating pixel values that are unrepresentative of their surroundings. This
filter is based around a kernel, which represents the shape and size of the neighborhood
to be sampled in calculation. Often a 3 x 3 square kernel is used, as shown in Figure 2.8,
although larger kernels (e.g., a 5 x 5 square) can be used for more severe smoothing.
23
Figure 2.8: An example of applying the 3x3 averaging kernel
Gaussian filter: this method of image smoothing convolves an input image by the
Gaussian filter. The Gaussian filter will screen noise with the high spatial frequencies
and produce a smoothing effect. In two dimensions, an isotropic (i.e., circularly
symmetric) Gaussian filter function is given by equation (2.3).
(𝑥2 +𝑦2 )
1 −
2𝑒
G(x, y) = 2σ2 (2.3)
√2𝜋𝜎
Median filter: it is another smoothing filter that is used to reduce noise in an image,
somehow like a mean filter. However, it performs better than a mean filter in the sense
of preserving useful details in the image. It is especially effective for removing impulse
noise, which is characterized by bright and or dark high-frequency features appearing
randomly over the image. Statistically, impulse noise falls well outside the peak of the
distribution of any given pixel neighborhood, so the median is well suited to learn where
impulse noise is not present, and hence to remove it by exclusion.
24
Threshold Based Image Segmentation: This method partition an input image into
pixels of two or more values through comparison of pixel values with the predefined
threshold value T individually [29]. It transforms an input image to segmented binary
image. The three commonly known threshold algorithms includes global thresholding,
Local thresholding and Adaptive thresholding [30].
Local Thresholding: - A single threshold will not work well when we have uneven
illumination due to shadows or due to the direction of illumination. In local
thresholding, the idea is to partition the image into m x n sub images and then choose
a threshold.
25
Region Based Image Segmentation: The region based segmentation is partitioning of
an image into similar/homogenous areas of connected pixels through the application of
homogeneity/similarity criteria among candidate sets of pixels. Each of the pixels in a
region is similar with respect to some characteristics or computed property such as color,
intensity and/or texture [29].
Cluster Based Image Segmentation: It is to partition an image data set into a number
of disjoint groups or clusters. That is, classifying pixels in an image into different clusters
that exhibit similar features. A cluster is therefore a collection of objects which are
“similar” between them and are “dissimilar” to the objects belonging to other clusters
[31].
Canny Edge Detection: The Canny edge detection is a multi-step algorithm that can
detect edges with noise suppressed at the same time. The algorithm is not very
susceptible to noise and it is superior edge detector compared to many of the newer
edge detection algorithms.
Sobel Edge Detection: This method finds edges using the Sobel approximation to
the derivative. It precedes the edges at those points where the gradient is highest. The
Sobel technique performs a 2-D spatial gradient quantity on an image and so
highlights regions of high spatial frequency that correspond to edges. In general it is
used to find the estimated absolute gradient magnitude at each point in n input
grayscale image.
26
In conjecture at least the operator consists of a pair of 3x3 kernels as given in the
tables below. One kernel is simply the other rotated by 90o. This is very alike to the
Roberts Cross operator.
It has two effects, it smooth the image and it computes the Laplacian, which yields
a double edge image. Locating edges then consists of finding the zero crossings
between the double edges. It is generally used to find whether a pixel is on the dark
or light side of an edge. The digital implementation of the Laplacian function is
usually made through the mask below.
27
Pixel values in every point in the output represent the estimated complete magnitude
of the spatial gradient of the input image at that point.
Prewitt Edge Detection: The Prewitt edge detection is proposed by Prewitt in 1970.
To estimate the magnitude and orientation of an edge Prewitt is a correct way. Even
though different gradient edge detection wants a quiet time consuming calculation
to estimate the direction from the magnitudes in the x and y-directions, the compass
edge detection obtains the direction directly from the kernel with the highest
response. It is limited to 8 possible directions; however knowledge shows that most
direct direction estimates are not much more perfect. This gradient based edge
detector is estimated in the 3x3 neighborhood for eight directions. All the eight
convolution masks are calculated. One complication mask is then selected, namely
with the purpose of the largest module.
28
2.2. Feature Extraction
When the pre-processing and the desired level of segmentation have been achieved,
feature extraction technique is applied to the segments to obtain image features. Image
features are those items which uniquely describe an image, such as size, shape,
composition, location etc. Quantitative measurements of object features allow
classification and description of the image [21].
An image descriptor is applied globally and extracts a single feature vector. Feature
descriptors on the other hand describe local, small regions of an image. It is possible to
get multiple feature vectors from an image with feature descriptors. A feature vector is
a list of numbers used to abstractly quantify and represent the image. Feature vectors can
be used for machine learning, building an image search engine, etc. [34]
There are a number of feature extraction techniques available for the extraction of image
features. It is essential to focus on the feature extraction phase as it has an observable
impact on the efficiency of the recognition system. The most commonly used feature
extraction techniques are discussed below.
2.3.1. Moments
Moments are scalar quantities that are used in a variety of applications to describe shape
of an object for recognition of different types of images. Image moments that are
invariant with respect to the transformations of scale, translation, and rotation find
applications in areas such as pattern recognition, object identification and template
matching [35, 36].
29
There are several moment based descriptors available in the literature and of these the
well-known are presented below.
where Ψpq is a continuous function of (x, y) known as the moment weighting kernel or
the basis set.
Geometric moments: Geometrical moment of order (p+ q) for a two-dimensional
discrete function like image is computed by using equation (2.7). If the image has
nonzero values in the finite part of xy plane; then moments of all orders exist for it [37].
∑ ∑ 𝑥 𝑝 𝑦 𝑞 𝑓(𝑥, 𝑦) (2.7)
𝑥=0 𝑦=0
where f(x, y) is image function and M, N are image dimensions. Then, by using (2.8)
geometrical central moments of order equal to (p+ q) can be computed as under:
𝑁−1 𝑀−1
(2.8)
mpq= ∑ ∑ (𝑥 − 𝑥)𝑝 (𝑦 − 𝑦)𝑞 𝑓(𝑥, 𝑦)
𝑥=0 𝑦=0
where xand y are center of gravity of image and are calculated using equation (2.9)
𝑚10 𝑚01
𝑥 = , 𝑦 = (2.9)
𝑚00 𝑚00
For binary image, m00 = μ00 is count of foreground pixels and has direct relation to
image scale. Therefore, central moments can become scale normalized using (2.10).
𝜇𝑝𝑞 𝑝+𝑞
𝜂𝑝𝑞 = 𝑎 ,a= + 1 (2.10)
𝑚00 2
30
Hu Invariant Moments: Based on normalized central moments, Hu (1962) introduced
seven nonlinear functions which are invariant with respect to object's translation, scale,
and rotation. Hu defines the following seven functions, computed form central moments
through order three, that are invariant with respect to object scale, translation and
rotation [38]:
ϕ1 =η20 + η 02
ϕ2 = (η 20-η 02)2+ 4η 112
ϕ3 = (η 30-3η 12) + (3η 21-η 03)2
ϕ4 = (η 30+η 12)2 + (η 21+η 03)2
ϕ5 = (η 30-3η 12)(η 30+η 12)[(η 30+η 12)2- 3(η 21+η 03)2] (2.11)
+(3η12-η03)(η 21+η 03)[3(η 30+η 12)2 - (η 21+η 03)2]
ϕ6 = (η 20-η 02)[(η 30+η12)2- (η 21+η03)2] + 4η 11(η 30+η 12)(η 21+η 03)
ϕ7 = (3η 21-η03)(η 30+η 12)[(η 30+η 12)2 - 3(η 21+η 03)2]
+ 3(η21–η03)(η 21+η 03)[3(η 30+ μ12)2- (η 21+ η 03)2]
Zernike Moments: Zernike moments are obtained from a transformed unit disk space
that allows for the extraction of shape descriptors which are invariant to rotation,
translation, and scale as well as skew and stretch, thus preserving more shape
information for the feature extraction process [39].
Zernike moment of order n and repetition m is defined as for a continuous image function
f(x, y) are as under:
𝑛+1 2𝜋 1
𝑉𝑛𝑚 (𝑥, 𝑦) =
𝜋
∫0 ∫0 𝑓(𝜌, 𝜃)𝑅𝑛𝑚 (𝜌)𝑒 (−𝑗𝑚𝜃) 𝜌𝑑𝜌𝑑𝜃 (2.12)
Using this equation Zernike moments from order 2 to 10 for extracting features can be
computed.
31
Radial Chebyshev moment (RCM): It [40] is a discrete orthogonal moment that has
distinctive advantages over other moments for feature extraction. Unlike invariant
moments, its orthogonal basis leads to having minimum information redundancy, and its
discrete characteristics explore some benefits over Zernike moments (ZM) due to having
no numerical errors and no computational complexity owing to normalization. The radial
Chebyshev moment of order p and repetition q for an image of size N x N with m = (N/2)
+ 1 is defined in equation (2.13)
1 2𝜋
Spq = 2𝜋𝜌(𝑝,𝑚) ∑𝑚−1
𝑟=0 ∑𝜃=0 𝑡𝑝 (𝑟)𝑒
−𝑗𝑞𝜃
𝑓(𝑟, 𝜃) (2.13)
where tp(r) is an orthogonal basis Chebyshev polynomial function for an image of size
NxN:
t0(x) = 1
2x−N+1
t1(x) = N
(p−1 )2
(2p−1)t1 (x)tp−1 (x)−(p−1){1− }tp−2 (x)
N2
tp(x) = P
ρ(p,N) is the squared – norm:
1 22 2𝑝
𝑁(1− 2 )(1− 2 )(1− 2 )
𝑁 𝑁 𝑁
ρ(p,N) = 2𝑝+1
p= 0, 1, N -1, m = (N/2) + 1
2.3.2.Texture Features
Textures are characteristic intensity variations that typically originate from roughness of
object surfaces that are capable of representing the content of many real-world images.
In leaf based plant species identification, texture features capture the internal vein
structure of leaf image. Texture features can be extracted by using various methods.
Gray-level co-occurrence matrices (GLCMs), Gabor Filter, Histogram of Oriented
Gradients and Local binary pattern (LBP) are examples of popular methods to extract
texture features [41].
32
Grey-level co-occurrence matrices (GLCM) Texture: Grey-level co-occurrence
matrices (GLCM) have been successfully used for deriving texture measures from
images. This technique uses a spatial co-occurrence matrix that computes the
relationships of pixel values and uses these values to compute the second-order statistics.
The GLCM approach assumes that the texture information in an image is constrained in
the overall or “average” spatial relationships between pixels of different grey level.
For texture feature extraction, the following four measures are commonly used from the
gray level co-occurrence matrix of the gray scaled image: contrast, correlation, energy
and homogeneity [42].
Energy: one approach to generating texture features is to use local kernels to detect
various types of texture. After the convolution with the specified kernel, the texture
energy measure (TEM) is computed by summing the absolute values in a local
neighborhood:
𝑚 𝑛
If n kernels are applied, the result is an n-dimensional feature vector at each pixel of the
image being analyzed.
𝑃𝑑 [𝑖, 𝑗]
Ch = ∑ | ∑ 1 + |𝑖 + 𝑗| (2.15)
𝑖 𝑗
Where the range of gray levels is small the P[i,j] will tend to be clustered around the
main diagonal. A heterogeneous image will result in an even spread of P[i,j]’s.
33
If there is a large amount of variation in an image the P[i,j]’s will be concentrated away
from the main diagonal and contrast will be high (typically k=2, n=1).
𝜇𝑗 = ∑ 𝑗𝑃𝑑 [𝑖, 𝑗]
𝑗
𝜎𝑖 2 = ∑ 𝑖 2 𝑃𝑑 [𝑖, 𝑗] − 𝜇𝑖 2
𝑖
𝜎𝑗 = ∑ 𝑗 2 𝑃𝑑 [𝑖, 𝑗] − 𝜇𝑗 2
2
𝑗
Gabor Filter: The term Gabor filter has been coined after the name of Dennis Gabor,
who in the year 1946 experimented and subsequently proposed the representation of the
signals. In image processing tasks, Gabor filters have been extensively been used for
feature extraction for the digital leaf images due to their nature of spatial locality,
orientation selectivity and frequency characteristic. The frequency and orientation
representation as used in Gabor filters, are useful for texture representation and
discrimination and the same concept is used in human visual system. The Gabor features
are invariant to illumination, rotation, scale and translation. The Gabor filters have
several advantages in feature extraction process over other techniques such as Gray
Level Co-occurrence Matrix (GLCM). The Gabor feature vectors can be used directly
as input to a classifier. A two-dimensional Gabor function g(x, y) consists of a sinusoidal
plane wave of some frequency and orientation (carrier), modulated by a two dimensional
translated Gaussian envelope [43, 44, 45]. The Gabor filter is defined as in equation
(2.18)
34
𝑥 ,2 + γ2 𝑦 ,2 𝑥,
𝑔 (x, y; λ, θ, ψ, σ ,γ) = exp(− )exp(𝑖(2𝜋 + ψ)) (2.18)
2σ2 λ
𝑦 , = -x sin θ + y cos θ
λ: represents the wavelength of the sinusoidal factor,
θ: represents the orientation of the normal to the parallel stripes of a Gabor
function,
ψ : phase offset,
σ: Sigma or standard deviation of the Gaussian envelope ,
γ: spatial aspect ratio
The Gabor transform of an image R(x,y) is defined as the convolution of a Gabor filter
𝑔 (x, y) with image I(x,y)
R(x, y) = 𝑔 (x, y) * I(x, y) = ∑𝑀−1 𝑁−1
𝑚=0 ∑𝑛=0 𝑔(𝑚, 𝑛).I(x-m, y-n) (2.19)
where * denotes two-dimensional linear convolution and M and N are the size of the
Gabor filter mask.
The following six texture features are extracted using Gabor filter:
a) Mean : it is defined as given in equation (2.20) below :
1
μ(𝑠,𝜃) = ∑𝑁 𝑀
𝑋=1 ∑𝑦=1 𝐺(𝑠,𝜃)(𝑥,𝑦) (2.20)
𝑁𝑀
b) Energy: The local energy of the filtered image E(x, y) is obtained by computing the
absolute average deviation of the transformed values of the filtered images from the
mean µ within a window W of size MxMy. The texture energy E(x, y) is given by
the equation (2.21):
1
E(x, y) = ∑(𝑎,𝑏 ∈𝑤) |𝑅(𝑎, 𝑏) − μ| (2.21)
𝑀
c) Standard Deviation: The standard deviation is computed as given in Equation
(2.22).
𝟏
𝝈(𝒔,𝜽) = √ (∑𝑵
𝒙=𝟏 ∑𝒚 𝑮(𝒔,𝜽)(𝒙,𝒚) − μ(𝑠,𝜃) )
𝑴 𝟐 (2.22)
𝑵𝑴
35
d) Skewness : It is the measure of asymmetry and is denoted by γ , it can be positive
which means that the distribution tends towards right and if it is negative when the
distribution tends towards left and is represented by Equation (2.23)
3
𝜇(𝒔,𝜽)
𝛾(𝒔,𝜽) = 3 (2.23)
𝜎(𝒔,𝜽)
For example, if we consider grayscale image f(x,y) of size 512x512 with horizontal
kernel DX= [-1 0 1] and vertical kernel DY =[-1 0 1]T, Thus, the horizontal and vertical
gradients are given by Equation (2.26) and (2.27) respectively.
36
2.3. Feature Fusion
The aim of the feature fusion technique is to combine many independent (or
approximately independent) features to give a more representative features for leaf
images that increase the accuracy of identification. The features are combined by
concatenating into one feature set.
Feature fusion technique has two problems. The first problem is the compatibility of
different features; i.e. the features may be in different ranges of numbers. Thus, the
features must be normalized to same range of numbers. There are different normalization
methods such as ZScore, Min-Max, and Decimal Scaling. Zscore normalization method is
the most common and simplest method and it is used to map the input scores to
𝑓𝑖 −µ𝑖
distribution with mean of zero and standard deviation of one as follows, xi = , where
𝜎𝑖
fi represents the ith feature vector, µi and σi are the mean and standard deviation of the ith
vector, respectively, xi is the ith normalized feature vector . The second problem of the
feature fusion technique is a high dimensionality, which may lead to high computation
time and needing more storage space. Thus, feature selection technique such as Linear
Discriminant Analysis or Principal Component Analysis are used to reduce the
dimensionality of the combined feature set [46].
37
LDA takes full consideration of the class labels for patterns. It is generally believed that
the label information can make the recognition algorithm more discriminative. LDA
projects the original data into an optimal subspace by a linear transformation. The
transformation matrix consists of the eigenvectors whose corresponding eigenvalues can
maximize the ratio of the trace of the between-class scatter to the trace of the within-
class scatter [48].
Let SB and SW denote the between-class and the within-class scatter matrices, which are
defined in Equation (2.30) and (2.31) respectively.
c
Where, mi denotes the class mean and m is the global mean of the entire sample. The
j
number of vectors in class Xi is denoted by ni .
LDA looks for a linear subspace W (C - 1components) within which the projections of
the different classes are best separated by maximizing discriminant criteria defined by
Equation (2.32)
tr{W T SB W } (2.32)
J(W) = max
tr{W T SW W }
Along with the orthogonal constraint of W, this can be solved as a generalized
eigenvector and eigenvalue problem stated in Equation (2.33)
(2.33)
SB Wi = λi Sw Wi
Where, where Wi and λi are the i -th generalized eigenvector and eigenvalue of SB with
regard to Sw . The LDA solution, i.e., W, contains all the C-1 eigenvectors with non-
zero eigenvalues (SB has a maximal rank of C-1).
38
2.5. Image Classification
The primary objective of image classification is to detect, identify and classify the
features occurring in an image in terms of the type of class these features represent on
the field. Image Classification can be broadly divided into supervised and unsupervised.
The most common classification methods used recently in plant recognition systems are
revised below [49, 50].
2.3.1. Supervised Image Classification
A supervised classification problem falls under the category of learning from instances
where each instance or pattern or example is associated with a label or class.
Conventionally an individual classifier like Neural Network, Decision Tree, or a Support
Vector Machine is trained on a labeled data set. Depending on the distribution of the
patterns, it is possible that not all the patterns are learned well by an individual classifier.
The most common classification methods used recently in plant recognition systems are
presented below:
39
For plant leaf classification, we first find out feature vector of test sample and then
calculate Euclidean distance between test sample and training sample. This way it finds
out similarity measures and accordingly finds out class for test sample. The k-nearest
neighbor’s algorithm is amongst the simplest of all machine learning algorithms. An
object is classified by a majority vote of its neighbors, with the object being assigned
to the class most common amongst its k nearest neighbors. k is a positive integer,
typically small. If k = 1, then the object is simply assigned to the class of its nearest
neighbor. In binary (two class) classification problems, it is helpful to choose k to be
an odd number as this avoids tied votes. It is intuitive that the k-NN rule doesn't take
the fact that different neighbors may give different evidences into consideration.
Actually, it is reasonable to assume that objects which are close together (according to
some appropriate metric) will belong to the same category.
40
Support Vector Machine: It is a discriminative classifier which formally defined by a
separating hyperplane. In other words, given labeled supervised learning (training data),
the algorithm outputs of an optimal hyperplane which categorizes a new examples. A
Support Vector Machine training algorithm creates a model which assigns new
examples into one category or the other. The idea behind the method is to non-linearly
map the input data to some high dimensional space, where the data can be linearly
separated, thus providing great classification performance. For plant leaf classification
it will transform feature vector extracted from leaf’s shape. SVM finds the optimal
separating hyperplane by maximizing the margin between the classes. Data vectors
nearest to the constructed line in the transformed space are called the support vectors.
The SVM estimates a function for classifying data into two classes. Using a nonlinear
transformation that depends on a regularization parameter, the input vectors are placed
into a high-dimensional feature space, where a linear separation is employed. Inner
product (x,y) is supplanted to construct a non-linear product by kernel function K (x,y)
which can be described by Equation (2.34):
𝑛
𝑓(𝑥) = 𝑆𝑔𝑛 (∑ 𝛼𝑖 𝛾𝑖 𝑘 (𝑥𝑖 𝑥) + 𝑏) (2.34)
𝑖=1
Where f indicates the member of x. From given kernel set, the basis k(xi,x) where
i=1,2,....N is selected using the first layer of the SVM and linear function in the space
is created by second layer. SVM is independent of dimensionality of the input space. It
has simple geometric construction and it generates spare solution. Classification is
performed by support vectors n large number which is obtained by training set.
Deep Neural Network: In Deep Neural Network [60] classification initially random
weights are used for training set. Main advantage of Deep Neural Network is that it is
capable to reduce classification error while performing for huge datasets. Conventional
neural network requires more time for training whereas deep neural network considers
input for training based on their weights. It is feed-forward neural network which
41
contains more hidden layer. Hidden layers are used to map the input features. A
conventional mapping function is used in this work.
1
𝑂= (2.35)
1+𝑒 −(𝑏+𝑓𝑤)
2.3.2.Unsupervised Classification
In unsupervised classification, we have only input data (X) and no corresponding output
variables. The goal for unsupervised classification is to model the underlying structure
or distribution in the data in order to learn more about the data. These are called
unsupervised learning because unlike supervised learning or classification above there
is no correct answers and there is no teacher. Algorithms are left to their own devises to
discover and present the interesting structure in the data. Unsupervised learning
problems can be further grouped into clustering and association problems [51].
42
: RELATED WORK
3.1 Introduction
In this Chapter, related works of different researchers in the area of plant species
identification using image processing techniques are reviewed. In recent years, the
techniques of plant species identification based on leaf features have achieved great
progress. Morphological features, leaf shape moments, morphological features
combined with other leaf shape feature descriptors and other miscellaneous plant leaf
shape feature descriptors have been used along with different classification models for
plant species identification. In order to describe each research clearly we divided the
chapter in to three sections based on the features used for classification and recognition
in each area of related works.
Adil Salman et al. [53] preprocessed leaf shape image, traced the boundary of the leaf
using Canny Edge Operator and extracted fifteen features (Area, Convex Area, Filled
Area, Perimeter, Eccentricity, Solidity, Orientation etc.) from the binary image and used
Support Vector Machine Classifier to classify 22 plant species of the Flavia dataset.
Overall classification accuracy of 85%-87% is reported.
43
3.3 Classification base on moments
Marko Lukic et al. [54] used invariant Hu moments as leaf discriminative features but it
is indicated that Hu moments are inadequate for some cases when leaves from different
species have very similar shape. In order to overcome this, the authors used uniform
local binary pattern histogram parameters (mean, standard deviation, energy and
entropy) as discriminative features descriptor. Support vector machine is used as a
classifier and it is tuned using hierarchical grid search for leaf recognition which issued
for plant classification. The algorithm is tested using Flavia dataset and accuracy of
94.13 % is reported.
Zalikha et al. [55] compared the effectiveness of Zernike Moment Invariant (ZMI),
Legendre Moment Invariant (LMI), and Tchebichef Moment Invariant (TMI) as features
descriptor of leaf images. A Generalized Regression Neural Network (GRNN) is used
for classification of 130 pre-processed leaf images of four plant family. A 100%
classification accuracy is reported but the authors recommended the need for further
study as the number of leaf images used for the study are small and the result may be
due to over fitting of the GRNN. However, it is ascertained that TMI is better feature
descriptor with classification results showing features from the TMI are the most
effective.
Celebi and Aslandogan [56] also studied and compared three moment-based descriptors:
Invariant moments such as Hu moments, Zernike moments, and radial Chebyshev
(a.k.a.Tchefichef) moments. Invariant moments suffer from a high degree of information
redundancy, sensitivity to noise and numerical instability. The authors experimentally
proved that radial Chebyshev moments have the highest retrieval performance compared
to the other two shape descriptors.
44
Isnanto et al. [57] developed herbal plants identification system based on the shape of
the herbal plants’ leaves. Hu’s seven moments invariant feature extraction method is
used. Euclidean and Canberra distance similarity measures are used as a recognition
method and the results of both methods are analyzed. The highest and lowest accuracy
result achieved using Euclidean distance similarity measure is 86.67% and 40%
respectively while it is 72% and 20% using Canberra similarity measure.
Patil and Bhagat [59] used combination of Gabor and Gray-Level Co-Occurrence Matrix
(GLCM) texture features to recognize leaf shape and Decision tree as a classifier. The
authors also used Principal component analysis (PCA) to reduce the dimensionality of
the extracted features to increase the discriminative power of the Decision tree classifier.
The highest classification accuracy reported is 96% on Swidish Leaf Dataset of 10 plant
species.
45
of china that are collected by Wu et.al [61] and classification accuracy of 95.38% is
reported that shows the robustness of their approach compared to other methods.
Leaf growth, image translation, rotation and scaling independent morphological features
along with Zernike moments are used by Harish et.al [10] to identify and classify plant
leaves. The features are then fed as an input to four classifiers (Naive Bayes, k-NN,
SVM, and PNN). It is reported that Naive Bayes and k-NN are lazy learners and less
accurate compared to SVM and PNN.
AlaaTharwat et al. [62] used feature fusion technique to combine color, shape, and
texture features of colored images of leaf. Color moments, invariant moments, and Scale
Invariant Feature Transform (SIFT) are used to extract the color, shape, and texture
features, respectively. Linear Discriminant Analysis (LDA) is used to reduce the number
of features and Bagging ensemble is used as a classifier. The proposed approach was
tested using Flavia dataset which consists of 1907 colored images of leaves. The feature
fusion method achieved accuracy (70%) better than all single feature extraction methods.
Du and Zhai [63] proposed plant species identification based on multi feature radial basis
probabilistic neural networks ensemble classifier (RBPNNE). The RBPNNE consists of
several different independent neural networks trained by different feature domains
extracted using texture extraction techniques such as autocorrelation, edge frequency,
wavelet transform and discrete cosine transform of the original plant leaf images. In
addition, Hu invariant moments is also used for feature extraction. The final
classification results represent a combined response of the individual networks. For
testing purpose, the authors used their own dataset of 1100 images of 50 species of
different plants leaf. The experimental results show that the RBPNNC achieves higher
recognition accuracy (79.24 %.) and better classification efficiency than single feature
domain.
46
3.6 Summary
Several leaf based plant classification methods have been proposed to address plant
identification problem. Plant leaf shape descriptors such as Hu, Zernike, and Chebyshev
moments are extensively used with different learning algorithm for plant species
identification. Digital morphological features combined with other feature descriptors
are also used for plant species identification. However, most methods are inaccurate and
the dataset size used for experiment is limited. In some cases, the classification results
achieved are high but the sample size is too small [54, 55, 59]. The computation of
Zernike moments involve discrete approximation of its continuous integral term which
result in loss of information. Hu moments are inadequate for some cases when leaves
from different species have very similar shape [59]. Radial Chebyshev moment has
distinctive advantages due to its discrete characteristics over other moments for feature
extraction. Unlike invariant moments, its orthogonal basis leads to having minimum
information redundancy. It has also no numerical errors and computational complexity
owing to normalization [40]. Therefore, in this work, an improved method based on
fusion of morphological features and Radial Chebyshev moments of leaf shape and
Gabor filter of leaf texture with a class of ensemble method called Radom Forest is
proposed for plant species classification and identification.
47
: DESIGNOF PLANT SPECIES
IDENTIFICATION
4.1 Introduction
In recent years there has been an increased interest in applying image processing
techniques to the problem of automatic plant species identification. There are ample
opportunities to improve plant species identification through the designing of a
convenient automatic plant species identification system. Many different approaches are
used to classify plant species to its predefined classes using the features of plants leaf.
In this work, we will explain the design and system architecture of the proposed work.
The different processes such as plant leaf preprocessing, segmentation, feature
extraction, model training and classification that are used in the system architecture are
also explained in detail.
The proposed approach consists of two phases, namely, training phase and identification
phase. In the training phase, leaf image of plant species (i.e. training images) are
preprocessed, features are extracted from each leaf image and fused to feature set. Lastly,
important features are selected from the feature set and provided as input to the learning
model. In the identification phase, same procedure will be followed as the training phase
except that at later stage, instead of using the plant leaf image features for training, it
will use for classification. In the identification phase, the extracted features will be used
for identification of the plant leaf using the knowledge base.
48
Figure 4.1: Architecture of Automatic Plant Species Identification
Image resizing: it is computationally expensive to process large images. Hence, the size
of leaf images in the database is resized to reduce image processing time.
49
Conversion of RGB image to Grayscale Format
Leaf texture is extracted from grayscale image and in this work we applied Gabor filter
on grayscale leaf images to extract Gabor features. Hence, the RGB image is converted
to grayscale image. To convert RGB leaf image to grayscale, weighted averaging method
is used. Assuming R, G and B represent the respective intensities for red, green and blue
channel of a pixel, the grayscale value is computed using weighted averaging method as
given in equation (4.1)
Figure 4.2: Original RGB leaf Figure 4.3: Grayscale leaf image after
image applying weighted averaging method
To convert a grayscale image into a binary image, Otsu’s [64] method of automatic
thresholding which is explained in section 4.4 is applied. During the thresholding
process, if pixel value is greater than a threshold value, it is assigned one value (may be
white), else it is assigned another value (may be black). The shape of the histogram is
used for automatic thresholding and the threshold is chosen so as to minimize the
interclass variance of the black and white pixels.
50
Figure 4.4: Grayscale leaf image Figure 4.5: Binary leaf image obtained
after applying Otsu thresholding method
Boundary extraction: Identifying the contour of leaf plays paramount role for the
computation of morphological features of leaf image. Convolving leaf image with a
canny edge detection produces an image where higher grey level values indicate the
presence of an edge in the leaf image.
51
4.4 Leaf image segmentation
Among all the segmentation methods, Otsu method is one of the most successful region
based segmentation methods for automatic image thresholding because of its simple
calculation. In Otsu's method we exhaustively search for the threshold that minimizes
the intra-class variance (the variance within the class), defined as a weighted sum of
variances of the two classes [65].
𝜎𝑤2 (𝑡) = 𝑤0 (𝑡)𝜎02 (𝑡) + 𝑤1 (𝑡)𝜎12 (𝑡)
Weights 𝑤0 and 𝑤1 are the probabilities of the two classes separated by a threshold t,
and 𝜎02 and 𝜎12 are variances of these two classes.
The class probability 𝑤0,1 (𝑡) is computed from the L bins of the histogram:
𝑡−1
𝑤0 (𝑡) = ∑ 𝑝(𝑖)
𝑖=0
𝐿−1
𝑤1 (𝑡) = ∑ 𝑝(𝑖)
𝑖=𝑡
Otsu shows that minimizing the intra-class variance is the same as maximizing inter-
class variance:
𝜎𝑏2 (𝑡) = 𝜎 2 − 𝜎𝑤2 (𝑡) = 𝑤0 (𝜇0 − 𝜇 𝑇 )2 + 𝑤1 (𝜇1 − 𝜇 𝑇 )2
= 𝑤0 (𝑡)𝑤1 (𝑡)[ 𝜇0 (𝑡) − 𝜇1 (𝑡)]2
which is expressed in terms of class probabilities 𝜔 and class means 𝜇.
While the class mean 𝜇0,1,𝑇 (t) is:
𝑡−1 𝐿−1
𝑝(𝑖) 𝑝(𝑖)
𝜇0 (𝑡) = ∑ 𝑖 , 𝜇1 (t) = ∑ 𝑖
𝑤0 𝑤1
𝑖=0 𝑖=𝑡
𝐿−1
𝜇 𝑇 (𝑡) = ∑ 𝑖𝑝(𝑖)
𝑖=0
The class probabilities and class means can be computed iteratively. This idea yields an
effective Ostu algorithm.
52
Algorithm 4.1: Ostu Algorithm
There are a number of shape and texture features that can be extracted from plant leaves
for the identification of plant species using image processing techniques. In this work,
Radial Chebyshev moments and morphological features are used as shape descriptors
and Gabor features are used as texture descriptors. The shape descriptor captures the
global shape of leaf image. The internal vein structure is captured by the texture
descriptors. Both the shape and texture are combined by concatenation before fed as in
input to the ensemble of classifiers. Finally, from the combined shape and texture
features, important features are selected and fed as input to the ensemble of classifiers.
Feature extraction from shape and texture of plant leaves involves major steps and each
of these steps consists of many methods that contribute to improved results. The three
shape features are presented below along with their corresponding algorithm.
53
4.5.1 Radial Chebyshev Moments
Radial Chebyshev moment of order p and repetition q for an image of size N x N is
defined in equation (2.13) of Chapter 3, section 4 as [40]:
1 2𝜋
Mpq = 2𝜋𝜌(𝑝,𝑚) ∑𝑚−1
𝑟=0 ∑𝜃=0 𝑡𝑝 (𝑟)𝑒
−𝑗𝑞𝜃
𝑓(𝑟, 𝜃) (4.2)
Where, t0(x) = 1
2x−N+1
t1(x) = N
(p−1 )2
(2p−1)t1 (x)tp−1 (x)−(p−1){1− }tp−2 (x)
N2
tp(x) = (4.3)
P
54
Algorithm 4.2: Radial Chebyshev Moments
55
4.5.2 Morphological Feature
In this work, morphological feature have been considered which provide significant
information about leaf image. Leaf images from same class follow same properties
such as area, aspect ratio, convex, extent etc. These properties can be extracted
significantly with the help of morphological features [49]. Leaf image contour plays
significant role in extraction of the different morphological features. Contour is a curve
that join all the continuous points having same color or intensity along the boundary of
leaf shape. For better accuracy, the leaf images are converted to binary images using
threshold method before the selected features are extracted. A contour stores the (x, y)
coordinates of the boundary of a shape in a 2-dimensional array. The following
morphological features are extracted from leaf image [66].
56
4.5.3 Gabor Filter
In this work, Gabor filters have been used for texture feature extraction from the digital leaf
images due to their nature of spatial locality, orientation selectivity and frequency
characteristic as explained chapter 2,section 2.2.2
In this study the texture features were obtained as follows using Gabor filter.
1. Use a bank of Gabor filters at multiple scales and orientations to obtain filtered
images R(x,y).
2. The texture feature which is Local Energy of each filtered images is computed using
equation 2.21 of chapter 2, section 2.2.2
Gabor filter with 8 orientations and 4 frequency values is used for texture features extraction
from leaf image. This filter is illustrated in Table 4.1 for the 8 orientations( 0,π/8, 2π/8,
3π/8,4π/8,5π/8,6π/8,7π/8) and 4 frequencies( 0.2, 0.4, 0.6,0.8) values which are
experimentally determined. Hence, a total of 8*4=32 filter for each leaf image are used for
extraction of the texture features.
57
Figure 4.7: Gabor filter output for different orientations and frequency values
When we convolve Gabor filters on the sample leaf image in figure 4.8, it produces 32
different filtered leaf images. The 32 filtered leaf images are depicted in figure 4.9
58
Figure 4.9: Leaf images after application of Gabor filter
59
4.5.5 Feature selection
In this work, the number of features are high and selecting important features that contribute
significantly for the predication of model is essential for improved performance and
accuracy. Random forest plays paramount role in feature selection as it uses Gini importance
or mean decrease in impurity (MDI) to calculate the importance of each feature. Gini
importance is also known as the total decrease in node impurity. This is how much the model
fit or accuracy decreases when we drop a variable. The larger the decrease, the more
significant the variable is. Here, the mean decrease is a significant parameter for variable
selection. The Gini index can describe the overall explanatory power of features [67].
To compute Gini impurity for a set of features with K classes, let i ∈ {1, 2, 3…K}, and
let pi be the fraction of features labeled with class in the set. Then, the Gini index is provided
by equation 4.5. [66]
4.6 Learning
The results obtained from the different scientific researches [67, 68, 69] showed high
efficiency of ensemble methods for image classification. Random Forest is a class of
ensemble method that builds an ensemble of multiple decision trees and merges them
together to get a more accurate and stable prediction [72].
In random forests, each tree in the ensemble is built from a sample drawn with replacement
from the training set. In addition, instead of using all the features, a random subset of features
is selected, further randomizing the tree. As a result, the bias of the forest increases slightly,
but due to the averaging of less correlated trees, its variance decreases, resulting in an overall
better model [67, 72].
60
Random forest method is highly accurate and robust because of the number of decision trees
participating in the process. This method does not suffer from the over fitting problem as it
takes the average of all the predictions, which cancels out the biases. The method also plays
paramount role in feature selection which is a key advantage over alternate machine learning
algorithms. Hence, we decide to use Random Forest method for automatic plant species
identification.
1. For b = 1 to B:
a. Draw a bootstrap sample Z* of size N from the training
data.
b. Grow a random-forest tree Tb to the bootstrapped data,
by recursively repeating the following steps for each
terminal node of the tree, until the minimum node
size nmin is reached.
i. Select m variables at random from the p
variables.
ii. Pick the best variable/split-point among the
m.
iii. Split the node into two daughter nodes.
2. Output the ensemble of trees {Tb} 𝐵1 .
3. To make a prediction at a new point x:
̂𝑏 (x) be the class prediction of the
Classification: Let 𝐶
bth random-forest tree.
Then 𝐶̂ 𝐵𝑟𝑓 (x) = majority vote {𝐶̂ b(x)} 𝐵1
61
Random Forest works in four steps:
1. Select random samples from a given dataset.
2. Construct a decision tree for each sample and get a prediction result from each decision
tree.
3. Perform a vote for each predicted result.
4. Select the prediction result with the most votes as the final prediction.
62
: EXPERIMENTATION
5.1. Introduction
Developing a prototype is one of the objective of this work. The prototype serve to
demonstrate the validity and usability of the proposed automatic plant species
identification system. Prototype is developed for plant leaf preprocessing, segmentation,
feature extraction, feature fusion, future selection in order to demonstrate as well as
evaluate the developed automatic plant species identification model. We also presented
the tools and development environments used to realize this work.
63
5.3. Implementation
In order to implement the Plant species identification model, we made use of several
open source libraries. The following libraries are used on Windows 10 Pro with
Processor Intel(R) Core(TM) i7, CPU 2.70 GHz and RAM 12.0 GB [75, 76, 77, 78].
Python: We choose Python as the programming language to develop the
algorithms. The main reason for its selection is mainly because of its simplicity
and code readability.
The IDE used is Anaconda 3, Spyder 3.2.8.
OpenCV-Python which is a library of Python bindings designed to solve
computer vision problems.
NumPy: is the fundamental package for scientific computing with Python that
contains a powerful N-dimensional array object. This library is needed in order
to treat images as matrices. NumPy is the key library for image manipulation. It
is very fast, which allows our algorithms to run with high computational
efficiency, which is part of the desired features of the proposed work. OpenCV-
Python makes use of Numpy, which is a highly optimized library for numerical
operations.
We also used pandas which is an open source library providing high-
performance, easy-to-use data structures and data analysis tools for the Python
programming language purpose of writing extracted features to CSV file.
Scikit-learn is a free machine learning library for the Python. It features various
classification, regression and clustering algorithms including support vector
machines, random forests, etc. and is designed to interoperate with NumPy and
SciPy.
66
5.3.3. Training model
For the proposed solution, we decided to use Random Forest algorithms as it has already
been explained in Section 4.6. Random Forest is a class of ensemble classifier with
robust and accurate classification capabilities. This classifier is first trained with the
combined features. The result is evaluated and trained again with selected features
which are 48 in number. We excluded 17 features as their importance is very low for
training the model. The feature selection is done using the Gini index where it is readily
available with the implementation of Random Forest in the Sciket Learn. The threshold
for deciding the importance of the features is decided empirically by conducting
experiments several times.
# Load data
filename =’path to data’
data=readData(filename)
X = data.iloc[:,0:n]# assign features set
y = data.iloc[:, n] # assign class labels
X_train, X_test, y_train, y_test = train_test_split(X,
y, test_size = 0.20)
rfc = RandomForestClassifier(n_estimators=1000)
rfc.fit(X_train, y_train)
# Use the random forest's predict method on the test data
p = rfc.predict(X_test)
displayAccuracy(p)
67
Pseudo code to view a list of features with their importance scores
featureImportance=list(zip(X_train,rfc.feature_
importances_))
for item in featureImportance:
print(item[0], item[1])
5.4. Evaluation
5.4.1. Evaluation Techniques
To test the accuracy of the model, 20% of the leaves in the database are used as query
images. The widely used measurement metrics such as precision, recall, f1 score and
support are used to compute the accuracy of the system which are defined as follow [9,].
Precision is defined as the ratio of tp / (tp + fp), where tp is the number of true positives
and fp is number of false positives. The precision is intuitively the ability of the classifier
not to label as positive a sample that is negative.
The recall is the ratio tp / (tp + fn) where tp is the number of true positives and fn the
number of false negatives. The recall is intuitively the ability of the classifier to find all
the positive samples.
The f1score can be interpreted as a weighted harmonic mean of the precision and recall
and is defined as f1 Score = 2*(Recall * Precision) / (Recall + Precision).
Support is defined as the number samples of the true response that lie in that class.
68
Scikit-learn provide a convenience report when working on classification problems to give
us a quick idea of the accuracy of a model using a number of measures.
The classification_report() function displays the precision, recall, f1-score and support for
each class.
Individually, the test results are presented in Table 5.2, 5.3, 5.4, 5.5 and 5.6 for test result
using morphological features, Radial Chebyshev moments, Gabor texture features, fused
features and selected features respectively.
69
Table 5.2: Test result for training model using morphological features
70
Table 5.3: Test result for training model using Chebyshev moments
71
Table 5.4: Test result for training model using Gabor texture features
72
Table 5.5: Test result for training model using fused features
73
Feature selection: We closely examined the importance of each feature for the training of
the Random Forest classifier. After repeated experimentation, we decided that features with
feature importance value less than 0.009 do not contribute much for the training of the
classifier. Hence, Radial Chebyshev moment m01,m11,m20,m21,m23,m41,m43 and Gabor
texture features mse3,mse7,mse9,mse12,mse13,mse17,mse20,mse21,mse25,mse31 are
removed from the fused feature set as their feature importance value is less than the threshold
value we set. After removing these features, the algorithm is re-trained using the remaining
selected features.
74
Table 5.6: Test result of training model using selected features
75
5.5. Discussion
The result show that the classification accuracy of the Random Forest model using
morphological features, Radial Chebyshev moments and Gabor texture features is
93.0%,91.0%, and 85.0% respectively. When the classifier is trained on fused and selected
features, the accuracy achieved is 97.0% which is higher than the result obtained by training
the classifier on individual features.
Morphological features are generally good as shape descriptor but has difficulty at
representing some plant species with irregular shape. For example for plant species in class
30, its precision, recall, fscore and support values are 0.75,0.86,0.80 respectively which are
slightly low compared to the result for other classes. This is because of the irregularity of
the shape of the leaf examples in this class.
Radial Chebyshev moment generation for the higher order and repetition is time taking to
compute. We take maximum order 5 and repetition 5. Radial Chebyshev moment is better
at representing irregular leaf shape compared to morphological technique. If we see the
result for same class, class 30, its precision, recall, fscore and support values are 0.86, 0.86,
0.86 respectively.
Setting Gabor filter parameters such as orientation, frequency, kernel size, standard
deviation of the Gaussian envelope, spatial aspect ratio, phase offset and type of filter
coefficient require careful selection supported by experiment. Poorly set parameter or
parameters impact the result of feature extraction. The Gabor filter parameters used in this
work which are provided in Table 4.1 are determined after several rounds of
experimentation. However, using the feature selection techniques we employed, we found
less important 10 of the Gabor texture features (mse3, mse7, mse9, mse12, mse13, mse17,
mse20, mse21, mse25 and mse31). This might be due to the requirement to some extent for
further refinement of the Gabor filters parameters setting. The more we have features that
represent the leaf image, the more accurate our classification result will be.
76
: CONCLUSION AND FUTURE WORK
6.1 Conclusion
In this work we described an algorithm for automatic plant species identification. We used
Random Forest as a classifier and tuned it using feature selection techniques. The classifier
is separately trained using Morphological features, Radial Chebyshev moment features and
Gabor texture features and we obtained 93%, 91% and 85% respectively. We also fused
these feature set and trained the classifier on the whole feature set and obtained 96%
accuracy and finally, we applied feature selection techniques on the fused feature set and
trained the Random Forest classifier again and achieved accuracy of 97% which is higher
than other approaches in the literature.
From the result achieved, we can conclude that use of feature fusion and a class of ensemble
classifier such as Random Forest is an excellent choice for automatic plant species
identification.
77
6.3 Recommendation
The proposed work can be further extended to identify complex images with petiole and
clustered leafs. It can also be extended to identify plant species in real time from their
leaf image.
We used leaf shape morphology, moments and texture as leaf shape features. Even
though, use of leaf color has its own limitation as described in section 2.2.1, still it is
good to consider use of leaf color with the features used in this work as it might improve
the result.
The work can also be extended to identify plant species from 3 dimensional (3D) flower
as flows are 3D in nature.
Use of Radial Chebyshev moment for feature extraction is proved to provide good result.
However, its implementation involves recursive function call and generally it is slow
process for higher order and repletion of the moment values. Alternative but efficient
algorithm development is area of future work.
78
References
[1] James S.Cope, David Corney, Jonathan Y.Clark, Paolo Remagnino and Paul Wilkin,
“Plant species identification using digital morphometrics: A review,” Expert System
with Application, vol. 39, pp.7562-7573, 2012.
[2] J. Cullen and James Cullen, Practical Plant Identification, Cambridge University Press,
2006, pp.1
[3] Scott Bissette and David Lane, ”Identification of Trees,” in Common Forest Tress of
North Carolina, 21 ed. North Carolina Department of Agriculture and Consumer
Services, Northern Carolina, USA,2015, ch.1, pp.1
[4] Key to Nature, “Types of Identification Keys”, [online]. Available:
http://www.keytonature.eu/handbook/Types_of_identification_keys. [Accessed: 24-
Dec- 2017].
[5] Naiara Aginakoetal.,”Identification of plant species on large botanical image datasets”
in Proc.of 1st Int. Workshop on Environmental Multimedia Retrieval co-located with
ACM Intl Conf. on Multimedia Retrieval (ICMR 2014), Glasgow, UK, April 1, 2014.
[6] Addis Ababa University, [Online] Available: http://www.aau.edu.et/cns/department-
of-plant-biology-and-biodiveristy-management/facilities-of-pb/.
[7] Tesfaye Awas (PhD), “Endemic plants of Ethiopia”, Institute of Biodiversity
Conservation, P. O. Box 30726, Addis Ababa, Ethiopia.
[8] Stephen Gang Wu, Forrest Sheng Bao, Eric You Xu, Yu-Xuan Wang, Yi-Fan Chang
and Chiao-Liang Shiang, A Leaf Recognition Algorithm for Plant classification Using
Probabilistic Neural Network, IEEE 7th International Symposium on Signal
Processing and Information Technology, Dec. 2007, Cario, Egypt.
[9] Sciket-Learn, “metrics”, [Online]. Available: http://scikit- learn.org/stable/modules/
generated/ sklearn.metrics.precision_recall_fscore_support.html. [Accessed: 15- Sep-
2018].
[10] B.S.Harish, Aditi Hedge, OmPriyaVenkatesh, D.G.Spoorthy,D.Sushma,
“Classification of Plant Leaves Using Morphological Features and Zernike
79
Moments,” in International Conference on Advances in Computing, Communications
and Informatics, pp.1827-1831,2013.
[11] Botanical-Online, “The importance of plants”, [Online]. Available:
https://www.botanical-online.com/theimportanceofplants.htm. [Accessed: 24- Dec-
2017].
[12] Education with Fun, “Parts of Plants”, [Online]. Available:
http://educationwithfun.com/course/view.php?id=18§ion=2. [Accessed: 24-
Dec- 2017].
[13] Plant facts glossary, “Leave variation”, [Online]. Available:
https://plantfacts.osu.edu/resources/hcs300/glossary/glossary.htm. [Accessed: 24-
Dec- 2017].
[14] Encyclopedia Britannica,” Summary of Leaf variation”, [Online]. Available:
https://www.britannica.com/science/simple-leaf/media/545336/374. [Accessed: 24-
Dec- 2017].
[15] 6BC Botanical Garden, “Glossary of leaf morphology”, [Online]. Available:
https://www.6bcgarden.org/glossary-of-leaf-morphology.html. [Accessed: 24- Dec-
2017].
[16] Gardening know how,” Leaf Identification – Learn about Different Leaf Types in
Plants”, [Online]. Available: https://www.gardeningknowhow.com/garden-how-
to/info/different-leaf-types-in-plants.htm. [Accessed: 24- Dec- 2017].
[17] Colombia University Press, “Plant Taxonomy”, [Online]. Available:
https://cup.columbia.edu/book/plant-taxonomy/9780231147125. [Accessed: 24-
Dec- 2017].
[18] Olive Oil Source, “Olive Classification”, [Online]. Available:
https://www.oliveoilsource.com/page/olive-classification. [Accessed: 24- Dec-
2017].
[19] Marko Lukic, Eva Tuba and Milan Tuba, “Leaf Recognition Algorithm using
Support Vector Machine with Hu Moments and Local Binary Patterns, “ in
Proceedings of IEEE 15th International Symposium on Applied Machine
Intelligence and Informatics, pp. 485 – 490, 2017.
80
[20] V. R. Patil and R. R. Manza, "A Method of Feature Extraction from Leaf
Architecture", in International Journal of Advanced Research in Computer Science
and Software Engineering, Volume 5, Issue 7, July 2015 ISSN: 2277 128X
[21] B. Chitradevi and P.Srimathi, “An Overview on Image Processing Techniques”, in
International Journal of Innovative Research in Computer and Communication
Engineering Vol. 2, Issue 11, November 2014.
[22] Rafael C. Gonzalez and Richard E. Woods, "Digital Image Processing, 2nd Edition",
Prentice Hall, pp1-2,567-568
[23] K. Padmavathi1 and K. Thangadurai, "Implementation of RGB and Grayscale
Images in Plant Leaves Disease Detection – Comparative Study", Indian Journal of
Science and Technology, Vol 9(6), DOI: 10.17485/ijst/2016/v9i6/77739, February
2016
[24] Sapna Sharma and Dr.Chitvan Gupta, “Recognition of Plant Species based on leaf
images using Multilayer Feed Forward Neural Network”, in International Journal of
Innovative Research in Advanced Engineering (IJIRAE) ISSN: 2349-2163 Issue 6,
Volume 2, June 2015.
[25] S.S. Bedi and Rati Khandelwal, "Various Image Enhancement Techniques - A
Critical Review", International Journal of Advanced Research in Computer and
Communication Engineering Vol. 2, Issue 3, March 2013
[26] Raman Maini and Himanshu Aggarwal, "A Comprehensive Review of Image
Enhancement Techniques", Journal of Computing, Vol. 2, Issue 3, March2010.
[27] Frank Shih,"Image Processing and Pattern Recognition", The Institute of Electrical
and Electronics Engineers, Inc., 2010, pp 52-55
[28] Heba F.Eid, "Performance Improvement of Plant Identification Model based on PSO
Segmentation ",International Journal of Intelligent Systems and Applications, 2016,
2, 53-58,DOI: 10.5815/ijisa.2016.02.07
[29] Shilpa Kamdi and R.K.Krishna,"Image Segmentation and Region Growing
Algorithm" in International Journal of Computer Technology and Electronics
Engineering (IJCTEE) Volume 2, Issue 1, February 2012.
81
[30] K.Bhargavi and S. Jyothi, "A Survey on Threshold Based Segmentation Technique in
Image Processing", in International Journal of Innovative Research and Development,
Vol. 12(12), Nov. 2014.
[31] B.Sathya and R.Manavalan, "Image Segmentation by Clustering Methods:
Performance Analysis", International Journal of Computer Applications (0975 –
8887), Volume 29– No.11, September 2011
[32] Muthukrishnan.R and M.Radha, "Edge Detection Techniques for Image
Segmentation", in International Journal of Computer Science & Information
Technology (IJCSIT) Vol 3, No 6, Dec 2011.
[33] N. Valliammal, "Computer-Aided Plant Identification Through Leaf Recognition
Using Enhanced Image Processing And Machine Learning Algorithms", PhD in
Computer Science, Avinashilingam Institute for Home science and Higher Education
for Women, Coimbatore – 641 043, October, 2013.
[34] Adrian Rosebrock, https://www.pyimagesearch.com/2014/03/03/charizard-explains-
describe-quantify-image-using-feature-vectors/
[35] Jan Flusser, Tomáš Suk and Barbara Zitová, Moments and Moment Invariants in
Pattern Recognition © 2009 John Wiley & Sons, Ltd. ISBN: 978-0-470-69987-4
[36] R. Athilakshmi and Dr. Amitabh Wahi, "An Efficient Method for Shape based Object
Classification using Radial Chebyshev Moment on Square Transform", in Australian
Journal of Basic and Applied Sciences, 8(13) August 2014, Pages: 51-60
[37] Hamid Reza Boveiri,"On Pattern Classification Using Statistical Moments",
International Journal of Signal Processing, Image Processing and Pattern Recognition
Vol. 3, No. 4, December, 2010
[38] Anant Bhardwaj and Manpreet Kaur," A review on plant recognition and classification
techniques using leaf images", International Journal of Engineering Trends and
Technology- Vol.4, Issue 2,2013
[39] Pallavi P and V.S Veena Devi, "Leaf Recognition Based on Feature Extraction and
Zernike Moments", International Journal of Innovative Research in Computer and
Communication Engineering Vol.2, Special Issue 2, May 2014
82
[40] Pouya Bolourchi, Hasan Demirel and Sener Uysal,"Target recognition in SAR images
using radial Chebyshev moments",SIViP (2017) 11:1033–1040, DOI10.1007/s11760-
017-1054-2, 21 January 2017
[41] Balasubramanian Raman, Sanjeev Kumar, ParthaPratim Roy and Debashis,”Leaf
Identification Using Shape and Texture Features” in Advances in Intelligent Systems
and Computing in Proceedings of International Conference on Computer Vision and
Image Processing, Volume 2, Springer, 2016, page 531-53.
[42] Anup S Vibhute and Ismail Saheb Bagalkote, "Identification of Grape variety Plant
species using image processing ", Avishkar – Solapur University Research Journal,
Vol. 3, 2014
[43] Arun Kumar1, Vinod Patidar, Deepak Khazanchi, and Poonam Saini,"Role of
Feature Selection on Leaf Image Classification", Journal of Data Analysis and
Information Processing, 2015, 3, 175-183 Published Online November 2015 in
SciRes. http://www.scirp.org/journal/jdaip
http://dx.doi.org/10.4236/jdaip.2015.34018
[44] Urszula Marmol, "Use of Gabor Filters for Texture Classification of Airborne images
and Lidar Data",Archives of Photogrammetry, Cartography and Remote Sensing, Vol.
22, 2011, pp. 325-336, ISSN 2083-2214
[45] Jyotismita Chaki and Ranjan Parekh, "Plant Leaf Recognition using Gabor Filter",
International Journal of Computer Applications (0975 – 8887), Volume 56– No.10,
October 2012.
[46] Alaa Tharwat,Tarek Gaber,Yasser M. Awad,Nilanjan Dey,Vaclav Snasel and Aboul
Ella Hassanien,"Plants Identification using Feature Fusion Technique and Bagging
Classier",ResearchGate Conference Paper,November 2015.
[47] Telgaonkar Archana H. and Deshmukh Sachin "Dimensionality Reduction and
Classification through PCA and LDA ",International Journal of Computer
Applications (0975 – 8887) Volume 122 – No.17, July 2015
[48] Minggang Du and Xianfeng Wang, Linear Discriminant Analysis and Its Application
in Plant Classification,2011 Fourth International Conference on Information and
Computing
83
[49] Kanika Kalra, Anil Kumar Goswami and Rhythm Gupta, "A comparative Study of
Supervised Image Classification Algorithms for Satellite Images ", International
Journal of Electrical, Electronics and Data Communication, ISSN: 2320-2084
Volume-1, Issue-10, Dec, 2013.
[50] Prof. Meeta Kumar, Mrunali Kamble, Shubhada Pawar, Prajakta Patil and Neha
Bonde,"Survey on Techniques for Plant Leaf Classification", International Journal of
Modern Engineering Research (IJMER),Vol.1, Issue.2, pp-538-544 ISSN: 2249-6645
[51] Jason Brownlee, Ph.D. [Online], Available: https://machinelearningmastery.com/
supervised -and-unsupervised-machine-learning-algorithms/
[52] Qingmao Zeng, Tonglin Zhu, XueyingZhuang, MingxuanZheng and YubinGuo,
“Using the periodic wavelet descriptor of plant leaf to identify plant species,” Springer
Science and Business Media, 2015, New York.
[53] Adil Salman, Ashish Semwal,Upendra Bhatt and V. M Thakkar, "Leaf Classification
and Identification using Canny Edge Detector and SVM Classifier," in International
Conference on Inventive Systems and Control, pp.1-4,2017.
[54] Marko Lukic, Eva Tuba and Milan Tuba, “Leaf Recognition Algorithm using Support
Vector Machine with Hu Moments and Local Binary Patterns, “ in Proceedings of
IEEE 15th International Symposium on Applied Machine Intelligence and
Informatics, pp. 485 – 490, 2017.
[55] Zalikha Zulkifli, Puteh Saad and Itaza Afiani Mohtar, “Plant Leaf Identification using
Moment Invariants & General Regression Neural Network,” in 11th International
Conference on Hybrid Intelligent Systems (HIS), pp.430-435, 2011.
[56] M. E. Celebi and Y. A. Aslandogan, “A comparative study of three moment-based
shape descriptors,” in Proceedings of IEEE International Conference on Information
Technology: Coding and Computing, Vol. 2, No. 1, pp. 788 – 793, April 4-6,2005.
[57] R Rizal Isnanto, Ajub Ajulian Zahra, and Patricia Julietta, “Pattern Recognition on
Herbs Leaves Using Region-Based Invariants Feature Extraction,” in Proceedings of
3rd International Conference on Information Technology, Computer, and Electrical
Engineering, pp. 455 – 459,Oct 19-21, 2016.
84
[58] Gunjan Mukherjee, Arpitam Chatterjee, and BipanTudu, “Study on the potential of
combined GLCM features towards medicinal plant classification,” in 2nd
International Conference on Control, Instrumentation, Energy & Communication
(CIEC), 2016.
[59] Patil and Bhagat, "Plants Identification by Leaf Shape using GLCM, Gabor Wavelets
and PCA," in International Journal of Engineering Trends and Technology (IJETT) –
Volume 37 Number 3- July 2016
[60] Anusha Rao and Dr. S.B. Kulkarni, “An Improved Technique of Plant Leaf
Classification Using Hybrid Feature Modeling,” in Proceedings of IEEE International
Conference on Innovative Mechanisms for Industry Applications, pp. 5-9, 21-23 Feb.
2017.
[61] S. Wu, F. Bao, E. Xu, Y. Wang, Y. Chang, and Q. Xiang, “A Leaf Recognition
Algorithm for Plant Classification Using Probabilistic Neural Network,” IEEE 7th
International Symposium on Signal Processing and Information Technology,
December 2007.
[62] Alaa Tharwat, Tarek Gaber, Yasser M. Awad, Nilanjan Dey, Vaclav Snasel, and
Aboul Ella Hassanien, “Plants Identification using Feature Fusion Technique and
Bagging Classifier,” in Conference Paper · November 2015
[63] Ji-Xiang Du and Chuan-Min Zhai, “Plant Species Recognition Based on Radial Basis
Probabilistic Neural Networks Ensemble Classifier,” in Advanced Intelligent
Computing Theories and Applications. Vol 6216, Springer, Berlin, Heidelberg, 2010.
[64] Open Source Computer Vision,” Otsu’s Binarization”, [Online]. Available:
https://docs.opencv.org/3.4.0/d7/d4d/tutorial_py_thresholding.html, [Accessed: 24-
Jul- 2018].
[65] Wikipedia, “Otsu's method”, [Online]. Available:
https://en.wikipedia.org/wiki/Otsu%27s_method. [Accessed: 28- Jul- 2018].
[66] Alexander Mordvintsev & Abid K,"OpenCV-Python Tutorials Documentation
Release 1",February 28, 2017,pp. 87-98
85
[67] Avinash Navlani, “Understanding Random Forests Classifiers in Python”, May 16th,
2018, [Online]. Available: https://www.datacamp.com/community/tutorials/random-
forests-classifier-python. [Accessed: 24- Aug- 2018].
[68] Wikipedia, “Decision Tree Learning”, [Online]. Available:
https://en.wikipedia.org/wiki/Decision_tree_learning. [Accessed: 25- Aug- 2018].
[69] A. Riaz, S. Farhan, M. A. Fahiem and H. Tauseef,"An Ensemble Classifier based
Leaf Recognition Approach for Plant Species Classification using Leaf Texture,
Morphology and Shape",Department of Computer Science, Lahore College for
Women University, Lahore, Pakistan
[70] R. Putri Ayu Pramesti, Yeni Herdiyeni, Anto Satriyo Nugroho,” Weighted
Ensemble Classifier for Plant Leaf Identification”, TELKOMNIKA, Vol.16, No.3,
June 2018, pp. 1386~1393 ISSN: 1693-6930, accredited A by DIKTI, Decree No:
58/DIKTI/Kep/2013 DOI : 10.12928/TELKOMNIKA.v16i3.7615
[71] Nikita Joshi1, Shweta Srivastava,” Improving Classification Accuracy Using
Ensemble Learning Technique (Using Different Decision Trees)”, IJCSMC, Vol. 3,
Issue. 5, May 2014, pg.727 – 732
[72] Savan Patel, “Random Forest Classifier”, May 18, 2017, [Online]. Available:
https://medium.com/machine-learning-101/chapter-5-random-forest-classifier-
56dc7425c3e1. [Accessed: 25- Aug- 2018].
[73] Trevor Hastie, Robert Tibshirani and Jerome Friedman, Random Forests in "The
Elements of Statistical Learning, 2nd ed., 2008. [Online]. Available:
http://statweb.stanford.edu/~tibs/book/chap17. Pp.601-603. [Accessed: 25- Aug-
2018].
[74] Pedro F. B. Silva André R. S. MarÇal Rubim Almeida da Silva, ‘Leaf data set ‘,
February, 2014
[75] Open Computer Vision, “Open CV python”, [Online]. Available:
https://docs.opencv.org/3.0- beta/doc/py_tutorials/py_setup/
py_intro/py_intro.html#intro. [Accessed: 5- Sep- 2018].
[76] The pandas project,” Python Data Analysis Library”, [Online]. Available:
https://pandas.pydata.org/. [Accessed: 5- Aug- 2018].
86
[77] NumPy developers,”Numpy”, [Online]. Available: http://www.numpy.org/.
[Accessed: 5- Sep- 2018].
[78] Sciket-Learn, “Sciket-Learn machine learning in python”, [Online]. Available:
http://scikit-learn.org/stable/. [Accessed: 15- Sep- 2018].
87
Declaration
I, the undersigned, declare that this thesis is my original work and has not been presented
for a degree in any other university, and that all source of materials used for the thesis have
been duly acknowledged.
Declared by:
Name: _____________________________________
Signature: __________________________________
Date: ______________________________________
Confirmed by advisor:
Name: _____________________________________
Signature: __________________________________
Date: ______________________________________
88