0% found this document useful (0 votes)
15 views91 pages

1dejene Tsegaye 2018

This thesis proposes a new method for automatic plant species identification using image processing techniques. The method extracts shape features using Radial Chebyshev moments and morphological features, and texture features using Gabor filters from leaf images. These features are then fused and the most important ones selected to train a Random Forest classifier. The method was tested on 1907 leaf images from 32 species and achieved 97% accuracy in identification.

Uploaded by

Mengistu Abebaw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views91 pages

1dejene Tsegaye 2018

This thesis proposes a new method for automatic plant species identification using image processing techniques. The method extracts shape features using Radial Chebyshev moments and morphological features, and texture features using Gabor filters from leaf images. These features are then fused and the most important ones selected to train a Random Forest classifier. The method was tested on 1907 leaf images from 32 species and achieved 97% accuracy in identification.

Uploaded by

Mengistu Abebaw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 91

ADDIS ABABA UNIVERSITY

COLLEGE OF NATURAL SCIENCES

Automatic Plant Species Identification Using Image


Processing Techniques

Dejene Tsegaye Ayane

A Thesis Submitted to the Department of Computer Science in


Partial Fulfillment of the Requirements for the Degree of
Master of Science in Computer Science

Addis Ababa, Ethiopia


October, 2018
ADDIS ABABA UNIVERSITY
COLLEGE OF NATURAL SCIENCES
Dejene Tsegaye Ayane

Advisor: Yaregal Assabie(PHD)

This is to certify that the thesis prepared by Dejene Tsegaye Ayane, titled: Automatic Plant
Species Identification Using Image Processing Techniques and submitted in partial
fulfillment of the requirements for the Degree of Master of Science in Computer Science
complies with the regulations of the University and meets the accepted standards with respect
to originality and quality.

Signed by the Examining Committee:

Name Signature Date

Advisor : Yaregal Assabie(PHD)

Examiner:

Examiner:
ABSTRACT
Plants are one of the important things that plays a very essential role for all living beings
exists on earth. Plants form a fundamental part of life on Earth, providing us with
breathable oxygen, food, fuel, medicine and more. Plants also help to regulate the climate,
provide habitats and food for insects and other animals. But due to unawareness and
environment deterioration, many plants are at the verge of extinction. Understanding of
plant behavior and ecology is very important for human being and the entire planet.

Plants possess unique features in their leaf that distinguish them from others. Taxonomists
use these unique features to classify and identify plant species. However, there is a
shortage of such skilled subject matter experts, as well as a limit on financial resources.

Several leaf image based plant species identification methods have been proposed to
address plant identification problem. However, most methods are inaccurate. Invariant
moments that are used for leaf shape features extraction are inadequate. Hu moments are
inadequate when leaves from different species have very similar shape. The computation
of Zernike moments involve discrete approximation of its continuous integral term which
result in loss of information. Hence, it is extremely important to look for an improved
method of plant species classification and identification using image processing
techniques.

In this work a new method based on combined leaf shape and texture features using a
class of ensemble method called Random Forest for the classification and identification
of plant species has been proposed. Morphological features and Radial Chebyshev
moments are extracted from the leaf shape and Gabor filters are extracted from leaf
texture. These three features are combined, important features are selected to form a
feature set that trained the Random Forest classifier.

The Random Forest was trained with 1907 sample leaves of 32 different plant species that
are taken form Flavia dataset. The proposed approach is 97% accurate using Random
Forest classifier.

Keywords: Plant Species Identification, Morphological Features, Radial Cheybshev


Moments, Gabor Filter, Random Forest
ACKNOWLEDGEMENTS
I would like to express the greatest appreciation to my advisor Yaregal Assabie (PHD) for
his patient guidance throughout this thesis and the amount of time he spent on consultations.
His suggestions, ideas and comments were invaluable.

I would also like to thank my classmates and friends who provided valuable assistance and
support.

Finally, I am deeply grateful to my family for their endless support and encouragement.
TABLE OF CONTENTS

LIST OF TABLES ............................................................................................................ V

ACRONYM AND ABBREVIATION ........................................................................ VIII

: INTRODUCTION ...................................................................................9

1.1 Background ................................................................................................................9

1.2 Motivation ................................................................................................................10

1.3 Statement of the Problem .........................................................................................10

1.4 Objective ..................................................................................................................11

1.5 Methods....................................................................................................................12

1.6 Scope and limitation ................................................................................................ 12

1.7 Application of Results.............................................................................................. 13

1.8 Organization of the Rest of the Thesis .....................................................................13

: LITERATURE REVIEW .....................................................................14

2.1 Introduction ..............................................................................................................14

2.2 Overview of plant species ........................................................................................14

2.2.1 Plant Species Identification ...........................................................................18

2.2.2 Digital Morphological features ......................................................................19

2.3 Digital Image Processing .........................................................................................20

2.4 Plant species identification model and techniques...................................................42

: RELATED WORK ................................................................................43

3.1 Introduction ..............................................................................................................43

3.2 Classification based on morphological feature ........................................................43

3.3 Classification base on moments ...............................................................................44


3.4 Classification base on texture feature ......................................................................45

3.5 Classification base on fused feature .........................................................................45

3.6 Summary ..................................................................................................................47

: DESIGN OF PLANT SPECIES IDENTIFICATION ........................48

4.1 Introduction ..............................................................................................................48

4.2 System Architecture .................................................................................................48

4.3 Leaf image preprocessing ........................................................................................49

4.4 Leaf image segmentation .........................................................................................52

4.5 Features Extraction ..................................................................................................53

4.5.1 Radial Chebyshev Moments ..........................................................................54

4.5.2 Morphological Feature ...................................................................................56

4.5.3 Gabor Filter ....................................................................................................57

4.5.4 Features Fusion .............................................................................................. 59

4.5.5 Feature selection ............................................................................................60

4.6 Learning ...................................................................................................................60

4.7 Leaf Identification ....................................................................................................62

: EXPERIMENTATION .........................................................................63

5.1. Introduction ..............................................................................................................63

5.2. Data collection .........................................................................................................63

5.3. Implementation ........................................................................................................64

5.3.1. Leaf preprocessing and segmentation ......................................................................64

5.3.2. Feature extraction.....................................................................................................66

5.3.3. Training model .........................................................................................................67

5.4. Evaluation ................................................................................................................68

iii
5.4.1. Evaluation Techniques .............................................................................................68

5.4.2. Test Result ...............................................................................................................69

5.5. Discussion ................................................................................................................76

: CONCLUSION AND FUTURE WORK .............................................77

6.1 Conclusion ...............................................................................................................77

6.2 Contribution of the thesis work................................................................................77

6.3 Recommendation .....................................................................................................78

References .........................................................................................................................79

iv
LIST OF TABLES

Table 2.1: Classification of olive according to classical taxonomy .................................... 18

Table 4.1: Parameters Used for Gabor feature extraction ................................................... 57

Table 5.1: Test Result using various features ...................................................................... 69

Table 5.2: Test result for training model using morphological features ............................. 70

Table 5.3: Test result for training model using Chebyshev moments ................................. 71

Table 5.4: Test result for training model using Gabor texture features ............................... 72

Table 5.5: Test result for training model using fused features ............................................ 73

Table 5.6: Test result of training model using selected features ......................................... 75

v
LIST OF FIGURES
Figure 2.1: Different parts of plant as illustrated in [12] ..................................................... 15
Figure 2.2:Simple leaf parts ................................................................................................ 16

Figure 2.3: Compound leaf .................................................................................................. 16

Figure 2.4: Leaf margin as illustrated in [13] ...................................................................... 16

Figure 2.5: Arrangement of leaf as depicted in [13]............................................................ 16

Figure 2.6: Summary of Leaf variation as illustrated in [14] .............................................. 17

Figure 2.7: Relationship between physiological width and height of plant leaf ................. 19

Figure 2.8: An example of applying the 3x3 averaging kernel ........................................... 24

Figure 2.9: Sobel 3 x 3 masks in x and y direction ............................................................. 27

Figure 2.10: Laplacian 3 x 3 masks in x and y direction ..................................................... 27

Figure 2.11: Roberts 2 x 2 masks in x and y direction ........................................................ 28

Figure 2.12: Prewitt 3 x 3 masks in x and y direction ......................................................... 28

Figure 2.13: Block diagram of classification system as illustrated in [10] ......................... 42

Figure 4.1: Architecture of Automatic Plant Species Identification ................................... 49

Figure 4.2: Original RGB leaf image .................................................................................. 50

Figure 4.3: Grayscale leaf image after applying weighted averaging method .................... 50

Figure 4.4: Grayscale leaf image ......................................................................................... 51

Figure 4.5: Binary leaf image obtained after applying Otsu thresholding method ............. 51

Figure 4.6: Plant leaf image contour ................................................................................... 51

Figure 4.7: Gabor filter output for different orientations and frequency values ................. 58

Figure 4.8: Sample Leaf image before Gabor filter is applied ............................................ 58

Figure 4.9: Leaf images after application of Gabor filter .................................................... 59

Figure 4.10: How Random Forest works as illustrated in [67] ........................................... 62

Figure 5.1: Sample Leaf images .......................................................................................... 63


vi
LIST OF ALGORITHMS
Algorithm 4.1: Ostu Algorithm ........................................................................................... 53

Algorithm 4.2: Radial Chebyshev Moments ....................................................................... 55

Algorithm 4.3: morphological features ............................................................................... 56

vii
ACRONYM AND ABBREVIATION

IDE Integrated Development Environment

Spyder Scientific PYthon Development EnviRonment

OpenCV Open Computer Vision

RGB Red, Green, Blue

JPEG Joint Photographer Expert Group

k-NN K-Nearest Neighbor Classifier

PNN Probabilistic Neural Network

SVM Support Vector Machine

RCM Radial Chebyshev moments

MSE Mean square Error

CSV Comma Separated Value

SciPy Scientific Python


: INTRODUCTION
1.1 Background
Plants form a fundamental part of life on Earth, providing us with breathable oxygen,
food, fuel, medicine and more. Plants also help to regulate the climate, provide habitats
and food for insects and other animals and provide a natural way to regulate flooding. A
good understanding of plants is necessary to improve agricultural productivity and
sustainability, to discover new pharmaceuticals, to plan for and mitigate the worst effects
of climate change, and to come to a better understanding of life as a whole. It is therefore
becoming increasingly important to identify new or rare plant species [1].

Plant species identification is the finding of the correct name for an unknown plant in
order to access all the information so far available about that plant. Plants have many
characteristics that can be used to identify particular species: overall size and shape; the
color, size and shape of leaves; the texture, color and shape of twigs and buds; and the
color and texture of bark, fruit and flowers. Growing range is also useful in identifying
plant species. Most people use several of these characteristics to identify a specific plant
[2, 3].

There are a number of methods for plant species identification. Use of pictures and
illustrations, identification keys in botanical books and floras and asking experts are
among others [4].The image-based identification of different species of plants that both
botanical scientists and expert users have collected has become a key study among plant
biology science [5].

Herbaria are also used for identifying plant species. One such herbarium is the National
Herbarium of Ethiopia. This herbarium is the sole Laboratory for researches to identify
and study the plant biodiversity of Ethiopia. The herbarium houses over 80,000 plant
specimens collected from all over Ethiopia and Eritrea(and also from Somalia, Kenya,
Tanzania, Uganda and other countries as well). The Herbarium, in addition to housing
well authenticated plant specimens in two rooms has copies of eight volumes (ten books)
of the Flora of Ethiopia and Eritrea [6].
9
Although there are well known methods of plant species identification in plant science,
the methods are tedious and exhausting which demand for an automatic species
identification method. Hence, this research is to address plant species identification
problem using image processing techniques.

1.2 Motivation
Plant species identification is a tedious task. There are huge numbers of specimen in the
National Herbarium of Ethiopia which will serve for plant species identification. The
books (the Flora of Ethiopia; ten books in eight volumes) are also the key reference for
plant species identification. However, Researchers spent considerable time to search and
identify specimen they brought from different places at the herbarium and referring to
the books to identify the plant is cumbersome and time taking. To greatly speed up the
process of plant species identification, there should be a solution that eases the researches
plant species identification burden and save their precious time. One such solution is the
use of leaf image based plant species identification.

Leaf image based plant species identification method will greatly simplify the
identification process by allowing a user to search through the species image data using
models that match images of newly collected specimens with images of those previously
discovered and described.

1.3 Statement of the Problem


The traditional approach of identifying plant species and their relationships is to train
taxonomists who can examine specimens and assign taxonomic labels to them. However,
there is a shortage of such skilled subject matter experts, as well as a limit on financial
resources. Furthermore, an expert on one species or family may be unfamiliar with
another. This has led to an increasing interest in automating the process of plant species
identification.
Botanists collect specimens of plants and preserve them in herbarium. Herbarium
collections can be seen as major, structured repositories of expert knowledge. For

10
example, there are huge numbers of specimen (over 80,000) collections at the National
herbarium of Ethiopia. But, these collections are not easily accessible as there are no
visual means of specimen identification.

Other significant sources of knowledge include flora, taxonomic keys and monographs.
Identifying the name of a plant using these sources of knowledge is time consuming and
requires an experienced taxonomist. Moreover, there are a number of plant species in a
given family that complicates identification of species. For example, there are twenty
two endemic plants of Ethiopia with the same family name Euphorbiaceae [7].

Therefore, there is a need to develop a robust automated species identification system


that would allow people with only limited botanical training and expertise to carry out
plant species identification.

1.4 Objective
General objectives
 The general objective of this research is to develop an automatic plant species
identification system using image processing techniques.

Specific objectives
The specific objectives of the research are to:

 Review relevant literature and related works to understand the domain;


 Study the physical characteristics that are used for plant species identification;
 Identify tools and technologies that will be used for designing and modeling
automatic plant species identification system;
 Design model for automatic plant specifies identification;
 Develop prototype; and
 Evaluate the performance of the developed model.

11
1.5 Methods
The following methods will be applied to achieve the objectives of this research work.
Literature Review
Different literature on image processing techniques and identification of plant species
will be reviewed. Articles, thesis and conference papers that are related to the research
topic will also be examined.

Data collection
The publically available Flavia [8] dataset will be used for modeling the automatic plant
species identification system. Attributes (local plant species name, scientific name, etc.)
of each plant species will also be captured as additional data for easy description of each
plant species.

Evaluation
The model designed for identification of plant species will be evaluated for its accuracy,
precision, recall, f-score and support [9].

1.6 Scope and limitation


The focus of this work is primarily on the modeling of the characteristics of plant leaf
shape for easy identification of plant species using Chebyshev Redial Moments, digital
morphological and texture features.

This work does not include:


 Use of color features for the identification of plant species.
 Use of other parts of a plant such as flowers, fruits, stems, roots, etc. for
identification plant species.
 Use of the whole image of a plant for identification of the plant species.
 Working with high-noise images, with complex backgrounds or with bad
resolution images.

12
1.7 Application of Results
Results of this method can generally be applied in areas where plant species studies being
conducted. The result of this research benefits taxonomists, botanists, agriculturalists,
environmentalists etc. Different institutions that can research on plant species are also
among the candidate beneficiaries of this study.

The application of such a method has advantage for the plant species identification as it
greatly simplifies the process of plant species identification. Some of the advantages are:
 It provides reliable, faster and convenient plant species identification;
 It reduces the time that the researches will have to spend in the field and
herbarium to identify and classify plant species; and
 It automatically identify and classify plant species.

1.8 Organization of the Rest of the Thesis


The rest of this thesis report is organized as follows. Chapter two discusses about plant
species identification, digital image processing and the related subject areas as literature
review. Chapter three is devoted to discuss related works done on plant species
identification and other topics related to plant species classification. Chapter four gives
a detailed description of the architecture and design issues of our system. The main
components of the system, their functional operation and the specific components are
discussed in this Chapter. Chapter 5 presents the implementation of the proposed system
architecture and experimental results. Chapter 6 concludes the thesis by recommending
some feature work. It also shows some research directions that can be used in the future
to improve automatic plant species identification.

13
: LITERATURE REVIEW

2.1 Introduction
This chapter is concerned with plant identification, images and image processing
techniques. The main goal of this research is to build a plant species identification model
that classifies leaves and assigns the names of the plant that they belong to, when
supplied only with the leaf image as input. This goal is met through a series of sequential
steps, namely, image preprocessing, segmentation, extraction of features, fusion of
features, feature selection, classification and identification of plant species. Several
researchers have contributed and proposed various algorithms in each of these steps.
This chapter presents a review of some important publications made in these steps to
understand the current research status.

2.2 Overview of plant species


Plants are living organisms belonging to the kingdom of vegetable that can live on earth.
There are about 3,000,000 plant species that has been name and classified [10]. Life on
earth depends on plants. Plants are responsible for the presence of oxygen, which is vital
for human beings. They are the base of the human food chain and humans directly or
indirectly take their food from plants. Plants are used to prevent soil erosion and they are
also used for providing building materials. They play a vital role in the field of
medicines, where more than one-quarter of all prescribed drugs come directly from
derivatives of plants [11].

14
A typical plant body consists of different parts as illustrated in Figure 2.1

Figure 2.1: Different parts of plant as illustrated in [12]

The plant body consists of two major organ systems, namely, shoot system and root
system. The shoot systems exist above the ground and include organs like buds, leaves,
fruits, flowers and seeds.

Leaf structure: Leaves of most plants have a flat structure called the lamina or blade,
but all leaves are not flat, some are cylindrical. Leaves can be simple, with a single leaf
blade, or compound, with several leaflets. Figure 2.2 and Figure 2.3 show simple and
compound leaves respectively [13].

15
Figure 2.2:Simple leaf parts Figure 2.3: Compound leaf

Edges (or margin) of leaf is structure of leaf at its boundary. The edges (or margin) of
leaves may be smooth, toothed, lobed, incised, or wavy. Figure 2.4 shows the different
leaf margins.

Figure 2.4: Leaf margin as illustrated in [13]

Arrangement of a leaf: it refers to how leaf grows on the stem. Some leaves grow
opposite, some alternate, some in rosette forms and others in whorls. The different leaf
arrangement are illustrated in Figure 2.5.

Figure 2.5: Arrangement of leaf as depicted in [13]

16
Venation is the imprinted veins in the leaf surface. Venation may be parallel,
dichotomous (forming a “Y”), palmate (emanating out from a central point), pinnate
(where the veins are arrayed from the midrib).

The texture of the leaf is another aspect to consider during plant leaf identification. Leaf
texture can be firm and waxy, shiny, thick, stiff, limp, etc. It is easy to identify whether
the leaf has sticky glands, prickly thorns, or fine hairs. The different leaf types based on
their form, shape venation, margins and arrangement of leaf on the stem are summarized
and shown in Figure 2.6 below [16].

Figure 2.6: Summary of Leaf variation as illustrated in [14]

Plant classification : is the placing of known plants into groups or categories to show
some relationship within the category. Scientific classification follows a system of rules
that standardizes the results, and groups successive categories into a hierarchy. In this
hierarchy, kingdom is the broadest and species is the most specific category. As an
example, olive classification is depicted in Table 2.1 [17].

17
Table 2.1: Classification of olive according to classical taxonomy

2.2.1 Plant Species Identification


Plant species identification is the determination of the identity of an unknown plant using
the morphological characteristics of the plant such as structure of stems, roots, leaves
and flowers followed by comparison with previously collected specimens or with the aid
of books or identification manuals. The identification process connects the specimen
with a published name. If a plant specimen is identified, its name and properties will be
known [17, 18].

Leaf shape is valuable for plant species identification. Plant organs such as flowers and
fruits are seasonal in nature and root and stem characteristics are inconsistent. On the
other hand, leaves are present for several months and they also contain taxonomic
identity of a plant which is useful for plant classification. Moreover, plant leaves are two
dimensional in nature while flowers and fruits are three dimensional and are not suitable
for machine processing compared to plant leaves. Hence, a plant’s leaf shape is the most
discriminative feature and classification of plants based on leaves is the quickest and
easiest way to identify a plant [10, 19].

18
2.2.2 Digital Morphological features
The digital morphological features generally include basic geometric features and
morphological features [20]. The features are computed from the contour of leaf. The
basic geometrical features of a leaf include longest diameter, physiological length,
physiological width, leaf area and leaf perimeter. The basic geometrical features of a leaf
are defined below.

 Longest Diameter: it is longest distance between any two points on the contours
of leaf. It is denoted as D.
 Physiological length: It is the distance between two terminals of a leaf. It is
denoted as LP.
 Physiological Width: It is the longest distance orthogonal to physiological
length. We consider two lines are orthogonal if their degree is 900. It is denoted
as WP.
 Leaf Area: Smoothed leaf image is considered to find out leaf area. Number of
pixels having binary value 1 is termed as leaf area. It is denoted as A.
 Leaf Perimeter: Leaf Perimeter is calculated by counting the number of pixels
consisting leaf margin. It is denoted as P.

Figure 2.7: Relationship between physiological width and height of plant leaf

Based on above five basic geometric features, it is possible to define the following
commonly used morphological features.

19
1. Smooth factor: It is defined as the ratio between area of leaf image smoothed by
5 × 5 rectangular averaging filter and area of leaf image smoothed by 2× 2
rectangular averaging filters.
2. Rectangularity: It is defined as the ratio between physiological length (PL),
physiological width (PW) and leaf area (A). Thus, (PL* PW)/A.
3. Aspect ratio: It is defined as the ratio between physiological length and
physiological width. Thus, LP/ WP.
4. Perimeter ratio of diameter: It is defined as the ratio between perimeter (P) and
diameter (D). Thus, P/D.
5. Form factor: It is defined as 4πA/P. where A is area of leaf and P is perimeter of
the leaf margin.
6. Narrow factor: It is defined as the ratio between diameter and physiological
length. Thus, D / LP.
7. Perimeter ratio of physiological length and physiological width: It is defined as
the ratio between perimeter and sum of physiological length and physiological
width. Thus, P / (PL + PW).

2.3 Digital Image Processing


Digital image processing is an interesting field that provides improved pictorial
information for human interpretation and processing of image data for storage,
transmission, and representation for machine perception. Image Processing is a
technique to enhance raw images received from cameras/sensors placed on satellites,
space probes and aircrafts or pictures taken in normal day-to-day life for various
applications [21].

20
A digital image is an array of real numbers represented by a finite number of bits. The
basic definition of image processing refers to processing of digital image, i.e. removing
the noise and any kind of irregularities present in an image. The noise or irregularity may
creep into the image either during its formation or during transformation etc. For
mathematical analysis, an image may be defined as a two dimensional function f(x, y)
where x and y are spatial (plane) coordinates, and the amplitude of f at any pair of
coordinates (x, y) is called the intensity or gray level of the image at that point. When x,
y, and the amplitude values of f are all finite, discrete quantities, we call the image a
digital image. It is very important that a digital image is composed of a finite number of
elements, each of which has a particular location and value. These elements are called
picture elements, image elements, and pixels [22].

There are different types of digital images [23]. These are binary, grayscale and RGB
images. Binary image is the simplest type of image and has two values, black and white
or ‘0’ and ‘1’. The binary image is referred to as a 1 bit image because it takes only one
binary digit to represent each pixel.

Grayscale image is a monochrome image or one-color image. It contains brightness


information only and no color information. Then grayscale data matrix values represent
intensities. The typical image contains 8 bit allows the image to represent (0-255)
different brightness (gray) levels.

RGB Image: RGB image does not use a color map and an image is represented by three
color component intensities such as red, green, and blue. RGB image uses 8-bit
monochrome standard and has 24 bit where 8 bit for each color (red, green and blue).

21
2.3.1. Image Acquisition
The leaf images can be acquired using a digital camera which is also embedded in our
cell-phone. There is no restriction on resolution and image format. Generally scanned
image or digital image which is two dimensional in nature and RGB format is used for
image processing. In some cases we can use binary and grayscale images as well.
However, the image background needs to be cleaned and single colored using image
segmentation [24].

2.3.2. Image Enhancement


Image enhancement is the modification of image by changing the pixel brightness values
to improve its visual impact. Images suffer from poor contrast and noise because of the
limitations of imaging sub systems and illumination conditions while capturing image.
Hence, it is necessary to enhance the contrast and remove the noise to increase image
quality. In image processing, image enhancement is the most important stage that
improves the quality (clarity) of images by removing blurring and noise, increasing
contrast, and revealing image details for human viewing or machine interpretation.
While image noise is a random effect that causes variation in image brightness or color
information, image contrast is the difference in visual properties making an object in
image distinguishable from other objects and the background. In general, image
enhancement techniques can be divided into three categories [25, 26].
 Spatial-domain methods that directly manipulate pixels in an image;
 Frequency-domain methods that operate on the Fourier transform or other
frequency domains of an image; and
 Combinational methods that process an image in both spatial and frequency
domains.
The commonly used image enhancement techniques includes point processing
operation, logarithmic transform, histogram equalization and smoothing filters [26].

22
Point Processing Operation: it is the most basic operations in image processing where
each pixel value is replaced with a new value obtained from the old one. Point processing
operations take the form shown in equation (2.1)
g(x, y) =T[f(x, y)] (2.1)

Where, f(x, y): input image, g(x, y): output image and T is the transformation function
or point processing operation.

Logarithmic Transforms: it is spatial domain image enhancement technique that maps


a narrow range of low gray levels into a wider range of gray levels. i.e., expand values
of bright pixels and compress values of dark pixels. More often, it is used to increase the
detail (or contrast) of lower intensity values and brighten the intensities of an image. If
k is the scaling factor, then logarithmic transformation is achieved using equation (2.2)

y =k*log (1+│x│) (2.2)

Histogram Equalization: it is a common technique for enhancing the appearance of


images. Suppose we have an image which is predominantly dark. Then its histogram
would be skewed towards the lower end of the grey scale and all the image detail is
compressed into the dark end of the histogram. If we could `stretch out' the grey levels
at the dark end to produce amore uniformly distributed histogram then the image would
become much clearer.

Smoothing filters: smoothing filters are used to reduce noise or prepare the image for
further processing such as segmentation. There are different type of smoothing filters
[27]. A simple mean smoothing filter or operation intends to replace each pixel value in
an input image by the mean (or average) value of its neighbors, including itself. This has
an effect of eliminating pixel values that are unrepresentative of their surroundings. This
filter is based around a kernel, which represents the shape and size of the neighborhood
to be sampled in calculation. Often a 3 x 3 square kernel is used, as shown in Figure 2.8,
although larger kernels (e.g., a 5 x 5 square) can be used for more severe smoothing.

23
Figure 2.8: An example of applying the 3x3 averaging kernel

Gaussian filter: this method of image smoothing convolves an input image by the
Gaussian filter. The Gaussian filter will screen noise with the high spatial frequencies
and produce a smoothing effect. In two dimensions, an isotropic (i.e., circularly
symmetric) Gaussian filter function is given by equation (2.3).
(𝑥2 +𝑦2 )
1 −
2𝑒
G(x, y) = 2σ2 (2.3)
√2𝜋𝜎

Median filter: it is another smoothing filter that is used to reduce noise in an image,
somehow like a mean filter. However, it performs better than a mean filter in the sense
of preserving useful details in the image. It is especially effective for removing impulse
noise, which is characterized by bright and or dark high-frequency features appearing
randomly over the image. Statistically, impulse noise falls well outside the peak of the
distribution of any given pixel neighborhood, so the median is well suited to learn where
impulse noise is not present, and hence to remove it by exclusion.

2.3.3. Image Segmentation


Image segmentation is the process of partitioning a digital image into multiple segments
to locate different objects and boundaries in the image content. Its goal is to partition an
image into multiple segments that are more meaningful to analyze. First, the digital
image is divided into two parts: background and foreground, where the foreground is the
interesting objects and the background is the rest of the image. All the pixels in the
foreground are similar with respect to a specific characteristic, such as intensity, color,
or texture. The result of image segmentation is a set of segments that collectively cover
the entire image. Image segmentation methods are classified into threshold based, region
based, cluster based and edge based image segmentation [28].

24
Threshold Based Image Segmentation: This method partition an input image into
pixels of two or more values through comparison of pixel values with the predefined
threshold value T individually [29]. It transforms an input image to segmented binary
image. The three commonly known threshold algorithms includes global thresholding,
Local thresholding and Adaptive thresholding [30].

Global thresholding: This threshold algorithm is applicable when the intensity


distribution of objects and background pixels are sufficiently distinct. In the global
threshold, a single threshold value is used in the whole image. If G(x, y) is a threshold
version of f(x, y) at some global threshold T, then the G(x,y) will be given as in
equation (2.4)
1 , 𝑖𝑓 𝑓(𝑥, 𝑦) ≥ 𝑇
G(x, y) ={ (2.4)
0, 𝑜𝑡ℎ𝑒𝑤𝑖𝑠𝑒
There are a number of global thresholding techniques. Otsu thresholding, optimal
thresholding, histogram analysis thresholding, iterative thresholding, maximum
correlation thresholding, clustering thresholding, Multispectral and
Multithresholding techniques are among the global tresholding techniques.

Local Thresholding: - A single threshold will not work well when we have uneven
illumination due to shadows or due to the direction of illumination. In local
thresholding, the idea is to partition the image into m x n sub images and then choose
a threshold.

Adaptive Thresholding:– In this method, different threshold values for different


local areas are used and typically takes a grayscale or color image as input and, in
the simplest implementation, outputs a binary image representing the segmentation.
For each pixel in the image, a threshold has to be calculated. If the pixel value is
below the threshold it is set to the background value, otherwise it assumes the
foreground value.

25
Region Based Image Segmentation: The region based segmentation is partitioning of
an image into similar/homogenous areas of connected pixels through the application of
homogeneity/similarity criteria among candidate sets of pixels. Each of the pixels in a
region is similar with respect to some characteristics or computed property such as color,
intensity and/or texture [29].

Cluster Based Image Segmentation: It is to partition an image data set into a number
of disjoint groups or clusters. That is, classifying pixels in an image into different clusters
that exhibit similar features. A cluster is therefore a collection of objects which are
“similar” between them and are “dissimilar” to the objects belonging to other clusters
[31].

Edge Based Image Segmentation: it is a fundamental tool for image segmentation.


Edge is the location of pixels in the image that correspond to the boundaries of the
objects seen in the image. It is assumed that since it is a boundary of a region or an object
then it is closed and that the number of objects of interest is equal to the number of
boundaries in an image. There are several edge detection techniques in the literature for
image segmentation and the most widely used techniques are Canny edge detection,
Sobel edge detection, Laplacian of Gaussian edge detection, Roberts edge detection and
Prewitt edge detection [32].

Canny Edge Detection: The Canny edge detection is a multi-step algorithm that can
detect edges with noise suppressed at the same time. The algorithm is not very
susceptible to noise and it is superior edge detector compared to many of the newer
edge detection algorithms.

Sobel Edge Detection: This method finds edges using the Sobel approximation to
the derivative. It precedes the edges at those points where the gradient is highest. The
Sobel technique performs a 2-D spatial gradient quantity on an image and so
highlights regions of high spatial frequency that correspond to edges. In general it is
used to find the estimated absolute gradient magnitude at each point in n input
grayscale image.
26
In conjecture at least the operator consists of a pair of 3x3 kernels as given in the
tables below. One kernel is simply the other rotated by 90o. This is very alike to the
Roberts Cross operator.

Figure 2.9: Sobel 3 x 3 masks in x and y direction

Laplacian of Gaussian (LoG) edge detection: The LoG of an image f(x, y) is a


second order derivative defined by equation (2.5)
𝜕2 𝑓 𝜕2 𝑓
∇2 = + (2.5)
𝑥2 𝑦2

It has two effects, it smooth the image and it computes the Laplacian, which yields
a double edge image. Locating edges then consists of finding the zero crossings
between the double edges. It is generally used to find whether a pixel is on the dark
or light side of an edge. The digital implementation of the Laplacian function is
usually made through the mask below.

Figure 2.10: Laplacian 3 x 3 masks in x and y direction

Roberts Edge Detection: The Roberts edge detection is introduced by Lawrence


Roberts (1965). It performs a simple, quick to compute, 2-D spatial gradient
measurement on an image. This method emphasizes regions of high spatial
frequency which often correspond to edges. The input to the operator is a grayscale
image the same as to the output is the most common usage for this technique.

27
Pixel values in every point in the output represent the estimated complete magnitude
of the spatial gradient of the input image at that point.

Figure 2.11: Roberts 2 x 2 masks in x and y direction

Prewitt Edge Detection: The Prewitt edge detection is proposed by Prewitt in 1970.
To estimate the magnitude and orientation of an edge Prewitt is a correct way. Even
though different gradient edge detection wants a quiet time consuming calculation
to estimate the direction from the magnitudes in the x and y-directions, the compass
edge detection obtains the direction directly from the kernel with the highest
response. It is limited to 8 possible directions; however knowledge shows that most
direct direction estimates are not much more perfect. This gradient based edge
detector is estimated in the 3x3 neighborhood for eight directions. All the eight
convolution masks are calculated. One complication mask is then selected, namely
with the purpose of the largest module.

Figure 2.12: Prewitt 3 x 3 masks in x and y direction

Prewitt detection is slightly simpler to implement computationally than the Sobel


detection, but it tends to produce somewhat noisier results.

28
2.2. Feature Extraction
When the pre-processing and the desired level of segmentation have been achieved,
feature extraction technique is applied to the segments to obtain image features. Image
features are those items which uniquely describe an image, such as size, shape,
composition, location etc. Quantitative measurements of object features allow
classification and description of the image [21].

Feature extraction refers to taking measurements, geometric or otherwise, of possibly


segmented, meaningful regions in the image. Features are described by a set of numbers
that characterize some property of the plant or the plant’s organs captured in the images.
Shape is one of the most important image features of recognizing objects by human
perception. Humans generally describe objects either by giving examples or by sketching
the shape. In computer vision, shape is the most commonly used feature for
characterizing objects [33].

An image descriptor is applied globally and extracts a single feature vector. Feature
descriptors on the other hand describe local, small regions of an image. It is possible to
get multiple feature vectors from an image with feature descriptors. A feature vector is
a list of numbers used to abstractly quantify and represent the image. Feature vectors can
be used for machine learning, building an image search engine, etc. [34]

There are a number of feature extraction techniques available for the extraction of image
features. It is essential to focus on the feature extraction phase as it has an observable
impact on the efficiency of the recognition system. The most commonly used feature
extraction techniques are discussed below.

2.3.1. Moments
Moments are scalar quantities that are used in a variety of applications to describe shape
of an object for recognition of different types of images. Image moments that are
invariant with respect to the transformations of scale, translation, and rotation find
applications in areas such as pattern recognition, object identification and template
matching [35, 36].
29
There are several moment based descriptors available in the literature and of these the
well-known are presented below.

A general definition of moment functions Φpq of order (p+q), of an image intensity


function f(x, y) is given as follows [20]:

Φpq = ∬𝑥𝑦 𝛹𝑝𝑞 (𝑥, 𝑦)𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦, p, q=0, 1, 2, 3.... (2.6)

where Ψpq is a continuous function of (x, y) known as the moment weighting kernel or
the basis set.
Geometric moments: Geometrical moment of order (p+ q) for a two-dimensional
discrete function like image is computed by using equation (2.7). If the image has
nonzero values in the finite part of xy plane; then moments of all orders exist for it [37].

mpq= 𝑁−1 𝑀−1

∑ ∑ 𝑥 𝑝 𝑦 𝑞 𝑓(𝑥, 𝑦) (2.7)
𝑥=0 𝑦=0

where f(x, y) is image function and M, N are image dimensions. Then, by using (2.8)
geometrical central moments of order equal to (p+ q) can be computed as under:

𝑁−1 𝑀−1
(2.8)
mpq= ∑ ∑ (𝑥 − 𝑥)𝑝 (𝑦 − 𝑦)𝑞 𝑓(𝑥, 𝑦)
𝑥=0 𝑦=0
where xand y are center of gravity of image and are calculated using equation (2.9)

Actually by image translation to coordinate origin while computing central moments,


they become translation invariant.

𝑚10 𝑚01
𝑥 = , 𝑦 = (2.9)
𝑚00 𝑚00

For binary image, m00 = μ00 is count of foreground pixels and has direct relation to
image scale. Therefore, central moments can become scale normalized using (2.10).
𝜇𝑝𝑞 𝑝+𝑞
𝜂𝑝𝑞 = 𝑎 ,a= + 1 (2.10)
𝑚00 2

30
Hu Invariant Moments: Based on normalized central moments, Hu (1962) introduced
seven nonlinear functions which are invariant with respect to object's translation, scale,
and rotation. Hu defines the following seven functions, computed form central moments
through order three, that are invariant with respect to object scale, translation and
rotation [38]:

ϕ1 =η20 + η 02
ϕ2 = (η 20-η 02)2+ 4η 112
ϕ3 = (η 30-3η 12) + (3η 21-η 03)2
ϕ4 = (η 30+η 12)2 + (η 21+η 03)2
ϕ5 = (η 30-3η 12)(η 30+η 12)[(η 30+η 12)2- 3(η 21+η 03)2] (2.11)
+(3η12-η03)(η 21+η 03)[3(η 30+η 12)2 - (η 21+η 03)2]
ϕ6 = (η 20-η 02)[(η 30+η12)2- (η 21+η03)2] + 4η 11(η 30+η 12)(η 21+η 03)
ϕ7 = (3η 21-η03)(η 30+η 12)[(η 30+η 12)2 - 3(η 21+η 03)2]
+ 3(η21–η03)(η 21+η 03)[3(η 30+ μ12)2- (η 21+ η 03)2]

Zernike Moments: Zernike moments are obtained from a transformed unit disk space
that allows for the extraction of shape descriptors which are invariant to rotation,
translation, and scale as well as skew and stretch, thus preserving more shape
information for the feature extraction process [39].

Zernike moment of order n and repetition m is defined as for a continuous image function
f(x, y) are as under:

𝑛+1 2𝜋 1
𝑉𝑛𝑚 (𝑥, 𝑦) =
𝜋
∫0 ∫0 𝑓(𝜌, 𝜃)𝑅𝑛𝑚 (𝜌)𝑒 (−𝑗𝑚𝜃) 𝜌𝑑𝜌𝑑𝜃 (2.12)

In the xy image plane,


𝑛+1
𝑉𝑛𝑚 (𝑥, 𝑦) =
𝜋
∬ 𝑓(𝑥, 𝑦)𝑉 ∗ (𝜌, 𝜃)𝑑𝑥𝑑𝑦; 𝑥 2 + 𝑦 2 ≤ 1
The real valued radial polynomial Rnm is defined as
𝑛−|𝑚|
(𝑛−𝑠)!
Rnm(𝜌) = ∑𝑠=0
2
(−1𝑠 ) 𝑛−|𝑚| 𝑛+|𝑚|
𝑠!( −𝑠)!( −𝑠)!
2 2

Using this equation Zernike moments from order 2 to 10 for extracting features can be
computed.
31
Radial Chebyshev moment (RCM): It [40] is a discrete orthogonal moment that has
distinctive advantages over other moments for feature extraction. Unlike invariant
moments, its orthogonal basis leads to having minimum information redundancy, and its
discrete characteristics explore some benefits over Zernike moments (ZM) due to having
no numerical errors and no computational complexity owing to normalization. The radial
Chebyshev moment of order p and repetition q for an image of size N x N with m = (N/2)
+ 1 is defined in equation (2.13)

1 2𝜋
Spq = 2𝜋𝜌(𝑝,𝑚) ∑𝑚−1
𝑟=0 ∑𝜃=0 𝑡𝑝 (𝑟)𝑒
−𝑗𝑞𝜃
𝑓(𝑟, 𝜃) (2.13)

where tp(r) is an orthogonal basis Chebyshev polynomial function for an image of size
NxN:
t0(x) = 1
2x−N+1
t1(x) = N
(p−1 )2
(2p−1)t1 (x)tp−1 (x)−(p−1){1− }tp−2 (x)
N2
tp(x) = P
ρ(p,N) is the squared – norm:
1 22 2𝑝
𝑁(1− 2 )(1− 2 )(1− 2 )
𝑁 𝑁 𝑁
ρ(p,N) = 2𝑝+1

p= 0, 1, N -1, m = (N/2) + 1

2.3.2.Texture Features
Textures are characteristic intensity variations that typically originate from roughness of
object surfaces that are capable of representing the content of many real-world images.
In leaf based plant species identification, texture features capture the internal vein
structure of leaf image. Texture features can be extracted by using various methods.
Gray-level co-occurrence matrices (GLCMs), Gabor Filter, Histogram of Oriented
Gradients and Local binary pattern (LBP) are examples of popular methods to extract
texture features [41].

32
Grey-level co-occurrence matrices (GLCM) Texture: Grey-level co-occurrence
matrices (GLCM) have been successfully used for deriving texture measures from
images. This technique uses a spatial co-occurrence matrix that computes the
relationships of pixel values and uses these values to compute the second-order statistics.
The GLCM approach assumes that the texture information in an image is constrained in
the overall or “average” spatial relationships between pixels of different grey level.

For texture feature extraction, the following four measures are commonly used from the
gray level co-occurrence matrix of the gray scaled image: contrast, correlation, energy
and homogeneity [42].

Energy: one approach to generating texture features is to use local kernels to detect
various types of texture. After the convolution with the specified kernel, the texture
energy measure (TEM) is computed by summing the absolute values in a local
neighborhood:
𝑚 𝑛

Le = ∑ | ∑ |𝐶(𝑖, 𝑗)| (2.14)


𝑖=1 𝑗=1

If n kernels are applied, the result is an n-dimensional feature vector at each pixel of the
image being analyzed.

Homogeneity: homogeneous image will result in a co-occurrence matrix with a


combination of high and low P [i,j]’s.

𝑃𝑑 [𝑖, 𝑗]
Ch = ∑ | ∑ 1 + |𝑖 + 𝑗| (2.15)
𝑖 𝑗
Where the range of gray levels is small the P[i,j] will tend to be clustered around the
main diagonal. A heterogeneous image will result in an even spread of P[i,j]’s.

Contrast: contrast is a measure of the local variations present in an image.

C(k, n) = ∑ | ∑(𝑖 − 𝑗)𝑘 𝑃𝑑 [i, j]𝑛 (2.16)


𝑖 𝑗

33
If there is a large amount of variation in an image the P[i,j]’s will be concentrated away
from the main diagonal and contrast will be high (typically k=2, n=1).

Correlation: correlation is a measure of image linearity

∑𝑖 | ∑𝑗[𝑖𝑗𝑃𝑑 [𝑖, 𝑗]] − 𝜇𝑖 𝜇𝑗


𝐶𝑐 = (2.17)
𝜎𝑖 𝜎𝑗
Where,
𝜇𝑖 = ∑ 𝑖𝑃𝑑 [𝑖, 𝑗]
𝑖

𝜇𝑗 = ∑ 𝑗𝑃𝑑 [𝑖, 𝑗]
𝑗

𝜎𝑖 2 = ∑ 𝑖 2 𝑃𝑑 [𝑖, 𝑗] − 𝜇𝑖 2
𝑖

𝜎𝑗 = ∑ 𝑗 2 𝑃𝑑 [𝑖, 𝑗] − 𝜇𝑗 2
2

𝑗
Gabor Filter: The term Gabor filter has been coined after the name of Dennis Gabor,
who in the year 1946 experimented and subsequently proposed the representation of the
signals. In image processing tasks, Gabor filters have been extensively been used for
feature extraction for the digital leaf images due to their nature of spatial locality,
orientation selectivity and frequency characteristic. The frequency and orientation
representation as used in Gabor filters, are useful for texture representation and
discrimination and the same concept is used in human visual system. The Gabor features
are invariant to illumination, rotation, scale and translation. The Gabor filters have
several advantages in feature extraction process over other techniques such as Gray
Level Co-occurrence Matrix (GLCM). The Gabor feature vectors can be used directly
as input to a classifier. A two-dimensional Gabor function g(x, y) consists of a sinusoidal
plane wave of some frequency and orientation (carrier), modulated by a two dimensional
translated Gaussian envelope [43, 44, 45]. The Gabor filter is defined as in equation
(2.18)

34
𝑥 ,2 + γ2 𝑦 ,2 𝑥,
𝑔 (x, y; λ, θ, ψ, σ ,γ) = exp(− )exp(𝑖(2𝜋 + ψ)) (2.18)
2σ2 λ

Where, 𝑥 , = x cos θ + y sin θ

𝑦 , = -x sin θ + y cos θ
λ: represents the wavelength of the sinusoidal factor,
θ: represents the orientation of the normal to the parallel stripes of a Gabor
function,
ψ : phase offset,
σ: Sigma or standard deviation of the Gaussian envelope ,
γ: spatial aspect ratio

The Gabor transform of an image R(x,y) is defined as the convolution of a Gabor filter
𝑔 (x, y) with image I(x,y)
R(x, y) = 𝑔 (x, y) * I(x, y) = ∑𝑀−1 𝑁−1
𝑚=0 ∑𝑛=0 𝑔(𝑚, 𝑛).I(x-m, y-n) (2.19)
where * denotes two-dimensional linear convolution and M and N are the size of the
Gabor filter mask.

The following six texture features are extracted using Gabor filter:
a) Mean : it is defined as given in equation (2.20) below :
1
μ(𝑠,𝜃) = ∑𝑁 𝑀
𝑋=1 ∑𝑦=1 𝐺(𝑠,𝜃)(𝑥,𝑦) (2.20)
𝑁𝑀
b) Energy: The local energy of the filtered image E(x, y) is obtained by computing the
absolute average deviation of the transformed values of the filtered images from the
mean µ within a window W of size MxMy. The texture energy E(x, y) is given by
the equation (2.21):
1
E(x, y) = ∑(𝑎,𝑏 ∈𝑤) |𝑅(𝑎, 𝑏) − μ| (2.21)
𝑀
c) Standard Deviation: The standard deviation is computed as given in Equation
(2.22).

𝟏
𝝈(𝒔,𝜽) = √ (∑𝑵
𝒙=𝟏 ∑𝒚 𝑮(𝒔,𝜽)(𝒙,𝒚) − μ(𝑠,𝜃) )
𝑴 𝟐 (2.22)
𝑵𝑴

35
d) Skewness : It is the measure of asymmetry and is denoted by γ , it can be positive
which means that the distribution tends towards right and if it is negative when the
distribution tends towards left and is represented by Equation (2.23)
3
𝜇(𝒔,𝜽)
𝛾(𝒔,𝜽) = 3 (2.23)
𝜎(𝒔,𝜽)

e) Kurtosis : It defines the degree of peakiness in the dataset and it is provided in


Equation (2.24)
4
𝜇(𝒔,𝜽)
𝑘(𝒔,𝜽) = 4 (2.24)
𝜎(𝒔,𝜽)

f) Contrast : It is given by Equation (2.25)


𝜇(𝒔,𝜽)
𝜓(𝒔,𝜽) = 0.25 (2.25)
𝑘(𝒔,𝜽)

Histogram of Oriented Gradients (HOG): Histogram of oriented gradients, divides an


image into squares, cells and blocks. Then, the gradient and its directions of pixels in
cells are computed. Later, a histogram is created. Each bin of this histogram contains the
number of pixels in equal direction. The histogram of each block is formed by
accumulating the histograms of its cells. Finally, all histograms are concatenated to form
a HOG descriptor [41].

For example, if we consider grayscale image f(x,y) of size 512x512 with horizontal
kernel DX= [-1 0 1] and vertical kernel DY =[-1 0 1]T, Thus, the horizontal and vertical
gradients are given by Equation (2.26) and (2.27) respectively.

∇fX(x,y) = f(x,y)*DX (2.26)


∇fY(x,y) = f(x,y)*DY (2.27)
Where * denotes the convolution operator, ∇fX(x,y) and ∇fY(x,y) are horizontal and
vertical gradients. The magnitude and orientation of the gradients are given by Equation
(2.28) and (2.29) respectively.

G =√∇fX(x, y)2 + ∇fY(x, y)2 (2.28)


∇fX(x,y)
θ =tan−1 ( ∇fX(x,y)) (2.29)

36
2.3. Feature Fusion
The aim of the feature fusion technique is to combine many independent (or
approximately independent) features to give a more representative features for leaf
images that increase the accuracy of identification. The features are combined by
concatenating into one feature set.

Feature fusion technique has two problems. The first problem is the compatibility of
different features; i.e. the features may be in different ranges of numbers. Thus, the
features must be normalized to same range of numbers. There are different normalization
methods such as ZScore, Min-Max, and Decimal Scaling. Zscore normalization method is
the most common and simplest method and it is used to map the input scores to
𝑓𝑖 −µ𝑖
distribution with mean of zero and standard deviation of one as follows, xi = , where
𝜎𝑖

fi represents the ith feature vector, µi and σi are the mean and standard deviation of the ith
vector, respectively, xi is the ith normalized feature vector . The second problem of the
feature fusion technique is a high dimensionality, which may lead to high computation
time and needing more storage space. Thus, feature selection technique such as Linear
Discriminant Analysis or Principal Component Analysis are used to reduce the
dimensionality of the combined feature set [46].

2.4. Dimensionality Reduction


Usage of combined feature set increases the accuracy of identification, but on the other
hand it suffers from the curse of high dimensionality. Feature selection, a process of
removing irrelevant and redundant features overcome this problem. Principal
Component Analysis (PCA) and linear discriminant analysis (LDA) are two popular
methods which have been widely used in many classification applications for reducing
the dimensionality of feature set. PCA is unsupervised while LDA is supervised and can
achieve better classification results due to the utilization of label information. PCA
preserving as much of the variance in the high dimensional space as possible while LDA
preserving as much of the class discriminatory information as possible [47].

37
LDA takes full consideration of the class labels for patterns. It is generally believed that
the label information can make the recognition algorithm more discriminative. LDA
projects the original data into an optimal subspace by a linear transformation. The
transformation matrix consists of the eigenvectors whose corresponding eigenvalues can
maximize the ratio of the trace of the between-class scatter to the trace of the within-
class scatter [48].

Let SB and SW denote the between-class and the within-class scatter matrices, which are
defined in Equation (2.30) and (2.31) respectively.
c

SB = ∑ ni (mi − m)((mi − m)T ) (2.30)


i=1
C ni
j j
SW = ∑(∑(Xi − mi )(Xi − mi )T ) (2.31)
i=1 j=1

Where, mi denotes the class mean and m is the global mean of the entire sample. The
j
number of vectors in class Xi is denoted by ni .

LDA looks for a linear subspace W (C - 1components) within which the projections of
the different classes are best separated by maximizing discriminant criteria defined by
Equation (2.32)
tr{W T SB W } (2.32)
J(W) = max
tr{W T SW W }
Along with the orthogonal constraint of W, this can be solved as a generalized
eigenvector and eigenvalue problem stated in Equation (2.33)
(2.33)
SB Wi = λi Sw Wi
Where, where Wi and λi are the i -th generalized eigenvector and eigenvalue of SB with
regard to Sw . The LDA solution, i.e., W, contains all the C-1 eigenvectors with non-
zero eigenvalues (SB has a maximal rank of C-1).

38
2.5. Image Classification
The primary objective of image classification is to detect, identify and classify the
features occurring in an image in terms of the type of class these features represent on
the field. Image Classification can be broadly divided into supervised and unsupervised.
The most common classification methods used recently in plant recognition systems are
revised below [49, 50].
2.3.1. Supervised Image Classification
A supervised classification problem falls under the category of learning from instances
where each instance or pattern or example is associated with a label or class.
Conventionally an individual classifier like Neural Network, Decision Tree, or a Support
Vector Machine is trained on a labeled data set. Depending on the distribution of the
patterns, it is possible that not all the patterns are learned well by an individual classifier.
The most common classification methods used recently in plant recognition systems are
presented below:

K-Nearest Neighbor Classifier: This classifier calculates the minimum distance of a


given point with other points to determine its class. Suppose we have some training
objects whose attribute vectors are given and some unknown object w is to be
categorized. Now we should decide to which class object w belongs. Let us take an
example. According to the k-NN rule suppose we first select k = 5 neighbors of w.
Because three of these five neighbors belong to class 2 and two of them to class 3, the
object w should belong to class 2, according to the k-NN rule. It is intuitive that the k-
NN rule doesn't take the fact that different neighbors may give different evidences into
consideration. Actually, it is reasonable to assume that objects which are close together
(according to some appropriate metric) will belong to the same category. According to
the k-NN rule suppose we first select k = 5 neighbors of w. Because three of these five
neighbors belong to class 2 and two of them to class 3, the object w should belong to
class 2, according to the k-NN rule.

39
For plant leaf classification, we first find out feature vector of test sample and then
calculate Euclidean distance between test sample and training sample. This way it finds
out similarity measures and accordingly finds out class for test sample. The k-nearest
neighbor’s algorithm is amongst the simplest of all machine learning algorithms. An
object is classified by a majority vote of its neighbors, with the object being assigned
to the class most common amongst its k nearest neighbors. k is a positive integer,
typically small. If k = 1, then the object is simply assigned to the class of its nearest
neighbor. In binary (two class) classification problems, it is helpful to choose k to be
an odd number as this avoids tied votes. It is intuitive that the k-NN rule doesn't take
the fact that different neighbors may give different evidences into consideration.
Actually, it is reasonable to assume that objects which are close together (according to
some appropriate metric) will belong to the same category.

Probabilistic Neural Network: Probabilistic neural networks can be used for


classification problems. It has parallel distributed processor that has a natural tendency
for storing experiential knowledge. PNN is derived from Radial Basis Function (RBF)
Network. PNN basically works with 3 layers. First layer is input layer. The input layer
accepts an input vector. When an input is presented, first layer computes distances from
the input vector to the training input vectors and produces a vector whose elements
indicate how close the input is to a training input. The second layer sums these
contributions for each class of inputs to produce as its net output a vector of
probabilities. Radial Basis Layer evaluates vector distances between input vector and
row weight vectors in weight matrix. These distances are scaled by Radial Basis
Function nonlinearly. The last layer i.e. competitive layer in PNN structure produces a
classification decision, in which a class with maximum probabilities will be assigned
by 1 and other classes will be assigned by 0.A key benefit of neural networks is that a
model of the system can be built from the available data.

40
Support Vector Machine: It is a discriminative classifier which formally defined by a
separating hyperplane. In other words, given labeled supervised learning (training data),
the algorithm outputs of an optimal hyperplane which categorizes a new examples. A
Support Vector Machine training algorithm creates a model which assigns new
examples into one category or the other. The idea behind the method is to non-linearly
map the input data to some high dimensional space, where the data can be linearly
separated, thus providing great classification performance. For plant leaf classification
it will transform feature vector extracted from leaf’s shape. SVM finds the optimal
separating hyperplane by maximizing the margin between the classes. Data vectors
nearest to the constructed line in the transformed space are called the support vectors.
The SVM estimates a function for classifying data into two classes. Using a nonlinear
transformation that depends on a regularization parameter, the input vectors are placed
into a high-dimensional feature space, where a linear separation is employed. Inner
product (x,y) is supplanted to construct a non-linear product by kernel function K (x,y)
which can be described by Equation (2.34):

𝑛
𝑓(𝑥) = 𝑆𝑔𝑛 (∑ 𝛼𝑖 𝛾𝑖 𝑘 (𝑥𝑖 𝑥) + 𝑏) (2.34)
𝑖=1

Where f indicates the member of x. From given kernel set, the basis k(xi,x) where
i=1,2,....N is selected using the first layer of the SVM and linear function in the space
is created by second layer. SVM is independent of dimensionality of the input space. It
has simple geometric construction and it generates spare solution. Classification is
performed by support vectors n large number which is obtained by training set.

Deep Neural Network: In Deep Neural Network [60] classification initially random
weights are used for training set. Main advantage of Deep Neural Network is that it is
capable to reduce classification error while performing for huge datasets. Conventional
neural network requires more time for training whereas deep neural network considers
input for training based on their weights. It is feed-forward neural network which

41
contains more hidden layer. Hidden layers are used to map the input features. A
conventional mapping function is used in this work.
1
𝑂= (2.35)
1+𝑒 −(𝑏+𝑓𝑤)

Input features denoted by f, weights denoted by w, b denotes biasing and output is


denoted as O. Complex relation between input and output are modeled with the help of
this mapping function. This network can be trained by using back-propagation
derivatives which gives similarity between input and output for each training set.
Deep Neural Network pre-training can be performed by using discriminative method and
supervised pre-training approach.

2.3.2.Unsupervised Classification

In unsupervised classification, we have only input data (X) and no corresponding output
variables. The goal for unsupervised classification is to model the underlying structure
or distribution in the data in order to learn more about the data. These are called
unsupervised learning because unlike supervised learning or classification above there
is no correct answers and there is no teacher. Algorithms are left to their own devises to
discover and present the interesting structure in the data. Unsupervised learning
problems can be further grouped into clustering and association problems [51].

2.4 Plant species identification model and techniques


Plant species identification model and techniques follow a known sequence of steps.
That is, image acquisition, image preprocessing, feature extraction, classification, and
labeling output as depicted in Figure 2.13

Figure 2.13: Block diagram of classification system as illustrated in [10]

42
: RELATED WORK

3.1 Introduction
In this Chapter, related works of different researchers in the area of plant species
identification using image processing techniques are reviewed. In recent years, the
techniques of plant species identification based on leaf features have achieved great
progress. Morphological features, leaf shape moments, morphological features
combined with other leaf shape feature descriptors and other miscellaneous plant leaf
shape feature descriptors have been used along with different classification models for
plant species identification. In order to describe each research clearly we divided the
chapter in to three sections based on the features used for classification and recognition
in each area of related works.

3.2 Classification based on morphological feature


Qingmao Zeng et al. [52] used Periodic Wavelet Descriptor (PWD) as shape descriptor
for plant leaf shape. This shape descriptor is a global morphological feature of a leaf
shape which is invariant to translation, scaling, rotation and used to retrieve shape under
multiresolution. The authors select the leaves of six kinds of different apocynaceae
plants as experimental objects and Back Propagation Neural Network (BPNN) classifier
is trained to fulfill the experiment of plant species identification. Accuracy of the method
is evaluated and correct identification rate of about 90% is reported.

Adil Salman et al. [53] preprocessed leaf shape image, traced the boundary of the leaf
using Canny Edge Operator and extracted fifteen features (Area, Convex Area, Filled
Area, Perimeter, Eccentricity, Solidity, Orientation etc.) from the binary image and used
Support Vector Machine Classifier to classify 22 plant species of the Flavia dataset.
Overall classification accuracy of 85%-87% is reported.

43
3.3 Classification base on moments
Marko Lukic et al. [54] used invariant Hu moments as leaf discriminative features but it
is indicated that Hu moments are inadequate for some cases when leaves from different
species have very similar shape. In order to overcome this, the authors used uniform
local binary pattern histogram parameters (mean, standard deviation, energy and
entropy) as discriminative features descriptor. Support vector machine is used as a
classifier and it is tuned using hierarchical grid search for leaf recognition which issued
for plant classification. The algorithm is tested using Flavia dataset and accuracy of
94.13 % is reported.

Zalikha et al. [55] compared the effectiveness of Zernike Moment Invariant (ZMI),
Legendre Moment Invariant (LMI), and Tchebichef Moment Invariant (TMI) as features
descriptor of leaf images. A Generalized Regression Neural Network (GRNN) is used
for classification of 130 pre-processed leaf images of four plant family. A 100%
classification accuracy is reported but the authors recommended the need for further
study as the number of leaf images used for the study are small and the result may be
due to over fitting of the GRNN. However, it is ascertained that TMI is better feature
descriptor with classification results showing features from the TMI are the most
effective.

Celebi and Aslandogan [56] also studied and compared three moment-based descriptors:
Invariant moments such as Hu moments, Zernike moments, and radial Chebyshev
(a.k.a.Tchefichef) moments. Invariant moments suffer from a high degree of information
redundancy, sensitivity to noise and numerical instability. The authors experimentally
proved that radial Chebyshev moments have the highest retrieval performance compared
to the other two shape descriptors.

44
Isnanto et al. [57] developed herbal plants identification system based on the shape of
the herbal plants’ leaves. Hu’s seven moments invariant feature extraction method is
used. Euclidean and Canberra distance similarity measures are used as a recognition
method and the results of both methods are analyzed. The highest and lowest accuracy
result achieved using Euclidean distance similarity measure is 86.67% and 40%
respectively while it is 72% and 20% using Canberra similarity measure.

3.4 Classification base on texture feature


Gunjan et al.[58] used the gray level co-occurrence matrix (GLCM) for extraction of
texture features(Absolute Value, Contrast, Contrast-Inertia, Correlation, Energy,
Entropy, Haralick Correlation, Homogeneity, Sum average, Sum entropy ,etc) of two
popular Indian medicinal plants leave, namely, Neem and Tulsi. Back propagation multi-
layer perceptron (BP-MLP) neural network classifier is used for classification 30 Neem
and 30 Tulsi leaves image. Classification accuracy of 80% is reported using
preprocessed combined GLCM texture features. It is indicated that use of preprocessed
combined GLCM features can provide higher classification rate compared to raw single
GLCM features.

Patil and Bhagat [59] used combination of Gabor and Gray-Level Co-Occurrence Matrix
(GLCM) texture features to recognize leaf shape and Decision tree as a classifier. The
authors also used Principal component analysis (PCA) to reduce the dimensionality of
the extracted features to increase the discriminative power of the Decision tree classifier.
The highest classification accuracy reported is 96% on Swidish Leaf Dataset of 10 plant
species.

3.5 Classification base on fused feature


Rao and Kulkarni [60] used hybrid approach for feature extraction by combining
morphological, shape and SIFT feature. The authors also used auto-regressive model to
enhance image quality prior to feature extraction and applied Deep Neural Network for
classification performance evaluation. The proposed hybrid model of leaf based plant
classification is tested using publically available dataset of 1800 common plant leaves

45
of china that are collected by Wu et.al [61] and classification accuracy of 95.38% is
reported that shows the robustness of their approach compared to other methods.

Leaf growth, image translation, rotation and scaling independent morphological features
along with Zernike moments are used by Harish et.al [10] to identify and classify plant
leaves. The features are then fed as an input to four classifiers (Naive Bayes, k-NN,
SVM, and PNN). It is reported that Naive Bayes and k-NN are lazy learners and less
accurate compared to SVM and PNN.

AlaaTharwat et al. [62] used feature fusion technique to combine color, shape, and
texture features of colored images of leaf. Color moments, invariant moments, and Scale
Invariant Feature Transform (SIFT) are used to extract the color, shape, and texture
features, respectively. Linear Discriminant Analysis (LDA) is used to reduce the number
of features and Bagging ensemble is used as a classifier. The proposed approach was
tested using Flavia dataset which consists of 1907 colored images of leaves. The feature
fusion method achieved accuracy (70%) better than all single feature extraction methods.

Du and Zhai [63] proposed plant species identification based on multi feature radial basis
probabilistic neural networks ensemble classifier (RBPNNE). The RBPNNE consists of
several different independent neural networks trained by different feature domains
extracted using texture extraction techniques such as autocorrelation, edge frequency,
wavelet transform and discrete cosine transform of the original plant leaf images. In
addition, Hu invariant moments is also used for feature extraction. The final
classification results represent a combined response of the individual networks. For
testing purpose, the authors used their own dataset of 1100 images of 50 species of
different plants leaf. The experimental results show that the RBPNNC achieves higher
recognition accuracy (79.24 %.) and better classification efficiency than single feature
domain.

46
3.6 Summary
Several leaf based plant classification methods have been proposed to address plant
identification problem. Plant leaf shape descriptors such as Hu, Zernike, and Chebyshev
moments are extensively used with different learning algorithm for plant species
identification. Digital morphological features combined with other feature descriptors
are also used for plant species identification. However, most methods are inaccurate and
the dataset size used for experiment is limited. In some cases, the classification results
achieved are high but the sample size is too small [54, 55, 59]. The computation of
Zernike moments involve discrete approximation of its continuous integral term which
result in loss of information. Hu moments are inadequate for some cases when leaves
from different species have very similar shape [59]. Radial Chebyshev moment has
distinctive advantages due to its discrete characteristics over other moments for feature
extraction. Unlike invariant moments, its orthogonal basis leads to having minimum
information redundancy. It has also no numerical errors and computational complexity
owing to normalization [40]. Therefore, in this work, an improved method based on
fusion of morphological features and Radial Chebyshev moments of leaf shape and
Gabor filter of leaf texture with a class of ensemble method called Radom Forest is
proposed for plant species classification and identification.

47
: DESIGNOF PLANT SPECIES
IDENTIFICATION

4.1 Introduction
In recent years there has been an increased interest in applying image processing
techniques to the problem of automatic plant species identification. There are ample
opportunities to improve plant species identification through the designing of a
convenient automatic plant species identification system. Many different approaches are
used to classify plant species to its predefined classes using the features of plants leaf.
In this work, we will explain the design and system architecture of the proposed work.
The different processes such as plant leaf preprocessing, segmentation, feature
extraction, model training and classification that are used in the system architecture are
also explained in detail.

4.2 System Architecture


The architecture of the proposed work is depicted in the figure 4.1. Its major components
includes plant preprocessing, leaf image segmentation, feature extraction, learning and
plant species classification.

The proposed approach consists of two phases, namely, training phase and identification
phase. In the training phase, leaf image of plant species (i.e. training images) are
preprocessed, features are extracted from each leaf image and fused to feature set. Lastly,
important features are selected from the feature set and provided as input to the learning
model. In the identification phase, same procedure will be followed as the training phase
except that at later stage, instead of using the plant leaf image features for training, it
will use for classification. In the identification phase, the extracted features will be used
for identification of the plant leaf using the knowledge base.

48
Figure 4.1: Architecture of Automatic Plant Species Identification

4.3 Leaf image preprocessing


The main goal of preprocessing is to identify the leaf in an image and discard all other
information other than the leaf shape and texture. As part of image processing, we
resized the input image size, converted RGB image to gray scale and gray scale to binary.
We also extracted the boundary of the leaf image. These operations are described below.

Image resizing: it is computationally expensive to process large images. Hence, the size
of leaf images in the database is resized to reduce image processing time.

49
Conversion of RGB image to Grayscale Format
Leaf texture is extracted from grayscale image and in this work we applied Gabor filter
on grayscale leaf images to extract Gabor features. Hence, the RGB image is converted
to grayscale image. To convert RGB leaf image to grayscale, weighted averaging method
is used. Assuming R, G and B represent the respective intensities for red, green and blue
channel of a pixel, the grayscale value is computed using weighted averaging method as
given in equation (4.1)

Grayscale = 0.2989∗R + 0.5870∗G+ 0.1140∗B (4.1)

Figure 4.2: Original RGB leaf Figure 4.3: Grayscale leaf image after
image applying weighted averaging method

Conversion of Grayscale Format to Binary


Extraction of features is faster with binary images because all the pixel values in a binary
image can either be zero or one, hence, computations become faster. Thus, the images
are preprocessed and converted into smaller size files in binary format without the loss
of any morphological (shape related) information. In this work, Morphological features
and Radial Chebyshev moments are extracted from binary images.

To convert a grayscale image into a binary image, Otsu’s [64] method of automatic
thresholding which is explained in section 4.4 is applied. During the thresholding
process, if pixel value is greater than a threshold value, it is assigned one value (may be
white), else it is assigned another value (may be black). The shape of the histogram is
used for automatic thresholding and the threshold is chosen so as to minimize the
interclass variance of the black and white pixels.

50
Figure 4.4: Grayscale leaf image Figure 4.5: Binary leaf image obtained
after applying Otsu thresholding method

Boundary extraction: Identifying the contour of leaf plays paramount role for the
computation of morphological features of leaf image. Convolving leaf image with a
canny edge detection produces an image where higher grey level values indicate the
presence of an edge in the leaf image.

Figure 4.6: Plant leaf image contour

51
4.4 Leaf image segmentation
Among all the segmentation methods, Otsu method is one of the most successful region
based segmentation methods for automatic image thresholding because of its simple
calculation. In Otsu's method we exhaustively search for the threshold that minimizes
the intra-class variance (the variance within the class), defined as a weighted sum of
variances of the two classes [65].
𝜎𝑤2 (𝑡) = 𝑤0 (𝑡)𝜎02 (𝑡) + 𝑤1 (𝑡)𝜎12 (𝑡)
Weights 𝑤0 and 𝑤1 are the probabilities of the two classes separated by a threshold t,
and 𝜎02 and 𝜎12 are variances of these two classes.

The class probability 𝑤0,1 (𝑡) is computed from the L bins of the histogram:
𝑡−1

𝑤0 (𝑡) = ∑ 𝑝(𝑖)
𝑖=0
𝐿−1

𝑤1 (𝑡) = ∑ 𝑝(𝑖)
𝑖=𝑡

Otsu shows that minimizing the intra-class variance is the same as maximizing inter-
class variance:
𝜎𝑏2 (𝑡) = 𝜎 2 − 𝜎𝑤2 (𝑡) = 𝑤0 (𝜇0 − 𝜇 𝑇 )2 + 𝑤1 (𝜇1 − 𝜇 𝑇 )2
= 𝑤0 (𝑡)𝑤1 (𝑡)[ 𝜇0 (𝑡) − 𝜇1 (𝑡)]2
which is expressed in terms of class probabilities 𝜔 and class means 𝜇.
While the class mean 𝜇0,1,𝑇 (t) is:
𝑡−1 𝐿−1
𝑝(𝑖) 𝑝(𝑖)
𝜇0 (𝑡) = ∑ 𝑖 , 𝜇1 (t) = ∑ 𝑖
𝑤0 𝑤1
𝑖=0 𝑖=𝑡
𝐿−1

𝜇 𝑇 (𝑡) = ∑ 𝑖𝑝(𝑖)
𝑖=0

The class probabilities and class means can be computed iteratively. This idea yields an
effective Ostu algorithm.

52
Algorithm 4.1: Ostu Algorithm

4.5 Features Extraction


A leaf image can be characterized by its color, texture, and shape. Since the color of a
leaf varies with the seasons and climatic conditions and also most plants have similar
leaf color, this feature may not be useful as discriminating feature for the plant species
identification [40]. Hence, we used only shape and texture features as discriminating
features for plant species identification.

There are a number of shape and texture features that can be extracted from plant leaves
for the identification of plant species using image processing techniques. In this work,
Radial Chebyshev moments and morphological features are used as shape descriptors
and Gabor features are used as texture descriptors. The shape descriptor captures the
global shape of leaf image. The internal vein structure is captured by the texture
descriptors. Both the shape and texture are combined by concatenation before fed as in
input to the ensemble of classifiers. Finally, from the combined shape and texture
features, important features are selected and fed as input to the ensemble of classifiers.
Feature extraction from shape and texture of plant leaves involves major steps and each
of these steps consists of many methods that contribute to improved results. The three
shape features are presented below along with their corresponding algorithm.
53
4.5.1 Radial Chebyshev Moments
Radial Chebyshev moment of order p and repetition q for an image of size N x N is
defined in equation (2.13) of Chapter 3, section 4 as [40]:

1 2𝜋
Mpq = 2𝜋𝜌(𝑝,𝑚) ∑𝑚−1
𝑟=0 ∑𝜃=0 𝑡𝑝 (𝑟)𝑒
−𝑗𝑞𝜃
𝑓(𝑟, 𝜃) (4.2)

Where, t0(x) = 1
2x−N+1
t1(x) = N
(p−1 )2
(2p−1)t1 (x)tp−1 (x)−(p−1){1− }tp−2 (x)
N2
tp(x) = (4.3)
P

ρ(p,N) is the squared – norm:


1 22 2𝑝
𝑁(1− 2 )(1− 2 )(1− 2 )
𝑁 𝑁 𝑁
ρ(p,N) = (4.4)
2𝑝+1

p= 0, 1, N -1, m = (N/2) + 1, and i = √−1

54
Algorithm 4.2: Radial Chebyshev Moments

55
4.5.2 Morphological Feature
In this work, morphological feature have been considered which provide significant
information about leaf image. Leaf images from same class follow same properties
such as area, aspect ratio, convex, extent etc. These properties can be extracted
significantly with the help of morphological features [49]. Leaf image contour plays
significant role in extraction of the different morphological features. Contour is a curve
that join all the continuous points having same color or intensity along the boundary of
leaf shape. For better accuracy, the leaf images are converted to binary images using
threshold method before the selected features are extracted. A contour stores the (x, y)
coordinates of the boundary of a shape in a 2-dimensional array. The following
morphological features are extracted from leaf image [66].

Algorithm 4.3: morphological features

56
4.5.3 Gabor Filter
In this work, Gabor filters have been used for texture feature extraction from the digital leaf
images due to their nature of spatial locality, orientation selectivity and frequency
characteristic as explained chapter 2,section 2.2.2

Scheme of the algorithm

In this study the texture features were obtained as follows using Gabor filter.
1. Use a bank of Gabor filters at multiple scales and orientations to obtain filtered
images R(x,y).
2. The texture feature which is Local Energy of each filtered images is computed using
equation 2.21 of chapter 2, section 2.2.2

Gabor filter with 8 orientations and 4 frequency values is used for texture features extraction
from leaf image. This filter is illustrated in Table 4.1 for the 8 orientations( 0,π/8, 2π/8,
3π/8,4π/8,5π/8,6π/8,7π/8) and 4 frequencies( 0.2, 0.4, 0.6,0.8) values which are
experimentally determined. Hence, a total of 8*4=32 filter for each leaf image are used for
extraction of the texture features.

Table 4.1: Parameters Used for Gabor feature extraction

57
Figure 4.7: Gabor filter output for different orientations and frequency values

The following leaf image is used to illustrate Gabor filter.

Figure 4.8: Sample Leaf image before Gabor filter is applied

When we convolve Gabor filters on the sample leaf image in figure 4.8, it produces 32
different filtered leaf images. The 32 filtered leaf images are depicted in figure 4.9

58
Figure 4.9: Leaf images after application of Gabor filter

4.5.4 Features Fusion


The aim of the feature fusion technique is to combine many independent (or approximately
independent) features to give a more representative features for the objects or patterns. The
features are combined by concatenating it into one feature vector. Usage of combined feature
set increases the accuracy of recognition of plant species. Hence, we combined
morphological features, Radial Chebyshev moments and Gabor local energy features in to
one feature vector.

For each image:


1. MF := morphological features
2. GM := Chebyshve Moments
3. GF := Gabor Local energy
4. Fused feature := CONCATENATE(MF,GM,GF)

Algorithm 4.4: Feature Fusion

59
4.5.5 Feature selection
In this work, the number of features are high and selecting important features that contribute
significantly for the predication of model is essential for improved performance and
accuracy. Random forest plays paramount role in feature selection as it uses Gini importance
or mean decrease in impurity (MDI) to calculate the importance of each feature. Gini
importance is also known as the total decrease in node impurity. This is how much the model
fit or accuracy decreases when we drop a variable. The larger the decrease, the more
significant the variable is. Here, the mean decrease is a significant parameter for variable
selection. The Gini index can describe the overall explanatory power of features [67].

To compute Gini impurity for a set of features with K classes, let i ∈ {1, 2, 3…K}, and
let pi be the fraction of features labeled with class in the set. Then, the Gini index is provided
by equation 4.5. [66]

Gini = 1 - ∑𝑘𝑖=1(𝑃𝑖2 ) (4.10)

4.6 Learning
The results obtained from the different scientific researches [67, 68, 69] showed high
efficiency of ensemble methods for image classification. Random Forest is a class of
ensemble method that builds an ensemble of multiple decision trees and merges them
together to get a more accurate and stable prediction [72].

In random forests, each tree in the ensemble is built from a sample drawn with replacement
from the training set. In addition, instead of using all the features, a random subset of features
is selected, further randomizing the tree. As a result, the bias of the forest increases slightly,
but due to the averaging of less correlated trees, its variance decreases, resulting in an overall
better model [67, 72].

60
Random forest method is highly accurate and robust because of the number of decision trees
participating in the process. This method does not suffer from the over fitting problem as it
takes the average of all the predictions, which cancels out the biases. The method also plays
paramount role in feature selection which is a key advantage over alternate machine learning
algorithms. Hence, we decide to use Random Forest method for automatic plant species
identification.

1. For b = 1 to B:
a. Draw a bootstrap sample Z* of size N from the training
data.
b. Grow a random-forest tree Tb to the bootstrapped data,
by recursively repeating the following steps for each
terminal node of the tree, until the minimum node
size nmin is reached.
i. Select m variables at random from the p
variables.
ii. Pick the best variable/split-point among the
m.
iii. Split the node into two daughter nodes.
2. Output the ensemble of trees {Tb} 𝐵1 .
3. To make a prediction at a new point x:
̂𝑏 (x) be the class prediction of the
Classification: Let 𝐶
bth random-forest tree.
Then 𝐶̂ 𝐵𝑟𝑓 (x) = majority vote {𝐶̂ b(x)} 𝐵1

Algorithm 4.5: Algorithm of Random Forest [73]

61
Random Forest works in four steps:
1. Select random samples from a given dataset.
2. Construct a decision tree for each sample and get a prediction result from each decision
tree.
3. Perform a vote for each predicted result.
4. Select the prediction result with the most votes as the final prediction.

Figure 4.10: How Random Forest works as illustrated in [67]

4.7 Leaf Identification

The identification component of the automatic plant species identification architecture


takes extracted, combined and selected features from plant leaf morphology, radial
Chebyshev moments and Gabor filters and input to the knowledge base. The knowledge
base will identify the input plant leaf image using the trained random forest classifier.

62
: EXPERIMENTATION

5.1. Introduction
Developing a prototype is one of the objective of this work. The prototype serve to
demonstrate the validity and usability of the proposed automatic plant species
identification system. Prototype is developed for plant leaf preprocessing, segmentation,
feature extraction, feature fusion, future selection in order to demonstrate as well as
evaluate the developed automatic plant species identification model. We also presented
the tools and development environments used to realize this work.

5.2. Data collection


There are several leaf dataset that are commonly used for experimentation. The Flavia
data set [8] and the “Leaf” dataset [74] are the dataset with high number of leaf. For this
work, we primarily used Flavia data set for testing of our proposed work. The Flavia
data set contains 1907 leaf images from 32 different species. In Flavia dataset, all images
have resolution of 1600x1200. Sample leafs from this dataset are shown in Figure 5.1

Figure 5.1: Sample Leaf images

63
5.3. Implementation

In order to implement the Plant species identification model, we made use of several
open source libraries. The following libraries are used on Windows 10 Pro with
Processor Intel(R) Core(TM) i7, CPU 2.70 GHz and RAM 12.0 GB [75, 76, 77, 78].
 Python: We choose Python as the programming language to develop the
algorithms. The main reason for its selection is mainly because of its simplicity
and code readability.
 The IDE used is Anaconda 3, Spyder 3.2.8.
 OpenCV-Python which is a library of Python bindings designed to solve
computer vision problems.
 NumPy: is the fundamental package for scientific computing with Python that
contains a powerful N-dimensional array object. This library is needed in order
to treat images as matrices. NumPy is the key library for image manipulation. It
is very fast, which allows our algorithms to run with high computational
efficiency, which is part of the desired features of the proposed work. OpenCV-
Python makes use of Numpy, which is a highly optimized library for numerical
operations.
 We also used pandas which is an open source library providing high-
performance, easy-to-use data structures and data analysis tools for the Python
programming language purpose of writing extracted features to CSV file.
 Scikit-learn is a free machine learning library for the Python. It features various
classification, regression and clustering algorithms including support vector
machines, random forests, etc. and is designed to interoperate with NumPy and
SciPy.

5.3.1. Leaf preprocessing and segmentation


As part of leaf image preprocessing, we resized each leaf image, converted RGB image
to Grayscale and Grayscale Format to Binary. Leaf image boundary extraction for the
purpose of morphological features extraction is also performed. The pseudo code for
these processes is presented below.
64
65
5.3.2. Feature extraction
We extracted 25 Radial Chebyshev moment features(m00,m01,m02,m03,m04 ,m10,
m11,…m44), 8 morphological features ( area, aspect ratio, extent, solidity, equivalent
diameter, form factor, major axis and minor axis) and 32 Gabor texture features(mse1,
mse2,…mse31) for each leaf image.

66
5.3.3. Training model
For the proposed solution, we decided to use Random Forest algorithms as it has already
been explained in Section 4.6. Random Forest is a class of ensemble classifier with
robust and accurate classification capabilities. This classifier is first trained with the
combined features. The result is evaluated and trained again with selected features
which are 48 in number. We excluded 17 features as their importance is very low for
training the model. The feature selection is done using the Gini index where it is readily
available with the implementation of Random Forest in the Sciket Learn. The threshold
for deciding the importance of the features is decided empirically by conducting
experiments several times.

Scikit Learn Pseudo code for Random Forest implementation

# Load data
filename =’path to data’
data=readData(filename)
X = data.iloc[:,0:n]# assign features set
y = data.iloc[:, n] # assign class labels
X_train, X_test, y_train, y_test = train_test_split(X,
y, test_size = 0.20)
rfc = RandomForestClassifier(n_estimators=1000)
rfc.fit(X_train, y_train)
# Use the random forest's predict method on the test data
p = rfc.predict(X_test)
displayAccuracy(p)

67
Pseudo code to view a list of features with their importance scores

featureImportance=list(zip(X_train,rfc.feature_
importances_))
for item in featureImportance:
print(item[0], item[1])

5.4. Evaluation
5.4.1. Evaluation Techniques
To test the accuracy of the model, 20% of the leaves in the database are used as query
images. The widely used measurement metrics such as precision, recall, f1 score and
support are used to compute the accuracy of the system which are defined as follow [9,].

Precision is defined as the ratio of tp / (tp + fp), where tp is the number of true positives
and fp is number of false positives. The precision is intuitively the ability of the classifier
not to label as positive a sample that is negative.

The recall is the ratio tp / (tp + fn) where tp is the number of true positives and fn the
number of false negatives. The recall is intuitively the ability of the classifier to find all
the positive samples.

The f1score can be interpreted as a weighted harmonic mean of the precision and recall
and is defined as f1 Score = 2*(Recall * Precision) / (Recall + Precision).

Support is defined as the number samples of the true response that lie in that class.

Accuracy is defined as a ratio of correctly predicted observation to the total observations.


That is, accuracy = (tp+tn)/(tp+fp+fn+tn), where tn is true negative which is defined as the
ration of tn/(tn+fp).

68
Scikit-learn provide a convenience report when working on classification problems to give
us a quick idea of the accuracy of a model using a number of measures.
The classification_report() function displays the precision, recall, f1-score and support for
each class.

5.4.2. Test Result


In section 5.3.2 and 5.3.3, we have explained about the features extracted and how the
selected algorithm is used for training the model using the extracted feature set. We
presented here the experimental results first by training the algorithm for each feature set
(Morphological features, Radial Chebyshev moment features and Gabor texture features)
and then for the combined feature set with feature selection and without feature selection.
That is, using morphological features, Radial Chebysheve moments and Gabor texture
features together before feature selection and after feature selection.

Table 5.1: Test Result using various features

The classification accuracy of Random Forest model using morphological, Radial


Chebyshev moment and Gabor texture features is 93.0%, 91.0% an 85.0% respectively.
However, when we fused the three feature set and train the model, we achieved 96.0%
accuracy. We further proceed with feature selection and re-trained the model. The
accuracy of the model on selected feature is 97.0%.

Individually, the test results are presented in Table 5.2, 5.3, 5.4, 5.5 and 5.6 for test result
using morphological features, Radial Chebyshev moments, Gabor texture features, fused
features and selected features respectively.

69
Table 5.2: Test result for training model using morphological features

70
Table 5.3: Test result for training model using Chebyshev moments

71
Table 5.4: Test result for training model using Gabor texture features

72
Table 5.5: Test result for training model using fused features

73
Feature selection: We closely examined the importance of each feature for the training of
the Random Forest classifier. After repeated experimentation, we decided that features with
feature importance value less than 0.009 do not contribute much for the training of the
classifier. Hence, Radial Chebyshev moment m01,m11,m20,m21,m23,m41,m43 and Gabor
texture features mse3,mse7,mse9,mse12,mse13,mse17,mse20,mse21,mse25,mse31 are
removed from the fused feature set as their feature importance value is less than the threshold
value we set. After removing these features, the algorithm is re-trained using the remaining
selected features.

74
Table 5.6: Test result of training model using selected features

75
5.5. Discussion
The result show that the classification accuracy of the Random Forest model using
morphological features, Radial Chebyshev moments and Gabor texture features is
93.0%,91.0%, and 85.0% respectively. When the classifier is trained on fused and selected
features, the accuracy achieved is 97.0% which is higher than the result obtained by training
the classifier on individual features.

Morphological features are generally good as shape descriptor but has difficulty at
representing some plant species with irregular shape. For example for plant species in class
30, its precision, recall, fscore and support values are 0.75,0.86,0.80 respectively which are
slightly low compared to the result for other classes. This is because of the irregularity of
the shape of the leaf examples in this class.

Radial Chebyshev moment generation for the higher order and repetition is time taking to
compute. We take maximum order 5 and repetition 5. Radial Chebyshev moment is better
at representing irregular leaf shape compared to morphological technique. If we see the
result for same class, class 30, its precision, recall, fscore and support values are 0.86, 0.86,
0.86 respectively.

Setting Gabor filter parameters such as orientation, frequency, kernel size, standard
deviation of the Gaussian envelope, spatial aspect ratio, phase offset and type of filter
coefficient require careful selection supported by experiment. Poorly set parameter or
parameters impact the result of feature extraction. The Gabor filter parameters used in this
work which are provided in Table 4.1 are determined after several rounds of
experimentation. However, using the feature selection techniques we employed, we found
less important 10 of the Gabor texture features (mse3, mse7, mse9, mse12, mse13, mse17,
mse20, mse21, mse25 and mse31). This might be due to the requirement to some extent for
further refinement of the Gabor filters parameters setting. The more we have features that
represent the leaf image, the more accurate our classification result will be.

76
: CONCLUSION AND FUTURE WORK

6.1 Conclusion
In this work we described an algorithm for automatic plant species identification. We used
Random Forest as a classifier and tuned it using feature selection techniques. The classifier
is separately trained using Morphological features, Radial Chebyshev moment features and
Gabor texture features and we obtained 93%, 91% and 85% respectively. We also fused
these feature set and trained the classifier on the whole feature set and obtained 96%
accuracy and finally, we applied feature selection techniques on the fused feature set and
trained the Random Forest classifier again and achieved accuracy of 97% which is higher
than other approaches in the literature.
From the result achieved, we can conclude that use of feature fusion and a class of ensemble
classifier such as Random Forest is an excellent choice for automatic plant species
identification.

6.2 Contribution of the thesis work

The main contributions of this thesis work are outlined as follows:


a. The morphological features, Radial Cheybeshev moments and Gabor texture
features are fused together and 48 features are selected. These features effectively
identify plant species. Thus, our method is more suitable for real-world
applications.
b. The study has implemented automatic plant species identification using fused
features and Random Forest classifier.
c. The study showed how a fused feature could enhance the automatic plant species
identification.
d. This work can be used as a reference for identification of object from images as
the basic underlying principle is same concerning object detection and
recognition.

77
6.3 Recommendation

The proposed work can be further extended to identify complex images with petiole and
clustered leafs. It can also be extended to identify plant species in real time from their
leaf image.

We used leaf shape morphology, moments and texture as leaf shape features. Even
though, use of leaf color has its own limitation as described in section 2.2.1, still it is
good to consider use of leaf color with the features used in this work as it might improve
the result.

The work can also be extended to identify plant species from 3 dimensional (3D) flower
as flows are 3D in nature.

Use of Radial Chebyshev moment for feature extraction is proved to provide good result.
However, its implementation involves recursive function call and generally it is slow
process for higher order and repletion of the moment values. Alternative but efficient
algorithm development is area of future work.

78
References

[1] James S.Cope, David Corney, Jonathan Y.Clark, Paolo Remagnino and Paul Wilkin,
“Plant species identification using digital morphometrics: A review,” Expert System
with Application, vol. 39, pp.7562-7573, 2012.
[2] J. Cullen and James Cullen, Practical Plant Identification, Cambridge University Press,
2006, pp.1
[3] Scott Bissette and David Lane, ”Identification of Trees,” in Common Forest Tress of
North Carolina, 21 ed. North Carolina Department of Agriculture and Consumer
Services, Northern Carolina, USA,2015, ch.1, pp.1
[4] Key to Nature, “Types of Identification Keys”, [online]. Available:
http://www.keytonature.eu/handbook/Types_of_identification_keys. [Accessed: 24-
Dec- 2017].
[5] Naiara Aginakoetal.,”Identification of plant species on large botanical image datasets”
in Proc.of 1st Int. Workshop on Environmental Multimedia Retrieval co-located with
ACM Intl Conf. on Multimedia Retrieval (ICMR 2014), Glasgow, UK, April 1, 2014.
[6] Addis Ababa University, [Online] Available: http://www.aau.edu.et/cns/department-
of-plant-biology-and-biodiveristy-management/facilities-of-pb/.
[7] Tesfaye Awas (PhD), “Endemic plants of Ethiopia”, Institute of Biodiversity
Conservation, P. O. Box 30726, Addis Ababa, Ethiopia.
[8] Stephen Gang Wu, Forrest Sheng Bao, Eric You Xu, Yu-Xuan Wang, Yi-Fan Chang
and Chiao-Liang Shiang, A Leaf Recognition Algorithm for Plant classification Using
Probabilistic Neural Network, IEEE 7th International Symposium on Signal
Processing and Information Technology, Dec. 2007, Cario, Egypt.
[9] Sciket-Learn, “metrics”, [Online]. Available: http://scikit- learn.org/stable/modules/
generated/ sklearn.metrics.precision_recall_fscore_support.html. [Accessed: 15- Sep-
2018].
[10] B.S.Harish, Aditi Hedge, OmPriyaVenkatesh, D.G.Spoorthy,D.Sushma,
“Classification of Plant Leaves Using Morphological Features and Zernike

79
Moments,” in International Conference on Advances in Computing, Communications
and Informatics, pp.1827-1831,2013.
[11] Botanical-Online, “The importance of plants”, [Online]. Available:
https://www.botanical-online.com/theimportanceofplants.htm. [Accessed: 24- Dec-
2017].
[12] Education with Fun, “Parts of Plants”, [Online]. Available:
http://educationwithfun.com/course/view.php?id=18&section=2. [Accessed: 24-
Dec- 2017].
[13] Plant facts glossary, “Leave variation”, [Online]. Available:
https://plantfacts.osu.edu/resources/hcs300/glossary/glossary.htm. [Accessed: 24-
Dec- 2017].
[14] Encyclopedia Britannica,” Summary of Leaf variation”, [Online]. Available:
https://www.britannica.com/science/simple-leaf/media/545336/374. [Accessed: 24-
Dec- 2017].
[15] 6BC Botanical Garden, “Glossary of leaf morphology”, [Online]. Available:
https://www.6bcgarden.org/glossary-of-leaf-morphology.html. [Accessed: 24- Dec-
2017].
[16] Gardening know how,” Leaf Identification – Learn about Different Leaf Types in
Plants”, [Online]. Available: https://www.gardeningknowhow.com/garden-how-
to/info/different-leaf-types-in-plants.htm. [Accessed: 24- Dec- 2017].
[17] Colombia University Press, “Plant Taxonomy”, [Online]. Available:
https://cup.columbia.edu/book/plant-taxonomy/9780231147125. [Accessed: 24-
Dec- 2017].
[18] Olive Oil Source, “Olive Classification”, [Online]. Available:
https://www.oliveoilsource.com/page/olive-classification. [Accessed: 24- Dec-
2017].
[19] Marko Lukic, Eva Tuba and Milan Tuba, “Leaf Recognition Algorithm using
Support Vector Machine with Hu Moments and Local Binary Patterns, “ in
Proceedings of IEEE 15th International Symposium on Applied Machine
Intelligence and Informatics, pp. 485 – 490, 2017.
80
[20] V. R. Patil and R. R. Manza, "A Method of Feature Extraction from Leaf
Architecture", in International Journal of Advanced Research in Computer Science
and Software Engineering, Volume 5, Issue 7, July 2015 ISSN: 2277 128X
[21] B. Chitradevi and P.Srimathi, “An Overview on Image Processing Techniques”, in
International Journal of Innovative Research in Computer and Communication
Engineering Vol. 2, Issue 11, November 2014.
[22] Rafael C. Gonzalez and Richard E. Woods, "Digital Image Processing, 2nd Edition",
Prentice Hall, pp1-2,567-568
[23] K. Padmavathi1 and K. Thangadurai, "Implementation of RGB and Grayscale
Images in Plant Leaves Disease Detection – Comparative Study", Indian Journal of
Science and Technology, Vol 9(6), DOI: 10.17485/ijst/2016/v9i6/77739, February
2016
[24] Sapna Sharma and Dr.Chitvan Gupta, “Recognition of Plant Species based on leaf
images using Multilayer Feed Forward Neural Network”, in International Journal of
Innovative Research in Advanced Engineering (IJIRAE) ISSN: 2349-2163 Issue 6,
Volume 2, June 2015.
[25] S.S. Bedi and Rati Khandelwal, "Various Image Enhancement Techniques - A
Critical Review", International Journal of Advanced Research in Computer and
Communication Engineering Vol. 2, Issue 3, March 2013
[26] Raman Maini and Himanshu Aggarwal, "A Comprehensive Review of Image
Enhancement Techniques", Journal of Computing, Vol. 2, Issue 3, March2010.
[27] Frank Shih,"Image Processing and Pattern Recognition", The Institute of Electrical
and Electronics Engineers, Inc., 2010, pp 52-55
[28] Heba F.Eid, "Performance Improvement of Plant Identification Model based on PSO
Segmentation ",International Journal of Intelligent Systems and Applications, 2016,
2, 53-58,DOI: 10.5815/ijisa.2016.02.07
[29] Shilpa Kamdi and R.K.Krishna,"Image Segmentation and Region Growing
Algorithm" in International Journal of Computer Technology and Electronics
Engineering (IJCTEE) Volume 2, Issue 1, February 2012.

81
[30] K.Bhargavi and S. Jyothi, "A Survey on Threshold Based Segmentation Technique in
Image Processing", in International Journal of Innovative Research and Development,
Vol. 12(12), Nov. 2014.
[31] B.Sathya and R.Manavalan, "Image Segmentation by Clustering Methods:
Performance Analysis", International Journal of Computer Applications (0975 –
8887), Volume 29– No.11, September 2011
[32] Muthukrishnan.R and M.Radha, "Edge Detection Techniques for Image
Segmentation", in International Journal of Computer Science & Information
Technology (IJCSIT) Vol 3, No 6, Dec 2011.
[33] N. Valliammal, "Computer-Aided Plant Identification Through Leaf Recognition
Using Enhanced Image Processing And Machine Learning Algorithms", PhD in
Computer Science, Avinashilingam Institute for Home science and Higher Education
for Women, Coimbatore – 641 043, October, 2013.
[34] Adrian Rosebrock, https://www.pyimagesearch.com/2014/03/03/charizard-explains-
describe-quantify-image-using-feature-vectors/
[35] Jan Flusser, Tomáš Suk and Barbara Zitová, Moments and Moment Invariants in
Pattern Recognition © 2009 John Wiley & Sons, Ltd. ISBN: 978-0-470-69987-4
[36] R. Athilakshmi and Dr. Amitabh Wahi, "An Efficient Method for Shape based Object
Classification using Radial Chebyshev Moment on Square Transform", in Australian
Journal of Basic and Applied Sciences, 8(13) August 2014, Pages: 51-60
[37] Hamid Reza Boveiri,"On Pattern Classification Using Statistical Moments",
International Journal of Signal Processing, Image Processing and Pattern Recognition
Vol. 3, No. 4, December, 2010
[38] Anant Bhardwaj and Manpreet Kaur," A review on plant recognition and classification
techniques using leaf images", International Journal of Engineering Trends and
Technology- Vol.4, Issue 2,2013
[39] Pallavi P and V.S Veena Devi, "Leaf Recognition Based on Feature Extraction and
Zernike Moments", International Journal of Innovative Research in Computer and
Communication Engineering Vol.2, Special Issue 2, May 2014

82
[40] Pouya Bolourchi, Hasan Demirel and Sener Uysal,"Target recognition in SAR images
using radial Chebyshev moments",SIViP (2017) 11:1033–1040, DOI10.1007/s11760-
017-1054-2, 21 January 2017
[41] Balasubramanian Raman, Sanjeev Kumar, ParthaPratim Roy and Debashis,”Leaf
Identification Using Shape and Texture Features” in Advances in Intelligent Systems
and Computing in Proceedings of International Conference on Computer Vision and
Image Processing, Volume 2, Springer, 2016, page 531-53.
[42] Anup S Vibhute and Ismail Saheb Bagalkote, "Identification of Grape variety Plant
species using image processing ", Avishkar – Solapur University Research Journal,
Vol. 3, 2014
[43] Arun Kumar1, Vinod Patidar, Deepak Khazanchi, and Poonam Saini,"Role of
Feature Selection on Leaf Image Classification", Journal of Data Analysis and
Information Processing, 2015, 3, 175-183 Published Online November 2015 in
SciRes. http://www.scirp.org/journal/jdaip
http://dx.doi.org/10.4236/jdaip.2015.34018
[44] Urszula Marmol, "Use of Gabor Filters for Texture Classification of Airborne images
and Lidar Data",Archives of Photogrammetry, Cartography and Remote Sensing, Vol.
22, 2011, pp. 325-336, ISSN 2083-2214
[45] Jyotismita Chaki and Ranjan Parekh, "Plant Leaf Recognition using Gabor Filter",
International Journal of Computer Applications (0975 – 8887), Volume 56– No.10,
October 2012.
[46] Alaa Tharwat,Tarek Gaber,Yasser M. Awad,Nilanjan Dey,Vaclav Snasel and Aboul
Ella Hassanien,"Plants Identification using Feature Fusion Technique and Bagging
Classier",ResearchGate Conference Paper,November 2015.
[47] Telgaonkar Archana H. and Deshmukh Sachin "Dimensionality Reduction and
Classification through PCA and LDA ",International Journal of Computer
Applications (0975 – 8887) Volume 122 – No.17, July 2015
[48] Minggang Du and Xianfeng Wang, Linear Discriminant Analysis and Its Application
in Plant Classification,2011 Fourth International Conference on Information and
Computing
83
[49] Kanika Kalra, Anil Kumar Goswami and Rhythm Gupta, "A comparative Study of
Supervised Image Classification Algorithms for Satellite Images ", International
Journal of Electrical, Electronics and Data Communication, ISSN: 2320-2084
Volume-1, Issue-10, Dec, 2013.
[50] Prof. Meeta Kumar, Mrunali Kamble, Shubhada Pawar, Prajakta Patil and Neha
Bonde,"Survey on Techniques for Plant Leaf Classification", International Journal of
Modern Engineering Research (IJMER),Vol.1, Issue.2, pp-538-544 ISSN: 2249-6645
[51] Jason Brownlee, Ph.D. [Online], Available: https://machinelearningmastery.com/
supervised -and-unsupervised-machine-learning-algorithms/
[52] Qingmao Zeng, Tonglin Zhu, XueyingZhuang, MingxuanZheng and YubinGuo,
“Using the periodic wavelet descriptor of plant leaf to identify plant species,” Springer
Science and Business Media, 2015, New York.
[53] Adil Salman, Ashish Semwal,Upendra Bhatt and V. M Thakkar, "Leaf Classification
and Identification using Canny Edge Detector and SVM Classifier," in International
Conference on Inventive Systems and Control, pp.1-4,2017.
[54] Marko Lukic, Eva Tuba and Milan Tuba, “Leaf Recognition Algorithm using Support
Vector Machine with Hu Moments and Local Binary Patterns, “ in Proceedings of
IEEE 15th International Symposium on Applied Machine Intelligence and
Informatics, pp. 485 – 490, 2017.
[55] Zalikha Zulkifli, Puteh Saad and Itaza Afiani Mohtar, “Plant Leaf Identification using
Moment Invariants & General Regression Neural Network,” in 11th International
Conference on Hybrid Intelligent Systems (HIS), pp.430-435, 2011.
[56] M. E. Celebi and Y. A. Aslandogan, “A comparative study of three moment-based
shape descriptors,” in Proceedings of IEEE International Conference on Information
Technology: Coding and Computing, Vol. 2, No. 1, pp. 788 – 793, April 4-6,2005.
[57] R Rizal Isnanto, Ajub Ajulian Zahra, and Patricia Julietta, “Pattern Recognition on
Herbs Leaves Using Region-Based Invariants Feature Extraction,” in Proceedings of
3rd International Conference on Information Technology, Computer, and Electrical
Engineering, pp. 455 – 459,Oct 19-21, 2016.

84
[58] Gunjan Mukherjee, Arpitam Chatterjee, and BipanTudu, “Study on the potential of
combined GLCM features towards medicinal plant classification,” in 2nd
International Conference on Control, Instrumentation, Energy & Communication
(CIEC), 2016.
[59] Patil and Bhagat, "Plants Identification by Leaf Shape using GLCM, Gabor Wavelets
and PCA," in International Journal of Engineering Trends and Technology (IJETT) –
Volume 37 Number 3- July 2016

[60] Anusha Rao and Dr. S.B. Kulkarni, “An Improved Technique of Plant Leaf
Classification Using Hybrid Feature Modeling,” in Proceedings of IEEE International
Conference on Innovative Mechanisms for Industry Applications, pp. 5-9, 21-23 Feb.
2017.
[61] S. Wu, F. Bao, E. Xu, Y. Wang, Y. Chang, and Q. Xiang, “A Leaf Recognition
Algorithm for Plant Classification Using Probabilistic Neural Network,” IEEE 7th
International Symposium on Signal Processing and Information Technology,
December 2007.
[62] Alaa Tharwat, Tarek Gaber, Yasser M. Awad, Nilanjan Dey, Vaclav Snasel, and
Aboul Ella Hassanien, “Plants Identification using Feature Fusion Technique and
Bagging Classifier,” in Conference Paper · November 2015
[63] Ji-Xiang Du and Chuan-Min Zhai, “Plant Species Recognition Based on Radial Basis
Probabilistic Neural Networks Ensemble Classifier,” in Advanced Intelligent
Computing Theories and Applications. Vol 6216, Springer, Berlin, Heidelberg, 2010.
[64] Open Source Computer Vision,” Otsu’s Binarization”, [Online]. Available:
https://docs.opencv.org/3.4.0/d7/d4d/tutorial_py_thresholding.html, [Accessed: 24-
Jul- 2018].
[65] Wikipedia, “Otsu's method”, [Online]. Available:
https://en.wikipedia.org/wiki/Otsu%27s_method. [Accessed: 28- Jul- 2018].
[66] Alexander Mordvintsev & Abid K,"OpenCV-Python Tutorials Documentation
Release 1",February 28, 2017,pp. 87-98

85
[67] Avinash Navlani, “Understanding Random Forests Classifiers in Python”, May 16th,
2018, [Online]. Available: https://www.datacamp.com/community/tutorials/random-
forests-classifier-python. [Accessed: 24- Aug- 2018].
[68] Wikipedia, “Decision Tree Learning”, [Online]. Available:
https://en.wikipedia.org/wiki/Decision_tree_learning. [Accessed: 25- Aug- 2018].
[69] A. Riaz, S. Farhan, M. A. Fahiem and H. Tauseef,"An Ensemble Classifier based
Leaf Recognition Approach for Plant Species Classification using Leaf Texture,
Morphology and Shape",Department of Computer Science, Lahore College for
Women University, Lahore, Pakistan
[70] R. Putri Ayu Pramesti, Yeni Herdiyeni, Anto Satriyo Nugroho,” Weighted
Ensemble Classifier for Plant Leaf Identification”, TELKOMNIKA, Vol.16, No.3,
June 2018, pp. 1386~1393 ISSN: 1693-6930, accredited A by DIKTI, Decree No:
58/DIKTI/Kep/2013 DOI : 10.12928/TELKOMNIKA.v16i3.7615
[71] Nikita Joshi1, Shweta Srivastava,” Improving Classification Accuracy Using
Ensemble Learning Technique (Using Different Decision Trees)”, IJCSMC, Vol. 3,
Issue. 5, May 2014, pg.727 – 732
[72] Savan Patel, “Random Forest Classifier”, May 18, 2017, [Online]. Available:
https://medium.com/machine-learning-101/chapter-5-random-forest-classifier-
56dc7425c3e1. [Accessed: 25- Aug- 2018].
[73] Trevor Hastie, Robert Tibshirani and Jerome Friedman, Random Forests in "The
Elements of Statistical Learning, 2nd ed., 2008. [Online]. Available:
http://statweb.stanford.edu/~tibs/book/chap17. Pp.601-603. [Accessed: 25- Aug-
2018].
[74] Pedro F. B. Silva André R. S. MarÇal Rubim Almeida da Silva, ‘Leaf data set ‘,
February, 2014
[75] Open Computer Vision, “Open CV python”, [Online]. Available:
https://docs.opencv.org/3.0- beta/doc/py_tutorials/py_setup/
py_intro/py_intro.html#intro. [Accessed: 5- Sep- 2018].
[76] The pandas project,” Python Data Analysis Library”, [Online]. Available:
https://pandas.pydata.org/. [Accessed: 5- Aug- 2018].
86
[77] NumPy developers,”Numpy”, [Online]. Available: http://www.numpy.org/.
[Accessed: 5- Sep- 2018].
[78] Sciket-Learn, “Sciket-Learn machine learning in python”, [Online]. Available:
http://scikit-learn.org/stable/. [Accessed: 15- Sep- 2018].

87
Declaration
I, the undersigned, declare that this thesis is my original work and has not been presented
for a degree in any other university, and that all source of materials used for the thesis have
been duly acknowledged.

Declared by:
Name: _____________________________________
Signature: __________________________________
Date: ______________________________________

Confirmed by advisor:
Name: _____________________________________
Signature: __________________________________
Date: ______________________________________

Place and date of submission: Addis Ababa, Oct 26, 2018.

88

You might also like