0% found this document useful (0 votes)
35 views42 pages

Project Report

Uploaded by

sozya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views42 pages

Project Report

Uploaded by

sozya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

CHAPTER 1

INTRODUCTION
Thyroid disease is a medical condition that affects the function of the thyroid
gland. The thyroid gland is located at the front of the neck and produces thyroid
hormones that travel through the blood to help regulate many other organs, meaning that
it is an endocrine organ. These hormones normally act in the body to regulate energy use,
infant development, and childhood development. Classification techniques that play vital
role and there is a major role for analysing diseases and providing facilities to reduce the
cost for the patients. In India, it is expected that about 42 million people suffer from
thyroid disorders. Symptoms include weight gain, tiredness, weakness as well as feeling
cold etc.

The thyroid gland is an endocrine gland in the neck. It erects in the lessened part
of the human neck, beneath the Adam’s apple which aids in the secretion of thyroid
hormones and that basically influences the rate of metabolism and protein synthesis. To
control the metabolism in the body, thyroid hormones are useful in many ways, counting
how briskly the heart beats and how quickly the calories are burnt. The composition of
thyroid hormones by the thyroid gland helps in the domination of the body’s metabolism.
The thyroid glands are composed of two active thyroid hormones, levothyroxine
(abbreviated T4) and triiodothyronine (abbreviated T3). To regulate the temperature of
the body these hormones are imperative in the fabrication and also in the comprehensive
construction and supervision. The hypothalamus in the brain produces Thryrotropin
releasing hormone (TRH) which causes pituitary gland to release hormone called thyroid
stimulating hormone (TSH). This happens when there is a lack of production of thyroid
hormone. TSH is the one which stimulates thyroid gland to release T4. Thyroid disease
can be classified mainly into two types:

a. Hyperthyroidism:
This is a disorder where thyroid gland produces enormous amount of thyroid
hormones. Common symptoms of this includes restlessness, agitation, tremors,
weight loss, rapid heartbeat, frequent bowel movements. In this disease most of T4
hormone gets converted to T3. The main causes of hyperthyroidism are excessive
intake of iodine, abnormal secretion of TSH, Excessive intake of thyroid hormones.
Grave’s disease is severe condition of thyroid which may lead to death.

Dept. of CSE, CIT Page 1


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

b. Hypothyroidism
Hypothyroidism is a situation where amount of production of thyroid hormone
reduces. Common symptoms of hypothyroidism include dry skin, constipation,
feeling cold, prolonged menstrual bleeding, sudden weight gain. This also causes
other disorders such as Thyroid hormone resistance, Hashimoto’s thyroiditis.

Figure 1.1: Hypothyroidism


Thyroid hormones are required for mental and physical development of body. It is
responsible for maintaining electrolyte and other mineral levels in body. It also controls
central nervous system, brain and other parts of the body. So, Thyroid hormones is
important for all over development of human body. Thyroid disorders are the condition
which makes improper functioning of the thyroid gland. This gland plays numerous and
diverse roles such as regulation of various metabolic processes in human body.

1.1 Problem Statement


Thyroid disorders can range from a small, harmless goiter (enlarged gland) that needs
no treatment to life-threatening cancer. The most common thyroid problems involve
abnormal production of thyroid hormones, resulting in Thyroiditis, Iodine deficiency,
Nodules and Excessive iodine. Women are much more likely than men to have thyroid.
The disease is also more common among people older than age 60. The problem is the
lack of sustainable system that can predict the thyroid disease by making use of the test
values.

Dept. of CSE, CIT Page 2


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

1.2 Objectives
 To build an automatic system that develops the solution for the problem that uses
deep learning.
 To classify the result into whether a person is thyroid positive or thyroid negative
based on the historical data.
 To identify the best possible model to detect the thyroid presence for given data.
1.3 Scope of the project
The Thyroid disease detection using the deep learning technique is realized by making
use of different attributes such as age, sex, sick, pregnant, values of different thyroid
hormones such as thyroxine, triiodothyronine, Thyroid Stimulating Hormone etc. It
includes research, content strategy and accurate prediction system. It consists of two
phases. First one is to import dataset and data pre-processing. In this segment, the
unnecessary columns are removed from the dataset to ensure better detection and change
the dimensions from 2D to 3D to give input for the models chosen. Second one is to apply
the models. In this segment, the 3D data is split into train and test sets and the models are
applied on train set and evaluated on test set.
The project is mainly useful in medical field. This benefits many patients in
identifying their thyroid results based on the symptoms they have, by giving the values of
the thyroid diagnosis they have undergone. It also benefits the doctors in determining the
disease and treating their patients well and making them to get rid of the disease as soon
as possible.

Dept. of CSE, CIT Page 3


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

CHAPTER 2

LITERATURE SURVEY
Literature review provides an overview of current knowledge, methods, and gaps in
the existing research. During the literature survey, we collected some of the such research
papers and information about thyroid disease detection mechanisms that have been used.

1) Thyroid Disease Classification Using Decision Tree and SVM:


a. Methodology: This paper describes about the diagnosis of thyroid disorders using
decision tree attribute splitting rules. In this study, the comparative thyroid disease
diagnosis is performed using the machine learning techniques such as SVM,
Naïve Bayes and Decision Trees. The data is collected from the thyroid patients.
The data is the blood samples [1].
b. Conclusion: Among all, the model based on decision tree is the more efficient
compared to other classification models.
2) Interactive Thyroid Disease Prediction System Using Machine
Learning Technique:
a. Methodology: In this paper, we also proposed different machine learning
techniques and diagnosis for the prevention of thyroid [2]. SVM, KNN, Decision
Tree and ANN were used to predict the estimated risk on a patient’s chance of
obtaining thyroid disease.
b. Conclusion: Among all, the model based on SVM is the more efficient compared
to other classification models.
3) Comparative Analysis of Thyroid Disease based on Hormone Level
using Data Mining Techniques:
a. Methodology: In this work, techniques such as SVM, KNN, Decision tree and
Naïve bayes, Random Forest and Logistic Regression to identify the type of
thyroid disease [3]. The dataset is collected from repository data which consists
of records of thyroid patients.
b. Conclusion: The system makes use of some Data Mining classification and
regression algorithms to provide diagnosis report of thyroid to patients with
reduced cost. The logistic regression is more efficient and accurate compared to
other classification techniques.

Dept. of CSE, CIT Page 4


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

4) A Comparative Study of Machine Learning Based Model for


Thyroid Disease Prediction:
a. Methodology: The study done in [4] is based on the prediction of thyroid disease
using Decision Tree and Artificial Neural Network. This paper explains to see the
separation of the best way. The dataset used for the experimental purpose is
downloaded from UCI machine learning repository.
b. Conclusion: The operation is completed using both the segregation modes and
their accuracy is compared with the confusion matrix. It has been concluded that
the ANN is more accurate than the decision tree classification.
5) A Comparative study of machine learning algorithms on thyroid
disease prediction:
a. Methodology: In this study, a comparative thyroid disease diagnosis is
performed by using machine learning techniques that is SVM, Multiple Linear
Regression, Naïve Bayes, Decision Trees. For this purpose, thyroid disease
dataset gathered from the UCI machine learning database was used. The study is
done in [5].
b. Conclusion: It is observed from the results that the decision tree outpowers the
other techniques with respect to accuracy of network to diagnose the disease.
6) Prediction of Thyroid Disease Using Data Mining Techniques:
a. Methodology: The study in [6] analyses and compares four classification
models: Naive Bayes, Decision Tree, Multilayer Perceptron and Radial Basis
Function Network. The data set used to build and to validate the classifier was
provided by UCI machine learning repository and by a website with Romanian
data.
b. Conclusion: On comparing the four classification models, the results indicate a
significant accuracy for all the classification models, the best classification rate
being that of the Decision Tree model.
7) Deep Learning:
Feedforward neural networks, or multi-layer perceptron (MLPs), are comprised of an
input layer, a hidden layer or layers, and an output layer. Data usually is fed into these
models to train them, and they are the foundation for computer vision, natural language
processing, and other neural networks.

Dept. of CSE, CIT Page 5


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

Deep-learning architectures such as deep neural networks, deep belief networks, deep
reinforcement learning, recurrent neural networks and convolutional neural networks
have been applied to fields including computer vision, speech recognition, natural
language processing, where they have produced results comparable to and in some cases
surpassing human expert performance [7].
A deep neural network (DNN) is an artificial neural network with multiple layers
between the input and output layers. There are different types of neural networks but they
always consist of the same components: neurons, synapses, weights, biases, and
functions. These components functioning similar to the human brains and can be trained
like any other ML algorithm.
Convolutional neural networks (CNNs) are similar to feedforward networks, but
they’re usually utilized for image recognition, pattern recognition, computer vision [8].
Recurrent neural networks (RNNs) are identified by their feedback loops. These
learning algorithms are primarily leveraged when using time-series data to make
predictions about future outcomes, such as stock market predictions or sales forecasting
[8].
As the thyroid dataset we have taken consists of sequence of textual data, we have
considered RNN.
8) Recurrent Neural Network:
A recurrent neural network (RNN) is a class of artificial neural networks where
connections between nodes form a directed or undirected graph along a temporal
sequence. This allows it to exhibit temporal dynamic behaviour. Derived from
feedforward neural networks, RNNs can use their internal state (memory) to process
variable length sequences of inputs. This makes them applicable to tasks such as
unsegmented, connected handwriting recognition or speech recognition [8].
Both finite impulse and infinite impulse recurrent networks can have additional stored
states, and the storage can be under direct control by the neural network. The storage can
also be replaced by another network or graph if that incorporates time delays or has
feedback loops. Such controlled states are referred to as gated state or gated memory, and
are part of long short-term memory networks (LSTMs) and gated recurrent units (GRUs).
This is also called Feedback Neural Network (FNN).

Dept. of CSE, CIT Page 6


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

9) Long Short-Term Memory:


Long short-term memory (LSTM) is an artificial neural network used in the fields of
artificial intelligence and deep learning. Unlike standard feedforward neural networks,
LSTM has feedback connections. Such a recurrent neural network can process not only
single data points (such as images), but also entire sequences of data (such as speech or
video) [9].
10) Gated Recurrent Unit:
The Gated Recurrent Unit is like a long short-term memory (LSTM) with a forget
gate, but has fewer parameters than LSTM, as it lacks an output gate. GRU's performance
on certain tasks of polyphonic music modelling, speech signal modelling and natural
language processing was found to be similar to that of LSTM. GRUs have been shown to
exhibit better performance on certain smaller and less frequent datasets [10].

Dept. of CSE, CIT Page 7


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

CHAPTER 3

SYSTEM ANALYSIS

3.1 Existing System


Machine Learning is the science that deals with getting computers to perform in a
specified manner, without meddling with their programming capabilities. One of the
common uses for machine learning is performing binary classification, which looks at an
input and predicts which of two possible classes it belongs to. Such models are trained
with datasets labelled with 1s and 0s representing the two classes. Data usually changes
on a routine basis. We can apply numerous machine learning algorithms for all kinds of
data problems.

After the literature survey, it has been observed that most of the authors have used
the most common machine learning algorithms such as Decision Tree, Naïve Bayes,
KNN, SVM, Random Forest, Regression algorithms and other techniques to predict,
classify and analyse the data.

Drawbacks with the existing system:


The drawbacks of using the traditional machine learning algorithms are:
 Linear regression model oversimplifies the problem by assuming a linear
relationship among the variables.
 Logistic regression tends to underperform when there are multiple or non-linear
decision boundaries.
 The SVMs are trickier to tune due to the importance of picking the right kernel.
 Decision trees are prone to overfitting, especially when a tree is particularly deep.
 Random Forest requires much time for training as it combines a lot of decision
trees to determine the class.
 Naive Bayes assumes class conditional independence, which decreases the
accuracy of the model.
 KNN requires high storage. Its prediction rate is slow. It stores all the training
data. The algorithm gets slower when the number of examples, predictors or
independent variables increases.

Deep learning can be used for binary classification, too. In fact, building a neural
network that acts as a binary classifier is little different than building one that acts as a
regressor [11].
Dept. of CSE, CIT Page 8
THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

3.2 Proposed System


The proposed system uses the deep learning technique, Recurrent Neural Network.
The system uses models based on RNN architecture, LSTM and GRU for detecting the
thyroid disease. The thyroid Dataset is taken from UCI Machine Learning Repository.
The dataset consists of different attributes of a person such as age, sex, sick, pregnant,
and the attributes related to thyroid such as values of different thyroid hormones such as
thyroxine, triiodothyronine, Thyroid Stimulating Hormone etc. It includes research,
content strategy and accurate prediction system.
The current system has two phases of classifying thyroid dataset. First one is to import
dataset and data pre-processing. In this segment, the dataset and the required machine
learning libraries are imported and the unnecessary columns are removed from the dataset
to ensure better detection and change the dimensions from 2D to 3D to give input for the
models chosen. Second one is to apply the models. In this segment, the 3D data is split
into train and test sets. The different variants of LSTM models are applied on train set and
evaluated on test set. The output is binary classification i.e., either positive or negative.

Dept. of CSE, CIT Page 9


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

CHAPTER 4

SYSTEM DESIGN
The design of the system deals with how the system is developed. It explains the flow
functionalities in brief. The section contains system data flow diagram , Flowchart and
Sequence Diagram described below.

4.1 System Architecture


The machine learning architecture defines the various layers involved in the machine
learning cycle and involves the major steps being carried out in the transformation of raw
data into training data sets capable for enabling the decision making of a system.
In this, user use the thyroid dataset taken as an input and then those input is
transformed and converted to a 3D input, then the dataset is trained and tested for splitting
up. Later using LSTM, we are going to predict the thyroid based on the previous thyroid
results. After this finally we will get the result by displaying some graphs and validating
the model.

Figure 4.1: Architecture of the proposed system

Dept. of CSE, CIT Page 10


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

4.2 Flowchart
A flowchart is a type of diagram that represents an algorithm, workflow or process.
Flowchart can also be defined as a diagrammatic representation of an algorithm (step by
step approach to solve a task).
Start

Import dataset and libraries

Data Pre-processing

Reshape dataframe

Splitting of data

Train model

Evaluate model

Predict training data values

Stop

Figure 4.2: Flowchart for the proposed system

Dept. of CSE, CIT Page 11


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

4.3 Sequence Diagram


A sequence diagram shows object interactions arranged in time sequence. It depicts
the objects and classes involved in the scenario and the sequence of messages exchanged
between the objects needed to carry out the functionality of the scenario. Sequence
diagrams are typically associated with use case realizations in the Logical View of the
system under development. Sequence diagrams are sometimes called event diagrams or
event scenarios.
By using LSTM model, the user will detect the thyroid disease in the given sequence
diagram. It has been split into 2 parts. In phase 1, the user will input data and then the
data is converted as supervised learning, as shown in the figure 4.3 (a). Then, in phase 2,
the 3D data which is returned in phase 1 is used for further process till the end of the
prediction as shown in the figure 4.3 (b).

Figure 4.3 (a): Sequence diagram of the proposed system (Phase1)

Figure 4.3 (b): Sequence diagram of the proposed system (Phase2)

Dept. of CSE, CIT Page 12


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

4.4 Use case Diagram


Use cases are used during the analysis phase of a project to identify system
functionality. They separate the system into actors and use cases. Actors represent roles
that are played by users of the system. Users may be humans, other computers, or even
other software systems.
Use case diagrams are used to gather the requirements of a system including internal
and external influences. These requirements are mostly design requirements. Hence,
when a system is analysed to gather its functionalities, use cases are prepared and actors
are identified.

Figure 4.4: Use Case diagram of the proposed system.

Dept. of CSE, CIT Page 13


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

CHAPTER 5

IMPLEMENTATION

5.1 Importing Libraries


 Numpy: NumPy is the fundamental package for scientific computing in Python.
Numpy is a Python library that provides a multidimensional array object, various
derived objects (such as masked arrays and matrices), and an assortment of routines
for fast operations on arrays, including mathematical, logical, shape manipulation,
sorting, selecting, I/O, discrete Fourier transforms and much more [12].
 Pandas: Pandas is a Python package providing fast, flexible, and expressive data
structures designed to make working with “relational” or “labelled” data both easy
and intuitive. It aims to be the fundamental high-level building block for doing
practical, real-world data analysis in Python [13].
 Matplotlib: Matplotlib is a comprehensive library for creating static, animated, and
interactive visualizations in Python [14].
 Seaborn: Seaborn is a library for making statistical graphics in Python. It builds on
top of matplotlib and integrates closely with Pandas. Its plotting functions operate on
dataframes and arrays containing whole datasets and internally perform the necessary
semantic mapping and statistical aggregation to produce informative plots. Its
dataset-oriented, declarative API lets you focus on what the different elements of
your plots mean, rather than on the details of how to draw them [15].
 Scikit-learn: Scikit-learn (also known as sklearn) is an open-source machine
learning library that supports supervised and unsupervised learning. It also provides
various tools for model fitting, data pre-processing, model selection, model
evaluation, and many other utilities [16].
 Warnings: Warning messages are typically issued in situations where it is useful to
alert the user of some condition in a program, where that condition (normally)
doesn’t warrant raising an exception and terminating the program. For example, one
might want to issue a warning when a program uses an obsolete module [17].
 Tensorflow: TensorFlow is an end-to-end open-source platform for machine
learning. It has a comprehensive, flexible ecosystem of tools, libraries and
community resources that lets researchers push the state-of-the-art in ML and
developers easily build and deploy ML powered applications [18].
5.2 Dataset Collection
Dept. of CSE, CIT Page 14
THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

The dataset is acquired from the UCI machine learning repository. The dataset consists
of 3772 instances, 30 attributes of which 24 attributes are categorical and six attributes
are continuous. All the instances in the dataset are labelled to one of the two classes:
positive and negative. The description of the attributes which are present in the dataset is
shown in Table 5.1 given below:

Table 5.1: Thyroid Disease Dataset Attribute Description

5.3 Data Pre-processing

Dept. of CSE, CIT Page 15


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

Data pre-processing is a step in the data analysis that takes raw data and transforms it
into a format that can be understood and analysed by the computers [19]. The steps
involved in data pre-processing are:
 Data Cleaning
 Data Transformation
 Data Visualization
A. Data Cleaning: Data cleaning is the process of adding missing data and correcting,
repairing, or removing incorrect or irrelevant data from a data set. Data cleaning is
the most important step of pre-processing because it will ensure that your data is
ready to go for your downstream needs. Data Cleaning includes the following stages:
 Dropping Unwanted Features
 Handling irrelevant data
a) Dropping Unwanted Features: The features which were just the information of
the tests conducted and the sources from which the data is taken are dropped as
they are not useful for analysis, but just useful for the information.
b) Handling irrelevant data: Irrelevant data can be defined as the data that don't
fit under the context of the problem we're trying to solve. In the dataset, there are
data with the value ‘?’ which don’t fit the analysis. These data are handled by
a. removing them, for categorical features
b. replacing with mean value of the column, for numerical features

Figure 5.1: The dataset after data cleaning

Dept. of CSE, CIT Page 16


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

B. Data Transformation: Data transformation is the process of turning the data into the
proper format(s) which is needed for analysis and other downstream processes. Data
Transformation includes following stages:
 Replace the data with binary number, for categorical features
 Normalize the data, for numerical features
a) Replace the data with binary number: The binary data present in categorical
features are replaced with binary numbers in order to consider these data for the
analysis. The data in the target column is mapped with the binary numbers.
Mapping is used for substituting each value in a Series with another value, that
may be derived from a function, a Dictionary or a Series.
b) Normalize the data: The data which is present in numerical features is subjected
to data normalization. Normalizing the data refers to scaling the data values to a
much smaller range such as [-1, 1] or [0.0, 1.0]. Out of different methods available
to normalize the data, the Min-max normalization is used.
a. Min-max normalization: Min-max scaling is a common feature pre-
processing technique which results in scaled data values that fall in the range
[0,1]. When applied to a Python sequence, such as a Pandas Series, scaling
results in a new sequence such that 0 is the minimum value and 1 is the
maximum value of the prior unscaled sequence.

Figure 5.2: The dataset after data transformation

Dept. of CSE, CIT Page 17


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

C. Data Visualization: Data visualization is defined as a graphical representation that


contains the information and the data. By using visual elements like charts, graphs,
and maps, data visualization techniques provide an accessible way to see and
understand trends, outliers, and patterns in data [20]. There are three different types
of analysis for Data Visualization:
 Univariate Analysis: In the univariate analysis, we will be using a single feature
to analyse almost all of its properties. Distribution plot can be used to visualize
univariate analysis.
 Bivariate Analysis: When we compare the data between exactly 2 features then it
is known as bivariate analysis. Bar plot can be used to visualize bivariate analysis.
 Multivariate Analysis: In the multivariate analysis, we will be comparing more
than 2 variables. Heatmap can be used to visualize multivariate analysis.
A heatmap is a plot of rectangular data as a color-encoded matrix. It takes
pairwise correlation of all columns in the data frame as input to plot.

Figure 5.3: Heatmap of the dataset

Dept. of CSE, CIT Page 18


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

Fig Figu
ure 5.4: Age-wise analysis of thyroid data re 5.5: Gender-wise analysis of thyroid data

Figu
Figure 5.6: Medication-wise analysis of re 5.7: Sick-wise analysis of thyroid data
thyroid data

Dept. of CSE, CIT Page 19


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

Dept. of CSE, CIT Page 20


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

Fig Figur
ure 5.8: Pregnancy-wise analysis of e 5.9: Surgery-wise analysis of thyroid data
thyroid data

Fig Figur
ure 5.10: Treatment-wise analysis of e 5.11: Lithium-wise analysis of thyroid
thyroid data data

Dept. of CSE, CIT Page 21


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

Figur Fi
e 5.12: Goitre-wise analysis of thyroid data gure 5.13: Tumor-wise analysis of
thyroid data

Figur Fi
e 5.14: Distribution plot of TSH gure 5.15: Distribution plot of T3

Dept. of CSE, CIT Page 22


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

Figu Figu
re 5.16: Distribution plot of TT4 re 5.17: Distribution plot of T4U

Figu Figu
re 5.18: Distribution plot of FTI re 5.19: Count plot of target attribute

Dept. of CSE, CIT Page 23


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

5.4 Reshape Dataframe:


In Pandas, data reshaping means the transformation of the structure of a table or vector
(DataFrame or Series) to make it suitable for further analysis.
The models chosen are based on Recurrent neural network and expect the input in 3-
dimensions. The dataset which is pre-processed, is in 2-dimensions. The structure of the
dataframe is transformed to 3-dimensions in order to input the data to the models.

Split into Train and Test Set: Before building the models for prediction, the predictors
(X) and target (y) of the dataset is separated. Then, the dataset is split into train_set and
test_set. 70% of the data is assigned for train_set and the remaining 30% is assigned for
test_set. The models are then trained on the train_set and the predicted output is
compared with the test_set. Splitting up of dataset is done using the train_test_split.

 train_test_split: train_test_split is a function in Sklearn model selection for splitting


data arrays into two subsets: for training data and for testing data. With this function,
you don't need to divide the dataset manually. By default, Sklearn train_test_split will
make random partitions for the two subsets.
Parameters:
 train_test_split(X, y, train_size=0.*,test_size=0.*, random_state=*).
 X, y. The first parameter is the dataset you're selecting to use.
 train_size. This parameter sets the size of the training dataset.
 test_size. This parameter specifies the size of the testing dataset.
 random_state. The default mode performs a random split using [Link].
Alternatively, you can add an integer using an exact number.
5.5 Models
After splitting the dataset into train and test sets, the models are applied on train_set.
In order to apply the models, few libraries are imported. The libraries that are essential to
apply the models based on RNN architecture are:
 Tensorflow
 Keras
A) Tensorflow: TensorFlow is a free and open-source software library for machine
learning and artificial intelligence. It can be used across a range of tasks but has a
particular focus on training and inference of deep neural networks [21]. TensorFlow
can be used in a wide variety of programming languages. This flexibility lends itself
to a range of applications in many different sectors.

Dept. of CSE, CIT Page 24


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

B) Keras: Keras is an open-source software library that provides a Python interface for
artificial neural networks. Keras acts as an interface for the TensorFlow library [22].
Keras contains numerous implementations of commonly used neural-network
building blocks such as layers, objectives, activation functions, optimizers, and a host
of tools to make working with image and text data easier to simplify the coding
necessary for writing deep neural network code.

The models are built based on RNN architecture. The models used are:

 Long Short-Term Memory


 Gated Recurrent Unit
a) Long Short-Term Memory:
Long short-term memory (LSTM) is an artificial neural network used in the fields of
artificial intelligence and deep learning. Unlike standard feedforward neural networks,
LSTM has feedback connections. A common LSTM unit is composed of a cell, an input
gate, an output gate and a forget gate. The cell remembers values over arbitrary time
intervals and the three gates regulate the flow of information into and out of the cell.
LSTMs were developed to deal with the vanishing gradient problem that can be
encountered when training traditional RNNs.

Figure 5.20: An LSTM Cell


The different univariate LSTM models are:
 Vanilla LSTM
 Bidirectional LSTM
 Stacked LSTM

Dept. of CSE, CIT Page 25


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

a. Vanilla LSTM: A Vanilla LSTM (also known as a Simple LSTM or Classical LSTM)
is an LSTM model that has a single hidden layer of LSTM units, and an output layer
used to make a prediction. A vanilla LSTM unit is composed of a cell, an input gate,
an output gate and a forget gate. The cell remembers values over arbitrary time
intervals and the three gates regulate the flow of information associated with the cell.
In the remainder of this section, LSTM will refer to the vanilla version as this is the
most popular LSTM architecture. This does not imply, however, that it is also the
superior one in every situation.

Figure 5.21: Architecture of typical vanilla LSTM block


b. Bidirectional LSTM: The Bi-LSTM neural network is composed of LSTM units that
operate in both directions to incorporate past and future context information. Unlike
the LSTM network, the Bi-LSTM network has two parallel layers that propagate in
two directions with forward and reverse passes to capture dependencies in two
contexts.

Dept. of CSE, CIT Page 26


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

Figure 5.22: Architecture of Bidirectional LSTM


c. Stacked LSTM: A Stacked LSTM architecture can be defined as an LSTM model
comprised of multiple LSTM layers. An LSTM layer above provides a sequence
output rather than a single value output to the LSTM layer below. Specifically, one
output per input time step, rather than one output time step for all input time steps.

Figure 5.23: Architecture of Stacked LSTM with 2 LSTMs

b) Gated Recurrent Unit:


The GRU is like an LSTM with a forget gate, but has fewer parameters than LSTM, as
it lacks an output gate. Unlike LSTM, it consists of only three gates and does not maintain
an Internal Cell State. The information which is stored in the Internal Cell State in an
LSTM recurrent unit is incorporated into the hidden state of the GRU. This collective
information is passed onto the next GRU. The different gates of a GRU are update gate,
reset gate and current memory gate. The Update Gate determines how much of the past
knowledge needs to be passed along into the future. It is analogous to the Output Gate in
an LSTM recurrent unit. The Reset Gate determines how much of the past knowledge to
forget. It is analogous to the combination of the Input Gate and the Forget Gate in an
LSTM recurrent unit. The Current Memory Gate is incorporated into the Reset Gate just
Dept. of CSE, CIT Page 27
THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

like the Input Modulation Gate is a sub-part of the Input Gate and is used to introduce
some non-linearity into the input and to also make the input Zero-mean.

The basic work-flow of a GRU is similar to that of a basic RNN when illustrated, the
main difference between the two is in the internal working within each recurrent unit as
GRUs consist of gates which modulate the current input and the previous hidden state.

Figure 5.24: Architecture of GRU block


The various steps involved in building the models mentioned above are:

Step 1: Define Network

Step 2: Compile Network

Step 3: Fit Network

Step 4: Evaluate Network

Step 5: Make Predictions

Step 1: Define Network. The first step in building the model

Neural networks are defined in Keras as a sequence of layers. The container for these
layers is the Sequential class. The first step is to create an instance of the Sequential class.
Then you can create your layers and add them in the order that they should be connected.
The LSTM recurrent layer comprised of memory units is called LSTM(). A fully
connected layer that often follows LSTM layers and is used for outputting a prediction is
called Dense(). For example, we can define the network in two steps:

model = Sequential()

[Link](LSTM(2))

[Link](Dense(1))
Dept. of CSE, CIT Page 28

model = Sequential()
THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

But we can also do this in one step by creating an array of layers and passing it to the
constructor of the Sequential.

layers = [LSTM(2), Dense(1)]

model = Sequential(layers)

layers = [LSTM(2), Dense(1)]

model = Sequential(layers)

The first layer in the network must define the number of inputs to expect. The Input
must be of 3-dimension, comprised of samples, timesteps, and features.

 Samples are the rows in your data.


 Timesteps are the past observations for a feature, such as lag variables.
 Features are columns in your data.

LSTM layers can be stacked by adding them to the Sequential model. Importantly,
when stacking LSTM layers, we must output a sequence rather than a single value for
each input so that the subsequent LSTM layer can have the required 3D input. We can do
this by setting the return_sequences argument to True.

We have used ‘tanh’ activation function to make the prediction between -1 and 1 the
resulting activation between -1 and 1 is then weighted to finally give us the features to use
in making our predictions.

Step 2: Compile Network. The second step in building the model

Compilation transforms the simple sequence of layers that we defined into a highly
efficient series of matrix transforms in a format intended to be executed.

Compilation is always required after defining a model. Compilation requires a number


of parameters to be specified, specifically tailored to training your network. The

Dept. of CSE, CIT Page 29


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

optimization algorithm to use to train the network and the loss function used to evaluate
the network. The different loss functions used for different purposes are:

 For Regression: mean_squared_error.


 For Binary Classification ( 2 class): binary_crossentropy.
 For Multiclass Classification ( >2 class): categorical_crossentropy.

For example, below is a case of compiling a defined model and specifying the
stochastic gradient descent (sgd) optimization algorithm and the mean squared error
(mean_squared_error) loss function, intended for a regression type problem.

[Link](optimizer='sgd', loss='mean_squared_error')

Alternately, the optimizer can be created and configured before being provided as an
argument to the compilation step.

algorithm = SGD(lr=0.1, momentum=0.3)


[Link](optimizer=algorithm, loss='mean_squared_error')

The most common optimization algorithm is stochastic gradient descent, but Keras
also supports a suite of other state-of-the-art optimization algorithms that work well with
little or no configuration. Perhaps the most commonly used optimization algorithms
because of their generally better performance are:

 Stochastic Gradient Descent, or sgd, that requires the tuning of a learning rate and
momentum.
 ADAM, or adam, that requires the tuning of learning rate.
 RMSprop, or rmsprop, that requires the tuning of learning rate.

Finally, you can also specify metrics to collect while fitting your model in addition to
the loss function. Generally, the most useful additional metric to collect is accuracy for
classification problems. For example:

[Link](optimizer='sgd', loss='mean_squared_error', metrics=['accuracy'])

Step 3: Fit Network. The third step in building the model

Dept. of CSE, CIT Page 30


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

Fitting the network requires the training data to be specified, both a matrix of input
patterns, X, and an array of matching output patterns, y.

The network is trained using the back propagation algorithm and optimized according
to the optimization algorithm and loss function specified when compiling the model. The
backpropagation algorithm requires that the network be trained for a specified number of
epochs or exposures to the training dataset. One epoch is when an entire dataset is passed
forward and backward through the neural network only once.

Each epoch can be partitioned into groups of input-output pattern pairs called batches.
This defines the number of patterns that the network is exposed to before the weights are
updated within an epoch. A minimal example of fitting a network is as follows:

history = [Link](X, y, batch_size=10, epochs=100)

Once fit, a history object is returned that provides a summary of the performance of
the model during training. This includes both the loss and any additional metrics specified
when compiling the model, recorded each epoch.

Training can take a long time, from seconds to hours to days depending on the size of
the network and the size of the training data. The amount of information displayed can be
reduced to just the loss each epoch by setting the verbose argument. For example:

history = [Link](X, y, batch_size=10, epochs=100, verbose=0)

Step 4: Evaluate Network. The fourth step in building the model

Once the network is trained, it can be evaluated. The network can be evaluated on the
training data, but this will not provide a useful indication of the performance of the
network as a predictive model, as it has seen all of this data before.
We can evaluate the performance of the network on a separate dataset, unseen during
testing. This will provide an estimate of the performance of the network at making
predictions for unseen data in the future.
For example, for a model compiled with the accuracy metric, we could evaluate it on a
new dataset as follows:

loss, accuracy = [Link](X, y)

Dept. of CSE, CIT Page 31


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

As with fitting the network, verbose output is provided to give an idea of the progress
of evaluating the model. We can turn this off by setting the verbose argument to 0.

loss, accuracy = [Link](X, y, verbose=0)

Step 5: Make predictions. The last step in building the model

Once we are satisfied with the performance of our fit model, we can use it to make
predictions on new data.
This is as easy as calling the predict() function on the model with an array of new input
patterns. For example:

predictions = [Link](X)

The predictions will be returned in the format provided by the output layer of the
network.

 In the case of a regression problem, these predictions may be in the format of the
problem directly, provided by a linear activation function.
 For a binary classification problem, the predictions may be an array of probabilities for
the first class that can be converted to a 1 or 0 by rounding.
For a multiclass classification problem, the results may be in the form of an array of
probabilities (assuming a one hot encoded output variable) that may need to be converted
to a single class output prediction using the argmax() NumPy function. Alternately, for
classification problems, we can use the predict_classes() function that will automatically
convert uncrisp predictions to crisp integer class values.

predictions = model.predict_classes(X)

Dept. of CSE, CIT Page 32


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

As with fitting and evaluating the network, verbose output is provided to given an idea
of the progress of the model making predictions. We can turn this off by setting the
verbose argument to zero (0).

predictions = [Link](X, verbose=0)

CHAPTER 6

RESULTS
We have applied different types of LSTM models and a variant of LSTM to classify
the data into two classes on a given dataset. The models which were built are fit for 16
batches and 2 epochs and are evaluated based on the accuracy, loss, validation accuracy
and validation loss for each epoch. We obtain a confusion matrix. Confusion matrix
represents the values of True positive (TP), false negative (FN), false positive (FP) and
true positive (TP). TP is the number of correct prediction that a value belongs to same
class. TN is the number of correct prediction that a value does belong to same class. FP is
the number of incorrect prediction that a value belongs to a class when it belongs to some
other class. FN is the number of incorrect prediction that a value belongs to some other
class when it belongs to the same class.

Accuracy or the training accuracy can be defined as a method for measuring a


classification model’s performance. The accuracy can be calculated using the equation:

Accuracy = (TP+TN) / (TP+TN+FN+FP)

Loss or the training loss can be defined as a summation of the errors made for each
sample in training or validation sets. During the training process the goal is to minimize
this value. The final experimental results for the accuracies and losses of models are as
shown in the Table 6.1 below:

Dept. of CSE, CIT Page 33


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

Table 6.1: Accuracies and losses of models

ACCURACY LOSS
MODEL
Epoch - 1 Epoch - 2 Epoch-1 Epoch-2

Vanilla LSTM 76.89% 92.24% 0.3793 0.2736

Bidirectional LSTM 82.65% 92.20% 0.3545 0.2851

Stacked LSTM 77.57% 91.92% 0.3944 0.2935

Gated Recurrent Unit 56.54% 91.76% 0.6808 0.2951

The test (or testing) accuracy often refers to the validation accuracy, is the accuracy
you calculate on the data set you do not use for training, but you use (during the training
process) for validating (or "testing") the generalisation ability of your model or for "early
stopping".

Validation loss is a metric used to assess the performance of a deep learning model on
the validation set. The validation set is a portion of the dataset set aside to validate the
performance of the model. The validation loss is similar to the training loss and is
calculated from a sum of the errors for each example in the validation set. The final
experimental results for the validation losses of models are as shown in the Table 6.2
below:

Table 6.2: Validation accuracies and validation losses of models

VALIDATION VALIDATION LOSS

MODEL ACCURACY

Epoch - 1 Epoch - 2 Epoch-1 Epoch-2

Vanilla LSTM 92.16% 92.16% 0.2773 0.2745

Bidirectional LSTM 92.16% 92.16% 0.2752 0.2746

Stacked LSTM 92.16% 92.16% 0.2749 0.2746

Gated Recurrent Unit 92.16% 92.16% 0.2754 0.2748

From the above results, we can draw some observations for:

Dept. of CSE, CIT Page 34


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

a) Train Set:
 In terms of accuracy, the model based on vanilla LSTM performs the best, followed
by the model based on bidirectional LSTM.
 As far as the loss is concerned, the model which is based on vanilla LSTM has the
lower loss value compared to other models.
b) Validation Set:
 Even though the validation accuracy is same for all models, the model based on
vanilla LSTM has the lowest validation loss value, which means, it is the best
model compared to other models.

Dept. of CSE, CIT Page 35


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

Figure 6.1: Vanilla LSTM Model Accuracy

Figure 6.2: Vanilla LSTM Model Loss

Dept. of CSE, CIT Page 36


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

Figure 7.3: Bidirectional LSTM Model Accuracy

Figure 6.4: Bidirectional LSTM Model Loss

Dept. of CSE, CIT Page 37


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

Figure 8.5: Stacked LSTM Model Accuracy

Figure 6.6: Stacked LSTM Model Loss

Dept. of CSE, CIT Page 38


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

Figure 9.7: GRU Model Accuracy

Figure 6.8: GRU Model Loss

Dept. of CSE, CIT Page 39


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

CHAPTER 7

CONCLUSION
In this work, we have used deep learning techniques to detect the thyroid disease. In
this system, we have used different types of LSTM such as Vanilla LSTM, bidirectional
LSTM, stacked LSTM and Gated Recurrent Unit. The study found that the model based
on vanilla LSTM gave the best accuracy with the least loss compared to other models.
But other recent techniques can be combined in future to give still more accurate results
of thyroid diagnosis.

Dept. of CSE, CIT Page 40


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

REFERENCES

[1] K. Dharmarajan, K. Balasree, A. Arunachalam and K. Abirmai, "Thyroid Disease


Classification Using Decision Tree and SVM," Indian Journal of Public Health Research &
Development, vol. 11, pp. 224-229, March 2020.

[2] A. Tyagi, R. Mehra and A. Saxena, "Interactive Thyroid Disease Prediction System Using
Machine Learning Technique," 5th IEEE International Conference on Parallel, Distributed
and Grid Computing, pp. 689-693, 20-22 December 2018.

[3] G. Pushpanathan, G. Singh, U. A. Kumar, P. Ramesh and A. K. Dubey, "Comparative


Analysis of Thyroid Disease based on Hormone Level using Data Mining Techniques,"
International Journal of Engineering Research & Technology, vol. 8, no. 14, pp. 30-34,
2020.

[4] S. K. KASHYAP and N. SAHU, "A COMPARATIVE STUDY OF MACHINE


LEARNING BASED MODEL FOR THYROID DISEASE PREDICTION," International
Journal of Creative Research Thoughts, vol. 9, no. 4, pp. 2140-2144, April 2021.

[5] S. Razia, P. S. Prathyusha, N. V. Krishna and N. S. Sumana, "A Comparative study of


machine learning algorithms on thyroid disease prediction," International Journal of
Engineering & Technology, vol. 7, pp. 315-319, 2018.

[6] I. IoniŃă and L. IoniŃă, "Prediction of Thyroid Disease Using Data Mining Techniques,"
Broad Research in Artificial Intelligence and Neuroscience, vol. 7, no. 3, pp. 115-124,
August 2016.

[7] W. contributors, "Deep learning," Wikipedia, The Free Encyclopedia, [Online]. Available:
[Link] [Accessed
17 May 2022].

[8] "Neural Networks," IBM Cloud Education, 17 August 2020. [Online]. Available:
[Link]

[9] W. contributors, "Long short-term memory," Wikipedia, The Free Encyclopedia, [Online].
Available: [Link]

[10] W. contributors, "Gated recurrent unit," Wikipedia, The Free Encyclopedia, [Online].
Available: [Link]

[11] J. Prosise, "Binary Classification with Neural Networks," Atmosera, 20 September 2021.
[Online]. Available: [Link]

Dept. of CSE, CIT Page 41


THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22

networks/.

[12] "What is NumPy?," [Online]. Available:


[Link]

[13] "Package overview," [Online]. Available:


[Link]

[14] "Matplotlib 3.5.2 documentation," [Online]. Available:


[Link]

[15] M. Waskom, "An introduction to seaborn," [Online]. Available:


[Link]

[16] Pedregosa et al., "Scikit-learn: Machine Learning in Python," [Online]. Available:


[Link]

[17] "warnings — Warning control," [Online]. Available:


[Link]

[18] Mart´ın Abadi et al., "TensorFlow: Large-scale machine learning on heterogeneous


systems," [Online]. Available: [Link]

[19] "What Is Data Preprocessing & What Are The Steps Involved?," [Online]. Available:
[Link]

[20] A. Goyal, "Must Known Data Visualization Techniques for Data Science," 7 June 2021.
[Online]. Available: [Link]
visualization-techniques-for-data-science/.

[21] W. contributors, "TensorFlow," Wikipedia, The Free Encyclopedia., 1 June 2022. [Online].
Available: [Link]

[22] W. contributors, "Keras," Wikipedia, The Free Encyclopedia., 1 June 2022. [Online].
Available: [Link]

Dept. of CSE, CIT Page 42

You might also like