Project Report
Project Report
CHAPTER 1
INTRODUCTION
Thyroid disease is a medical condition that affects the function of the thyroid
gland. The thyroid gland is located at the front of the neck and produces thyroid
hormones that travel through the blood to help regulate many other organs, meaning that
it is an endocrine organ. These hormones normally act in the body to regulate energy use,
infant development, and childhood development. Classification techniques that play vital
role and there is a major role for analysing diseases and providing facilities to reduce the
cost for the patients. In India, it is expected that about 42 million people suffer from
thyroid disorders. Symptoms include weight gain, tiredness, weakness as well as feeling
cold etc.
The thyroid gland is an endocrine gland in the neck. It erects in the lessened part
of the human neck, beneath the Adam’s apple which aids in the secretion of thyroid
hormones and that basically influences the rate of metabolism and protein synthesis. To
control the metabolism in the body, thyroid hormones are useful in many ways, counting
how briskly the heart beats and how quickly the calories are burnt. The composition of
thyroid hormones by the thyroid gland helps in the domination of the body’s metabolism.
The thyroid glands are composed of two active thyroid hormones, levothyroxine
(abbreviated T4) and triiodothyronine (abbreviated T3). To regulate the temperature of
the body these hormones are imperative in the fabrication and also in the comprehensive
construction and supervision. The hypothalamus in the brain produces Thryrotropin
releasing hormone (TRH) which causes pituitary gland to release hormone called thyroid
stimulating hormone (TSH). This happens when there is a lack of production of thyroid
hormone. TSH is the one which stimulates thyroid gland to release T4. Thyroid disease
can be classified mainly into two types:
a. Hyperthyroidism:
This is a disorder where thyroid gland produces enormous amount of thyroid
hormones. Common symptoms of this includes restlessness, agitation, tremors,
weight loss, rapid heartbeat, frequent bowel movements. In this disease most of T4
hormone gets converted to T3. The main causes of hyperthyroidism are excessive
intake of iodine, abnormal secretion of TSH, Excessive intake of thyroid hormones.
Grave’s disease is severe condition of thyroid which may lead to death.
b. Hypothyroidism
Hypothyroidism is a situation where amount of production of thyroid hormone
reduces. Common symptoms of hypothyroidism include dry skin, constipation,
feeling cold, prolonged menstrual bleeding, sudden weight gain. This also causes
other disorders such as Thyroid hormone resistance, Hashimoto’s thyroiditis.
1.2 Objectives
To build an automatic system that develops the solution for the problem that uses
deep learning.
To classify the result into whether a person is thyroid positive or thyroid negative
based on the historical data.
To identify the best possible model to detect the thyroid presence for given data.
1.3 Scope of the project
The Thyroid disease detection using the deep learning technique is realized by making
use of different attributes such as age, sex, sick, pregnant, values of different thyroid
hormones such as thyroxine, triiodothyronine, Thyroid Stimulating Hormone etc. It
includes research, content strategy and accurate prediction system. It consists of two
phases. First one is to import dataset and data pre-processing. In this segment, the
unnecessary columns are removed from the dataset to ensure better detection and change
the dimensions from 2D to 3D to give input for the models chosen. Second one is to apply
the models. In this segment, the 3D data is split into train and test sets and the models are
applied on train set and evaluated on test set.
The project is mainly useful in medical field. This benefits many patients in
identifying their thyroid results based on the symptoms they have, by giving the values of
the thyroid diagnosis they have undergone. It also benefits the doctors in determining the
disease and treating their patients well and making them to get rid of the disease as soon
as possible.
CHAPTER 2
LITERATURE SURVEY
Literature review provides an overview of current knowledge, methods, and gaps in
the existing research. During the literature survey, we collected some of the such research
papers and information about thyroid disease detection mechanisms that have been used.
Deep-learning architectures such as deep neural networks, deep belief networks, deep
reinforcement learning, recurrent neural networks and convolutional neural networks
have been applied to fields including computer vision, speech recognition, natural
language processing, where they have produced results comparable to and in some cases
surpassing human expert performance [7].
A deep neural network (DNN) is an artificial neural network with multiple layers
between the input and output layers. There are different types of neural networks but they
always consist of the same components: neurons, synapses, weights, biases, and
functions. These components functioning similar to the human brains and can be trained
like any other ML algorithm.
Convolutional neural networks (CNNs) are similar to feedforward networks, but
they’re usually utilized for image recognition, pattern recognition, computer vision [8].
Recurrent neural networks (RNNs) are identified by their feedback loops. These
learning algorithms are primarily leveraged when using time-series data to make
predictions about future outcomes, such as stock market predictions or sales forecasting
[8].
As the thyroid dataset we have taken consists of sequence of textual data, we have
considered RNN.
8) Recurrent Neural Network:
A recurrent neural network (RNN) is a class of artificial neural networks where
connections between nodes form a directed or undirected graph along a temporal
sequence. This allows it to exhibit temporal dynamic behaviour. Derived from
feedforward neural networks, RNNs can use their internal state (memory) to process
variable length sequences of inputs. This makes them applicable to tasks such as
unsegmented, connected handwriting recognition or speech recognition [8].
Both finite impulse and infinite impulse recurrent networks can have additional stored
states, and the storage can be under direct control by the neural network. The storage can
also be replaced by another network or graph if that incorporates time delays or has
feedback loops. Such controlled states are referred to as gated state or gated memory, and
are part of long short-term memory networks (LSTMs) and gated recurrent units (GRUs).
This is also called Feedback Neural Network (FNN).
CHAPTER 3
SYSTEM ANALYSIS
After the literature survey, it has been observed that most of the authors have used
the most common machine learning algorithms such as Decision Tree, Naïve Bayes,
KNN, SVM, Random Forest, Regression algorithms and other techniques to predict,
classify and analyse the data.
Deep learning can be used for binary classification, too. In fact, building a neural
network that acts as a binary classifier is little different than building one that acts as a
regressor [11].
Dept. of CSE, CIT Page 8
THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22
CHAPTER 4
SYSTEM DESIGN
The design of the system deals with how the system is developed. It explains the flow
functionalities in brief. The section contains system data flow diagram , Flowchart and
Sequence Diagram described below.
4.2 Flowchart
A flowchart is a type of diagram that represents an algorithm, workflow or process.
Flowchart can also be defined as a diagrammatic representation of an algorithm (step by
step approach to solve a task).
Start
Data Pre-processing
Reshape dataframe
Splitting of data
Train model
Evaluate model
Stop
CHAPTER 5
IMPLEMENTATION
The dataset is acquired from the UCI machine learning repository. The dataset consists
of 3772 instances, 30 attributes of which 24 attributes are categorical and six attributes
are continuous. All the instances in the dataset are labelled to one of the two classes:
positive and negative. The description of the attributes which are present in the dataset is
shown in Table 5.1 given below:
Data pre-processing is a step in the data analysis that takes raw data and transforms it
into a format that can be understood and analysed by the computers [19]. The steps
involved in data pre-processing are:
Data Cleaning
Data Transformation
Data Visualization
A. Data Cleaning: Data cleaning is the process of adding missing data and correcting,
repairing, or removing incorrect or irrelevant data from a data set. Data cleaning is
the most important step of pre-processing because it will ensure that your data is
ready to go for your downstream needs. Data Cleaning includes the following stages:
Dropping Unwanted Features
Handling irrelevant data
a) Dropping Unwanted Features: The features which were just the information of
the tests conducted and the sources from which the data is taken are dropped as
they are not useful for analysis, but just useful for the information.
b) Handling irrelevant data: Irrelevant data can be defined as the data that don't
fit under the context of the problem we're trying to solve. In the dataset, there are
data with the value ‘?’ which don’t fit the analysis. These data are handled by
a. removing them, for categorical features
b. replacing with mean value of the column, for numerical features
B. Data Transformation: Data transformation is the process of turning the data into the
proper format(s) which is needed for analysis and other downstream processes. Data
Transformation includes following stages:
Replace the data with binary number, for categorical features
Normalize the data, for numerical features
a) Replace the data with binary number: The binary data present in categorical
features are replaced with binary numbers in order to consider these data for the
analysis. The data in the target column is mapped with the binary numbers.
Mapping is used for substituting each value in a Series with another value, that
may be derived from a function, a Dictionary or a Series.
b) Normalize the data: The data which is present in numerical features is subjected
to data normalization. Normalizing the data refers to scaling the data values to a
much smaller range such as [-1, 1] or [0.0, 1.0]. Out of different methods available
to normalize the data, the Min-max normalization is used.
a. Min-max normalization: Min-max scaling is a common feature pre-
processing technique which results in scaled data values that fall in the range
[0,1]. When applied to a Python sequence, such as a Pandas Series, scaling
results in a new sequence such that 0 is the minimum value and 1 is the
maximum value of the prior unscaled sequence.
Fig Figu
ure 5.4: Age-wise analysis of thyroid data re 5.5: Gender-wise analysis of thyroid data
Figu
Figure 5.6: Medication-wise analysis of re 5.7: Sick-wise analysis of thyroid data
thyroid data
Fig Figur
ure 5.8: Pregnancy-wise analysis of e 5.9: Surgery-wise analysis of thyroid data
thyroid data
Fig Figur
ure 5.10: Treatment-wise analysis of e 5.11: Lithium-wise analysis of thyroid
thyroid data data
Figur Fi
e 5.12: Goitre-wise analysis of thyroid data gure 5.13: Tumor-wise analysis of
thyroid data
Figur Fi
e 5.14: Distribution plot of TSH gure 5.15: Distribution plot of T3
Figu Figu
re 5.16: Distribution plot of TT4 re 5.17: Distribution plot of T4U
Figu Figu
re 5.18: Distribution plot of FTI re 5.19: Count plot of target attribute
Split into Train and Test Set: Before building the models for prediction, the predictors
(X) and target (y) of the dataset is separated. Then, the dataset is split into train_set and
test_set. 70% of the data is assigned for train_set and the remaining 30% is assigned for
test_set. The models are then trained on the train_set and the predicted output is
compared with the test_set. Splitting up of dataset is done using the train_test_split.
B) Keras: Keras is an open-source software library that provides a Python interface for
artificial neural networks. Keras acts as an interface for the TensorFlow library [22].
Keras contains numerous implementations of commonly used neural-network
building blocks such as layers, objectives, activation functions, optimizers, and a host
of tools to make working with image and text data easier to simplify the coding
necessary for writing deep neural network code.
The models are built based on RNN architecture. The models used are:
a. Vanilla LSTM: A Vanilla LSTM (also known as a Simple LSTM or Classical LSTM)
is an LSTM model that has a single hidden layer of LSTM units, and an output layer
used to make a prediction. A vanilla LSTM unit is composed of a cell, an input gate,
an output gate and a forget gate. The cell remembers values over arbitrary time
intervals and the three gates regulate the flow of information associated with the cell.
In the remainder of this section, LSTM will refer to the vanilla version as this is the
most popular LSTM architecture. This does not imply, however, that it is also the
superior one in every situation.
like the Input Modulation Gate is a sub-part of the Input Gate and is used to introduce
some non-linearity into the input and to also make the input Zero-mean.
The basic work-flow of a GRU is similar to that of a basic RNN when illustrated, the
main difference between the two is in the internal working within each recurrent unit as
GRUs consist of gates which modulate the current input and the previous hidden state.
Neural networks are defined in Keras as a sequence of layers. The container for these
layers is the Sequential class. The first step is to create an instance of the Sequential class.
Then you can create your layers and add them in the order that they should be connected.
The LSTM recurrent layer comprised of memory units is called LSTM(). A fully
connected layer that often follows LSTM layers and is used for outputting a prediction is
called Dense(). For example, we can define the network in two steps:
model = Sequential()
[Link](LSTM(2))
[Link](Dense(1))
Dept. of CSE, CIT Page 28
model = Sequential()
THYROID DISEASE DETECTION USING DEEP LEARNING 2021-22
But we can also do this in one step by creating an array of layers and passing it to the
constructor of the Sequential.
model = Sequential(layers)
model = Sequential(layers)
The first layer in the network must define the number of inputs to expect. The Input
must be of 3-dimension, comprised of samples, timesteps, and features.
LSTM layers can be stacked by adding them to the Sequential model. Importantly,
when stacking LSTM layers, we must output a sequence rather than a single value for
each input so that the subsequent LSTM layer can have the required 3D input. We can do
this by setting the return_sequences argument to True.
We have used ‘tanh’ activation function to make the prediction between -1 and 1 the
resulting activation between -1 and 1 is then weighted to finally give us the features to use
in making our predictions.
Compilation transforms the simple sequence of layers that we defined into a highly
efficient series of matrix transforms in a format intended to be executed.
optimization algorithm to use to train the network and the loss function used to evaluate
the network. The different loss functions used for different purposes are:
For example, below is a case of compiling a defined model and specifying the
stochastic gradient descent (sgd) optimization algorithm and the mean squared error
(mean_squared_error) loss function, intended for a regression type problem.
[Link](optimizer='sgd', loss='mean_squared_error')
Alternately, the optimizer can be created and configured before being provided as an
argument to the compilation step.
The most common optimization algorithm is stochastic gradient descent, but Keras
also supports a suite of other state-of-the-art optimization algorithms that work well with
little or no configuration. Perhaps the most commonly used optimization algorithms
because of their generally better performance are:
Stochastic Gradient Descent, or sgd, that requires the tuning of a learning rate and
momentum.
ADAM, or adam, that requires the tuning of learning rate.
RMSprop, or rmsprop, that requires the tuning of learning rate.
Finally, you can also specify metrics to collect while fitting your model in addition to
the loss function. Generally, the most useful additional metric to collect is accuracy for
classification problems. For example:
Fitting the network requires the training data to be specified, both a matrix of input
patterns, X, and an array of matching output patterns, y.
The network is trained using the back propagation algorithm and optimized according
to the optimization algorithm and loss function specified when compiling the model. The
backpropagation algorithm requires that the network be trained for a specified number of
epochs or exposures to the training dataset. One epoch is when an entire dataset is passed
forward and backward through the neural network only once.
Each epoch can be partitioned into groups of input-output pattern pairs called batches.
This defines the number of patterns that the network is exposed to before the weights are
updated within an epoch. A minimal example of fitting a network is as follows:
Once fit, a history object is returned that provides a summary of the performance of
the model during training. This includes both the loss and any additional metrics specified
when compiling the model, recorded each epoch.
Training can take a long time, from seconds to hours to days depending on the size of
the network and the size of the training data. The amount of information displayed can be
reduced to just the loss each epoch by setting the verbose argument. For example:
Once the network is trained, it can be evaluated. The network can be evaluated on the
training data, but this will not provide a useful indication of the performance of the
network as a predictive model, as it has seen all of this data before.
We can evaluate the performance of the network on a separate dataset, unseen during
testing. This will provide an estimate of the performance of the network at making
predictions for unseen data in the future.
For example, for a model compiled with the accuracy metric, we could evaluate it on a
new dataset as follows:
As with fitting the network, verbose output is provided to give an idea of the progress
of evaluating the model. We can turn this off by setting the verbose argument to 0.
Once we are satisfied with the performance of our fit model, we can use it to make
predictions on new data.
This is as easy as calling the predict() function on the model with an array of new input
patterns. For example:
predictions = [Link](X)
The predictions will be returned in the format provided by the output layer of the
network.
In the case of a regression problem, these predictions may be in the format of the
problem directly, provided by a linear activation function.
For a binary classification problem, the predictions may be an array of probabilities for
the first class that can be converted to a 1 or 0 by rounding.
For a multiclass classification problem, the results may be in the form of an array of
probabilities (assuming a one hot encoded output variable) that may need to be converted
to a single class output prediction using the argmax() NumPy function. Alternately, for
classification problems, we can use the predict_classes() function that will automatically
convert uncrisp predictions to crisp integer class values.
predictions = model.predict_classes(X)
As with fitting and evaluating the network, verbose output is provided to given an idea
of the progress of the model making predictions. We can turn this off by setting the
verbose argument to zero (0).
CHAPTER 6
RESULTS
We have applied different types of LSTM models and a variant of LSTM to classify
the data into two classes on a given dataset. The models which were built are fit for 16
batches and 2 epochs and are evaluated based on the accuracy, loss, validation accuracy
and validation loss for each epoch. We obtain a confusion matrix. Confusion matrix
represents the values of True positive (TP), false negative (FN), false positive (FP) and
true positive (TP). TP is the number of correct prediction that a value belongs to same
class. TN is the number of correct prediction that a value does belong to same class. FP is
the number of incorrect prediction that a value belongs to a class when it belongs to some
other class. FN is the number of incorrect prediction that a value belongs to some other
class when it belongs to the same class.
Loss or the training loss can be defined as a summation of the errors made for each
sample in training or validation sets. During the training process the goal is to minimize
this value. The final experimental results for the accuracies and losses of models are as
shown in the Table 6.1 below:
ACCURACY LOSS
MODEL
Epoch - 1 Epoch - 2 Epoch-1 Epoch-2
The test (or testing) accuracy often refers to the validation accuracy, is the accuracy
you calculate on the data set you do not use for training, but you use (during the training
process) for validating (or "testing") the generalisation ability of your model or for "early
stopping".
Validation loss is a metric used to assess the performance of a deep learning model on
the validation set. The validation set is a portion of the dataset set aside to validate the
performance of the model. The validation loss is similar to the training loss and is
calculated from a sum of the errors for each example in the validation set. The final
experimental results for the validation losses of models are as shown in the Table 6.2
below:
MODEL ACCURACY
a) Train Set:
In terms of accuracy, the model based on vanilla LSTM performs the best, followed
by the model based on bidirectional LSTM.
As far as the loss is concerned, the model which is based on vanilla LSTM has the
lower loss value compared to other models.
b) Validation Set:
Even though the validation accuracy is same for all models, the model based on
vanilla LSTM has the lowest validation loss value, which means, it is the best
model compared to other models.
CHAPTER 7
CONCLUSION
In this work, we have used deep learning techniques to detect the thyroid disease. In
this system, we have used different types of LSTM such as Vanilla LSTM, bidirectional
LSTM, stacked LSTM and Gated Recurrent Unit. The study found that the model based
on vanilla LSTM gave the best accuracy with the least loss compared to other models.
But other recent techniques can be combined in future to give still more accurate results
of thyroid diagnosis.
REFERENCES
[2] A. Tyagi, R. Mehra and A. Saxena, "Interactive Thyroid Disease Prediction System Using
Machine Learning Technique," 5th IEEE International Conference on Parallel, Distributed
and Grid Computing, pp. 689-693, 20-22 December 2018.
[6] I. IoniŃă and L. IoniŃă, "Prediction of Thyroid Disease Using Data Mining Techniques,"
Broad Research in Artificial Intelligence and Neuroscience, vol. 7, no. 3, pp. 115-124,
August 2016.
[7] W. contributors, "Deep learning," Wikipedia, The Free Encyclopedia, [Online]. Available:
[Link] [Accessed
17 May 2022].
[8] "Neural Networks," IBM Cloud Education, 17 August 2020. [Online]. Available:
[Link]
[9] W. contributors, "Long short-term memory," Wikipedia, The Free Encyclopedia, [Online].
Available: [Link]
[10] W. contributors, "Gated recurrent unit," Wikipedia, The Free Encyclopedia, [Online].
Available: [Link]
[11] J. Prosise, "Binary Classification with Neural Networks," Atmosera, 20 September 2021.
[Online]. Available: [Link]
networks/.
[19] "What Is Data Preprocessing & What Are The Steps Involved?," [Online]. Available:
[Link]
[20] A. Goyal, "Must Known Data Visualization Techniques for Data Science," 7 June 2021.
[Online]. Available: [Link]
visualization-techniques-for-data-science/.
[21] W. contributors, "TensorFlow," Wikipedia, The Free Encyclopedia., 1 June 2022. [Online].
Available: [Link]
[22] W. contributors, "Keras," Wikipedia, The Free Encyclopedia., 1 June 2022. [Online].
Available: [Link]