Liver Disease Prediction Using ML Techniques
Liver Disease Prediction Using ML Techniques
Abstract. Liver diseases are the deadliest orders of several countries. These liver
disorders are increasing among the population due to drinking alcohol, inhaling
poisonous fumes and taking contaminated food and drugs, causing a huge loss of
our population. There is ongoing research on datasets of patients suffering from liver
diseases to develop predictive models that can help determine liver disorders. This
dataset was used for prediction by using classification algorithms, thus reducing the
burden on medical practitioners. This disease is spreading like an epidemic in India
where out of every five adults, one is getting affected. There are a couple of
expensive tests that are done to diagnose liver diseases. To make it affordable and
reliable for common people and also to reduce the work load in medical field, we
have studied on liver disease dataset. Here, several machine learning algorithms are
implemented to evaluate the best performance out of it.
Our purpose is to do a comparative study on different classifications techniques and
judge them under certain measurement techniques like - confusion matrix, accuracy,
precision, recall, f1 score, etc. Thus, we will pick up the best algorithm that is giving
the maximum accuracy and do analysis on it. In this paper, some machine learning
algorithms were implemented to evaluate their performance. The idea is to compare
different classification methods with aforementioned evaluation metrics. Thus, we
choose the best-performing algorithm in terms of having the highest accuracy and
do the analysis based on it.
To identify the most effective machine learning algorithm for liver disease prediction
by comparing multiple models based on accuracy, precision, recall, specificity, F1-
Score, confusion matrix and execution time, ensuring optimal performance for early
diagnosis in comparison with existing traditional methods.
2. Introduction
Chronic liver disease has increased over the years in many countries, with India being
a major center. Recent lifestyle behaviors such as consumption of junk food and
alcohol now affect one in five adults. A human body contains the most vital
component in the organ referred to as the liver. The liver decomposes insulin. The
2 Soumyadip Chanda and Supriya Sarkar and Saurabh Banerjee
2.1.1 Infection - Infections caused by parasites and virus in the liver contribute to
inflammation or swelling and, subsequently impede the working condition of the
liver. The virus that can cause damage to the liver is mainly exposed through either
semen or blood and is therefore mostly from contaminated food, polluted water, or
touching a person, who is already infected. Types of infections in liver that may
affect anyone are – Hepatitis A, B and C.
2.1.2 Immunity-related disorder - In some diseases, the body's immune system gets
anomalous and begins to attack other parts of the body. In such conditions, the liver
gets affected too. These health issues might possibly arise because of Auto-immune
Hepatitis. This can also be Primary Sclerosing Cholangitis or Primary Biliary
Cholangitis.
2.1.3 Genetic inheritance - An abnormally inherited gene can lead to the buildup of
various substances in our liver that might result in liver damage. Some examples of
genetically inherited liver disorders are: Hemochromatosis, Alpha-1 Antitrypsin
Deficiency and Wilson's Disease.
2.4.2 Alkaline Phosphatase - It is present in every tissue of the human bodies, with
its highest concentrations found in human liver, bile duct, intestinal mucosa, bones,
placenta and kidney. There are two types of alkaline phosphatase isozymes that are
present in serum - hepatic (in liver) and skeletal. At early age, the predominant
source of alkaline phosphatase in human body is skeletal. These alkaline phosphates
can be found in most mammals including humans.
ALPI: A protein in the intestine with a weight of 150 kDa.
ALPL: This enzyme is traced in bones, liver and kidney, but not specifically to
any tissue.
ALPP: Known as the Regan isozyme and is derived from the placenta.
GCAP : This is a type of germ cell in our body.
main bloodstream. Thus, conducting an AST test can help us monitor or detect any
damage or dysfunction in the liver.
2.4.4 Albumin - These are proteins that are globular in nature. Albumin is abundant
in serum and constitutes the major portion of protein in our blood. Its primary role is
to regulate as well as sustain oncotic pressure in blood. Additionally, it binds with
cations, bilirubin and fatty acids.
2.4.5 Globulin - These are also protein globules, It is insoluble in pure water but gets
dissolved in the solutions of weak salts. The liver produces certain globulins. In
normal human blood, the rate of globulin absorption generally lies between 2.6 and
3.5 grams per deciliter. Several kinds of globulins exist in human body - alpha 1,
alpha 2, beta and gamma. Excessive amount of chemical production in the kidneys
can often lead to imbalances in our bodies, which in turn may result in liver diseases
also.
3. Literature Review
In the thesis by M.B. Priya (2018)[1], liver patient datasets are investigate for
building classification models in order to predict liver disease. This thesis
implemented a feature model construction and comparative analysis for improving
prediction accuracy of Indian liver patients in three phases. In first phase, min max
normalization algorithm is applied on the original liver patient datasets collected
from UCI repository. In liver dataset prediction second phase, by the use of PSO
feature selection, subset (data) of liver patient dataset from whole normalized liver
patient datasets is obtained which comprises only significant attributes. Third phase,
classification algorithms are applied on the data set. In the fourth phase, the accuracy
will be calculated using root mean Square value, root mean error value. J48
algorithm is considered as the better performance algorithm after applying PSO
feature selection. Finally, the evaluation is done based on accuracy values. Thus
outputs shows from proposed classification implementations indicate that J48
algorithm performances all other classification algorithm with the help of feature
selection with an accuracy of 95.04%.
A.K.M Sazzadur Rahman (2019)[2], in his study, used six ML algorithms - Logistic
Regression, K Nearest Neighbors, Decision Tree, Support Vector Machine, Naïve
Bayes, and Random Forest. The performances of different classification techniques
were evaluated on different measurement techniques such as accuracy, precision,
recall, f-1 score, and specificity. The analysis result shows that the Linear Regression
has achieved the highest accuracy. Moreover, their study mainly focused on the use
of clinical data for liver disease prediction and explores different ways of
representing such data through analysis.
P.C. Sen (2020)[3] in his research, has shown the implementation of machine
Liver Disease Prediction 5
learning for prediction of liver diseases in human bodies. Supervised learning is one
of two broad branches of machine learning that makes the model enable to predict
future outcomes after they are trained based on past data where we use input/output
pairs or the labeled data to train the model with the goal to produce a function that is
approximated enough to be able to predict outputs for new inputs when introduced to
them. Supervised learning problems can be grouped into regression problems and
classification problems. A regression problem is when outputs are continuous
whereas a classification problem is when outputs are categorical. This paper tries to
compare different types of classification algorithms precisely widely used ones on
the basis of some basic conceptions
Chronic liver disease is one of the principal causes of death affecting large portions
of the global population. An accumulation of liver-damaging factors deteriorates this
condition. Obesity, an undiagnosed hepatitis infection, alcohol abuse, coughing or
vomiting blood, kidney or hepatic failure, jaundice, liver encephalopathy, and many
more disorders are responsible for it. Thus, immediate intervention is needed to
diagnose the ailment before it is too late. The work by M. Sarker (2021)[4] aims to
evaluate several machine learning algorithm outputs, namely logistic regression,
random forest, XGBoost, support vector machine (SVM), AdaBoost, K-NN, and
decision tree for predicting and diagnosing chronic liver disease. The classification
algorithms are evaluated based on various measurement criteria, such as accuracy,
precision, recall, F1 score, an area under the curve (AUC), and specificity. Among
the algorithms, the random forest algorithm showed better performance in liver
disease prediction with an accuracy of 83.70%. Furthermore, the random forest
algorithm also showed better precision, F1, recall, and AUC metrics. Hence, random
forest is considered the best algorithm for early liver disease prediction.
There has been a rapid growth in the use of automatic decision-making systems and
tools in the medical domain. By using the concepts of big data, deep learning, and
machine learning, these systems extract useful information from large medical
datasets and help physicians in making accurate and timely decisions regarding
predictions and diagnosis of diseases. In this regard, the study by Neha Tanwar
(2021)[5] provides an extensive review of the progress of applying Artificial
Intelligence in forecasting and detecting liver diseases and then summarizes related
limitations of the studies followed by future research.
Recently liver diseases are becoming most lethal disorder in a number of countries.
The count of patients with liver disorder has been going up because of alcohol intake,
breathing of harmful gases, and consumption of food which is spoiled and drugs.
Liver patient data sets are being studied for the purpose of developing classification
models to predict liver disorder. This data set was used to implement prediction and
classification algorithms which in turn reduce the workload on doctors. In the work
by Srilatha Tokala (2023)[6], the author proposed apply machine learning algorithms
to check the entire patient’s liver disorder. Chronic liver disorder is defined as a liver
6 Soumyadip Chanda and Supriya Sarkar and Saurabh Banerjee
disorder that lasts for at least six months. As a result, they have used the percentage
of patients who contract the disease as both positive and negative information. They
have processed liver disease percentages with classifiers, and the results are
displayed as a confusion matrix. They proposed several classification schemes that
can effectively improve classification performance when a training data set is
available. Then, using a machine learning classifier, good and bad values are
classified. Thus, the outputs of the proposed classification model show accuracy in
predicting the result.
4. Dataset Description
The publicly available medical records of liver patients provide the data for this paper
and it comprises data of 30,691 individuals (22,888 males and 7,803 females) and 11
columns (attributes – Patient’s age, Gender of the patient, Total Bilirubin count in
body, Direct Bilirubin count in body, Alkphos Alkaline Phosphatase, Sgpt Alamine
Aminotransferase, Sgot Aspartate Aminotransferase, Total Proteins count, Albumin,
Albumin and Globulin Ratio and the patient status i.e., the Result/Target Variable).
necessary correlation matrix using [Link]() [here, ‘df’ refers to the dataset used],
which is used to interpret the feature relationships (highly correlated, weakly
correlated or negatively correlated). This is very useful in order to analyze
correlations between the individual features and the target variable and identify the
redundant features.
Correlation values range from -1 to 1, indicating the strength and direction of the
relationship between the features as follows –
4.1.1. 1.0 : Perfect positive correlation (features increase together)
4.1.2. -1.0 : Perfect negative correlation (one feature increases while the other
decreases)
4.2.3. 0 : No Correlation (features are independent)
4.3 Inferences
4.3.1 Feature Selection – TB and DB are highly correlated, so one might be dropped
to avoid redundancy.
4.3.2 Weak Correlations with Target – Advanced feature importance methods (e.g.,
SHAP, etc.) can be used to confirm the most significant predictors.
8 Soumyadip Chanda and Supriya Sarkar and Saurabh Banerjee
4.3.3 Potential Feature Engineering – Combining correlated features like TP, ALBA
or using feature transformation techniques might improve model performance.
5. Methodology
We used various machine learning algorithms in our study for predicting liver disease.
This allowed us to detect the possibility of having liver disease based on the
individual characteristics of the patients, gaining a better understanding and the
effectiveness of each of these algorithms. We used measurements such as confusion
matrix, precision, accuracy and recall to evaluate performance. These tell us about the
overall success rate of the models while providing information on how good the
models are at correctly classifying positive cases of liver disease.
Experimental study demonstrated that some of the algorithms performed out there in
particular aspects. For instance, the accuracy of Decision Tree as well as Random
Forest models was remarkably high. The Gradient Boosting Classifier was remarkable
for its effective recall performance. Logistic Regression as well as KNN showed
strong performance, suggesting that these are both really applicable in different
measurements. This analysis highlights the need to evaluate different algorithms to
get an optimal model of predicting liver disease based on the dataset and goals of
analysis.
5.1.1 Removing Missing Values – Missing values can arise because of incorrect data
entry, sensor failures or incomplete records. Handling them is essential to prevent
biases in model training.
Steps :
Checked for Missing Values : Used ‘[Link]()’ [‘df’: dataset] to
identify missing values in each column.
Strategies Used :
If missing values were few (<5%), we used mean/median imputation for
numerical features (e.g., replacing missing albumin levels with the median).
If missing values were many (>30%), we considered dropping the column if
it was not that crucial.
For categorical features like Gender, we used mode imputation
5.1.2 Handling Duplicate Data - Duplicate records can distort model predictions and
inflate accuracy falsely.
Steps :
Used ‘[Link]().sum()’ to check for duplicate rows.
Removed exact duplicates using ‘df.drop_duplicates(inplace=True)’.
Verified that removing duplicates did not lead to data loss, ensuring class
balance was maintained.
Normalization of numerical features standardized the input data.
Liver Disease Prediction 9
The entire dataset is divided into two portions - training set contains 80
percent of the data and testing set contains 20 percent of the data.
5.1.3 Handling Outliers – Outliers are the extreme values that can skew model
performance, especially in medical datasets where biochemical indicators vary within
a range.
Visualization : Used box plots and histograms to detect outliers in several
features
Identification : Calculated the IQR and removed the extreme values
5.1.4 Handling Class Imbalances – We had found out that the dataset we were
working with, was imbalanced, where, the training class of X features had
significantly more training samples than the training class of Y labels. Such
imbalances could lead to biased model predictions, where the classifier favours the
majority class more. In order to handle such imbalances, we have used ADASYN
(Adaptive Synthetic Sampling for oversampling because of the following reasons –
It generates synthetic samples for the minority class (Y) instead of just
duplicating existing ones.
It focuses more on the difficult-to-learn samples, improving model
generalization.
It helps in reducing bias towards the majority class.
Steps :
Checked the overall class distribution primarily.
Imported ADASYN using imblearn.over_Sampling.ADASYN and applied to
the training data
Verified the new class distribution to ensure balance.
Outcomes :
The dataset became balanced, reducing the overall bias in model
predictions.
The model learnt more effectively, improving classification performance for
both the classes.
'
This structure helps to obtain the most important quantitative indicators, including
accuracy, precision, sensitivity, or recall and the F1 score. This is particularly
pertinent with respect to such datasets that contain imbalanced cases, where reliability
takes this metric alone.
5.4 Accuracy
The accuracy metric in machine learning measures the correctness with which a
model predicts results. Often, the accuracy is measured in terms of a ratio of the
number of instances correctly predicted to the number of observations in general -
hence often reported in percentage form. It is quite popular as it is easy to interpret
and gives a quick overview of how well a model performs. Accuracy should be
supplemented with more metrics, such as support, precision, recall and F1-score in the
case of a class-imbalanced distribution while attempting to carry out a comprehensive
evaluation.
Acurracy=
5.5 Precision
It is a measure of the accuracy that how frequently does the model gives positive
predictions. This is more appropriately defined as the amount of correct positive
identifications relative to all positive identifications, including both correct and
incorrect ones. Precision is crucial when the economic term of ‘false positives’ is
considerably high. Nevertheless, to have any meaningful appreciation of what a
model really does, other measures such as recall, etc. have to be used together with
the precision measure.
Precision=
5.6 Recall
Recall in machine learning evaluates the ability of a model to achieve the good
recognition of all relevant instances of a given class, especially in classification tasks.
The following formula is obtained based on the number of true positive divided by the
sum of the true positive predictions and false negative ones. High recall refers to how
well the model picks out most of the true positive cases and it is valuable in situations
where one wants to reduce false negatives. However, recall cannot be allowed to
comprise accuracy to maintain relevance in the predictions.
Recall=
5.7 F1-Score
It is a crucial metric in machine learning, especially in classification, that helps to
maintain a fair balance between precision and recall. This is a measure that combines
precision and recall by the use of harmonic mean. This measure is particularly
valuable for description of imbalanced datasets where it punishes the model that
shines on one metric but fails on the other. An F1-score of 1 shows perfect precision
and recall while a score of 0 means the model lacks both.
F1-Score=
5.8 Support
In machine learning, support is a measure of how many times each class label has
occurred in the dataset. It shows how frequent each class is and is particularly useful
for imbalanced datasets with unequal sample sizes for different classes. Support is
often reported with other metrics, which include recall, precision and F1-score within
classification reports.
Liver Disease Prediction 13
In our project, we’ve used Logistic Regression, Decision Tree, Random Forest, K-
Nearest Neighbors and Gradient Boosting Classifier. A comparison of these models
has been done in terms of accuracy of both training and testing datasets using metrics
like precision, recall and confusion matrix. This will allow us to assess the strengths
and weaknesses of each model in predicting liver disease. As we can see in the above
figure (Fig. 6) that among the tested models, Random Forest and Gradient Boosting
Classifier have the best accuracies; therefore, they can be good selection for the
prediction of liver disease. Overall, Gradient Boosting Classifier has high
performance in accuracy and precision measures, pointing to its competence in both
detecting true positives and true negatives.
Both Decision Tree and Logistic Regression models showed noteworthy accuracy
despite some misclassifying cases. Logistic Regression demonstrated high sensitivity,
suggesting its efficiency in detecting individuals with liver disease. But this model
had a slightly lower precision, indicating potential inaccuracies in its predictions.
KNN performed decently but failed when compared to ensemble techniques such as
Random Forest and GBC, perhaps due to complex interrelation between features in
the dataset. The application of distance-based metrics in KNN might have led to a loss
of the ability to generalize in scenarios where feature values overlap.
Decision Tree and Logistic Regression models have certain observations with notable
accuracy, though a few were misclassified. In Logistic Regression model, sensitivity
results were high, hence suggesting the detection efficiency for liver disease patients
but the precision result was a bit lower, hence it may have errors in their prediction.
KNN performed satisfactorily but lagged behind in a comparison to ensemble
methods like RF and GBC, probably because the complex interrelation of features in
this data made it a complex problem. The distance-based measure on which KNN
relies on, might contribute to generality loss in the cases where values overlap.
14 Soumyadip Chanda and Supriya Sarkar and Saurabh Banerjee
In summary, both Random Forest and Gradient Boosting Classifier have performed
quite efficiently and should be further improved to be applied in the prediction of
liver disease since they both exhibit balanced and good performance on accuracy,
precision, as well as recall. This potentiality of using machine learning to help early
diagnose and manage liver disease can be gained since these models work perfectly.
Gradient Boosting Classifier surpasses all other algorithms in terms of accuracy and
other performance metrics up to a 99.6% accuracy rate (Fig.7). Therefore, in this case,
GBC is the best algorithm that can be used for predicting liver disease.
6.1.1 Tree-Based Models Dominate : Ensemble methods (RF, GBC) and Decision
Trees surpassed other algorithms, probably because of their capacity to manage
intricate, non-linear relationships within the data.
6.1.2 High Precision and Recall : RF and GBC reached nearly perfect precision
(>99%) and recall (>99%), reflecting a slight misclassification of both healthy and
diseased patients.
Liver Disease Prediction 15
6.2.3 LR Limitations : Logistic Regression had difficulties with recall (60. 5%), not
recognizing numerous true positive cases, which is vital for medical diagnoses.
6.2.4 Sample Heading (Third Level). Only two levels of headings should be
numbered. Lower level headings remain unnumbered; they are formatted as run-in
headings.
The results of comparison among the models based on the evaluation metrics are -
7. Conclusion
This liver disease prediction paper incorporates several classification algorithms:
Logistic Regression, K-Nearest Neighbors, Decision Tree, Random Forest and
Gradient Boosting Classifier for comparison of possible successfulness in liver
disease diagnosis at a higher accuracy rate. The performance assessment took place
using precision, accuracy, recall and confusion matrix that gives an overview of the
robustness of these models in general. Such models pointed out their specific benefits;
for example, Random Forest and Gradient Boosting had the highest recall values so
they can better identify positive instances while showing high accuracy for the
Decision Tree model. This comparison explains why certain algorithms should be
chosen over others which are specifically applied according to the distinct objectives
and characteristics of the provided data.
In general, this paper provides valuable knowledge about the prediction of liver
diseases in human bodies using certain machine learning models that seem to be
useful in improving medical diagnosis.
16 Soumyadip Chanda and Supriya Sarkar and Saurabh Banerjee
8. Future Work
For the future, there are multiple ways to improve and expand this model of
prediction of liver diseases. Machine learning methods that are more advanced than
those used in this paper could bring even greater accuracy in the predictions when the
datasets involved are larger in size. Ensemble learning with algorithms that combine
the strengths of different approaches may yield additional performance metric
improvements. Exploring techniques like recursive feature elimination for feature
engineering and selection could help point to the most important features, ultimately
streamlining models and enhancing performance. The model could also be adapted to
predict different kinds of liver diseases, which means the model has to use a
multiclass classification method.
References