Abstract
Hyperspectral Microscopic Images (HSMI) offer a more comprehensive spectral range, making them particularly useful in medical diagnostics, including early detection of cancerous tissues. Cholangiocarcinoma, a highly lethal bile duct cancer, traditionally requires a histopathological analysis of tissue samples under a microscope. However, this method is prone to subjectivity and error, often leading to delays in diagnosis, contributing to patient mortality. This study focuses on the automated diagnosis of cholangiocarcinoma using deep learning techniques on microscopic hyperspectral data. Leveraging the spectral richness of HSMI, we developed a framework utilizing a modified Visual Geometry Group (VGG) architecture to process this hyperspectral data, aiming to detect cholangiocarcinoma at its early stages. With the growing role of artificial intelligence in pathology, this approach minimizes human error and enhances diagnostic accuracy. The dataset used in this study includes 880 cholangiocarcinoma tissue samples from 174 individuals, comprising 689 partial cancer regions, 49 full cancer regions, and 142 healthy scenes, all meticulously labeled by expert pathologists. Our ensemble learning technique integrates image preprocessing, spectral feature extraction, and classification, significantly improving diagnostic accuracy. The proposed system demonstrates a substantial improvement in the early detection of cholangiocarcinoma and offers a valuable tool for smart microscopy-based diagnosis, potentially facilitating the diagnostic burden in clinical settings.
Similar content being viewed by others
1 Introduction
Cholangiocarcinomas, also referred to as bile duct cancer, can develop either within or outside of the liver. In Western nations, the death rate from intrahepatic cancer is increasing at a faster pace than the mortality rate from extrahepatic cancer. One possible cause is the increasing incidence of liver diseases, while another is the incorrect diagnosis of some cancers [1]. The specific types of this extremely dangerous disease, which affects the liver and biliary system, include intrahepatic, perihilar, and distant cholangiocarcinoma. Geographical groups may exhibit different symptoms, respond differently to therapy, and have different genetic concerns [2]. Cholangiocarcinoma is difficult to detect when it develops in an advanced stage. Although it can occur at any age, cholangiocarcinoma, also known as colorectal cancer, most often affects people aged 50 and older [3]. The main problem with this type of cancer is that it is often diagnosed late, as physical signs are hidden or missed in the early stages. The patients prognosis becomes more worse when this stage limits the treatment options. Traditional methods, such as microscopic analysis of tissue samples, rely on visual validation by pathologists. These methods are reliable, they are complex and sometimes require technical interpretation [4, 5]. The researchers are investigating how best to meet these needs using high resolution imaging. This technique offers thorough morphological and molecular information about cells by combining spectroscopy with imaging. Rapid study of tissue structures is made possible by the ability of hyperspectral imaging to expose chemical and structural changes unseen to conventional microscopes [6]. Detecting cancer cells and separating them from normal cells is considerably helped by this approach. As a sensitive and non-invasive diagnosis method for cholangiocarcinoma, high quality images have considerable promise. High throughput imaging can help identify cancer at an early stage, forecast disease development, and guide therapy decisions when paired with cutting-edge computing methods including statistical analysis and pattern recognition [7].
The primary objective of this work is to provide pathologists with a comprehensive and trustworthy framework. This will be accomplished by integrating images with microscope equipment and using advanced deep learning algorithms. Hyperspectral imaging holds significant potential in analyzing histopathology images due to its ability to capture detailed data across a wider electromagnetic spectrum. This technique provides rich insights into the composition and characteristics of biological samples. The key contributions of this research are as follows:
-
This study presents the first-ever microscopic imaging dataset specifically designed for cancer detection. The dataset features high-resolution data from microscopic pathological samples, offering a groundbreaking resource for the field. It has the potential to become a benchmark for pathological imaging, enabling standardized cancer detection methods and facilitating comparisons across studies.
-
To address the risk of overfitting and limited data problems in multi-class classification, we employed image augmentation strategies on RGB microscopic data. The augmentation enhance the dataset while preserving microscopic image details, also improve the model performance for the detection of cancerous tissues.
-
This study utilized VGG-16 as a base learner and then include convolutional learning based network, with two blocks and analyze microscopic images by capturing both spatial and spectral features. This enhanced feature extraction and enables more accurate classification of cancerous tissues. The method provides a deeper understanding of the subtle differences between healthy and cancerous cells, which is crucial for advancing pathological analysis.
-
The proposed model was tested using the holdout splitting and k-fold cross-validation to check the model’s efficiency and robustness. The literature is extensively studied and compared with the proposed model. The proposed model performed well for cancer diagnosis using RGB imaging data. The model will be adapted in different datasets and in real-world applications.
2 Related Work
A significant number of individuals have perished owing to misdiagnoses or delayed treatment for cholangiocarcinoma, a cancer that originates in the intestine lining. The manual examination of histopathological images obtained from specimens is a conventional method for cancer identification. This need much expert input and empirical data. The research’s [8] major purpose was to create an automated system that can diagnose cholangiocarcinoma from biopsy images with little human intervention and without prior training, using machines. CholangioNet is a cutting-edge, efficient neural network adept at detecting cholangiocarcinoma. RGB images are fundamental to histology. To enhance the research endeavour, the histology RGB image collection requires the addition of additional photographs. Thereafter, the dataset underwent little preprocessing. An additional effort was required to improve the model’s effectiveness in some scenarios. In contrast, newly developed automated machine learning technologies have the potential to improve the back-end operations of the IT system. Their work was a continuation of previous research that had shown the potential of machine learning approaches to aid in the initial classification of malignancies by utilising CT images [9].
The original DLS created by Authros using several WSIs had significant success and precision in histopathologic identification. The samples included results from operations and tumour investigations conducted at six distinct institutions. Two specialists in the domain analysed the diverse materials. They used four distinct deep neural networks. They used a heat map, receiver operating characteristic curve, classification map, and confusion matrix to evaluate their performance. They further substantiated this by comparing the anticipated effectiveness of the most precise model with that of nine doctors [10]. The study [11] aims to create a repository of colour hyperspectral photographs of cholangiocarcinoma. The dataset comprises 880 situations from 174 individuals: 142 examples are cancer-free, 49 exhibit partial malignancy, and 689 are diagnosed with cancer. Furthermore, histology specialists have delineated each cancerous area. Despite the fact that only 20% of individuals with intrahepatic cholangiocarcinoma display treatable alterations, the death rate associated with targeted therapy remains elevated. They employ whole-slide imaging in their Genetic Alteration Prediction method to predict genetic alterations. It relies on self-supervised and multi-instance learning methodologies [12].
In another work [13], the authors propose using hyperspectral microscopy to identify and classify cholangiocarcinomas using a three-dimensional U-Net architecture. The authors choose to include many preparation approaches in their design to enhance classification accuracy and reduce computational costs. They showcased the effectiveness of the proposed network and preparation procedures by contrasting their performance with other commonly used controlled and unstructured learning methodologies. They juxtaposed the findings with novel concepts. Their findings indicate that using the conventional filtering step with the proposed design increases the CA by 1.29%, but utilising the selected preprocessing steps enhances it by 4.29%. Detection of cholangiocarcinoma using hyperspectral images from a microscope is based on the ResNet-50 architecture. The microscopy hyperspectral imaging system acquires pictures of hyperspectral choledoch tissue under microscopic examination, subsequently allowing skilled pathologists to manually identify each image. To train the classification model, authors refined the data and divided it into training (6800 images) and testing (210 images) sets. The classification model can autonomously differentiate between malignant and noncancerous areas in the choledochal tissue images [14].
The study [15] encompassed a total of 117 individuals who were diagnosed with ICC following surgery. The remaining 78 cases formed the modeling set, while the remaining 49 occurrences comprised the testing set. Algorithms using deep learning identified both tissue types and lymphocytes in whole slide images. The analysis of the histomorphological patterns of cancer cells and tissues produced 107-dimensional characteristics, encompassing various graph features on the whole slide images. The mRMR method used 5-fold cross-validation to identify the three most distinctive criteria for assessing the patient’s likelihood of survival. The research [16] employs microscopic hyperspectral imaging (HSI) pathology datasets to automate the detection of cholangiocarcinoma (CC) using a deep learning methodology. The first benchmark was established using the minuscule hyperspectral pathological images. The research collects 880 multidimensional hyperspectral images of cholangiocarcinoma and manually annotates each pixel as tumour or non-tumour to train supervised learning systems. Moreover, a binary label indicating the individual’s health status as either healthy or unwell accompanies each scenario on the slide. In contrast to conventional RGB images, HSI captures pixels at different spectral intervals, hence augmenting the channel dimension of a 3-channel RGB image.
The study [17] introduces a method for categorizing cholangiocarcinoma histopathological images using spatial-channel feature fusion in convolutional neural networks. The proposed model features a geographic branch in addition to a channel branch. The structural blocks that remain inside the spatial branch can yield deep spatial data. To enhance the model’s descriptive capability, the channel branch integrates a multi-scale feature extraction module along with several multi-level feature extraction modules that derive channel features. Experiments conducted on the Multidimensional Choledoch Database demonstrate that the suggested approach surpasses conventional CNN classification techniques. Cholangiocarcinoma has become the leading cause of mortality and disability in our day and age, making it one of the most dangerous kinds of cancer that may affect people. The study [18] highlights EDLM as the most effective method for the early detection of cholangiocarcinoma, a highly aggressive form of cancer. EDLM employs three distinct deep learning techniques to identify changes associated with cholangiocarcinoma. Another study [19] introduces a specialized cholangiocarcinoma segmentation network, HLCA-UNet, which leverages high-resolution images rich in hyperspectral data. Compared to the original UNet, HLCA-UNet demonstrates superior performance in transmitting information between the encoder and decoder by incorporating a channel attention mechanism and hierarchical feature extraction. The hierarchical module captures high-resolution, high-level features while bridging semantic gaps, whereas the channel attention mechanism strengthens connections between low-level and high-level features. HLCA-UNet outperforms other methods in three key metrics: accuracy (82.84%), precision (69.60%), and memory efficiency (77.99%). The study solved the overfitting problem using enhancement techniques with microscopic data into two classes normal and abnormal [20].
3 Proposed Methodology
The proposed methodology provides a detailed description of the dataset, including important details such as class numbers and sample distribution. The dataset is prepared by steps to improve its quality and readiness for training, including standardizing data and balancing class distribution. Augmentation strategies enhance the variety of the training data and improve the model’s capacity to generalize. Methods such as flips, random translations, rotations, and image brightness and contrast adjustments are used. This approach enhances the model’s exposure to a greater number of data variances, hence increasing its robustness to different input conditions. The ensemble model is formed by integrating forecasts from two base models to enhance the overall accuracy of predictions and system resilience. The technique suggested utilizes a cohesive workflow that integrates dataset description, preprocessing, augmentation, and ensemble modeling to address issues in real-world datasets and attain exceptional performance in classification tasks. Figure 1 represents the proposed workflow of the methodology and detailed information are presented in the subsections.
The proposed Cholangiocarcinoma smart diagnosis workflow consists of various image preprocessing techniques like resizing, normalization, enhancement, histogram and augmentation techniques like rotation, zoom, flipping, shearning and brightness. Then we have a finalized dataset for training the pretrained and proposed framework. After that, we evaluate the model
3.1 Dataset
Histological examinations are crucial for diagnosing and treating illnesses. Artificial intelligence has led to an increase in pathological databases, but these are often limited to grayscale or RGB color photographs, making deep learning algorithms less effective. This study focuses on 174 slides of bile duct tissue examined using clinical microscopy, provided by Changhai Hospital in Shanghai, China. The slides have a thickness of 10 micrometers and are supervised by trained pathologists. The dataset includes 880 strong photos, with two formats: 30-channel hyperspectral photos and 3-channel RGB [16]. The camera automatically adjusts exposure and focus to ensure sufficient detail. The database contains 880 scenes created by 174 individuals, with 689 samples having partial cancer regions. All cancerous regions have been accurately identified and classified by knowledgeable pathologists. The study aims to improve the effectiveness of deep learning strategies in histopathology diagnosis.
3.2 Preprocessing and Augmentation
The image collection consists of 880 RGB images divided into three groups: those without cancer spots, those with entire cancer regions, and those with partial cancer areas. The dataset has an imbalance of classes, with each category having the same number of pictures to illustrate their points. Figure 2 illustrate the different steps and its histogram. Offline data augmentation was performed on the initial collected RGB image dataset to enhance the number of images taken from each category and address socioeconomic disparity [21]. The preprocessing pseudocode is shown in Algorithm 1. Augmentation values range with different techniques are shown in Table 1. Augmentation [22] methods are crucial in the process of training deep learning models, as they serve to expand the diversity and resilience of datasets. Adjustments to the brightness, rotation, horizontal or vertical inversion, resolution, and shear all contribute to the diversity of the dataset. The orientation of an object is altered by rotation, whereas the placement of an item is modified by inverting. By adjusting brightness and image scale, distortion is corrected, and shearing is used to adjust the image. These methodologies enhance the diversity of the dataset, mitigate the issue of overfitting, and empower models to acquire invariant attributes. Augmentation pipeline is presented in pseudocode 2.
Table 2 presents the data before and after augmentation. The finalized training dataset comprises 18,900 images, with each category providing 6300 shots. The images undergo image-enhancing procedures, beginning with the application of the Gaussian-Blur algorithm for denoising in order to minimize noise in histopathological images. The images are initially resized from their original proportions of \(1728 \times 2304\) to \(150 \times 150 \times 3\). This procedure improves the computational efficiency of the computer system, enabling quicker and uninterrupted processing of images. Moreover, augmenting the dimensions of the images enhances the training of a more extensive dataset, hence enhancing the resilience and effectiveness of deep network models. Once images have been resized, they undergo standardization before being integrated into neural network architectures for additional investigation.
3.3 Proposed Model
Ensemble models are machine learning techniques that combine the predictions of numerous models to improve the overall performance. They are especially beneficial for improving detection accuracy because of the probabilistic nature of deep neural networks. The work used a fusion of VGG-16 and CNN to improve image detection accuracy as shown in Fig. 3. The ensemble model was created by merging pre-trained deep learning models with a CNN model that has input shapes of (150, 150, 3). The CNN model was added progressively to maintain consistency in input size and pre-trained models [23]. The include_top parameter was used to fix the deep layers of pre-trained models, such as VGG-16, and append the convolutional neural network before these fully connected layers. This strategy ensures that the input size and pre-trained models are the same, hence improving the overall performance of the model.
The VGG-16 is a popular architecture developed by the Visual Geometry Group (VGG) at Oxford University [24]. Thirteen layers in the design are convolutional, while the other three are completely linked. The network uses learnable filters to extract properties from the input image across several spatial dimensions to classify the image into predefined categories. The architectural design starts with a series of convolutional layers using small 3x3 convolutional filters with a stride of 1. Padding is used to maintain the spatial dimensions of feature maps.
Max-pooling layers are added after convolutional layers to reduce the spatial dimensionality of feature maps while retaining important information. This downsampling technique enhances the network’s receptive field and translation invariance. As the network deepens, the spatial dimensions of feature maps decrease as the number of channels grows. This allows the network to create more abstract and advanced representations of the input image.
Acquiring hierarchical representations is essential for gathering high-level semantic details and low-level elements like edges, textures, object forms, and structures. Feature maps are transformed into one-dimensional vectors at the end of the network and fed into fully connected layers. These layers act as classifiers by merging information from the convolutional layers to make predictions about the class of the input image.
The Rectified Linear Unit (ReLU) activation function and Conv2D layer are crucial components of convolutional neural networks for feature extraction from input data, especially images, and pattern recognition.
Conv2D: In neural networks, the convolutional layer transforms multidimensional input data into a single layer. This transformation preserves order and transforms the input tensor into a single layer, creating a feature layer that bypasses the entire processing layer for processing. Since dense layers require one-dimensional layers, this layer is needed to combine the spatial information obtained from the previous layer with the fully connected layer.
ReLU Activation: ReLU is a method for training CNNs by incorporating nonlinearity into the output of convolutional layers. It is computationally efficient and does not suffer from the notorious [25] problem. ReLU makes small moves, allows positive values in and negative values out, and allows for robust and discriminative behavior. This is important for capturing complex interactions in nonlinear real-world environments. By applying ReLU after each convolution operation, CNNs can distinguish between different classes in tasks such as image classification and object detection.
Batch Normalization: Batch Normalization is a technique used to ensure continuous training and accelerate the convergence of deep neural networks such as the VGG-16 architecture. It maintains consistency across all micro-services and contributes to traditional metrics such as offset and scale. It reduces internal friction and reduces the rate of loss. Higher education spending can benefit from this stabilization effect, accelerating integration and improving overall innovation. Batch normalization reduces overfitting by adding noise during the training process.
Flatten layer: This layer in a neural network transforms the input data from multiple layers into a single input and is a key component in DL. This structure stores memory and aggregates the input context into a container, which creates a transport layer that can be used to expand fully connected objects. Since the hybrid layer requires single-layer processing, it needs to combine or predict spatial information from previous layers with fully connected layers.
Dropout Layer: The dropouts are used to reduce the complexity of neural networks and improve model classification. Dropout involves gradually removing a portion of the neurons during training, thus reducing the reliance on specific features of the neurons in the training set. A model is built for each learned loop by removing neurons from the training set. During training ll neurons are active, but their activity is increased by a factor of p, ensuring that additional steps are expected during training.
Softmax: Softmax is a mathematical technique used to optimize features in neural networks, especially in the case of multiclass classification problems. It converts raw results into a random distribution containing multiple features, ensuring that each result is similar, based on the state of the subject, and evaluates it. This additional feature allows for a useful description of the performance by identifying the probability of the application being applied to a particular class. Softmax is an important tool for analyzing classification problems and is suitable for iterative modeling.
4 Results
This study analyzes 174 slides of bile duct tissue from Changhai Hospital in Shanghai, China, utilizing clinical microscopy techniques. The dataset, consisting of 880 images, includes 3-channel RGB. All cancerous regions are accurately identified and classified by knowledgeable pathologists, aiming to improve deep learning strategies in histopathology diagnosis. The dataset underwent several preprocessing and augmentation procedures before deep learning was utilized for disease identification. This section provides a detailed description of the experimental results. Key training parameters are shown in Table 3.
4.1 Performance Measures
Performance measures play a crucial role in deep learning [26]. The ability to accurately assess a model’s performance, within the appropriate framework, relies on selecting the appropriate metric for that particular model. In order to assess accuracy, we may examine the proportion of accurate predictions to the total number of forecasts generated by our model. The precision scale ranges from zero to one. Extreme occurrences are associated with predictions that are either entirely absent or consistently accurate. Accuracy constraints can be overcome by employing specificity, recall, and precision. Accuracy is the measure of the proportion of accurate positive predictions that were correct. The ratio of the total number of positive predictions, independent of their accuracy, to the number of samples accurately predicted as positive (TP) (TP, FP) is calculated. This produces the intended outcome. Recall, akin to accuracy, measures the proportion of true positives that are successfully recognized. The formula for this is the ratio of true positives (TP) to false negatives (FN) divided by the total number of samples that were predicted as positive. The F1 score, which is often disregarded, is a performance measure that combines accuracy and recall in a harmonious manner. A score of 0 signifies complete absence of memory or accuracy, whereas a score of 1 signifies flawless recall and precision.
4.2 Overall Results of Proposed Versus Deep Learning Models
Table 4 displays the overall results of the proposed models compared to deep learning models. The ResNet-50 model achieved an accuracy of 61.51% and a precision of 77.73%. VGG-16 achieved an accuracy of 55.09% and a recall of 55.09%. CNN achieved an accuracy of 49.81% and f1 score of 53.36%. The DenseNet achieved 80% accuracy and 44.69% MCC. EfficientNet achieved 78.87% accuracy and 79.05f1 score. MobileNet achieved 63.77% accuracy and 28.99% MCC. The proposed model achieved an accuracy of 84.53% and f1 score of 83.76% utilizing the original imbalanced dataset. All six deep learning models, particularly DenseNet, performed well on imbalanced data and yielded favorable outcomes, but CNN produced poor results.
The overall results of the proposed model are displayed in Table 5, which compares them to the results of deep learning models. Both the accuracy and precision of the ResNet-50 model were measured at 97.15 and 97.16%, respectively. An accuracy of 94.68% and precision of 94.75% were both reached by VGG-16. CNN scored a recall rate of 88.54% and kappa rate of 83.96%. Through the utilization of balanced dataset, the proposed approach was able to attain a recall rate of 98.68 % and precision of 98.70% as well as 98.03 MCC. In spite of the fact that all six deep learning models, and EfficientNet in particular, performed exceptionally well on balanced data and delivered good findings, MobileNet failed to produce satisfactory outputs.
4.3 Class Wise Results of Proposed Model Using Imbalance and Balance Data
The class-specific results of the proposed model are presented in Table 6, contrasting them with the results imbalance and balance data. Using imbalance, the model achieved 91.12% precision for class 0, 63.41 for class 2, 92.64% f1 score for class 0, and 61.90% for class 2. Using balanced data, the proposed model achieved 99.70 precision for class 0, 100% recall for class 1 and 99.11 recall for class 2. The class-specific results indicate that the model achieved inferior performance for complete cancer compared to other types of cancer and without cancer cases.
K-fold cross-validation is a machine learning method that divides a dataset into similar-sized folds. Performance of cross validation is shown in Table 7. We choose last ten epochs from the validation accuracy and performed the validation. The identical technique as the validation data was used to perform ten epochs. The CNN model achieved 0.8228 mean accuracy from last ten epochs and 0.1369 standard deviation. The ResNet-50 achieved highest mean results among the baseline models with 0.9528 accuracy and 0.0296 STD. The proposed model achieved 0.9778 mean accuracy and 0.0231 STD.
Figures 4 and 5 illustrate the results of confusion matrix for imbalance and balance data. This is the important method for evaluating DL models, particularly for classification. This provides the overall performance, error rate and model behavior thorough evaluation. We employed in this study for addressing multi-class classification issues, the 3x3 matrix is usually used for multi-class classification jobs. Model tests evaluate the model’s performance by gauging its capacity to solve intricate equations within a range of standard deviations. Error analysis offers suggestions for enhancing the training processes, data preparation, or model design. The proposed model achieved 195 accurate predictions for class 0 using imbalance data and 2615 using balanced data.
Deep learning model evaluation involves a number of factors, including training loss, validation loss, training accuracy, and validation accuracy. Figure 6 illustrates these measures, which are essential for assessing model performance. The capacity of a model to provide accurate predictions on dataset is known as training accuracy, and it shows how effectively the model learns from the training data. A high level of training accuracy signifies efficient learning, which raises the process’s total efficacy. Training loss, on the other hand, helps lower training mistakes and enhance model performance by measuring the discrepancy between the model’s predictions and the actual values in the training dataset. proportion of the model’s predictions on a different dataset that were correct. Assessing the model’s ability to generalise beyond the training data is crucial. In order to avoid overfitting and improve the model’s capacity to generalise to new, untested data, validation loss quantifies the difference between the validation dataset’s projected values and ground truth.
It is not directly utilized for tuning and optimization but is focused on optimizing hyperparameters and identifying aspects like model architecture, training duration, and other variables.
Visualization of ROC-AUC for the proposed method is depicted in Fig. 7. The ROC-AUC is a statistical measure utilized to assess the performance of a classification model, especially in classification problems.
The method integrates the ROC curve and the AUC, which are two-dimensional statistics ranging from 0 to 1 in both dimensions. The ROC curve illustrates the ability of a classifier system to distinguish between positive and negative categories. The true positive rate (TPR) is determined by dividing the number of true positives by the number of false negatives.
4.4 Statistical Significance
This study employed t-tests and p-values to assess the relevance of our recommended approach relative to baseline models. We concentrated on validation accuracy as it represents the model’s optimal performance and offers the most dependable approximation of final accuracy. The t-test assesses if the differences in effects between the proposed model and the baseline model are statistically significant, indicating that they did not arise by coincidence. For each model pair, the experimenter assesses their performance over time, calculates the difference, and examines the cumulative nature of this difference. If the p-value from the test is below 0.05, the proposed model demonstrates superior performance. Table 8 illustrates the statistical significance of the proposed mode.
Alongside the t-test, we computed 95% confidence intervals for the mean effect of each measure. These intervals delineate the range within which the real average resides, so facilitating our comprehension of the accuracy and efficacy of the performance. The integration of the p-value and confidence interval enabled us to assess the models not only by their mean but also by the reliability and validity of their outcomes. The confidence interval analysis is shown in Fig. 8.
Table 9 represents the state of the art studies. The state of the art specific models such as ensemble of deep learning models Inception-V3, Xception, ResNet50 were used by the [9] for the detection and achieved 93.5% AUC. Cheng et al. [10] employed Auto machine learning and achieved 88% accuracy and 85% sensitivity in segmentation. Cao et al. [27] employed MT-SCNet architecture and achieved 92% recall. Similarly Gao et al. [19] used HLCA-UNet achieved 77.99% recall rate.
4.5 Limitations and Future Directions
The dataset was derived from a single hospital, which may limit the heterogeneity in terms of patient demographics and geography, which may affect the generalizability of the results to different populations. Furthermore, while the model performed well on the balanced dataset, its performance was less consistent and variable on the imbalanced dataset, highlighting the challenges of real-world application. Another limitation is that we used RGB data for cholangiocarcinoma diagnosis. Despite the model’s promising performance, cancer segmentation was not feasible with the current approach. Future efforts will focus on enhancing the model’s ability to precisely segment cancer regions, which will further improve diagnostic accuracy. Additionally, incorporating a lightweight version of the model into a web-based application would facilitate broader adoption, allowing healthcare workers to diagnose cholangiocarcinoma more efficiently. We plan to develop a lightweight web application to facilitate clinical access and support real-time decision-making. We will employ model enhancement techniques, such as post-training and cross-validation, to improve memory utilization and computational efficiency. To ensure rapid deployment, the developed model has been calibrated for CPU speed and on-board processing to ensure good response in high-load environments. Based on our preliminary analysis, we expect the system to perform well on small hardware configurations, which do not require a dedicated GPU. The web application is developed using a simple interface, and the interface is implemented using the Python API. The network allows clinicians to edit images and provide insights within seconds, and supports seamless integration with existing digital workflows. Future testing will explore the model’s potential in detecting other abdominal cancers using this cutting-edge microscopy-based diagnostic technology. The work will focus on improving the robustness of the model against imbalanced data and expanding the test to different clinical contexts to better understand the generalizability of the results.
5 Conclusion
This study successfully employed microscopic imaging analysis, coupled with a modified Visual Geometry Group (VGG) architecture, to develop a reliable and efficient AI-based diagnostic method for cholangiocarcinoma. By utilizing microscopic data, this approach taps into the intricate spatial and spectral details provided by hyperspectral imaging, elevating the accuracy of cancer detection. The integration of advanced image augmentation techniques–such as random translations, rotations, contrast, brightness adjustments, and flips–enhanced the model’s generalization ability by exposing it to a broader range of data variations, making the model more resilient to input changes.
The proposed ensemble model demonstrated a notable improvement in detection accuracy, primarily due to the combination of base models, which allowed for more robust classification performance. Initially, the model was tested on an imbalanced dataset, achieving an accuracy of 84.53% and f1 score of 83.76%. After applying augmentation techniques, the model’s performance significantly improved, achieving an accuracy of 98.66% and recall of 98.58%. This study also highlights that, while EfficientNet showed impressive results, DenseNet excelled in handling imbalanced data, offering valuable insights into the deep learning architecture’s adaptability in microscopic data analysis. The proposed method achieved an AUC of 99.93 for class 0, and 99.96 for class 2, indicating its reliability for early-stage detection.
Data Availability
The dataset for the experiments are cited in the paper as reference [16].
References
Brindley, P.J., Bachini, M., Ilyas, S.I., Khan, S.A., Loukas, A., Sirica, A.E., Teh, B.T., Wongkham, S., Gores, G.J.: Cholangiocarcinoma. Nat. Rev. Disease Prime. 7(1), 65 (2021)
Vithayathil, M., Khan, S.A.: Current epidemiology of cholangiocarcinoma in western countries. J. Hepatol. 77(6), 1690–1698 (2022)
Malhi, H., Gores, G.J.: Cholangiocarcinoma: modern advances in understanding a deadly old disease. J. Hepatol. 45(6), 856–867 (2006)
Malka, D., Siebenhüner, A.R., Mertens, J.C., Schirmacher, P.: The importance of molecular testing in the treatment of cholangiocarcinoma. Oncology (2020)
Vicent, S., Lieshout, R., Saborowski, A., Verstegen, M.M., Raggi, C., Recalcati, S., Invernizzi, P., Laan, L.J., Alvaro, D., Calvisi, D.F., et al.: Experimental models to unravel the molecular pathogenesis, cell of origin and stem cell properties of cholangiocarcinoma. Liver Int. 39, 79–97 (2019)
Cheng, S., Liu, S., Yu, J., Rao, G., Xiao, Y., Han, W., Zhu, W., Lv, X., Li, N., Cai, J., et al.: Robust whole slide image analysis for cervical cancer screening using deep learning. Nat. Commun. 12(1), 5639 (2021)
Cui, R., Yu, H., Xu, T., Xing, X., Cao, X., Yan, K., Chen, J.: Deep learning in medical hyperspectral images: a review. Sensors 22(24), 9790 (2022)
Chakrabarti, S., Rao, U.S.: Lightweight neural network for smart diagnosis of cholangiocarcinoma using histopathological images. Sci. Rep. 13(1), 18854 (2023)
Hu, R., Li, H., Horng, H., Thomasian, N.M., Jiao, Z., Zhu, C., Zou, B., Bai, H.X.: Automated machine learning for differentiation of hepatocellular carcinoma from intrahepatic cholangiocarcinoma on multiphasic mri. Sci. Rep. 12(1), 7924 (2022)
Cheng, N., Ren, Y., Zhou, J., Zhang, Y., Wang, D., Zhang, X., Chen, B., Liu, F., Lv, J., Cao, Q., et al.: Deep learning-based classification of hepatocellular nodular lesions on whole-slide histopathologic images. Gastroenterology 162(7), 1948–1961 (2022)
Zhang, Q., Li, Q., Yu, G., Sun, L., Zhou, M., Chu, J.: A multidimensional choledoch database and benchmarks for cholangiocarcinoma diagnosis. IEEE Access 7, 149414–149421 (2019). https://doi.org/10.1109/ACCESS.2019.2947470
Xiao, H., Wang, J., Weng, Z., Lin, X., Shu, M., Shen, J., Sun, P., Cai, M., Xiang, X., Li, B., et al.: A histopathology-based artificial intelligence system assisting the screening of genetic alteration in intrahepatic cholangiocarcinoma. Br. J. Cancer 1–8 (2024). https://doi.org/10.1038/s41416-024-02910-5
Kumar, S.S., Sahoo, O.P., Mundada, G., Aala, S., Sudarsa, D., Pandey, O.J., Chinnadurai, S., Matoba, O., Muniraj, I., Deshpande, A.: Deep learning-based hyperspectral microscopic imaging for cholangiocarcinoma detection and classification. Opt. Contin. 3(8), 1311–1324 (2024)
Deng, Y., Yin, J., Wang, Y., Chen, J., Sun, L., Li, Q.: Resnet-50 based method for cholangiocarcinoma identification from microscopic hyperspectral pathology images. In: Journal of Physics: Conference Series, vol. 1880, p. 012019 (2021). IOP Publishing
Xie, J., Pu, X., He, J., Qiu, Y., Lu, C., Gao, W., Wang, X., Lu, H., Shi, J., Xu, Y., et al.: Survival prediction on intrahepatic cholangiocarcinoma with histomorphological analysis on the whole slide images. Comput. Biol. Med. 146, 105520 (2022)
Sun, L., Zhou, M., Li, Q., Hu, M., Wen, Y., Zhang, J., Lu, Y., Chu, J.: Diagnosis of cholangiocarcinoma from microscopic hyperspectral pathological dataset by deep convolution neural networks. Methods 22–30 (2021). https://doi.org/10.1016/j.ymeth.2021.04.005
Zhou, H., Li, J., Huang, J., Yue, Z.: A histopathological image classification method for cholangiocarcinoma based on spatial-channel feature fusion convolution neural network. Front. Oncol. 13, 1237816 (2023)
Shah, A.A., Alturise, F., Alkhalifah, T., Faisal, A., Khan, Y.D.: Edlm: ensemble deep learning model to detect mutation for the early detection of cholangiocarcinoma. Genes 14(5), 1104 (2023)
Gao, H., Yang, M., Cao, X., Liu, Q., Xu, P.: A high-level feature channel attention unet network for cholangiocarcinoma segmentation from microscopy hyperspectral images. Mach. Vis. Appl. 34(5), 72 (2023)
Hammad, M., Bakrey, M., Bakhiet, A., Tadeusiewicz, R., Abd El-Latif, A.A., Pławiak, P.: A novel end-to-end deep learning approach for cancer detection based on microscopic medical images. Biocybern. Biomed. Eng. 42(3), 737–748 (2022)
Li, C., Wang, M., Sun, X., Zhu, M., Gao, H., Cao, X., Ullah, I., Liu, Q., Xu, P.: A novel dimensionality reduction algorithm for cholangiocarcinoma hyperspectral images. Opt. Laser Technol. 167, 109689 (2023)
El-Shafai, W., Mahmoud, A.A., Ali, A.M., El-Rabaie, E.-S.M., Taha, T.E., El-Fishawy, A.S., Zahran, O., El-Samie, F.E.A.: Efficient classification of different medical image multimodalities based on simple cnn architecture and augmentation algorithms. J. Opt. 53(2), 775–787 (2024)
Salehi, A.W., Khan, S., Gupta, G., Alabduallah, B.I., Almjally, A., Alsolai, H., Siddiqui, T., Mellit, A.: A study of cnn and transfer learning in medical imaging: Advantages, challenges, future scope. Sustainability 15(7), 5930 (2023)
Aburaed, N., Al-Saad, M., Zitouni, M.S., Alkhatib, M.Q., Wahbah, M., Halawani, Y., Panthakkan, A.: Cancer detection in hyperspectral imagery using artificial intelligence: Current trends and future directions. In: Artificial Intelligence for Medicine, pp. 133–149. Elsevier (2024)
Banerjee, C., Mukherjee, T., Pasiliao Jr, E.: An empirical study on generalizations of the relu activation function. In: Proceedings of the 2019 ACM Southeast Conference, pp. 164–167 (2019)
Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Sci. Rep. 12(1), 5979 (2022)
Cao, X., Gao, H., Zhang, H., Fei, S., Xu, P., Wang, Z.: Mt-scnet: multi-scale token divided and spatial-channel fusion transformer network for microscopic hyperspectral image segmentation. Front. Oncol. 14, 1469293 (2024)
Acknowledgements
This research is supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2024R136), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia. The authors are also thankful to Prince Sultan University, Riyadh Saudi Arabia of APC support.
Funding
This research is funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2024R136), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.
Author information
Authors and Affiliations
Contributions
MM conceived the idea, performed data curation and wrote the original draft. KK performed data curation, formal analysis, and designed methodology. MA dealt with software, and performed visualization. SA carried out project administration. AE performed visualization and initial investigation, supervised the study. AR performed validation and review and edit the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical Approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Mujahid, M., Kanwal, K., Abubakar, M. et al. Smart Diagnosis of Cholangiocarcinoma from Microscopic Images Using a Modified Visual Geometry Group Network with Adaptive Augmentation. Int J Comput Intell Syst 18, 230 (2025). https://doi.org/10.1007/s44196-025-00965-7
Received:
Revised:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1007/s44196-025-00965-7











