2024 - DGCBG-Net-a Dual-Branch Network With Global Cross-Modal Interaction and Boundary Guidance For Tumor Segmentation in PET-CT Images
2024 - DGCBG-Net-a Dual-Branch Network With Global Cross-Modal Interaction and Boundary Guidance For Tumor Segmentation in PET-CT Images
DGCBG-Net: a dual-branch network with global cross-modal interaction and boundary guidance for
tumor segmentation in PET/CT images
Ziwei Zou, Beiji Zou, Xiaoyan Kui, Zhi Chen and Yang Li
PII: S0169-2607(24)00121-4
DOI: https://doi.org/10.1016/j.cmpb.2024.108125
Reference: COMM 108125
Please cite this article as: Z. Zou, B. Zou, X. Kui et al., DGCBG-Net: a dual-branch network with global cross-modal interaction and boundary guidance for
tumor segmentation in PET/CT images, Computer Methods and Programs in Biomedicine, 108125, doi: https://doi.org/10.1016/j.cmpb.2024.108125.
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for
readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its
final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which
could affect the content, and all legal disclaimers that apply to the journal pertain.
• DGCBG-Net aims to tackle the challenge of insufficient cross-modal interaction and ambiguous tumor boundary segmentation in tumor segmentation tasks.
• Cross-modal fusion of global features facilitates the features interaction between PET and CT images.
• Multi-stage boundary feature extraction is employed to fully explore the potential boundary features of CT images.
• The experimental results demonstrate that DGCBG-Net outperforms existing methods, and is competitive to state-of-arts.
DGCBG-Net: a dual-branch network with global cross-modal
interaction and boundary guidance for tumor segmentation in PET/CT
images
Ziwei Zoua , Beiji Zoua , Xiaoyan Kuia,∗ , Zhi Chena and Yang Lib
a School of Computer Science and Engineering, Central South University, No. 932, Lushan South Road, ChangSha, 410083, China
b School of Informatics, Hunan University of Chinese Medicine, No. 300, Xueshi Road, ChangSha, 410208, China
DGCBG-Net is illustrated in Figure 3. DGCBG-Net consists branch takes PET images as input, CT branch takes CT
of two branches: (a) a SSB for obtaining visual representa- images and boundary features provided by BPGB. Each
tions of tumors from multi-modal images. (b) a BPGB for branch is designed for extracting high-level semantic fea-
extracting boundary feature information to assist the SSB in tures from PET/CT images. Both branches are composed
tumor boundary segmentation. of the same structure: convolution layers, 2D fast fourier
transform (FFT), parameter-learnable global filters, fourier
3.1. The semantic segmentation branch channel attention (FCA) blocks and 2D fast inverse fourier
The SSB is based on UNet architecture, comprising a transform (IFFT).
dual-branch encoder and a single-branch decoder, as shown
in Figure 3(A). The encoder extracts multi-modal feature
representations, and the decoder obtains the reconstructed
segmentation result. The encoder consists of encoder blocks,
GCIM and SMDM, the decoder includes decoder blocks and
transposed convolution layers. The GCIM extracts global
context features from the frequency domain of PET/CT
images, facilitating cross-modal interaction of global fea-
ture. The SMDM learns cross-modal semantic information
and reduces the loss of feature information during down-
sampling; Both the encoder block and the decoder blocks
comprise two sets of convolution layers, batch normalization
layers, and ReLU activation functions. A skip connection Figure 4: The structure of global cross-modal interaction
module. It comprises five basic components: convolutional
is established between the encoder block and the decoder
layer, 2D fast Fourier transform block, global filter (green area
block with the same dimensionality to preserve the original in the Figure), Fourier channel attention block (yellow area in
feature information. The goal of SSB is to learn multi-modal the Figure), and 2D fast inverse Fourier transform block. Two
complementary image features, enhancing the accuracy of parallel branches merge global feature information of different
tumor segmentation. modalities through cross-modal fusion.
⎧ 𝐹 ′ = 𝐹 ⊕ (𝐹 ⊙ 𝐾)
⎪ 𝑐𝑡 𝑐𝑡 𝑝𝑒𝑡
⎨ (2)
⎪ 𝐹 ′ 𝑝𝑒𝑡 = 𝐹𝑝𝑒𝑡 ⊕ (𝐹𝑐𝑡 ⊙ 𝐾)
⎩
Where ⊙ denotes Hadamard product, and ⊕ indicates
element-wise addition. Figure 5: The structure of shared multi-path downsampling
To capture channel feature information from complex module. It comprises two downsampling approaches: max-
tensors in the frequency domain, we propose FCA, defined pooling layer and 2D convolutional layer. Also, it utilizes an
by Eq. (3) and Eq. (4). FCA employs global average pooling enhanced channel attention block to capture global channel
to gain a global understanding of each channel, and ob- information.
tains weight factors for each channel through the sigmoid
function. FCA enhances the network’s global channel com-
prehension and emphasizes learned feature maps related to 3.2. Boundary prior-guided branch
tumor regions. Finally, feature information fusion between To fully leverage multi-stage potential boundary features
the spatial domain and frequency domain is accomplished to enhance tumor boundary segmentation performance in
through residual connections. The corresponding mathemat- the SSB, we propose the BPGB. The architecture of BPGB
ical formulas are provided below: is illustrated in Figure 3(B). Since CT images can reflect
the physical structural differences among various tissues and
provide clear boundary information, we utilize CT images as
⎧ 𝐹 ′′ = 𝑃 𝑜𝑜𝑙𝑖𝑛𝑔 ′
⎪ 𝑐𝑡 𝑔𝑙𝑜𝑏𝑎𝑙 (𝑅𝐸𝐿𝑈 [𝐶𝑜𝑛𝑣2𝑑{𝐹 𝑐𝑡 }]) the input. BPGB is constructed upon the UNet framework
⎨ (3) and comprises encoder blocks, downsampling blocks, BEM
⎪ 𝐹 ′′ 𝑝𝑒𝑡 = 𝑃 𝑜𝑜𝑙𝑖𝑛𝑔𝑔𝑙𝑜𝑏𝑎𝑙 (𝑅𝐸𝐿𝑈 [𝐶𝑜𝑛𝑣2𝑑{𝐹 ′ 𝑝𝑒𝑡 }]) and decoder blocks.
⎩
3.2.1. Boundary extraction module
⎧ 𝐹 ′′′ = 𝑆𝑖𝑔𝑚𝑜𝑖𝑑[𝐶𝑜𝑛𝑣1𝑑{𝐹 ′′ }] ⊗ 𝐹 ′ To extract potential boundary feature information from
⎪ 𝑐𝑡 𝑐𝑡 𝑐𝑡
CT images, we propose BEM. As illustrated in Figure 6,
⎨ (4)
⎪ 𝐹 ′′′ 𝑝𝑒𝑡 = 𝑆𝑖𝑔𝑚𝑜𝑖𝑑[𝐶𝑜𝑛𝑣1𝑑{𝐹 ′′ 𝑝𝑒𝑡 }] ⊗ 𝐹 ′ 𝑝𝑒𝑡 BEM comprises sobel operators, global filter layers, con-
⎩ volutional layers, batch normalization layers, and ReLU
activation. The sobel operator is employed to extract bound-
Where ⊗ denotes element-wise multiplication.
ary information from CT images, including the horizontal
3.1.2. Shared multi-path downsampling module boundary detection operator 𝑆𝑜𝑏𝑒𝑙𝑥 and the vertical bound-
ary detection operator 𝑆𝑜𝑏𝑒𝑙𝑦. 𝑆𝑜𝑏𝑒𝑙𝑥 is used to obtain the
To assist the encoder in enhancing the effectiveness of
horizontal gradient map 𝐹𝑏𝑥 , while 𝑆𝑜𝑏𝑒𝑙𝑦 is used to obtain
extracting complementary features from PET/CT modali-
the vertical gradient map 𝐹𝑏𝑦 . The final boundary feature
ties and minimizing feature loss during downsampling, we
map 𝐹𝑏𝑥𝑦 is obtained through the sigmoid function. The
propose the SMDM. During the model training process, the
corresponding formulas are provided below:
weights of each SMDM are updated twice within a single
epoch, enabling SMDM to learn features from different
modalities and reducing model complexity. Consequently,
when 𝐹𝑐𝑡′′′ pass through SMDM to obtain 𝐹𝑐𝑡′′′′ , 𝐹𝑐𝑡′′′′ will
incorporate the relevant PET modality features. Similarly,
when 𝐹𝑝𝑒𝑡′′′ pass through SMDM to obtain 𝐹 ′′′′ , 𝐹 ′′′′ will
𝑝𝑒𝑡 𝑝𝑒𝑡
incorporate the relevant CT modality features.
The structure of SMDM is illustrated in Figure 5. SMDM Figure 6: The structure of boundary extraction module. It in-
comprises convolution layers, max-pooling layers and an cludes Sobel x and Sobel y operators to compute horizontal and
enhanced channel attention block. A 2x2 convolution layer vertical gradient maps, respectively. Additionally, it integrates
is employed to preserve spatial information, enhancing the global filters and convolutional layers to extract global and
model’s capacity for multi-scale learning. The max-pooling local boundary features.
layer extracts prominent features from PET/CT images.
Table 2
Ablation experiments on the Hecktor 2022 dataset (Mean ± Standard Deviation). The 𝑝-value is derived by comparing the Dice
scores of other methods with our approach.
Methods Metrics
Settings Baseline GCIM SMDM BPGB Dice(%) IOU(%) PPV(%) Recall(%) HD(mm) 𝑝-value
1 74.26 ± 0.61 63.13 ± 0.71 85.97 ± 0.55 70.54 ± 0.64 11.45 ± 0.48 7.84e-5
2 77.06 ± 0.23 66.64 ± 0.36 87.29 ± 0.30 73.88 ± 0.38 11.78 ± 0.77 2.79e-3
3 75.47 ± 0.53 66.08 ± 0.43 87.33 ± 0.46 73.21 ± 0.11 14.98 ± 1.24 4.68e-4
4 76.27 ± 0.18 66.44 ± 0.22 88.38 ± 0.15 72.87 ± 0.27 10.55 ± 1.36 1.19e-4
5 78.45 ± 0.29 68.64 ± 0.32 88.86 ± 0.21 75.14 ± 0.33 14.77 ± 0.74 4.82e-6
6 78.14 ± 0.28 67.41 ± 0.37 85.48 ± 0.23 76.19 ± 0.32 8.46 ± 0.53 7.55e-6
7 79.29 ± 0.21 70.18 ± 0.22 88.16 ± 0.23 75.66 ± 0.39 8.72 ± 0.60 -
point, the model’s emphasis on dividing boundary areas as with BPGB may focus on some incorrect areas or fail to
completely as possible increases, leading to a greater focus fully cover all target areas (as indicated by the green and
on correctly detecting all positives, i.e., increasing attention orange circles). However, the baseline model combined with
to the Recall metric. At this time, the model will decrease BPGB or GCIM can accurately extract feature information
its focus on maximizing correctly identified positives (PPV from the tumor region, demonstrating that the proposed
metric) to some extent, resulting in a decrease in PPV SMDM module effectively mitigates the impact of mislead-
metric performance. Similarly, when the experiment setting ing features. Additionally, the feature heatmap results of
changes from 6 to 7, we introduced the SMDM module, the baseline model combined with GCIM are the closest
which focuses on learning complementary features of the to ground truth among all feature heatmaps, demonstrating
PET/CT modality. At this point, the model’s attention to the effective extraction of global features by GCIM and its
discriminative features increases, leading to a greater focus assistance in improving segmentation accuracy. Finally, the
on maximizing correctly identified positives, i.e., increas- boundary areas in the feature maps learned by the baseline
ing attention to the PPV metric. At this time, the model model with BPGB are closest to the ground truth, demon-
will decrease its focus on correctly detecting all positives strating that the boundary features extracted by BPGB can
(Recall metric) to some extent, resulting in a decrease in help the model reduce its uncertainty in tumor boundary
Recall metric performance. Additionally, the incorporation regions. In summary, the visualization results of the feature
of all three modules introduces some trade-offs, it achieves maps demonstrate the effectiveness of the proposed SMDM,
superior performance on most metrics (Dice metric, IOU BPGB, and GCIM modules.
metric, HD metric) and performs only slightly below optimal
on other metrics. 4.5.2. Ablation study on loss function
We visualized the learned feature maps of the three To analyze the impact of different loss functions on
proposed modules to provide a clearer explanation of their model segmentation performance, we conduct experiments
roles. We conducted experiments on the STS dataset and on various common loss functions using the Hecktor 2022
the Hecktor 2022 dataset, combining the baseline model dataset. The examined loss functions consist of BCE loss,
with SMDM, GCIM, and BPGB. We then visualized the Dice loss, Focal loss [39], and weight BCE+Dice loss used
feature maps extracted from the first skip connection in the in this study. The results are shown in table 3, the Dice
combined models, as illustrated in Figure 7. The first two loss demonstrates superior segmentation performance in the
columns show the visualization results of the feature maps single loss functions, surpassing other loss functions in all
for the Hecktor 2022 dataset, while the last two columns metrics. Our weight BCE+Dice loss combines the strengths
show the visualization results for the STS dataset. Figure of Dice loss and BCE loss by assigning different weights,
7 reveals that the baseline model and the baseline model resulting in a more competitive segmentation performance.
Figure 7: Comparison of spatial feature maps generated by GCIM, SMDM and BPGB modules at the first skip connection on
STS and hecktor 2022 datasets. From left to right, the first three columns represent the original CT image, PET image, and
Ground Truth. The subsequent columns show the spatial feature maps of different modules.
Table 3
Ablation experiments on the Hecktor 2022 dataset (Mean ± Standard Deviation). The 𝑝-value is derived by comparing the Dice
scores of other methods with our approach.
Losses Dice(%) IOU(%) PPV(%) Recall(%) HD(mm) 𝑝-value
BCE Loss 76.55 ± 0.15 63.38 ± 0.26 80.32 ± 0.37 71.15 ± 0.21 12.36 ± 0.79 3.65e-4
Focal Loss 77.68 ± 0.23 65.97 ± 0.48 85.36 ± 0.54 73.97 ± 0.32 13.24 ± 1.16 2.08e-4
Dice Loss 78.02 ± 0.34 71.02 ± 0.31 88.36 ± 0.27 74.68 ± 0.36 10.56 ± 1.32 6.16e-5
Weight BCE+Dice Loss
Dice=0.2 BCE=0.8 77.68 ± 0.22 65.75 ± 0.44 82.49 ± 0.42 73.35± 0.17 11.27± 1.12 5.98e-5
Dice=0.4 BCE=0.6 78.62 ± 0.34 68.82 ± 0.12 85.91 ± 0.53 74.64± 0.26 10.52± 1.39 7.74e-6
Dice=0.5 BCE=0.5 77.93 ± 0.35 68.37 ± 0.38 86.83 ± 0.24 74.01± 0.52 10.28± 0.95 4.48e-5
Dice=0.6 BCE=0.4 78.81 ± 0.28 69.48 ± 0.36 87.57 ± 0.32 74.98± 0.45 9.64± 0.74 6.44e-7
Dice=0.8 BCE=0.2 79.29 ± 0.21 70.18 ± 0.22 88.16 ± 0.23 75.66 ± 0.39 8.72 ± 0.60 -
It outperforms Dice Loss by 1.27% in Dice score, 0.98% in network (DiSegNet) proposed by [18], the multi-modal
Recall score, and reduced the HD distance by 1.84mm. spatial attention module method (MSAM) proposed by [16],
In addition, to analyze the optimal weight allocation ratio the collaborative learning feature fusion method (CoFeature-
of Dice loss and BCE loss, we conducted a series of exper- Model) proposed by [14], and the vision transformer based
iments and selected 5 representative experimental results, on UNet (TransUNet) proposed by [21].
as shown in the lower part of table 3. When the weight
allocation ratio of Dice loss and BCE loss is 0.8 to 0.2, all 4.6.1. Comparison in STS dataset
performance metrics reached their peak. For Dice scores, Table 4 shows the quantitative segmentation results of all
the loss function setting in this paper exhibits statistically methods on the STS dataset. Table 4 reveals that UNet++
significant differences compared to other settings (𝑝 ≤ 0.01). exhibits the lowest Dice and Recall scores, with values
Consequently, the optimal weight allocation ratio for the loss of 75.60% and 76.03%, respectively. DiSegNet shows the
function is set to 0.8 to 0.2. lowest IOU, PPV, and HD scores, with values of 62.23%,
71.80%, and 20.77mm, respectively. In contrast, DGCBG-
4.6. Comparisons with the existing methods Net outperforms in terms of Dice, PPV, Recall, and HD
We perform comparisons between DGCBG-Net and scores, achieving values of 80.33%, 81.14%, 84.28%, and
SOTA methods on both the STS and Hecktor 2022 datasets. 12.02 mm, respectively. Additionally, its IOU score of
To ensure a fair comparison, we apply consistent prepro- 67.64% is second only to TransUNet. For Dice scores,
cessing methods and experimental details to all methods. through significance analysis, the 𝑝-values are less than 0.01,
The compared methods include the single-modal UNet, the indicating statistical significance of the results.
dense UNet proposed by [40], the deep dilated segmentation
Figure 8: Qualitative segmentation results of four test cases on the STS dataset. From left to right, the first three columns
represent the original CT image, PET image, and Ground Truth. The subsequent columns show the segmentation results of
different segmentation models. The result images are cropped and resized for enhanced clarity. The red area represents the true
positive area, the green area represents the false positive area, and the purple area represents the false negative area.
Figure 8 illustrates visual segmentation results of the the lowest PPV score with 86.86%, TransUNet exhibits the
STS dataset. It is evident that the segmentation results pre- lowest HD score with 12.71mm. In contrast, DGCBG-Net
dicted by DGCBG-Net have the largest true positive regions excels in Dice, IOU, and HD scores, achieving the highest
and the smallest false negative regions. This notable im- values (Dice: 79.29%, IOU: 70.18%, HD: 8.72mm). It ranks
provement can be attributed to the enhanced cross-modal second only in terms of PPV and Recall scores (PPV with
interaction by the GCIM and SMDM. These modules al- 88.16%, Recall with 75.66%). Based on the Dice scores,
low for a more comprehensive utilization of multi-modal through significance analysis, the 𝑝-value being less than
feature representations. Additionally, our BPGB enhances 0.01 that indicates statistical significance of the results.
the model’s accuracy in segmenting tumor boundary re- Figure 9 illustrates the visual segmentation results of the
gions, significantly reducing false positive regions. Quan- Hecktor 2022 dataset. Notably, DGCBG-Net consistently
titative and qualitative results indicate that our DGCBG- exhibits the largest true positive regions, the smallest false
Net achieves superior segmentation results under the same positive regions, and the smallest false negative regions. Ad-
conditions. ditionally, in the bottom-most case in Figure 9, our method
correctly predicts small tumor regions surrounding the large
4.6.2. Comparison in Hecktor 2022 dataset tumor region (as indicated by the yellow circle). In contrast,
Table 5 presents the quantitative segmentation results of TransUnet predicts an incorrect location (as indicated by the
all methods on the Hecktor 2022 dataset. It is evident that red circle), and the other methods fail to predict this tumor
UNet++ demonstrates the lowest Dice score with 76.45%, region. This observation underscores the positive impact of
CoFeatureModel records the lowest IOU and Recall scores, the global contextual features extracted by GCIM on the
with values of 63.62% and 71.01%, respectively. MSAM has accuracy of small tumor region segmentation. Furthermore,
Figure 9: Qualitative segmentation results of four test cases in the hecktor 2022 dataset. From left to right, the first three
columns represent the original CT image, PET image, and Ground Truth. The subsequent columns show the segmentation results
of different segmentation models. The result images are cropped and resized for enhanced clarity. The red area represents the
true positive area, the green area represents the false positive area, and the purple area represents the false negative area.
the cross-modal information interaction effectively reduces NNUNet, which utilizes strong preprocessing and postpro-
the likelihood of mis-segmentation by the model. cessing strategies, employing the basic UNet architecture,
achieving high-precision medical image segmentation.
4.6.3. Comparison with SOTA methods Both of these works conducted extensive experiments on
To comprehensively analyze the differences between the STS dataset, achieving competitive segmentation per-
DGCBG-Net and SOTA methods, we select three recent formance. Unfortunately, the unavailability of open-source
works for a segmentation performance comparison. Luo code for these methods prevented us from replicating their
et al. [19] proposed C2BA-UNet, which comprises two experimental procedures. Therefore, we followed the exper-
crucial components: a multi-set boundary-aware module imental settings outlined in the two papers and retrained our
that effectively captures feature information in the tumor DGCBG-Net, with input image resolutions of 128 × 128
boundary region by combining gradient maps, uncertainty for C2BA-UNet and 256 × 256 for ASE-Net. We compared
maps, and horizontal maps. A context coordination module our results to those presented by C2BA-UNet and ASE-Net
that effectively integrates multi-scale information using at- across four performance metrics: Dice, IOU, Recall, and
tention mechanisms. Zhang et al. [20] proposed ASE-Net, PPV. As demonstrated in table 6, our method outperforms
which includes two key components: a pseudo-enhanced CT the SOTA methods in the majority of the metrics at both
image method based on metabolic intensity, enhancing the resolution settings. Specifically, at a resolution of 128 × 128,
positional distinctiveness of high and low metabolic areas. our method surpasses C2BA-UNet with a 1.62% higher Dice
An adaptive scale attention supervision module that learns score and a 0.82% higher IOU score. At a resolution of 256 ×
different-scale tumor features. Isensee et al. [41] proposed 256, our method outperforms ASE-Net with a 1.77% higher
Dice score, a 2.12% higher IOU score, a 1.59% higher Recall DGCBG-Net does require more parameters and computa-
score, and a 3.35% higher PPV score. tional resource requirements than the minimal CoFeature-
We compared with NNUNet on two datasets separately, Model. DGCBG-Net has achieve an effective balance be-
with the quantitative results of the two datasets shown in tween segmentation performance and computational costs.
tables 4 and 5, and the qualitative results shown in Figures
8 and 9 respectively. It can be observed from tables 4 and 5. Discussion
5 that on the STS dataset, compared to NNUNet, DGCBG-
Net improved by 1.79% in Dice score, 1.99% in Recall score, Automatic tumor segmentation is a crucial step in com-
and reduced HD distance by 2.27mm, with IOU score being puter aided disease diagnosis and treatment planning. In
close to NNUNet. On the Hecktor 2022 dataset, DGCBG- this study, we propose a dual-branch network based on
Net improved by 0.8% in Dice score, 0.91% in IOU score, global cross-modal interaction and boundary guidance for
and reduced HD distance by 1.64mm. From Figures 8 and tumor segmentation (DGCBG-Net). DGCBG-Net includes
9, it can be seen that DGCBG-Net has larger true positive three crucial components: GCIM, SMDM, and BPGB. We
regions than NNUNet, as well as smaller false positive conduct extensive ablation and comparative experiments on
and false negative regions. Both quantitative and qualitative two publicly datasets. The experimental results presented in
results demonstrate that DGCBG-Net performs excellently tables 1 - 7 and Figures 7 - 10, demonstrate the effectiveness
in segmentation performance, surpassing SOTA methods. of the three key components and the superiority of DGCBG-
Net in automatic tumor segmentation. Additionally, we con-
4.6.4. Comparison of tumor boundary segmentation duct additional experiments to analyze the rationality of the
results proposed key modules.
To further analyze the segmentation performance of We design a set of comparative experiments for GCIM,
DGCBG-Net in the tumor boundary region, we conduct transforming the cross-modal interaction approach to an
an evaluation of the boundary segmentation results for all early fusion method, as shown in Figure 11. The experiment
methods on the STS and Hecktor 2022 datasets, respectively. results on the STS dataset are presented in table 8, when
The visualized boundary segmentation results are shown in employing the comparative architecture as the cross-modal
Figure 10. On the Hecktor 2022 dataset, all methods exhibit interaction approach, the Dice score, IOU score, and Recall
insufficient segmentation of the edges. Compared methods score all fall short of the Baseline results. For Dice scores,
exhibit inferior segmentation results in some smaller or GCIM shows statistically significant differences compared
irregularly shaped tumor regions. Conversely, our DGCBG- to other methods (𝑝 ≤ 0.01). We infer that the reason for
Net, benefiting from the multi-scale boundary feature infor- this is the feature concatenation along the channel dimen-
mation provided by BPGB, achieves superior segmentation sion, leading to the model receiving a significant amount
results in these regions. On the STS dataset, all methods of redundant features. These redundant features cause the
exhibit over-segmentation, particularly in regions with ir- model to allocate more attention to background pixel regions
regular tumor edges, where compared methods struggle to than foreground pixel regions, consequently impacting the
achieve accurate segmentation. Because of the boundary as- model’s segmentation performance. By contrast, GCIM en-
sisting information provided by BPGB, our approach aligns sures the comprehensive utilization of global feature repre-
more closely with the ground truth in these areas. sentations from different modalities through the incorpora-
tion of global filters and bilateral cross-modal feature inter-
4.6.5. Comparison of Model complexity action. The experimental results in table 8 further validate
Table 7 summarizes the parameter counts and compu- the effectiveness of GCIM.
tational resource requirements for all methods. It is evident We analyzed the impact of weight sharing in SMDM by
that DGCBG-Net involves fewer parameters and computa- introducing a multi-path downsampling module (MDM) as
tional resources when compared to the majority of meth- a comparative reference. The experimental results are shown
ods. In particular, when contrasted with TransUNet, an- in table 9, indicate that SMDM outperforms MDM by 0.59%
other model capable of extracting global contextual features, in Dice score, 3.31% in IOU score, 4.7% in PPV score,
DGCBG-Net has significantly fewer parameters. Although 1.22% in Recall score, 0.8514mm in HD score, and reduces
the parameter count by 1.04Mb. For Dice scores, SMDM
shows statistically significant differences compared to other
Table 7
Comparison of model sizes and calculations expressed in trems of Params and Flops
methods (𝑝 ≤ 0.01). The experimental results suggest that a) and b) represent failure cases in the STS dataset, while c)
weight sharing has a positive impact on feature extraction and d) represent failure cases in the Hecktor 2022 dataset.
for multi-modal images. We infer that since PET scans and Such circumstances may affect physicians’ assessment and
CT scans are spatially aligned, which means that the feature diagnosis, thus impacting the precision of surgery and
representations of both scans can complement each other in radiotherapy. At this point, physicians need to integrate their
SMDM. Weight sharing leads to a reduction in false positive clinical experience with other imaging information (e.g.,
and false negative regions when delineating tumor areas. MRI) to make comprehensive judgments, thus reducing the
risks of misdiagnosis and mistreatment.
Although our DGCBG-Net has achieved satisfactory
segmentation results on the STS and Hecktor 2022 datasets,
our method still has some limitations. For instance, on the
STS dataset, DGCBG-Net obtains a relatively large Haus-
dorff distance score. We infer that this could be attributed to
the size limitations on STS dataset, the presence of variations
in tumor size and shape, and a relatively random spatial
distribution, all of which can affect the model’s generaliza-
tion ability. Given the limited generalization performance
Figure 11: Comparative architecture of GCIM. It utilizes early of our model, it is suitable for common tumor segmenta-
fusion as the cross-modal interaction approach. CT and PET tion tasks. For rare tumor types, doctors need to integrate
modality data are concatenated in a serial manner as input. other imaging information with clinical experience for com-
prehensive judgment. Furthermore, DGCBG-Net performs
tumor segmentation solely on 2D planes without considering
Despite the impressive performance achieved by DGCBG-
spatial structural information in the 3D dimension, which
Net, it still generates erroneous cases under certain con-
may impact doctors’ comprehension of the real tumor’s
ditions, as illustrated in Figure 12. When the similarity
three-dimensional structure.
between tumor and normal tissue is too high, and the
discrepancy between the tumor region displayed in the PET
image and the actual tumor region is too large, DGCBG-
Net will produce some segmentation failure cases. Figures
Table 9
Comparative experimental results of SMDM on Hecktor 2022 dataset (Mean ± Standard Deviation). The 𝑝-value is derived by
comparing the Dice scores of other methods with our approach.
Methods Dice(%) IOU(%) PPV(%) Recall(%) HD(mm) 𝑝-value Params(Mb) FLOPs(Gbps)
Baseline 74.26 ± 0.61 63.13 ± 0.71 85.97 ± 0.55 70.54 ± 0.64 11.45 ± 0.48 1.66e-3 17.63 24.06
SMDM 75.47 ± 0.53 66.08 ± 0.43 87.33 ± 0.46 73.21 ± 0.11 14.98 ± 1.24 - 19.82 25.69
MDM 74.88 ± 0.62 62.77 ± 0.37 82.63 ± 0.58 71.99 ± 0.30 15.84 ± 1.41 5.12e-4 20.86 25.69
6. Conclusion References
In this paper, we propose DGCBG-Net, a dual-branch [1] Rebecca L Siegel, Kimberly D Miller, Nikita Sandeep Wagle, and
Ahmedin Jemal. Cancer statistics, 2023. Ca Cancer J Clin, 73(1):17–
tumor segmentation network based on global cross-modal
48, 2023.
interaction and boundary guidance for PET/CT image seg- [2] Peter C Nowell. Tumor progression: a brief historical perspective.
mentation. DGCBG-Net comprises three essential compo- In Seminars in cancer biology, volume 12, pages 261–266. Elsevier,
nents: To extract global contextual features from PET and 2002.
CT modalities and facilitate bilateral cross-modal interaction [3] Luc Soler, Herve Delingette, Grégoire Malandain, Johan Montag-
nat, Nicholas Ayache, Christophe Koehl, Olivier Dourthe, Benoit
of global feature, we propose GCIM. To assist the encoder in
Malassagne, Michelle Smith, Didier Mutter, et al. Fully automatic
enhancing the effectiveness of extracting cross-modal com- anatomical, pathological, and functional segmentation from ct scans
plementary features and reducing feature loss during down- for hepatic surgery. Computer Aided Surgery, 6(3):131–142, 2001.
sampling, we propose SMDM. Finally, BPGB is proposed [4] Inês Domingues, Gisèle Pereira, Pedro Martins, Hugo Duarte, João
to extract multi-stage potential boundary features, assisting Santos, and Pedro Henriques Abreu. Using deep learning techniques
the SSB in tumor boundary area segmentation. Extensive in medical imaging: a systematic review of applications on ct and pet.
Artificial Intelligence Review, 53:4093–4160, 2020.
experiments conducted on two datasets have demonstrated [5] Anders Eklund, Paul Dufort, Daniel Forsberg, and Stephen M La-
the effectiveness of the three proposed key modules. On the Conte. Medical image processing on the gpu–past, present and future.
STS dataset, DGCBG-Net achieved a Dice score of 80.33%, Medical image analysis, 17(8):1073–1094, 2013.
an IOU score of 67.64%, and an HD distance of 12.02mm. [6] Neeraj Sharma and Lalit M Aggarwal. Automated medical image
On the Hecktor 2022 dataset, DGCBG-Net achieved a Dice segmentation techniques. Journal of medical physics/Association of
Medical Physicists of India, 35(1):3, 2010.
score of 79.29%, an IOU score of 70.18%, and an HD [7] Jeya Maria Jose Valanarasu, Poojan Oza, Ilker Hacihaliloglu, and
distance of 8.72mm. Experimental results on both datasets Vishal M Patel. Medical transformer: Gated axial-attention for medi-
have demonstrated that DGCBG-Net outperforms existing cal image segmentation. In Medical Image Computing and Computer
methods and is competitive with the SOTA approaches. Assisted Intervention–MICCAI 2021: 24th International Conference,
Strasbourg, France, September 27–October 1, 2021, Proceedings,
Part I 24, pages 36–46. Springer, 2021.
CRediT authorship contribution statement [8] Yu Gu, Jingqian Chi, Jiaqi Liu, Lidong Yang, Baohua Zhang, Dahua
Yu, Ying Zhao, and Xiaoqi Lu. A survey of computer-aided diagnosis
Ziwei Zou: Writing - Original draft preparation, Soft- of lung nodules from ct scans using deep learning. Computers in
ware. Beiji Zou: Conceptualization of this study. Xiaoyan biology and medicine, 137:104806, 2021.
Kui: Methodology, Writing - review & editing. Zhi Chen: [9] Changjian Sun, Shuxu Guo, Huimao Zhang, Jing Li, Meimei Chen,
Shuzhi Ma, Lanyi Jin, Xiaoming Liu, Xueyan Li, and Xiaohua Qian.
Data curation. Yang Li: Supervision.
a) b) c) d)
Figure 12: Illustration of failure cases with PET image as background. The red area represents the true positive area, the green
area represents the false positive area, and the purple area represents the false negative area.