0% found this document useful (0 votes)
34 views18 pages

2024 - DGCBG-Net-a Dual-Branch Network With Global Cross-Modal Interaction and Boundary Guidance For Tumor Segmentation in PET-CT Images

Uploaded by

WYS SNAPE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views18 pages

2024 - DGCBG-Net-a Dual-Branch Network With Global Cross-Modal Interaction and Boundary Guidance For Tumor Segmentation in PET-CT Images

Uploaded by

WYS SNAPE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Journal Pre-proof

DGCBG-Net: a dual-branch network with global cross-modal interaction and boundary guidance for
tumor segmentation in PET/CT images

Ziwei Zou, Beiji Zou, Xiaoyan Kui, Zhi Chen and Yang Li

PII: S0169-2607(24)00121-4
DOI: https://doi.org/10.1016/j.cmpb.2024.108125
Reference: COMM 108125

To appear in: Computer Methods and Programs in Biomedicine

Received date: 14 November 2023


Revised date: 24 February 2024
Accepted date: 7 March 2024

Please cite this article as: Z. Zou, B. Zou, X. Kui et al., DGCBG-Net: a dual-branch network with global cross-modal interaction and boundary guidance for
tumor segmentation in PET/CT images, Computer Methods and Programs in Biomedicine, 108125, doi: https://doi.org/10.1016/j.cmpb.2024.108125.

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for
readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its
final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which
could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2024 Published by Elsevier.


Highlights

• DGCBG-Net aims to tackle the challenge of insufficient cross-modal interaction and ambiguous tumor boundary segmentation in tumor segmentation tasks.
• Cross-modal fusion of global features facilitates the features interaction between PET and CT images.
• Multi-stage boundary feature extraction is employed to fully explore the potential boundary features of CT images.
• The experimental results demonstrate that DGCBG-Net outperforms existing methods, and is competitive to state-of-arts.
DGCBG-Net: a dual-branch network with global cross-modal
interaction and boundary guidance for tumor segmentation in PET/CT
images
Ziwei Zoua , Beiji Zoua , Xiaoyan Kuia,∗ , Zhi Chena and Yang Lib
a School of Computer Science and Engineering, Central South University, No. 932, Lushan South Road, ChangSha, 410083, China
b School of Informatics, Hunan University of Chinese Medicine, No. 300, Xueshi Road, ChangSha, 410208, China

ARTICLE INFO ABSTRACT


Keywords: Background and objectives: Automatic tumor segmentation plays a crucial role in cancer diagnosis
Tumor segmentation and treatment planning. Computed tomography (CT) and positron emission tomography (PET) are ex-
Cross-modal interaction tensively employed for their complementary medical information. However, existing methods ignore
Boundary guidance bilateral cross-modal interaction of global features during feature extraction, and they underutilize
PET/CT imaging multi-stage tumor boundary features.
Methods: To address these limitations, we propose a dual-branch tumor segmentation network
based on global cross-modal interaction and boundary guidance in PET/CT images (DGCBG-Net).
DGCBG-Net consists of 1) a global cross-modal interaction module that extracts global contextual
information from PET/CT images and promotes bilateral cross-modal interaction of global feature;
2) a shared multi-path downsampling module that learns complementary features from PET/CT
modalities to mitigate the impact of misleading features and decrease the loss of discriminative
features during downsampling; 3) a boundary prior-guided branch that extracts potential boundary
features from CT images at multiple stages, assisting the semantic segmentation branch in improving
the accuracy of tumor boundary segmentation.
Results: Extensive experiments are conducted on STS and Hecktor 2022 datasets to evaluate the
proposed method. The average Dice scores of our DGCB-Net on the two datasets are 80.33% and
79.29%, with average IOU scores of 67.64% and 70.18%. DGCB-Net outperformed the current state-
of-the-art methods with a 1.77% higher Dice score and a 2.12% higher IOU score.
Conclusions: Extensive experimental results demonstrate that DGCBG-Net outperforms existing
segmentation methods, and is competitive to state-of-arts.

1. Introduction the tumor intensity in CT images is similar to adjacent


tissues, PET images provide differentiation through varia-
Cancer has emerged as one of the world’s most prevalent
tions in metabolic intensity. Concurrently, the combination
and deadly diseases, claiming over ten million lives annually
of anatomical details from CT images helps identify necrotic
[1]. Medical imaging techniques play a pivotal role in early tumor regions characterized by reduced metabolic activity in
cancer diagnosis and treatment, assisting physicians in as- PET. In current clinical diagnosis, tumor segmentation still
certaining tumor dimensions, locations, and visualizing re- relies on manual annotations by physicians. This process is
lationships between tumors and adjacent tissues [2, 3]. Two laborious and time-intensive, susceptible to variability due
common medical imaging techniques used in cancer diag-
to differences in physicians’ expertise and perspectives [7].
nosis are computed tomography (CT) and positron emission
Such variability can impact the subsequent tumor resection
tomography (PET) [4]. CT provides high-resolution anatom- and treatment planning. Consequently, computer-aided di-
ical structural information [5]. PET can reflect the biological agnosis (CAD) technology is extensively employed in the
activity and metabolic functional information of different field of tumor segmentation[8, 9]. CAD assists physicians in
tissues and organs. Notably, due to the predilection of tumor
tumor diagnosis and localization, providing technical sup-
cells for glucose as their energy source, the use of 18 F-
port for the formulation of subsequent treatment strategies
fluorodeoxyglucose (18 F-FDG) PET enables high sensitivity [10, 11].
and specificity in tumor visualization [6]. Figure 1 illustrates In the CAD, automatic tumor segmentation methods
cross-sectional PET/CT image pairs of patients with head based on multi-modal scanning have become a current re-
and neck tumors or soft tissue sarcomas, respectively. When search hotspot [12, 13]. The key challenge in multi-modal

This work was supported in part by the National Key R&D Program tumor segmentation lies in effectively utilizing the comple-
of China under Grant 2018AAA0102100, the National Natural Science mentary features[14, 15]. Existing methods typically utilize
Foundation of China under Grants U22A2034, 62177047, the High Caliber- two branches to separately extract multi-stage features from
Foreign Experts Introduction Plan funded by MOST. and the Central South
University Research Programme of Advanced Interdisciplinary Studies PET and CT images, followed by integrating the feature
under Grant 2023QYIC020. representations of the two modalities through complexly de-
∗ Corresponding author
signed module and network architectures [16, 17, 18, 19, 20].
[email protected] (X. Kui) However, these methods rely on implicit learning of the
ORCID (s):
correlation and interaction between CT and PET images,

Ziwei Zou et al.: Preprint submitted to Elsevier Page 1 of 15


feature extraction, assisting the SSB in precise tumor bound-
ary segmentation and decreasing the model’s uncertainty
in the boundary regions. The SSB consists of two cru-
cial components: the global cross-modal interaction mod-
ule (GCIM) and the shared multi-path downsampling mod-
ule (SMDM). GCIM facilitates the learning of comple-
mentary cross-modal feature representations without neg-
atively impacting the extraction of high-level features in
individual modality branches. GCIM utilizes global filters
to extract crucial global contextual features from the fre-
quency domain of PET/CT images, integrating them into
the corresponding modality branch through cross-fusion to
promote cross-modal feature interaction. SMDM enhances
the model’s ability to utilize complementary features from
Figure 1: The CT/PET image examples of soft tissue PET/CT modalities through weight sharing, improving tu-
sarcomas and head and neck tumors. From left to right, the mor segmentation performance. Additionally, SMDM inte-
first column shows CT image examples of the two types of grates multiple downsampling methods to mitigate feature
tumors, the second column shows PET image examples of
loss during downsampling. Within BPGB, we propose a
the two types of tumors, and the third column represents the
boundary extraction module (BEM), which can extract po-
Ground Truth of the two types of tumors.
tential boundary features from CT images. BPGB acquires
multi-level boundary features by applying the BEM at dif-
ferent stages.
respectively. The multi-modal feature representations have The main contributions of this paper are :
not been fully explored. Individual PET/CT images are not
entirely flawless, they require the utilization of beneficial • We propose DGCBG-Net, which consists of SSB and
features from corresponding modal images to achieve ac- BPGB to fully utilize cross-modal semantic informa-
curate tumor segmentation. Additionally, the extraction of tion and boundary features.
global dependency in image features is also a critical step
in achieving precise tumor segmentation [21]. Common • We develop a SSB to extract visual representations
approaches for obtaining global contextual information in- of tumors, which includes GCIM and SMDM. The
troduce higher computational complexity and relative lower GCIM facilitates bilateral cross-modal interaction of
efficiency [22]. global features, while SMDM enhances the model’s
Existing tumor segmentation methods give limited con- ability to utilize complementary features from PET/CT
sideration to tumor boundary features. Indeed, boundary modalities.
features play a crucial role in assisting models for tumor
• We design a BPGB that extract multi-stage potential
localization and reducing uncertainty in tumor boundary
boundary features to enhance tumor boundary seg-
segmentation [23, 24]. For segmenting tumor boundary ac-
mentation performance.
curately, existing methods primarily concentrate on tumor
boundary features at the model’s input or output layers • The experimental results on the STS and Hecktor 2022
[25, 26, 27, 28]. These methods employ boundary-aware datasets demonstrate that DGCBG-Net outperforms
algorithms or enhanced loss functions to acquire boundary existing methods and is competitive with state-of-the-
supervision information. Although these approaches have art (SOTA) methods.
demonstrated meaningful results in improving the model’s
ability to segment tumor boundary. The challenge for bound-
ary features extraction at different stages remains to be 2. Related work
addressed. Investigating these boundary features can offer 2.1. Tumor segmentation methods based on
a more comprehensive understanding of the tumor bound- multi-modal images
ary, potentially leading to more accurate and robust tumor Multi-modal tumor segmentation methods can be cat-
segmentation. egorized into three types based on their fusion strategies
To fully exploit the complementary semantic informa- [29, 30, 31, 32]: early fusion methods, late fusion methods
tion and boundary feature from multi-modal images and and hybrid fusion methods, as shown in Figure 2.
achieve precise automatic tumor segmentation. This study
propose a dual-branch network based on global cross-modal 2.1.1. Early Fusion methods
interaction and boundary guidance, tremed as DGCBG-Net. Early fusion refers to concatenating multi-modal images
DGCBG-Net comprises a semantic segmentation branch as the model’s input and using customized modules to extract
(SSB) and a boundary prior-guided branch (BPGB). The abundant multi-modal features. Xu et al. [18] designed a
SSB extract visual representation of tumors from multi- multi-stage atrous spatial pyramid pooling module, which
modal images. The BPGB focus on multi-stage boundary provided a wealth of feature information to the decoder.

Ziwei Zou et al.: Preprint submitted to Elsevier Page 2 of 15


modalities during feature extraction and lack cross-modal
interaction of features.

2.1.3. Hybrid Fusion methods


Hybrid fusion refers to integrating feature maps from
other modalities into model’s feature extraction or image
reconstruction process. This enables the model to acquire
additional feature representations at different stages, ulti-
mately enhancing segmentation performance. Bi et al. [30]
introduced a cyclic fusion network that progressively fuses
Figure 2: Three different fusion strategies in multi-modal intermediate segmentation results from different stages with
segmentation methods. Part a) represents the early fusion
multi-modal image features, consequently enhancing the
strategy. Part b) represents the late fusion strategy, and part
accuracy of tumor segmentation. Fu et al. [16] designed a
c) represents the hybrid fusion strategy.
dual-branch multi-modal segmentation network. It utilizes
the segmentation result from the PET branch as a spatial
attention map, progressively fed into the decoder of the
Furthermore, this approach introduced a novel loss function CT branch through resampling techniques, achieving multi-
named "cosine-sine" to guide the model’s focus towards modal feature complementation learning. Zhou et al. [17]
misclassified pixels. Luo et al. [19] proposed C2BA-UNet, proposed a multi-modal segmentation network guided by
which comprises two key components: a multi set boundary triple attention fusion. By integrating triple attentions into
aware module that combines gradient maps, uncertainty feature maps at different decoder stages, the model acquires
maps, and horizontal maps, effectively capturing feature complementary multi-modal fusion feature representations.
information in the tumor boundary region. A context co- However, these approaches have not fully leveraged the
ordination module, effectively integrating multi-scale infor- correlations and interactions in multi-modal data, the dis-
mation with attention mechanisms. Zhang et al. [20] pro- criminative features of the two modal images should com-
posed a method that generates pseudo-enhanced CT images plement each other. Furthermore, these methods lack the
based on metabolic intensity, enhancing tumor regions while global understanding of image features, which can impact
suppresses background regions. They also introduced an the model’s accuracy and generalization.
adaptive scale attention supervision module, which automat-
ically selects the suitable path for tumors at various scales. 2.2. Segmentation methods for tumor boundaries
However, these methods commonly employ fixed fusion To enhance the model’s ability for tumor boundary seg-
ratios, overlooking spatially varying visual features, such as mentation, Dutta et al. [25] designed a reproducible deep
varying importance in different regions. learning algorithm. This algorithm extracted radiomics fea-
tures from the segmented tumor regions and used these fea-
2.1.2. Late Fusion methods
tures to evaluate the sensitivity of tumor boundary features.
Late fusion involves the use of multiple branches to sepa-
Tang et al. [26] designed a boundary prediction module
rately extract high-level semantic features from PET/CT im-
that utilized distance transformation on boundary images
ages. These multi-modal features are then integrated through
to acquire boundary supervision information. Han et al.
dedicated fusion modules, achieving accurate tumor seg-
[27] designed a boundary loss function that incorporates
mentation. Zhao et al. [12] proposed a multi-modal tumor
weighted calculations on the boundaries, distances, and ar-
segmentation method based on VNet, which employed dual-
eas of the segmentation results. This approach facilitates
branch encoders to extract features from CT and PET im-
the learning of additional boundary and contour features.
ages, and utilizing cascaded convolutional blocks to fuse
Tang et al. [28] proposed a weakly supervised general lesion
features from both modalities. Zhong et al. [13] designed two
segmentation method, which incorporated a dual attention
parallel 3DUNets for PET and CT images, enabling mutual
and a scale attention modules to generate high-resolution
communication between feature maps of the two modalities
feature representations with strong positional sensitivity.
in a cascaded manner. This approach enables each UNet to
This paper also proposed a region level set loss function
learn complementary feature representations from images
to enhance the segmentation accuracy of tumor boundary
of different modalities. Kumar et al. [14] presented a co-
regions. Nevertheless, these methods exclusively concen-
learning feature fusion map approach, utilizing two parallel
trate on extracting boundary features at the model’s input or
encoders to extract feature maps for the PET and CT images,
output layers, overlooking the latent multi-stage boundary
respectively. These feature maps are subsequently fused in
features.
a spatially varying manner, enabling clear quantification of
the relative weights assigned to different modality features.
Huang et al. [15] designed an improved spatial attention 3. Methods
network to emphasize tumor region, achieving competitive We develop a dual-branch tumor segmentation net-
segmentation performance on multiple datasets. However, work based on global cross-modal interaction and boundary
these methods may neglect the correlation between the two guidance, referred as DGCBG-Net. The overview of the

Ziwei Zou et al.: Preprint submitted to Elsevier Page 3 of 15


Figure 3: The overview of our DGCBG-Net, consisting of two crucial branches: (A) the semantic segmentation branch (green
area in the Figure), and (B) the boundary prior-guided branch (orange area in the Figure). The semantic segmentation branch
comprises four modules: encoder block, global cross-modal interaction module, shared multi-path downsampling module, and
decoder. The boundary prior-guided branch consists of four modules: encoder block, downsampling block, boundary extraction
module, and decoder. The loss function of DGCBG-Net is the weighted sum of segmentation loss and edge loss.

DGCBG-Net is illustrated in Figure 3. DGCBG-Net consists branch takes PET images as input, CT branch takes CT
of two branches: (a) a SSB for obtaining visual representa- images and boundary features provided by BPGB. Each
tions of tumors from multi-modal images. (b) a BPGB for branch is designed for extracting high-level semantic fea-
extracting boundary feature information to assist the SSB in tures from PET/CT images. Both branches are composed
tumor boundary segmentation. of the same structure: convolution layers, 2D fast fourier
transform (FFT), parameter-learnable global filters, fourier
3.1. The semantic segmentation branch channel attention (FCA) blocks and 2D fast inverse fourier
The SSB is based on UNet architecture, comprising a transform (IFFT).
dual-branch encoder and a single-branch decoder, as shown
in Figure 3(A). The encoder extracts multi-modal feature
representations, and the decoder obtains the reconstructed
segmentation result. The encoder consists of encoder blocks,
GCIM and SMDM, the decoder includes decoder blocks and
transposed convolution layers. The GCIM extracts global
context features from the frequency domain of PET/CT
images, facilitating cross-modal interaction of global fea-
ture. The SMDM learns cross-modal semantic information
and reduces the loss of feature information during down-
sampling; Both the encoder block and the decoder blocks
comprise two sets of convolution layers, batch normalization
layers, and ReLU activation functions. A skip connection Figure 4: The structure of global cross-modal interaction
module. It comprises five basic components: convolutional
is established between the encoder block and the decoder
layer, 2D fast Fourier transform block, global filter (green area
block with the same dimensionality to preserve the original in the Figure), Fourier channel attention block (yellow area in
feature information. The goal of SSB is to learn multi-modal the Figure), and 2D fast inverse Fourier transform block. Two
complementary image features, enhancing the accuracy of parallel branches merge global feature information of different
tumor segmentation. modalities through cross-modal fusion.

3.1.1. Global cross-modal interaction module


To extract global context features from PET and CT Initially, non-destructive transformations between the
images, and facilitate bilateral cross-modal feature interac- frequency domain and spatial domain are accomplished
tion. Inspired by [22, 33], we propose the GCIM. GCIM using a 2D FFT as formulated in Eq. (1).
extract crucial global contextual features from the frequency ⎧ 𝐹 = 𝑓 [𝑋 ] ∈ ℂ𝐻×𝑊 ×𝐶
domain of PET/CT images and merge them through cross- ⎪ 𝑐𝑡 𝑐𝑡
fusion into another branch. GCIM promotes complementary ⎨ [ ] (1)
⎪ 𝐹𝑝𝑒𝑡 = 𝑓 𝑋𝑝𝑒𝑡 ∈ ℂ𝐻×𝑊 ×𝐶
feature interaction without negatively impacting the extrac- ⎩
tion of high-level features from individual branch.
Where 𝑓 [⋅] represents 2DFFT, 𝑋𝑐𝑡 and 𝑋𝑝𝑒𝑡 represent the
The structure of GCIM is illustrated in Figure 4. GCIM
inputs of the CT branch and PET branch, respectively. 𝐹𝑐𝑡
consists of two branches: CT branch and PET branch. PET
and 𝐹𝑝𝑒𝑡 represent the spectra of 𝑋𝑐𝑡 and 𝑋𝑝𝑒𝑡 , respectively.

Ziwei Zou et al.: Preprint submitted to Elsevier Page 4 of 15


To facilitate cross-modal interaction of global feature, To mitigate the loss of discriminative features, we em-
global filters 𝐾 are employed to extract global contextual ploy element-wise addition to combine two downsampling
features from PET/CT images. These crucial global features approaches. Subsequently, the enhanced channel attention
are then cross-merged into their respective branches through block capture global channel information, selectively em-
element-wise addition, as Eq. (2) formulated. Additionally, phasizing the learned feature maps related to the tumor
within each branch, the global filter 𝐾 will share parameter region in the input image.
weights, enabling it to learn feature representations of dif-
ferent modality images.

⎧ 𝐹 ′ = 𝐹 ⊕ (𝐹 ⊙ 𝐾)
⎪ 𝑐𝑡 𝑐𝑡 𝑝𝑒𝑡
⎨ (2)
⎪ 𝐹 ′ 𝑝𝑒𝑡 = 𝐹𝑝𝑒𝑡 ⊕ (𝐹𝑐𝑡 ⊙ 𝐾)

Where ⊙ denotes Hadamard product, and ⊕ indicates
element-wise addition. Figure 5: The structure of shared multi-path downsampling
To capture channel feature information from complex module. It comprises two downsampling approaches: max-
tensors in the frequency domain, we propose FCA, defined pooling layer and 2D convolutional layer. Also, it utilizes an
by Eq. (3) and Eq. (4). FCA employs global average pooling enhanced channel attention block to capture global channel
to gain a global understanding of each channel, and ob- information.
tains weight factors for each channel through the sigmoid
function. FCA enhances the network’s global channel com-
prehension and emphasizes learned feature maps related to 3.2. Boundary prior-guided branch
tumor regions. Finally, feature information fusion between To fully leverage multi-stage potential boundary features
the spatial domain and frequency domain is accomplished to enhance tumor boundary segmentation performance in
through residual connections. The corresponding mathemat- the SSB, we propose the BPGB. The architecture of BPGB
ical formulas are provided below: is illustrated in Figure 3(B). Since CT images can reflect
the physical structural differences among various tissues and
provide clear boundary information, we utilize CT images as
⎧ 𝐹 ′′ = 𝑃 𝑜𝑜𝑙𝑖𝑛𝑔 ′
⎪ 𝑐𝑡 𝑔𝑙𝑜𝑏𝑎𝑙 (𝑅𝐸𝐿𝑈 [𝐶𝑜𝑛𝑣2𝑑{𝐹 𝑐𝑡 }]) the input. BPGB is constructed upon the UNet framework
⎨ (3) and comprises encoder blocks, downsampling blocks, BEM
⎪ 𝐹 ′′ 𝑝𝑒𝑡 = 𝑃 𝑜𝑜𝑙𝑖𝑛𝑔𝑔𝑙𝑜𝑏𝑎𝑙 (𝑅𝐸𝐿𝑈 [𝐶𝑜𝑛𝑣2𝑑{𝐹 ′ 𝑝𝑒𝑡 }]) and decoder blocks.

3.2.1. Boundary extraction module
⎧ 𝐹 ′′′ = 𝑆𝑖𝑔𝑚𝑜𝑖𝑑[𝐶𝑜𝑛𝑣1𝑑{𝐹 ′′ }] ⊗ 𝐹 ′ To extract potential boundary feature information from
⎪ 𝑐𝑡 𝑐𝑡 𝑐𝑡
CT images, we propose BEM. As illustrated in Figure 6,
⎨ (4)
⎪ 𝐹 ′′′ 𝑝𝑒𝑡 = 𝑆𝑖𝑔𝑚𝑜𝑖𝑑[𝐶𝑜𝑛𝑣1𝑑{𝐹 ′′ 𝑝𝑒𝑡 }] ⊗ 𝐹 ′ 𝑝𝑒𝑡 BEM comprises sobel operators, global filter layers, con-
⎩ volutional layers, batch normalization layers, and ReLU
activation. The sobel operator is employed to extract bound-
Where ⊗ denotes element-wise multiplication.
ary information from CT images, including the horizontal
3.1.2. Shared multi-path downsampling module boundary detection operator 𝑆𝑜𝑏𝑒𝑙𝑥 and the vertical bound-
ary detection operator 𝑆𝑜𝑏𝑒𝑙𝑦. 𝑆𝑜𝑏𝑒𝑙𝑥 is used to obtain the
To assist the encoder in enhancing the effectiveness of
horizontal gradient map 𝐹𝑏𝑥 , while 𝑆𝑜𝑏𝑒𝑙𝑦 is used to obtain
extracting complementary features from PET/CT modali-
the vertical gradient map 𝐹𝑏𝑦 . The final boundary feature
ties and minimizing feature loss during downsampling, we
map 𝐹𝑏𝑥𝑦 is obtained through the sigmoid function. The
propose the SMDM. During the model training process, the
corresponding formulas are provided below:
weights of each SMDM are updated twice within a single
epoch, enabling SMDM to learn features from different
modalities and reducing model complexity. Consequently,
when 𝐹𝑐𝑡′′′ pass through SMDM to obtain 𝐹𝑐𝑡′′′′ , 𝐹𝑐𝑡′′′′ will
incorporate the relevant PET modality features. Similarly,
when 𝐹𝑝𝑒𝑡′′′ pass through SMDM to obtain 𝐹 ′′′′ , 𝐹 ′′′′ will
𝑝𝑒𝑡 𝑝𝑒𝑡
incorporate the relevant CT modality features.
The structure of SMDM is illustrated in Figure 5. SMDM Figure 6: The structure of boundary extraction module. It in-
comprises convolution layers, max-pooling layers and an cludes Sobel x and Sobel y operators to compute horizontal and
enhanced channel attention block. A 2x2 convolution layer vertical gradient maps, respectively. Additionally, it integrates
is employed to preserve spatial information, enhancing the global filters and convolutional layers to extract global and
model’s capacity for multi-scale learning. The max-pooling local boundary features.
layer extracts prominent features from PET/CT images.

Ziwei Zou et al.: Preprint submitted to Elsevier Page 5 of 15


4. EXPERIMENTS
⎧ ⎡ −1 0 +1 ⎤ 4.1. Datasets
⎪ 𝐹bx = 𝑋𝑐𝑡 × ⎢ −2 0 +2 ⎥ In the experimental session, we utilize two publicly
⎪ ⎢ ⎥
⎣ −1 0 +1 ⎦ accessible datasets to evaluate our DGCBG-Net: the soft

⎨ (5) tissue sarcomas (STS) dataset [36] and the head and neck
⎪ ⎡ +1 +2 +1 ⎤ tumor (Hecktor) 2022 dataset [37]. Additionally, during the
⎪ 𝐹 =𝑋 ×⎢ 0 0 0 ⎥ model training phase, we apply k-fold cross-validation to
⎪ 𝑏𝑦 𝑐𝑡
⎢ ⎥
⎩ ⎣ −1 −2 −1 ⎦ each dataset individually to maximize data utilization and
reduce overfitting.
{√ } The STS dataset contains PET/CT scan data from 51
( )2 ( )2 confirmed cases of four-limb STS patients, acquired from
𝐹𝑏𝑥𝑦 = 𝑆𝑖𝑔𝑚𝑜𝑖𝑑 𝐹bx + 𝐹by (6)
different scanning machines. The CT scans in the STS
dataset have a 512 × 512 in-plane resolution. Each CT
Where 𝑋𝑐𝑡 represents the input CT images. and PET volume contains 91 to 311 slices, with 15 to
To extract tumor boundary features from 𝐹𝑏𝑥𝑦 , we em- 82 slices annotated per tumor case, and tumor annotation
ploy a global filter to extract global boundary context volumes ranging from 17.35𝑐𝑚3 to 2332.72𝑐𝑚3 . The pixel
features and utilize a 3x3 convolution for extracting local spacing within slices is 0.98mm × 0.98mm, with a slice
boundary features. The 1x1 convolution reduces the feature thickness of 3.27mm. The PET scans has a 128 × 128 in-
map to a single channel, generating the final boundary plane resolution and match the number of slices in the CT
feature map 𝐹 ′ 𝑏𝑥𝑦 as Eq. (7) formulated. Additionally, scans. The pixel spacing within PET slices is 3.91mm ×
𝐹 ′ 𝑏𝑥𝑦 is also fed to the next stage to obtain multi-stage 3.91mm, with a slice thickness of 3.27mm. For evaluating
tumor boundary feature maps. the model’s performance,, we select 36 cases (1826 slices in
[ { }] total) from the STS dataset as the cross-validation dataset,

𝐹𝑏𝑥𝑦 = 𝐶𝑜𝑛𝑣1×1 𝐶𝑜𝑛𝑣3×3 𝐹 𝑖𝑙𝑡𝑒𝑟𝑔𝑙𝑜𝑏𝑎𝑙 (𝐹𝑏𝑥𝑦 ) (7) and the remaining 15 cases (685 slices in total) serve as the
test set.
3.3. Loss function The Hecktor 2022 dataset comprises PET/CT scan data
In DGCBG-Net, the total loss function 𝐿𝑡𝑜𝑡𝑎𝑙 is defined
from 524 confirmed cases of oropharyngeal head and neck
as the weighted sum of 𝐿𝑆𝑆𝐵 and 𝐿𝐵𝑃 𝐺𝐵 . For the SSB,
cancer patients. These scans were collected from nine dif-
we employ a combination of both BCE loss [34] and Dice
ferent centers, utilizing various scanning machines and
loss [35] to optimize network weights. In the BPGB, we
schemes. Within the Hecktor 2022 dataset, the CT scans has
utilize Dice loss to optimize BPGB weights. The relevant
a 512 × 512 in-plane resolution. Each CT and PET volume
mathematical formulas are presented as follows:
contains 91 to 348 slices, with 16 to 73 slices annotated
𝐿𝑡𝑜𝑡𝑎𝑙 = 𝐿𝑆𝑆𝐵 + 𝜆𝐿𝐵𝑃 𝐺𝐵 (8) per tumor case, and tumor annotation volumes ranging from
0.79𝑐𝑚3 to 186𝑐𝑚3 . Pixel spacing within CT slices ranges
from 0.98mm to 1.36mm, and the slice thickness is 3.27mm.
The PET scans has a 128 × 128 in-plane resolution, with
( )

𝑁
[ ] the same number of slices as the CT scans. Pixel spacing
𝐿𝑆𝑆𝐵 =𝛼 − 𝑆𝑖 ⋅ log(𝐺𝑖 ) + (1 − 𝑆𝑖 ) ⋅ log(1 − 𝐺𝑖 ) within slices ranges from 2.73mm to 5.47mm, and the
𝑖=1 slice thickness is 3.27mm. To evaluate the training model’s
⎛ | | performance, we select 370 cases (17245 slices in total) from
2 × |𝐺𝑠𝑟 ∩ 𝐺𝑔𝑡 | ⎞

+ (1 − 𝛼) 1 − | |⎟ the Hecktor 2022 dataset as the cross-validation dataset, and
⎜ |𝐺 | + ||𝐺 || ⎟ the remaining 154 cases (7487 slices in total) serve as the
⎝ | 𝑠𝑟 | | 𝑔𝑡 | ⎠
(9) test set.
During the dataset extraction process, since the training
dataset contains a large number of true negative samples.
Therefore, to avoid bias during model training, we rebal-
| | anced the classes in the training dataset by extracting all true
2 × |𝐺𝑠𝑟 ∩ 𝐺𝑔𝑡 |
| | (10)
𝐿𝐵𝑃 𝐺𝐵 =1− positive slices containing tumor targets and combining them
|𝐺 | + ||𝐺 || with an equal number of randomly selected true negative
| 𝑠𝑟 | | 𝑔𝑡 |
slices from the same cases[19, 38].
Where 𝜆 = 0.1 denote the weighting factor of boundary loss
𝐿𝐵𝑃 𝐺𝐵 , 𝛼 and (1−𝛼) denotes the weighting factor. 𝑆𝑖 and 𝐺𝑖 4.2. Preprocessing
denote the segmentation predicted result and ground truth at We preprocess the CT/PET scans before experiment.
pixel 𝑖, respectively. 𝑁 represents the total number of pixels, Firstly, we truncate the hounsfield units (HU) of the CT scans
𝐺𝑠𝑟 and 𝐺𝑔𝑡 denote the segmentation predicted result set and within the range of [-260, 260]. Next, we utilize bilinear
ground truth set, respectively. |⋅| denote the number of pixels interpolation to resample the CT scans into 256 × 256.
in the set. Subsequently, we use the standard uptake value (SUV) to

Ziwei Zou et al.: Preprint submitted to Elsevier Page 6 of 15


truncate the PET scans within the range of [0, 5]. Using 4.5. Ablation studies
the resampled CT scans as the spatial alignment reference, We perform separate ablation experiments on the STS
we employ bilinear interpolation to resample the PET scans and Hecktor 2022 datasets to assess the segmentation per-
into 256 × 256. Finally, both the CT and PET scans were formance of our DGCBG-Net.
normalized using the zero-mean normalization.
During the experiment, we transform the PET/CT vol- 4.5.1. Ablation studies on different components
umes into 2D images. To mitigate the limitation of the We perform separate ablation experiments to validate
STS dataset scale, we employ data augmentation techniques the contributions of each components in our DGCBG-Net.
to expand the training dataset, including random rotation, First, we construct a baseline model based on the UNet
horizontal flip, vertical flip, and contrast processing. architecture. The baseline model consists of a dual-branch
encoder and a single-branch decoder. To assess the effective-
4.3. Evaluation metrics ness of different components in DGCBG-Net, we conduct
Five evaluation metrics were employed to assess the experiments with seven distinct settings on two datasets.
segmentation performance of our DGCBG-Net. These met- Tables 1 and 2 present the quantitative results of abla-
rics include: dice coefficient (Dice), intersection over union tion experiments conducted on the STS and Hecktor 2022
(IOU), positive predictive value (PPV), recall and hausdorff datasets, respectively. Compare with the baseline network,
distance (HD). The calculation formulas of these evaluation GCIM increases the Dice score by 2.37%, IOU score by
indicators are as follows: 2.4%, and reduces the hausdorff distance by 7.01 mm for
2 × 𝑇𝑃 the STS dataset. Additionally, GCIM improves the Dice
𝐷ice = (11)
2𝑇 𝑃 + 𝐹 𝑃 + 𝐹 𝑁 score by 2.8%, IOU score by 3.51% for the Hecktor 2022
dataset, with a slight expand in the hausdorff distance by 0.32
𝑇𝑃 mm. The above results demonstrate that GCIM enhances the
𝐼𝑂𝑈 = (12)
𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁 model’s utilization of multi-modal complementary feature
information, resulting in a positive impact on the segmenta-
𝑇𝑃 tion performance.
𝑃𝑃𝑉 = (13) In comparison with the baseline model, SMDM im-
𝑇𝑃 + 𝐹𝑃
proves the Dice score by 1.98%, IOU score by 1.68%, and
𝑇𝑃 Recall score by 1.91% for the STS dataset. Furthermore,
𝑅𝑒𝑐𝑎𝑙𝑙 = (14) SMDM increases the Dice score by 1.21%, IOU score by
𝑇𝑃 + 𝐹𝑁
2.95%, and Recall score by 2.67% for the Hecktor 2022
𝐻𝐷 (𝐺, 𝑃 ) = max[max{min ‖𝑔 − 𝑝‖}, max{min ‖𝑝 − 𝑔‖}] (15) dataset. The above results illustrate that SMDM strength-
𝑔∈𝐺 𝑝∈𝑃 𝑝∈𝑃 𝑔∈𝐺 ens the model’s tumor segmentation capability by learning
Where 𝑇 𝑃 denote true positive, 𝐹 𝑃 denote false positive, complementary feature from PET/CT modalities within the
𝐹 𝑁 denote false negative, 𝐺 denote the set of ground same spatial distribution. When compared to the baseline
truth pixel, 𝑃 denote the set of predicted result pixel, and model, BPGB increases the Dice score by 2.91%, IOU score
‖⋅‖ denote the distance paradigm. Among these evaluation by 1.33%, and decreases the hausdorff distance by 11.03mm
metrics, the Dice index primarily reflects the overlap be- for the STS test dataset. Moreover, BPGB improves the
tween the predicted results and the ground truth, making Dice score by 2.01%, IOU score by 3.31% for the Hecktor
it the most important evaluation metric in medical image 2022 test dataset, and decreases the hausdorff distance by
segmentation tasks. Thus, to evaluate the effectiveness of our 0.9mm. These findings suggest that the different stage auxil-
DGCBG-Net, we conducted a significant T-test based on the iary boundary features provided by BPGB have reduced the
Dice index and reported the p-values between our proposed model’s uncertainty in boundary segmentation. This hypoth-
method and other methods. esis is supported by the results obtained from the boundary-
sensitive Hausdorff distance metric. Additionally, the results
4.4. Implementation details in tables 1 and 2 demonstrate that a notable enhancement
All models in the experiment are implemented using in the model’s segmentation performance when GCIM is
Python 3.8 and the PyTorch 2.0.0 framework. The training combined with SMDM and BPGB, respectively. For Dice
is conducted on a NVIDIA RTX 3090 GPU. The opti- scores, there is a statistically significant improvement be-
mizer used in the experiment is stochastic gradient descent tween setting 7 and setting 1-6, with a 𝑝-value less than 0.01.
with a momentum of 0.9. For hyperparameters, the learning This confirms the efficacy of our proposed modules.
rate is set to 0.0001, and the batch size is set to 16. The However, according to the results in tables 1 and 2, it
experiment employs 5-fold cross-validation for evaluation. is evident that there is a decline in the performance of the
During the training process, the early stop method is applied, PPV metric from settings 5 to 7. Likewise, a similar trend
where if the loss function of the verification set does not is observed in the Recall metric from settings 6 to 7. We
decrease for 10 consecutive epochs, the model training is speculate that possible reasons include: when the experiment
stopped. All code and models are available in our GitHub setting changes from 5 to 7, we introduced the BPGB branch,
repository. Please visit the following link for reference: which focuses on segmenting tumor boundary areas. At this
https://github.com/zzwzzzsuc/DGCBG-Net_code

Ziwei Zou et al.: Preprint submitted to Elsevier Page 7 of 15


Table 1
Ablation experiments on the STS dataset (Mean ± Standard Deviation). The 𝑝-value is derived by comparing the Dice scores of
other methods with our approach.
Methods Metrics
Settings Baseline GCIM SMDM BPGB Dice(%) IOU(%) PPV(%) Recall(%) HD(mm) 𝑝-value
1 74.52 ± 0.31 62.68 ± 0.28 72.82 ± 0.27 80.20 ± 0.32 26.66 ± 1.14 3.09e-4
2 76.89 ± 0.25 65.08 ± 0.24 73.33 ± 0.31 80.19 ± 0.26 19.65 ± 1.08 2.77e-3
3 76.50 ± 0.34 64.36 ± 0.32 78.73 ± 0.30 82.11 ± 0.28 21.18 ± 1.34 3.18e-4
4 77.43 ± 0.27 64.01 ± 0.19 77.25 ± 0.19 83.13 ± 0.30 15.63 ± 1.54 4.72e-6
5 79.25 ± 0.45 66.53 ± 0.26 82.17 ± 0.29 83.17 ± 0.42 18.17 ± 1.78 1.60e-5
6 78.75 ± 0.44 65.47 ± 0.32 80.59 ± 0.37 84.50 ± 0.26 19.48 ± 1.47 1.42e-6
7 80.33 ± 0.25 67.64 ± 0.37 81.14 ± 0.44 84.28 ± 0.31 12.02 ± 1.11 -

Table 2
Ablation experiments on the Hecktor 2022 dataset (Mean ± Standard Deviation). The 𝑝-value is derived by comparing the Dice
scores of other methods with our approach.
Methods Metrics
Settings Baseline GCIM SMDM BPGB Dice(%) IOU(%) PPV(%) Recall(%) HD(mm) 𝑝-value
1 74.26 ± 0.61 63.13 ± 0.71 85.97 ± 0.55 70.54 ± 0.64 11.45 ± 0.48 7.84e-5
2 77.06 ± 0.23 66.64 ± 0.36 87.29 ± 0.30 73.88 ± 0.38 11.78 ± 0.77 2.79e-3
3 75.47 ± 0.53 66.08 ± 0.43 87.33 ± 0.46 73.21 ± 0.11 14.98 ± 1.24 4.68e-4
4 76.27 ± 0.18 66.44 ± 0.22 88.38 ± 0.15 72.87 ± 0.27 10.55 ± 1.36 1.19e-4
5 78.45 ± 0.29 68.64 ± 0.32 88.86 ± 0.21 75.14 ± 0.33 14.77 ± 0.74 4.82e-6
6 78.14 ± 0.28 67.41 ± 0.37 85.48 ± 0.23 76.19 ± 0.32 8.46 ± 0.53 7.55e-6
7 79.29 ± 0.21 70.18 ± 0.22 88.16 ± 0.23 75.66 ± 0.39 8.72 ± 0.60 -

point, the model’s emphasis on dividing boundary areas as with BPGB may focus on some incorrect areas or fail to
completely as possible increases, leading to a greater focus fully cover all target areas (as indicated by the green and
on correctly detecting all positives, i.e., increasing attention orange circles). However, the baseline model combined with
to the Recall metric. At this time, the model will decrease BPGB or GCIM can accurately extract feature information
its focus on maximizing correctly identified positives (PPV from the tumor region, demonstrating that the proposed
metric) to some extent, resulting in a decrease in PPV SMDM module effectively mitigates the impact of mislead-
metric performance. Similarly, when the experiment setting ing features. Additionally, the feature heatmap results of
changes from 6 to 7, we introduced the SMDM module, the baseline model combined with GCIM are the closest
which focuses on learning complementary features of the to ground truth among all feature heatmaps, demonstrating
PET/CT modality. At this point, the model’s attention to the effective extraction of global features by GCIM and its
discriminative features increases, leading to a greater focus assistance in improving segmentation accuracy. Finally, the
on maximizing correctly identified positives, i.e., increas- boundary areas in the feature maps learned by the baseline
ing attention to the PPV metric. At this time, the model model with BPGB are closest to the ground truth, demon-
will decrease its focus on correctly detecting all positives strating that the boundary features extracted by BPGB can
(Recall metric) to some extent, resulting in a decrease in help the model reduce its uncertainty in tumor boundary
Recall metric performance. Additionally, the incorporation regions. In summary, the visualization results of the feature
of all three modules introduces some trade-offs, it achieves maps demonstrate the effectiveness of the proposed SMDM,
superior performance on most metrics (Dice metric, IOU BPGB, and GCIM modules.
metric, HD metric) and performs only slightly below optimal
on other metrics. 4.5.2. Ablation study on loss function
We visualized the learned feature maps of the three To analyze the impact of different loss functions on
proposed modules to provide a clearer explanation of their model segmentation performance, we conduct experiments
roles. We conducted experiments on the STS dataset and on various common loss functions using the Hecktor 2022
the Hecktor 2022 dataset, combining the baseline model dataset. The examined loss functions consist of BCE loss,
with SMDM, GCIM, and BPGB. We then visualized the Dice loss, Focal loss [39], and weight BCE+Dice loss used
feature maps extracted from the first skip connection in the in this study. The results are shown in table 3, the Dice
combined models, as illustrated in Figure 7. The first two loss demonstrates superior segmentation performance in the
columns show the visualization results of the feature maps single loss functions, surpassing other loss functions in all
for the Hecktor 2022 dataset, while the last two columns metrics. Our weight BCE+Dice loss combines the strengths
show the visualization results for the STS dataset. Figure of Dice loss and BCE loss by assigning different weights,
7 reveals that the baseline model and the baseline model resulting in a more competitive segmentation performance.

Ziwei Zou et al.: Preprint submitted to Elsevier Page 8 of 15


CT PET Ground Truth Baseline Baseline+SMDM Baseline+GCIM Baseline+BPGB

Figure 7: Comparison of spatial feature maps generated by GCIM, SMDM and BPGB modules at the first skip connection on
STS and hecktor 2022 datasets. From left to right, the first three columns represent the original CT image, PET image, and
Ground Truth. The subsequent columns show the spatial feature maps of different modules.

Table 3
Ablation experiments on the Hecktor 2022 dataset (Mean ± Standard Deviation). The 𝑝-value is derived by comparing the Dice
scores of other methods with our approach.
Losses Dice(%) IOU(%) PPV(%) Recall(%) HD(mm) 𝑝-value
BCE Loss 76.55 ± 0.15 63.38 ± 0.26 80.32 ± 0.37 71.15 ± 0.21 12.36 ± 0.79 3.65e-4
Focal Loss 77.68 ± 0.23 65.97 ± 0.48 85.36 ± 0.54 73.97 ± 0.32 13.24 ± 1.16 2.08e-4
Dice Loss 78.02 ± 0.34 71.02 ± 0.31 88.36 ± 0.27 74.68 ± 0.36 10.56 ± 1.32 6.16e-5
Weight BCE+Dice Loss
Dice=0.2 BCE=0.8 77.68 ± 0.22 65.75 ± 0.44 82.49 ± 0.42 73.35± 0.17 11.27± 1.12 5.98e-5
Dice=0.4 BCE=0.6 78.62 ± 0.34 68.82 ± 0.12 85.91 ± 0.53 74.64± 0.26 10.52± 1.39 7.74e-6
Dice=0.5 BCE=0.5 77.93 ± 0.35 68.37 ± 0.38 86.83 ± 0.24 74.01± 0.52 10.28± 0.95 4.48e-5
Dice=0.6 BCE=0.4 78.81 ± 0.28 69.48 ± 0.36 87.57 ± 0.32 74.98± 0.45 9.64± 0.74 6.44e-7
Dice=0.8 BCE=0.2 79.29 ± 0.21 70.18 ± 0.22 88.16 ± 0.23 75.66 ± 0.39 8.72 ± 0.60 -

It outperforms Dice Loss by 1.27% in Dice score, 0.98% in network (DiSegNet) proposed by [18], the multi-modal
Recall score, and reduced the HD distance by 1.84mm. spatial attention module method (MSAM) proposed by [16],
In addition, to analyze the optimal weight allocation ratio the collaborative learning feature fusion method (CoFeature-
of Dice loss and BCE loss, we conducted a series of exper- Model) proposed by [14], and the vision transformer based
iments and selected 5 representative experimental results, on UNet (TransUNet) proposed by [21].
as shown in the lower part of table 3. When the weight
allocation ratio of Dice loss and BCE loss is 0.8 to 0.2, all 4.6.1. Comparison in STS dataset
performance metrics reached their peak. For Dice scores, Table 4 shows the quantitative segmentation results of all
the loss function setting in this paper exhibits statistically methods on the STS dataset. Table 4 reveals that UNet++
significant differences compared to other settings (𝑝 ≤ 0.01). exhibits the lowest Dice and Recall scores, with values
Consequently, the optimal weight allocation ratio for the loss of 75.60% and 76.03%, respectively. DiSegNet shows the
function is set to 0.8 to 0.2. lowest IOU, PPV, and HD scores, with values of 62.23%,
71.80%, and 20.77mm, respectively. In contrast, DGCBG-
4.6. Comparisons with the existing methods Net outperforms in terms of Dice, PPV, Recall, and HD
We perform comparisons between DGCBG-Net and scores, achieving values of 80.33%, 81.14%, 84.28%, and
SOTA methods on both the STS and Hecktor 2022 datasets. 12.02 mm, respectively. Additionally, its IOU score of
To ensure a fair comparison, we apply consistent prepro- 67.64% is second only to TransUNet. For Dice scores,
cessing methods and experimental details to all methods. through significance analysis, the 𝑝-values are less than 0.01,
The compared methods include the single-modal UNet, the indicating statistical significance of the results.
dense UNet proposed by [40], the deep dilated segmentation

Ziwei Zou et al.: Preprint submitted to Elsevier Page 9 of 15


Table 4
Quantitative evaluation results with other methods on STS dataset (Mean ± Standard Deviation). The 𝑝-value is derived by
comparing the Dice scores of other methods with our approach.
Methods Dice(%) IOU(%) PPV(%) Recall(%) HD(mm) 𝑝-value
CT 69.51 ± 0.38 52.77 ± 0.32 64.26 ± 0.45 74.38 ± 0.30 34.05 ± 1.22 1.10e-3
PET 59.29 ± 0.44 49.37 ± 0.34 69.38 ± 0.41 62.98 ± 0.46 26.23 ± 1.49 5.12e-6
Baseline 74.52 ± 0.31 62.68 ± 0.28 72.82 ± 0.27 80.20 ± 0.32 26.66 ± 1.14 3.09e-4
Unet++([40] ) 75.60 ± 0.30 63.41 ± 0.42 74.71 ± 0.37 76.03 ± 0.36 18.00 ± 1.61 1.57e-4
DiSegNet([18]) 75.80 ± 0.32 62.23 ± 0.26 71.80 ± 0.35 77.44 ± 0.39 20.77 ± 1.76 5.49e-5
MSAM([16]) 78.22 ± 0.38 63.31 ± 0.47 72.69 ± 0.61 83.12 ± 0.37 17.05 ± 1.26 4.86e-6
CoFeatureModel([14]) 77.69 ± 0.44 62.75 ± 0.48 75.28 ± 0.50 79.01 ± 0.33 17.92 ± 1.81 7.74e-5
TransUNet([21]) 77.89 ± 0.35 68.06 ± 0.38 80.35 ± 0.35 81.05 ± 0.35 14.62 ± 0.93 4.46e-4
NNUNet([41]) 78.54 ± 0.25 67.68 ± 0.18 80.38 ± 0.24 82.29 ± 0.26 14.29 ± 0.84 5.79e-5
DGCBG-Net(ours) 80.33 ± 0.25 67.64 ± 0.37 81.14 ± 0.44 84.28 ± 0.31 12.02 ± 1.11 -

Figure 8: Qualitative segmentation results of four test cases on the STS dataset. From left to right, the first three columns
represent the original CT image, PET image, and Ground Truth. The subsequent columns show the segmentation results of
different segmentation models. The result images are cropped and resized for enhanced clarity. The red area represents the true
positive area, the green area represents the false positive area, and the purple area represents the false negative area.

Figure 8 illustrates visual segmentation results of the the lowest PPV score with 86.86%, TransUNet exhibits the
STS dataset. It is evident that the segmentation results pre- lowest HD score with 12.71mm. In contrast, DGCBG-Net
dicted by DGCBG-Net have the largest true positive regions excels in Dice, IOU, and HD scores, achieving the highest
and the smallest false negative regions. This notable im- values (Dice: 79.29%, IOU: 70.18%, HD: 8.72mm). It ranks
provement can be attributed to the enhanced cross-modal second only in terms of PPV and Recall scores (PPV with
interaction by the GCIM and SMDM. These modules al- 88.16%, Recall with 75.66%). Based on the Dice scores,
low for a more comprehensive utilization of multi-modal through significance analysis, the 𝑝-value being less than
feature representations. Additionally, our BPGB enhances 0.01 that indicates statistical significance of the results.
the model’s accuracy in segmenting tumor boundary re- Figure 9 illustrates the visual segmentation results of the
gions, significantly reducing false positive regions. Quan- Hecktor 2022 dataset. Notably, DGCBG-Net consistently
titative and qualitative results indicate that our DGCBG- exhibits the largest true positive regions, the smallest false
Net achieves superior segmentation results under the same positive regions, and the smallest false negative regions. Ad-
conditions. ditionally, in the bottom-most case in Figure 9, our method
correctly predicts small tumor regions surrounding the large
4.6.2. Comparison in Hecktor 2022 dataset tumor region (as indicated by the yellow circle). In contrast,
Table 5 presents the quantitative segmentation results of TransUnet predicts an incorrect location (as indicated by the
all methods on the Hecktor 2022 dataset. It is evident that red circle), and the other methods fail to predict this tumor
UNet++ demonstrates the lowest Dice score with 76.45%, region. This observation underscores the positive impact of
CoFeatureModel records the lowest IOU and Recall scores, the global contextual features extracted by GCIM on the
with values of 63.62% and 71.01%, respectively. MSAM has accuracy of small tumor region segmentation. Furthermore,

Ziwei Zou et al.: Preprint submitted to Elsevier Page 10 of 15


Table 5
Quantitative evaluation results with other methods on Hecktor 2022 dataset (Mean ± Standard Deviation). The 𝑝-value is
derived by comparing the Dice scores of other methods with our approach.
Methods Dice(%) IOU(%) PPV(%) Recall(%) HD(mm) 𝑝-value
CT 59.26 ± 0.62 46.20 ± 0.45 75.40 ± 0.53 54.47 ± 0.54 24.88 ± 2.23 6.68e-4
PET 71.81 ± 0.41 60.01 ± 0.47 85.50 ± 0.39 66.92 ± 0.56 13.02 ± 0.79 4.20e-3
Baseline 74.26 ± 0.61 63.13 ± 0.71 85.97 ± 0.55 70.54 ± 0.64 11.45 ± 0.48 7.84e-5
Unet++([40]) 76.45 ± 0.26 69.24 ± 0.27 87.00 ± 0.30 77.39 ± 0.34 11.86 ± 0.65 8.90e-6
DiSegNet([18]) 77.62 ± 0.21 66.97 ± 0.23 88.29 ± 0.21 73.59 ± 0.24 10.07 ± 0.84 1.63e-4
MSAM([16]) 78.21 ± 0.17 67.59 ± 0.19 86.86 ± 0.15 74.62 ± 0.23 9.38 ± 1.10 1.30e-5
CoFeatureModel([14]) 77.06 ± 0.26 63.62 ± 0.33 87.06 ± 0.17 71.01 ± 0.37 12.20 ± 1.27 5.90e-3
TransUNet([21]) 78.56 ± 0.13 68.09 ± 0.16 88.77 ± 0.18 74.62 ± 0.15 12.71 ± 1.16 5.11e-4
NNUNet([41]) 78.49 ± 0.18 69.27 ± 0.24 87.84 ± 0.21 75.48 ± 0.19 10.36 ± 0.69 4.42e-4
DGCBG-Net(ours) 79.29 ± 0.21 70.18 ± 0.22 88.16 ± 0.23 75.66 ± 0.39 8.72 ± 0.60 -

Figure 9: Qualitative segmentation results of four test cases in the hecktor 2022 dataset. From left to right, the first three
columns represent the original CT image, PET image, and Ground Truth. The subsequent columns show the segmentation results
of different segmentation models. The result images are cropped and resized for enhanced clarity. The red area represents the
true positive area, the green area represents the false positive area, and the purple area represents the false negative area.

the cross-modal information interaction effectively reduces NNUNet, which utilizes strong preprocessing and postpro-
the likelihood of mis-segmentation by the model. cessing strategies, employing the basic UNet architecture,
achieving high-precision medical image segmentation.
4.6.3. Comparison with SOTA methods Both of these works conducted extensive experiments on
To comprehensively analyze the differences between the STS dataset, achieving competitive segmentation per-
DGCBG-Net and SOTA methods, we select three recent formance. Unfortunately, the unavailability of open-source
works for a segmentation performance comparison. Luo code for these methods prevented us from replicating their
et al. [19] proposed C2BA-UNet, which comprises two experimental procedures. Therefore, we followed the exper-
crucial components: a multi-set boundary-aware module imental settings outlined in the two papers and retrained our
that effectively captures feature information in the tumor DGCBG-Net, with input image resolutions of 128 × 128
boundary region by combining gradient maps, uncertainty for C2BA-UNet and 256 × 256 for ASE-Net. We compared
maps, and horizontal maps. A context coordination module our results to those presented by C2BA-UNet and ASE-Net
that effectively integrates multi-scale information using at- across four performance metrics: Dice, IOU, Recall, and
tention mechanisms. Zhang et al. [20] proposed ASE-Net, PPV. As demonstrated in table 6, our method outperforms
which includes two key components: a pseudo-enhanced CT the SOTA methods in the majority of the metrics at both
image method based on metabolic intensity, enhancing the resolution settings. Specifically, at a resolution of 128 × 128,
positional distinctiveness of high and low metabolic areas. our method surpasses C2BA-UNet with a 1.62% higher Dice
An adaptive scale attention supervision module that learns score and a 0.82% higher IOU score. At a resolution of 256 ×
different-scale tumor features. Isensee et al. [41] proposed 256, our method outperforms ASE-Net with a 1.77% higher

Ziwei Zou et al.: Preprint submitted to Elsevier Page 11 of 15


Table 6
Quantitative evaluation results with SOTA methods on STS dataset (Mean ± Standard Deviation).
Methods In-plane resolution (pixel) Dice IOU Recall PPV
C2BA-UNet([19]) 128 × 128 78.1 64.9 83.7 -
DGCBG-Net(ours) 128 × 128 79.72 ± 0.30 65.72 ± 0.28 81.37 ± 0.42 80.37 ± 0.38
ASE-Net([20]) 256 × 256 78.56 65.52 82.69 77.79
DGCBG-Net(ours) 256 × 256 80.33 ± 0.25 67.64 ± 0.37 84.28 ± 0.31 81.14 ± 0.44
Note: ”-” denotes the results that were not reported

Dice score, a 2.12% higher IOU score, a 1.59% higher Recall DGCBG-Net does require more parameters and computa-
score, and a 3.35% higher PPV score. tional resource requirements than the minimal CoFeature-
We compared with NNUNet on two datasets separately, Model. DGCBG-Net has achieve an effective balance be-
with the quantitative results of the two datasets shown in tween segmentation performance and computational costs.
tables 4 and 5, and the qualitative results shown in Figures
8 and 9 respectively. It can be observed from tables 4 and 5. Discussion
5 that on the STS dataset, compared to NNUNet, DGCBG-
Net improved by 1.79% in Dice score, 1.99% in Recall score, Automatic tumor segmentation is a crucial step in com-
and reduced HD distance by 2.27mm, with IOU score being puter aided disease diagnosis and treatment planning. In
close to NNUNet. On the Hecktor 2022 dataset, DGCBG- this study, we propose a dual-branch network based on
Net improved by 0.8% in Dice score, 0.91% in IOU score, global cross-modal interaction and boundary guidance for
and reduced HD distance by 1.64mm. From Figures 8 and tumor segmentation (DGCBG-Net). DGCBG-Net includes
9, it can be seen that DGCBG-Net has larger true positive three crucial components: GCIM, SMDM, and BPGB. We
regions than NNUNet, as well as smaller false positive conduct extensive ablation and comparative experiments on
and false negative regions. Both quantitative and qualitative two publicly datasets. The experimental results presented in
results demonstrate that DGCBG-Net performs excellently tables 1 - 7 and Figures 7 - 10, demonstrate the effectiveness
in segmentation performance, surpassing SOTA methods. of the three key components and the superiority of DGCBG-
Net in automatic tumor segmentation. Additionally, we con-
4.6.4. Comparison of tumor boundary segmentation duct additional experiments to analyze the rationality of the
results proposed key modules.
To further analyze the segmentation performance of We design a set of comparative experiments for GCIM,
DGCBG-Net in the tumor boundary region, we conduct transforming the cross-modal interaction approach to an
an evaluation of the boundary segmentation results for all early fusion method, as shown in Figure 11. The experiment
methods on the STS and Hecktor 2022 datasets, respectively. results on the STS dataset are presented in table 8, when
The visualized boundary segmentation results are shown in employing the comparative architecture as the cross-modal
Figure 10. On the Hecktor 2022 dataset, all methods exhibit interaction approach, the Dice score, IOU score, and Recall
insufficient segmentation of the edges. Compared methods score all fall short of the Baseline results. For Dice scores,
exhibit inferior segmentation results in some smaller or GCIM shows statistically significant differences compared
irregularly shaped tumor regions. Conversely, our DGCBG- to other methods (𝑝 ≤ 0.01). We infer that the reason for
Net, benefiting from the multi-scale boundary feature infor- this is the feature concatenation along the channel dimen-
mation provided by BPGB, achieves superior segmentation sion, leading to the model receiving a significant amount
results in these regions. On the STS dataset, all methods of redundant features. These redundant features cause the
exhibit over-segmentation, particularly in regions with ir- model to allocate more attention to background pixel regions
regular tumor edges, where compared methods struggle to than foreground pixel regions, consequently impacting the
achieve accurate segmentation. Because of the boundary as- model’s segmentation performance. By contrast, GCIM en-
sisting information provided by BPGB, our approach aligns sures the comprehensive utilization of global feature repre-
more closely with the ground truth in these areas. sentations from different modalities through the incorpora-
tion of global filters and bilateral cross-modal feature inter-
4.6.5. Comparison of Model complexity action. The experimental results in table 8 further validate
Table 7 summarizes the parameter counts and compu- the effectiveness of GCIM.
tational resource requirements for all methods. It is evident We analyzed the impact of weight sharing in SMDM by
that DGCBG-Net involves fewer parameters and computa- introducing a multi-path downsampling module (MDM) as
tional resources when compared to the majority of meth- a comparative reference. The experimental results are shown
ods. In particular, when contrasted with TransUNet, an- in table 9, indicate that SMDM outperforms MDM by 0.59%
other model capable of extracting global contextual features, in Dice score, 3.31% in IOU score, 4.7% in PPV score,
DGCBG-Net has significantly fewer parameters. Although 1.22% in Recall score, 0.8514mm in HD score, and reduces
the parameter count by 1.04Mb. For Dice scores, SMDM
shows statistically significant differences compared to other

Ziwei Zou et al.: Preprint submitted to Elsevier Page 12 of 15


Figure 10: Segmentation results for tumor boundaries on STS and Hecktor 2022 datasets. The above two rows represent the
boundary segmentation results of the Hecktor 2022 dataset. The following two rows represent the boundary segmentation results
of the STS dataset. The red lines represent the ground truth, while the green lines represent the predicted results.

Table 7
Comparison of model sizes and calculations expressed in trems of Params and Flops

Unet++ DiSegNet MSAM CoFeatureModel TransUNet DGCBG-Net


Methods Baseline
[40] [18] [16] [14] [21] (ours)
Params(Mb) 17.63 47.19 101.81 62.08 10.67 66.81 27.70
FLOPs(Gbps) 24.06 200.04 207.64 109.32 24.22 42.53 40.00

methods (𝑝 ≤ 0.01). The experimental results suggest that a) and b) represent failure cases in the STS dataset, while c)
weight sharing has a positive impact on feature extraction and d) represent failure cases in the Hecktor 2022 dataset.
for multi-modal images. We infer that since PET scans and Such circumstances may affect physicians’ assessment and
CT scans are spatially aligned, which means that the feature diagnosis, thus impacting the precision of surgery and
representations of both scans can complement each other in radiotherapy. At this point, physicians need to integrate their
SMDM. Weight sharing leads to a reduction in false positive clinical experience with other imaging information (e.g.,
and false negative regions when delineating tumor areas. MRI) to make comprehensive judgments, thus reducing the
risks of misdiagnosis and mistreatment.
Although our DGCBG-Net has achieved satisfactory
segmentation results on the STS and Hecktor 2022 datasets,
our method still has some limitations. For instance, on the
STS dataset, DGCBG-Net obtains a relatively large Haus-
dorff distance score. We infer that this could be attributed to
the size limitations on STS dataset, the presence of variations
in tumor size and shape, and a relatively random spatial
distribution, all of which can affect the model’s generaliza-
tion ability. Given the limited generalization performance
Figure 11: Comparative architecture of GCIM. It utilizes early of our model, it is suitable for common tumor segmenta-
fusion as the cross-modal interaction approach. CT and PET tion tasks. For rare tumor types, doctors need to integrate
modality data are concatenated in a serial manner as input. other imaging information with clinical experience for com-
prehensive judgment. Furthermore, DGCBG-Net performs
tumor segmentation solely on 2D planes without considering
Despite the impressive performance achieved by DGCBG-
spatial structural information in the 3D dimension, which
Net, it still generates erroneous cases under certain con-
may impact doctors’ comprehension of the real tumor’s
ditions, as illustrated in Figure 12. When the similarity
three-dimensional structure.
between tumor and normal tissue is too high, and the
discrepancy between the tumor region displayed in the PET
image and the actual tumor region is too large, DGCBG-
Net will produce some segmentation failure cases. Figures

Ziwei Zou et al.: Preprint submitted to Elsevier Page 13 of 15


Table 8
Comparative experimental results of GCIM on STS dataset (Mean ± Standard Deviation). The 𝑝-value is derived by comparing
the Dice scores of other methods with our approach.
Methods Dice(%) IOU(%) PPV(%) Recall(%) HD(mm) 𝑝-value Params(Mb) FLOPs(Gbps)
Baseline 74.52 ± 0.31 62.68 ±0.28 72.82 ± 0.27 80.20 ± 0.32 26.66 ± 1.14 1.59e-4 17.63 24.06
GCIM 76.89 ± 0.25 65.08 ± 0.24 73.33 ± 0.31 80.19 ± 0.26 19.65 ± 1.08 - 18.77 30.16
Comparative architecture 73.08 ± 0.35 62.09 ± 0.20 74.60 ± 0.21 78.80 ± 0.33 23.57 ± 1.75 4.33e-5 25.06 36.22

Table 9
Comparative experimental results of SMDM on Hecktor 2022 dataset (Mean ± Standard Deviation). The 𝑝-value is derived by
comparing the Dice scores of other methods with our approach.
Methods Dice(%) IOU(%) PPV(%) Recall(%) HD(mm) 𝑝-value Params(Mb) FLOPs(Gbps)
Baseline 74.26 ± 0.61 63.13 ± 0.71 85.97 ± 0.55 70.54 ± 0.64 11.45 ± 0.48 1.66e-3 17.63 24.06
SMDM 75.47 ± 0.53 66.08 ± 0.43 87.33 ± 0.46 73.21 ± 0.11 14.98 ± 1.24 - 19.82 25.69
MDM 74.88 ± 0.62 62.77 ± 0.37 82.63 ± 0.58 71.99 ± 0.30 15.84 ± 1.41 5.12e-4 20.86 25.69

6. Conclusion References
In this paper, we propose DGCBG-Net, a dual-branch [1] Rebecca L Siegel, Kimberly D Miller, Nikita Sandeep Wagle, and
Ahmedin Jemal. Cancer statistics, 2023. Ca Cancer J Clin, 73(1):17–
tumor segmentation network based on global cross-modal
48, 2023.
interaction and boundary guidance for PET/CT image seg- [2] Peter C Nowell. Tumor progression: a brief historical perspective.
mentation. DGCBG-Net comprises three essential compo- In Seminars in cancer biology, volume 12, pages 261–266. Elsevier,
nents: To extract global contextual features from PET and 2002.
CT modalities and facilitate bilateral cross-modal interaction [3] Luc Soler, Herve Delingette, Grégoire Malandain, Johan Montag-
nat, Nicholas Ayache, Christophe Koehl, Olivier Dourthe, Benoit
of global feature, we propose GCIM. To assist the encoder in
Malassagne, Michelle Smith, Didier Mutter, et al. Fully automatic
enhancing the effectiveness of extracting cross-modal com- anatomical, pathological, and functional segmentation from ct scans
plementary features and reducing feature loss during down- for hepatic surgery. Computer Aided Surgery, 6(3):131–142, 2001.
sampling, we propose SMDM. Finally, BPGB is proposed [4] Inês Domingues, Gisèle Pereira, Pedro Martins, Hugo Duarte, João
to extract multi-stage potential boundary features, assisting Santos, and Pedro Henriques Abreu. Using deep learning techniques
the SSB in tumor boundary area segmentation. Extensive in medical imaging: a systematic review of applications on ct and pet.
Artificial Intelligence Review, 53:4093–4160, 2020.
experiments conducted on two datasets have demonstrated [5] Anders Eklund, Paul Dufort, Daniel Forsberg, and Stephen M La-
the effectiveness of the three proposed key modules. On the Conte. Medical image processing on the gpu–past, present and future.
STS dataset, DGCBG-Net achieved a Dice score of 80.33%, Medical image analysis, 17(8):1073–1094, 2013.
an IOU score of 67.64%, and an HD distance of 12.02mm. [6] Neeraj Sharma and Lalit M Aggarwal. Automated medical image
On the Hecktor 2022 dataset, DGCBG-Net achieved a Dice segmentation techniques. Journal of medical physics/Association of
Medical Physicists of India, 35(1):3, 2010.
score of 79.29%, an IOU score of 70.18%, and an HD [7] Jeya Maria Jose Valanarasu, Poojan Oza, Ilker Hacihaliloglu, and
distance of 8.72mm. Experimental results on both datasets Vishal M Patel. Medical transformer: Gated axial-attention for medi-
have demonstrated that DGCBG-Net outperforms existing cal image segmentation. In Medical Image Computing and Computer
methods and is competitive with the SOTA approaches. Assisted Intervention–MICCAI 2021: 24th International Conference,
Strasbourg, France, September 27–October 1, 2021, Proceedings,
Part I 24, pages 36–46. Springer, 2021.
CRediT authorship contribution statement [8] Yu Gu, Jingqian Chi, Jiaqi Liu, Lidong Yang, Baohua Zhang, Dahua
Yu, Ying Zhao, and Xiaoqi Lu. A survey of computer-aided diagnosis
Ziwei Zou: Writing - Original draft preparation, Soft- of lung nodules from ct scans using deep learning. Computers in
ware. Beiji Zou: Conceptualization of this study. Xiaoyan biology and medicine, 137:104806, 2021.
Kui: Methodology, Writing - review & editing. Zhi Chen: [9] Changjian Sun, Shuxu Guo, Huimao Zhang, Jing Li, Meimei Chen,
Shuzhi Ma, Lanyi Jin, Xiaoming Liu, Xueyan Li, and Xiaohua Qian.
Data curation. Yang Li: Supervision.

a) b) c) d)

Figure 12: Illustration of failure cases with PET image as background. The red area represents the true positive area, the green
area represents the false positive area, and the purple area represents the false negative area.

Ziwei Zou et al.: Preprint submitted to Elsevier Page 14 of 15


Automatic segmentation of liver tumors from multiphase contrast- on Medical Image Computing and Computer-Assisted Intervention,
enhanced ct images based on fcns. Artificial intelligence in medicine, pages 512–522. Springer, 2020.
83:58–66, 2017. [27] Yuexing Han, Xiaolong Li, Bing Wang, and Lu Wang. Boundary
[10] Xiaomeng Li, Hao Chen, Xiaojuan Qi, Qi Dou, Chi-Wing Fu, and loss-based 2.5 d fully convolutional neural networks approach for
Pheng-Ann Heng. H-denseunet: hybrid densely connected unet for segmentation: a case study of the liver and tumor on computed
liver and tumor segmentation from ct volumes. IEEE transactions on tomography. Algorithms, 14(5):144, 2021.
medical imaging, 37(12):2663–2674, 2018. [28] Youbao Tang, Jinzheng Cai, Ke Yan, Lingyun Huang, Guotong Xie,
[11] Jianpeng Zhang, Yutong Xie, Pingping Zhang, Hao Chen, Yong Xia, Jing Xiao, Jingjing Lu, Gigin Lin, and Le Lu. Weakly-supervised uni-
and Chunhua Shen. Light-weight hybrid convolutional network for versal lesion segmentation with regional level set loss. In Medical Im-
liver tumor segmentation. In IJCAI, volume 19, pages 4271–4277, age Computing and Computer Assisted Intervention–MICCAI 2021:
2019. 24th International Conference, Strasbourg, France, September 27–
[12] Xiangming Zhao, Laquan Li, Wei Lu, and Shan Tan. Tumor co- October 1, 2021, Proceedings, Part II 24, pages 515–525. Springer,
segmentation in pet/ct using multi-modality fully convolutional neural 2021.
network. Physics in Medicine & Biology, 64(1):015011, 2018. [29] Zeyu Ren, Xiangyu Kong, Yudong Zhang, and Shuihua Wang. Ukssl:
[13] Zisha Zhong, Yusung Kim, Kristin Plichta, Bryan G Allen, Leixin Underlying knowledge based semi-supervised learning for medical
Zhou, John Buatti, and Xiaodong Wu. Simultaneous cosegmentation image classification. IEEE Open Journal of Engineering in Medicine
of tumors in pet-ct images using deep fully convolutional networks. and Biology, 2023.
Medical physics, 46(2):619–633, 2019. [30] Lei Bi, Michael Fulham, Nan Li, Qiufang Liu, Shaoli Song, David Da-
[14] Ashnil Kumar, Michael Fulham, Dagan Feng, and Jinman Kim. Co- gan Feng, and Jinman Kim. Recurrent feature fusion learning for
learning feature fusion maps from pet-ct images of lung cancer. IEEE multi-modality pet-ct tumor segmentation. Computer Methods and
Transactions on Medical Imaging, 39(1):204–217, 2019. Programs in Biomedicine, 203:106043, 2021.
[15] Zhengyong Huang, Sijuan Zou, Guoshuai Wang, Zixiang Chen, Hao [31] Yudong Zhang, Lijia Deng, Hengde Zhu, Wei Wang, Zeyu Ren,
Shen, Haiyan Wang, Na Zhang, Lu Zhang, Fan Yang, Haining Wang, Qinghua Zhou, Siyuan Lu, Shiting Sun, Ziquan Zhu, Juan Manuel
et al. Isa-net: Improved spatial attention network for pet-ct tumor Gorriz, et al. Deep learning in food category recognition. Information
segmentation. Computer Methods and Programs in Biomedicine, Fusion, page 101859, 2023.
226:107129, 2022. [32] Zeyu Ren, Shuihua Wang, and Yudong Zhang. Weakly supervised
[16] Xiaohang Fu, Lei Bi, Ashnil Kumar, Michael Fulham, and Jinman machine learning. CAAI Transactions on Intelligence Technology,
Kim. Multimodal spatial attention module for targeting multimodal 2023.
pet-ct lung tumor segmentation. IEEE Journal of Biomedical and [33] Chang Qiao, Di Li, Yuting Guo, Chong Liu, Tao Jiang, Qionghai Dai,
Health Informatics, 25(9):3507–3516, 2021. and Dong Li. Evaluation and development of deep neural networks
[17] Tongxue Zhou, Su Ruan, Pierre Vera, and Stéphane Canu. A tri- for image super-resolution in optical microscopy. Nature Methods,
attention fusion guided multi-modal segmentation network. Pattern 18(2):194–202, 2021.
Recognition, 124:108417, 2022. [34] J. Shore and R. Johnson. Axiomatic derivation of the principle of
[18] Guoping Xu, Hanqiang Cao, Jayaram K Udupa, Yubing Tong, and maximum entropy and the principle of minimum cross-entropy. IEEE
Drew A Torigian. Disegnet: A deep dilated convolutional encoder- Transactions on Information Theory, 26(1):26–37, 1980.
decoder architecture for lymph node segmentation on pet/ct images. [35] Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. V-net:
Computerized Medical Imaging and Graphics, 88:101851, 2021. Fully convolutional neural networks for volumetric medical image
[19] Shijie Luo, Huiyan Jiang, and Meng Wang. C2ba-unet: A context- segmentation. In 2016 Fourth International Conference on 3D Vision
coordination multi-atlas boundary-aware unet-like method for pet/ct (3DV), pages 565–571, 2016.
images based tumor segmentation. Computerized Medical Imaging [36] Martin Vallières, Carolyn R Freeman, Sonia R Skamene, and Issam
and Graphics, 103:102159, 2023. El Naqa. A radiomics model from joint fdg-pet and mri texture
[20] Junzhi Zhang, Huiyan Jiang, and Tianyu Shi. Ase-net: A tumor seg- features for the prediction of lung metastases in soft-tissue sarcomas
mentation method based on image pseudo enhancement and adaptive- of the extremities. Physics in Medicine & Biology, 60(14):5471, 2015.
scale attention supervision module. Computers in Biology and [37] Valentin Oreiller, Vincent Andrearczyk, Mario Jreige, Sarah Bough-
Medicine, 152:106363, 2023. dad, Hesham Elhalawani, Joel Castelli, Martin Vallieres, Simeng Zhu,
[21] Jieneng Chen, Yongyi Lu, Qihang Yu, Xiangde Luo, Ehsan Adeli, Juanying Xie, Ying Peng, et al. Head and neck tumor segmentation
Yan Wang, Le Lu, Alan L Yuille, and Yuyin Zhou. Transunet: in pet/ct: the hecktor challenge. Medical image analysis, 77:102336,
Transformers make strong encoders for medical image segmentation. 2022.
arXiv preprint arXiv:2102.04306, 2021. [38] Vajira Thambawita, Andrea M Storås, Steven A Hicks, Pål Halvorsen,
[22] Yongming Rao, Wenliang Zhao, Zheng Zhu, Jie Zhou, and Jiwen Lu. and Michael A Riegler. Mlc at hecktor 2022: The effect and impor-
Gfnet: Global filter networks for visual recognition. IEEE Trans- tance of training data when analyzing cases of head and neck tumors
actions on Pattern Analysis and Machine Intelligence, 45(9):10960– using machine learning. In 3D Head and Neck Tumor Segmentation
10973, 2023. in PET/CT Challenge, pages 166–177. Springer, 2022.
[23] Tianheng Cheng, Xinggang Wang, Lichao Huang, and Wenyu Liu. [39] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr
Boundary-preserving mask r-cnn. In Computer Vision–ECCV 2020: Dollar. Focal loss for dense object detection. In Proceedings of
16th European Conference, Glasgow, UK, August 23–28, 2020, Pro- the IEEE International Conference on Computer Vision (ICCV), Oct
ceedings, Part XIV 16, pages 660–676. Springer, 2020. 2017.
[24] David Acuna, Amlan Kar, and Sanja Fidler. Devil is in the edges: [40] Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh,
Learning semantic boundaries from noisy annotations. In Proceed- and Jianming Liang. Unet++: A nested u-net architecture for medical
ings of the IEEE/CVF conference on computer vision and pattern image segmentation. In Deep Learning in Medical Image Analysis
recognition, pages 11075–11083, 2019. and Multimodal Learning for Clinical Decision Support: 4th Interna-
[25] David Acuna, Amlan Kar, and Sanja Fidler. Devil is in the edges: tional Workshop, DLMIA 2018, and 8th International Workshop, ML-
Learning semantic boundaries from noisy annotations. In Proceed- CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain,
ings of the IEEE/CVF conference on computer vision and pattern September 20, 2018, Proceedings 4, pages 3–11. Springer, 2018.
recognition, pages 11075–11083, 2019. [41] Fabian Isensee, Paul F Jaeger, Simon AA Kohl, Jens Petersen, and
[26] Youbao Tang, Yuxing Tang, Yingying Zhu, Jing Xiao, and Ronald M Klaus H Maier-Hein. nnu-net: a self-configuring method for deep
Summers. E 2 net: an edge enhanced network for accurate liver learning-based biomedical image segmentation. Nature methods,
and tumor segmentation on ct scans. In International Conference 18(2):203–211, 2021.

Ziwei Zou et al.: Preprint submitted to Elsevier Page 15 of 15


Declaration of interests

√ The authors declare that they have no known competing financial


interests or personal relationships that could have appeared to influence the
work reported in this paper.

☐ The authors declare the following financial interests/personal


relationships which may be considered as potential competing interests:

You might also like