0% found this document useful (0 votes)
70 views48 pages

1 s2.0 S0925231225017850 Main

The document presents PONet, a novel prototype optimization network designed for few-shot medical image segmentation, addressing challenges such as inter-class and intra-class inconsistencies. It introduces two key modules: a boundary prototype contrastive learning module to enhance support prototypes and a query-guided prototype optimization module for better alignment with query images. Experimental results demonstrate that PONet achieves state-of-the-art performance on various medical datasets, including Abd-MRI, Abd-CT, and Card-MRI.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views48 pages

1 s2.0 S0925231225017850 Main

The document presents PONet, a novel prototype optimization network designed for few-shot medical image segmentation, addressing challenges such as inter-class and intra-class inconsistencies. It introduces two key modules: a boundary prototype contrastive learning module to enhance support prototypes and a query-guided prototype optimization module for better alignment with query images. Experimental results demonstrate that PONet achieves state-of-the-art performance on various medical datasets, including Abd-MRI, Abd-CT, and Card-MRI.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Journal Pre-proof

PONet: Prototype Optimization Network for


Few-shot Medical Image Segmentation

Wang Siqi, Yu Xiaosheng, Chi Jianning,


Wu Chengdong, Gao Xiujing

PII: S0925-2312(25)01785-0
DOI: https://doi.org/10.1016/j.
neucom.2025.131113
Reference: NEUCOM 131113

To appear in: Neurocomputing

Received Date: 23 May 2025


Revised Date: 13 July 2025
Accepted Date: 23 July 2025

Please cite this article as: Siqi W, Xiaosheng Y, Jianning C, Chengdong W,


Xiujing G, PONet: Prototype Optimization Network for Few-shot Medical
Image Segmentation, Neurocomputing (2025),
doi: https://doi.org/10.1016/j.neucom.2025.131113.

This is a PDF file of an unedited manuscript that has been accepted for
publication. As a service to our customers we are providing this early version
of the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting proof before it is published in its final form. Please note
that during the production process errors may be discovered which could
affect the content, and all legal disclaimers that apply to the journal pertain.

© 2025 Elsevier B.V. All rights are reserved, including those for text and data
mining, AI training, and similar technologies.
Highlights
PONet: Prototype Optimization Network for Few-shot Medical
Image Segmentation
Wang Siqi, Yu Xiaosheng, Chi Jianning, Wu Chengdong, Gao Xiujing

• Proposed PONet, a novel FSS method for medical image segmentation


with limited data.

• Introduced boundary prototype contrastive learning module to address


inter-class inconsistency.

• Introduced query-guided prototype optimization module to optimize


support prototypes for better query relevance.

• Achieved SOTA segmentation performance on medical datasets Abd-


MRI, Abd-CT, and Card-MRI.
PONet: Prototype Optimization Network for Few-shot
Medical Image Segmentation
Wang Siqia , Yu Xiaoshenga , Chi Jianninga,b , Wu Chengdonga,∗, Gao
Xiujingc
a
The School of Robot Science and Engineering, Northeastern University, Chuanxin
Road, Shen Yang, 110170, Liao Ning, China
b
Key Laboratory of Intelligent Computing in Medical Image of Ministry of
Education, Chuanxin Road, Shen Yang, 110170, Liao Ning, China
c
The School of Smart Marine Science and Technology, Fujian University of
Technology, Xuefu Road, Fu Zhou, 350118, Fu Jian, China

Abstract
Although deep learning algorithms have achieved remarkable success in med-
ical image segmentation, their reliance on extensive manually annotated
datasets restricts the generalization to scenarios with limited training data.
Few-shot segmentation has emerged as a promising solution to this problem
by enabling effective segmentation with limited data. However, when applied
directly to medical images, there are often two primary challenges: inter-class
and intra-class inconsistency. First, inter-class inconsistency refers to the
high similarity between different pathological states or tissue structures in
pathological images, which negatively impacts the quality of support proto-
types. Second, intra-class inconsistency stems from the varying appearances
of the same organ across different samples, thereby disrupting the support-
query alignment process. To counter these challenges, we propose a proto-
type optimization network to achieve accurate few-shot medical image seg-
mentation. Specifically, we first employ the simple linear iterative clustering
(SLIC) method to generate multiple foreground and background sub-regions
within the support image. Subsequently, we introduce a boundary proto-


Corresponding author.
Email addresses: [email protected] (Wang Siqi),
[email protected] (Yu Xiaosheng), [email protected] (Chi
Jianning), [email protected] (Wu Chengdong), [email protected]
(Gao Xiujing)

Preprint submitted to Neurocomputing July 23, 2025


type contrastive learning (BPCL) module that utilizes contrastive learning
to self-optimize the support prototypes, enhancing the quality of the support
prototypes. This reduces the risk of misclassifying the query background
as foreground during the support-query alignment process, thereby miti-
gating inter-class inconsistency. Furthermore, we propose the query-guided
prototype optimization (QGPO) module, which introduces a query-guided
optimization strategy to generate support prototypes tailored to the current
query image. This enhances the connection between support features and the
query image, thereby alleviating intra-class inconsistency. Comprehensive
experiments are conducted on three medical datasets. Our method achieves
state-of-the-art segmentation performance compared to other medical FSS
methods. Code is available at: https://github.com/WANGSIQII/FSS.git
Keywords:
Few-shot learning, Medical image segmentation, Prototype learning,
Contrastive learning

and uncomment

1. Introduction
Medical image segmentation aims to accurately identify and depict anatom-
ical structures or pathological regions in various medical images, including
ultrasound, computed tomography (CT), and magnetic resonance imaging
(MRI) scans [1]. The target segmentation results can be widely used in
clinical processes such as disease diagnosis [2], treatment planning [3], and
computer-assisted intervention [4]. In recent years, deep learning algorithms
based on the convolutional neural network (CNN) [5, 6], Transformer [7, 8],
and Mamba [9, 10], have made significant progress in medical image process-
ing, thereby effectively reducing the workload of healthcare professionals. In
addition, the Segment Anything Model (SAM) [11] has gained considerable
attention due to its outstanding performance in conventional image segmen-
tation tasks [12]. However, when applied to medical image segmentation, its
effectiveness is diminished due to the model’s lack of training on medical-
specific data. To address this limitation, Ma et al. presented MedSAM
[13], a foundational model trained on a large-scale annotated medical image
dataset, designed to enable universal medical image segmentation. Zhou et
al. introduced MIT-SAM [14], a novel model for text-assisted medical im-
age segmentation, which incorporates a SAM-enhanced image encoder and

2
a Bert-based text encoder. Despite advancements, these methods continue
to face significant challenges. For instance, their training processes typically
require large amounts of annotated data, which is challenging to obtain due
to high costs, variations in imaging equipment, and privacy concerns. These
factors hinder the practical application of these deep learning approaches.
Furthermore, due to the scarcity of abnormal organs and rare lesion samples,
models trained on such medical datasets tend to recognize only the anatom-
ical structures they have seen, struggling to generalize to unseen semantic
categories.
To tackle the above challenges, few-shot segmentation (FSS) has emerged
as a promising solution that enables deep learning algorithms to have supe-
rior performance with limited data [15]. Specifically, FSS methods enable the
segmentation of unseen anatomical structures with few labeled data. This
capability is particularly valuable in clinical contexts, such as diagnosing rare
or complex diseases. Specifically, FSS methods initially extract representa-
tive semantic information for a specific class from a limited set of labeled
images (i.e., the support set), then use the learned knowledge to guide the
segmentation of unlabeled images (i.e., the query set).
The development of FSS has derived numerous effective methods for
medical image segmentation, typically categorized into two-branch interac-
tive methods [16, 17, 18] and prototypical methods [19, 20]. Interaction-
based approaches often integrate techniques such as attention mechanisms
and contrastive learning to facilitate interactions between support and query
branches, enabling the propagation of learned knowledge from annotated sup-
port images to unannotated query images. In contrast, prototypical methods
are more prevalent in medical image segmentation. Its main workflow can be
summarized as mining foreground features from support images to generate
prototypes, and then using the prototypes for feature matching to segment
the target region in the query image, as depicted in Figure 1(a).
Although prototype-based FSS methods have made progress in medical
image segmentation, there remains room for improvement. We comprehen-
sively analyze the factors that lead to incorrect prediction masks, summarized
in two aspects: (1) Inter-class inconsistency leads to the generation of poor
support prototypes. As illustrated in Figure 2(a), inter-class inconsistency
occurs when different pathological states or tissue structures within the same
image are highly similar, resulting in the generation of low-quality support
prototypes. These inferior prototypes may fail to regenerate their corre-
sponding support masks, thereby hindering the accurate capture of specific

3
Figure 1: Workflow for generating foreground and background prototypes.

Figure 2: Visual segmentation results of the prototype-based method [19] under inter-class
and intra-class inconsistency: (a) The first column represents the inter-class inconsistency
that results in poor quality of the support prototype, which may even fail to restore
the segmentation mask of the supporting image. The second column illustrates cases of
query mask segmentation failure caused by poor support prototypes. (b) Query image
segmentation results guided by different support images.

features during the guided query image segmentation process and negatively
impacting segmentation accuracy. (2) Intra-class inconsistency leads to poor
generalization ability in the support-query guided process. Intra-class in-
consistency refers to the variation among samples within the same category.

4
Figure 3: Workflow for generating foreground and background prototypes.

The selection of support and query images in training is random, so they in-
evitably have differences in appearance, such as organ shape, size, and gray
level. Common prototype methods are mainly based on support features
to generate support prototypes and design a series of modules to enhance
the expression ability of support prototypes, which reduces the difference
between support and query. We consider this strategy to be suboptimal, as
it overlooks the need for customization specific to the query samples. As
shown in Figure 2(b), there is a noticeable variation in the query segmen-
tation quality when guided by prototypes generated from different support
images.
To mitigate the above issues, we propose the prototype optimization net-
work (PONet), which is composed of the boundary prototype contrastive
learning (BPCL) module and query guidance prototype optimization (QGPO)
module for accurate few-shot medical image segmentation. As is shown in
Figure 3, we first use the simple linear iterative clustering (SLIC) method
[21] to over-segment the support image, and combine the mask to generate
three independent sets of region-level prototypes from the support feature:
foreground prototypes, adjacent-boundary background prototypes, and non-
adjacent-boundary background prototypes. Instead of using a single sup-
port prototype, we guide segmentation by optimizing multiple support fore-
ground and background prototypes. Then, to enhance the quality of the
support prototype, we design the BPCL module to emphasize the impor-
tance of adjacent-boundary background prototypes, as these often exhibit
features similar to the foreground. Specifically, the BPCL module primar-
ily optimizes the background prototypes by introducing contrastive learning
to increase the distance between adjacent-boundary background prototypes

5
and foreground prototypes. This optimization of the background prototype
helps alleviate inter-class inconsistency during the subsequent guided query
segmentation process. To mitigate intra-class inconsistency, we introduce the
QGPO module, which designs a query-guided support prototype optimiza-
tion strategy that generates customized support prototypes based on the
current query image. Notably, the QGPO module incorporates the Mamba
model [22], which captures long-range dependencies while maintaining linear
complexity, to enhance the interaction between support and query, refining
the support prototypes.
In short, the main contributions of our work are:
1) We propose a new FSS method, PONet, to solve the data scarcity
problem in medical segmentation applications.
2) We propose the BPCL module, which introduces contrastive learning
to enhance the quality of supported prototypes, thereby addressing inter-class
inconsistency.
3) We propose the QGPO module, which introduces the query-guided
optimization strategy to enhance the connection between support features
and the query image, thereby alleviating intra-class inconsistency.
4) Extensive experiments on Abdominal MRI (Abd-MRI), Abdominal
CT (Abd-CT), and Cardiac MRI (Car-MRI) datasets demonstrate that our
PONet achieves state-of-the-art (SOTA) segmentation performance.

2. Related Works
2.1. Supervised Medical Image Segmentation
Medical image segmentation plays a vital role in many clinical studies and
practical applications by automatically identifying and depicting regions of
interest. In recent years, convolutional neural networks (CNN) have shown
remarkable performance in medical image segmentation, successfully solving
a variety of segmentation tasks such as tissue structure, lesion region, and or-
gan delineation. As one of the most prominent CNN architectures, U-Net [23]
employed an encoder-decoder structure and incorporates skip connections to
map the features from the encoder stage to the decoder stage, facilitating the
interaction between texture and semantic features, which proves effective in
various medical segmentation tasks. Subsequently, various U-Net-based vari-
ants have been proposed, including U-Net++ [24], nnU-Net [25], and atten-
tion U-Net [26], each with representative contributions. Additionally, several
supervised medical segmentation methods based on Transformer [7, 8] and

6
Mamba [9, 10] architectures have also shown promising performance. For in-
stance, Patil et al. [7] proposed the permutation invariant multi-headed self-
attention module integrated into a U-shaped transformer architecture, which
enhances segmentation performance by improving the robustness across dif-
ferent spatial locations in medical images. Gu et al. [8] proposed RAMIS, a
novel hybrid architecture combining CNN and Vision Transformer for med-
ical image segmentation. Cheng et al. [9] proposed Mamba-Sea, a novel
Mamba-based framework for domain generalization in medical image seg-
mentation, which incorporates global-to-local sequence augmentation to im-
prove performance. While these medical image segmentation methods have
achieved certain success, their efficacy often depends on the availability of
widely annotated datasets, which poses a potential limitation to their appli-
cation in real-world medical scenarios.

2.2. Few-Shot Segmentation


Few-shot segmentation (FSS) is a novel approach for solving the anno-
tation sparsity problem in semantic segmentation tasks, which leverages a
limited number of annotated examples to predict the labels of unseen data.
Shaban et al. [27] pioneered the FSS research area by introducing a dual-
branch network architecture for semantic segmentation. Building upon this
foundation, Dong et al. [28] introduced the concept of prototypes, utilizing
prototype vectors to generate support image representations and compare
these with query features to generate segmentation results. Later, Zhang et
al. [29] proposed the mask average pooling (MAP) method to obtain certain
foreground class prototypes in the support set, which has been widely used
to measure the similarity with query features. Early FSS methods tended
to focus on capturing global image features, overlooking the positive impact
of local features on prediction outcomes. Recognizing this limitation, re-
searchers have developed a series of methods, such as PMM[30], PPNet[31],
and ASGNet[32], which generate multiple local prototypes to represent fore-
ground target features through expectation maximization, K-means cluster-
ing, and superpixel-guided methods, respectively. Recently, Yang et al. [33]
proposed the MIANet, which incorporates semantic word embeddings and
instance information to facilitate accurate segmentation. Zhu et al. [34]
attempted to leverage large language models (LLMs) in few-shot segmen-
tation, achieving excellent results on multiple datasets by enabling LLMs
to produce segmentation outputs and utilizing multimodal guidance and
curriculum learning for improved performance. Yang et al. [35] proposed

7
MASNet, a multi-scale and attention-based few-shot semantic segmentation
network that enhances feature representation through a multi-scale feature
enhancement module and channel attention, achieving improved accuracy in
segmentation tasks. Wang et al. [36] proposed ESSNet, an Embedded-Self-
Supplementing Network that combines semantic word embedding and query
set self-supplementing information to address inter-class inconsistency and
information loss in FSS tasks. Wu et al. [37] proposed DefectSAM, which
fine-tunes SAM by incorporating few-shot learning and low-rank adaptation
to achieve effective industrial defect segmentation with limited training data.
Wang et al. [38] proposed LPFS, a meta-learning-based few-shot segmenta-
tion method that leverages a learnable prototype module and global-attention
correlation map to effectively adapt models to unseen geographic categories
with minimal support examples.
Although FSS techniques have made significant strides in natural image
segmentation, their direct application to medical images presents several chal-
lenges. For instance, medical images often have limited grayscale variation
and unclear boundaries, making segmentation more difficult. Furthermore,
ethical concerns related to personal privacy further restrict the availability
of labeled data, thereby limiting the effectiveness of FSS methods. These
challenges underscore the necessity of developing specialized FSS techniques
for medical tasks.

2.3. Few-Shot Medical Image Segmentation


Given the greater challenges in label acquisition for medical images com-
pared to natural images, few-shot medical image segmentation has been pro-
posed as an extension of FSS tailored to medical contexts [39]. As previ-
ously mentioned, medical FSS methods are primarily categorized into two
main research directions: those based on two-branch interactive structure
[16, 17, 18, 40, 41] and those based on prototypical network structures [19,
20, 42, 43, 44, 45, 46, 47, 48]. Compared to natural images, medical im-
ages have unique characteristics that require the development of specialized
algorithms for processing. For instance, medical images exhibit boundary
blurring, uneven gray-level distribution, and inter-individual variability, pos-
ing significant challenges during the support-guided query segmentation pro-
cess. Methods based on dual-branch interaction structures focus on estab-
lishing innovative connections and interactions between support and query
images using approaches such as attention mechanisms, contrastive learning,
or cross-feature guidance. For instance, Lin et al. [18] proposed a new FSS

8
framework based on cross attention transformer, termed CAT, which mines
correlations between support and query images, and eliminates useless pixel
information to enhance query feature purity. Gong et al. [41] proposed a
CGNet model that incorporates a cross feature module (CFM) to enhance
lesion detail understanding by facilitating interaction between query and sup-
port sets, and a support guide query (SGQ) module to refine segmentation by
integrating features at different scales to enhance segmentation performance
for intracranial hemorrhage. Although interactive methods have shown some
progress, they still face significant challenges. Attention-based models typ-
ically exhibit high computational complexity and require a larger number
of labeled images for training, which may lead to overfitting if insufficient
data is available. Contrastive learning or cross-feature guidance generally
involves two stages, resulting in increased computational demands. Further-
more, cross-feature guidance methods are often affected by blurry boundaries,
which may cause the background features of the query image to mix with
the foreground features of the support image, leading to suboptimal query
segmentation performance.
In contrast, the approach based on prototypical network structures is
more commonly used. Ouyang et al. [19] proposed a self-supervised few-shot
medical image segmentation model, SSL-ALPNet, which laid a solid foun-
dation for subsequent research on prototype learning. Subsequently, a series
of outstanding FSS works [20, 43, 44, 45, 46, 47, 48] combine self-supervised
with prototype learning. For instance, Zhu et al. [44] proposed a search and
filtering (S&F) module designed based on the self-selection mechanism to al-
leviate the impact of intra-class differences during the support-guided query
segmentation process. Rashid et al. [48] proposed the ViT-CAPS model,
which leverages Vision Transformers, the adaptive context embedding mod-
ule, and the meta prompt generator to improve few-shot segmentation per-
formance in dynamic, low-annotation settings. Although the aforementioned
methods have made some progress, they still face certain limitations. For
example, they typically rely on a single support prototype to guide query
segmentation. When the support image itself contains significant inter-class
inconsistency, it leads to the generation of low-quality support prototypes,
which severely impact the performance of query segmentation. Additionally,
they only consider support features during the prototype generation process,
overlooking potential appearance differences between the target objects in
the support and query images, thus hindering the generalization ability of
the support prototype. In contrast, our proposed PONet optimizes multi-

9
ple local support prototypes in a self-supervised manner through the BPCL
module and establishes relationships between support and query prototypes
using the QGPO module, generating customized prototypes suited for the
current query and enhancing medical FSS segmentation performance.
Recent advancements in computer vision have introduced the Segment
Anything Model (SAM) [11], which has been pre-trained on 11 million im-
ages and 1 billion masks, demonstrating exceptional performance across a
wide range of general image segmentation tasks. SAM has attracted a great
deal of attention in the field of medical segmentation, but it performs poorly
when it is applied directly to medical images due to the huge differences
between natural and medical images [12]. Consequently, SAM-inspired mod-
els such as MedSAM [13], MIT-SAM [14], BiASAM [49], MASG-SAM [50],
have demonstrated the potential of foundational segmentation models for 3D
medical image segmentation. For instance, Zhou et al. proposed BiASAM
[49], which uniquely incorporates two bidirectional attention mechanisms into
SAM for medical image few-shot segmentation. These models typically re-
quire input prompts, such as click points or bounding boxes, for effective
operation. Although SAM-based medical segmentation methods have pro-
gressed, they still face significant limitations due to the scarcity of labeled
medical data, compounded by privacy concerns and the high cost of data
acquisition.

3. Methods
3.1. Problem Definition
In the FSS task, the complete dataset D is divided into two separate
subsets: the training set D train and the testing set D test . The segmentation
model is trained on the training set D train , and then the trained model is
evaluated on the test set D test , D train is labeled by C train , and D test is labeled
by C test , where C train and C test are disjoint, i.e. C train ∩ C test = ∅.
FSS aims at segmenting query objects of a specific semantic class based on
extremely few labeled support images. Following the current approach [32],
we combine training and testing with meta-learning [51], also called episodic
learning. Specifically, we use episode mode to set N-way K-shot segmentation
task, where N represents the number of classes to be segmented in each
episode and K represents the number of images contained in each class. In
each episode for a specific class c, the input to the model F(S, Q) consists
K
of the support set S = (I is , M is ) i=1 and the query set Q = {I q , M q }.


10
Table 1: A summary of the segmentation methods involved in related works.

Section Method Source Year Main contributions

U-Net[23] MICCAI 2015 Encoder-decoder architecture


U-Net++[24] IEEE TMI 2020 Dense skip pathways
nnU-Net[25] Nat. Methods 2021 Self-configuring
Attention U-Net[26] Arxiv 2018 Attention mechanism
2.1
PISA[7] NEUROCOMPUTING 2025 U-shaped transformer
RAMIS[8] NEUROCOMPUTING 2025 Hybrid CNN-transformer
Mamba-Sea[9] IEEE TMI 2025 Mamba framework
DGFE-Mamba[10] J. Bionic. Eng. 2025 Global-local feature
OSLSM[27] Arxiv 2017 Dual-branch structure
FSSPL[28] BMVC 2018 Prototype learning
Sg-one[29] IEEE Trans. Cybern. 2020 Self-configuring
PMM[30] ECCV 2020 Part-aware prototype
PPNet[31] ECCV 2020 Prototype mixture models
2.2 ASGNet[32] CVPR 2021 Prototype learning
MIANet[33] CVPR 2023 Multi-information aggregation
LLaFS[34] CVPR 2024 Large language models
MASNet[35] NEUROCOMPUTING 2024 Multi-scale and attention
ESSNet[36] NEUROCOMPUTING 2025 Word vector embedding
DefectSAM[37] IEEE TIM 2025 Fine-tunes SAM
LPFS[38] IEEE TGRS 2025 Learnable prototype
SSL-ALPNet[19] IEEE TMI 2022 Self-supervised learning
IFSL[16] IEEE TMI 2021 Interactive learning
CRAPNet[17] WSCV 2023 Cycle-resemblance attention
CAT[18] MICCAI 2023 Cross attention transformer
CMRCLNet[40] IEEE SPL 2024 Contrast learning
CGNet[41] CMIG 2025 Cross features module
SR&CL[42] MICCAI 2022 Self-reference and contrastive learning
Q-Net[43] IntelliSys 2023 Threshold adaptation
ADNet++[20] MedIA‌‌ 2023 Uncertainty estimation
2.3 RPT[44] MICCAI 2023 Transformer
PFMNet[45] CMIG 2024 Feature mapping
DGPANet[46] IEEE TIM 2024 Dual optimization
PGRNet[47] IEEE TMI 2025 Graph reasoning
ViT-CAPS[48] NEUROCOMPUTING 2025 Transformer and contrastive learning
SAM[11] ICCV 2023 Foundation model
MedSAM[13] Nat. Commun. 2024 Medical foundation model
MIT-SAM[14] IEEE JBHI 2025 Text-assisted
BiASAM[49] ISPL 2025 Bidirectional-attention Guided
MASG-SAM[50] IEEE JBHI 2025 Multi-scale attention

11
Figure 4: Overview of the proposed PONet.

I is and I q represent the i -th support image and query image, their masks
are represented as M is and M q , respectively. The model takes the support
set S and the query image I q as input, leveraging the semantic knowledge
extracted from the labeled support images to guide the segmentation of the
query image and output its predicted mask. The prediction mask is then
supervised using the ground truth mask M q . Following previous works [19]
and [20] on medical FSS works, we set N = K = 1.

3.2. Network Overview


Figure 4 depicts an overview of the proposed PONet, which consists
of three key components: 1) the boundary prototype contrastive learning
(BPCL) module for enhancing adjacent-boundary background feature speci-
ficity, 2) the query guidance prototype optimization (QGPO) module for nar-
rowing the inconsistency between the support image and the query image,
3) the feature decoding module. Specifically, given the support image I s and
query image I q , the pipeline first adopts the weight-sharing ResNet-50 [52]
as the backbone to extract the support feature F s ∈ RH×W ×C and the query
feature F q ∈ RH×W ×C respectively. H, W, and C represent the height, width,
and number of channels of the feature, respectively. The ResNet-50 has been
pre-trained on the MS-COCO dataset [53]. Then, the extracted features F s

12
and F q are used as inputs to the BPCL and QGPO modules to optimize
regional support background prototypes P bs and foreground prototypes P fs ,
respectively. Finally, the optimized support prototype P s = P bs ∪ P fs and
query feature F q are used to generate the final query image prediction.

3.3. Boundary Prototype contrastive learning Module


In the medical FSS task based on prototype learning, inter-class inconsis-
tency arises from structural similarities and fuzzy boundaries between tissues,
which may degrade the quality of support prototypes. As shown in Figure
2(a), the disturbed support prototypes face significant challenges in even re-
covering their images. As a result, during the prototype learning process,
some query background regions that closely resemble the foreground, espe-
cially those near the foreground boundary (called adjacent-boundary back-
ground regions in this paper), may be mistakenly activated as foreground,
hindering the segmentation of the query image. To address this issue, we pro-
pose the boundary prototype contrastive learning (BPCL) module to enhance
the quality of support prototypes and reduce the inconsistency from these
adjacent-boundary background regions. Specifically, we minimize the dis-
tance between adjacent-boundary background prototypes and non-adjacent-
boundary background prototypes, while maximizing the distance between
adjacent-boundary background prototypes and the foreground prototype.
As is shown in Figure 4(a), given the support image I s , we first use the
simple linear iterative clustering (SLIC) method to over-segment the I s into
N super-pixel blocks, where N is set to 100 by default. Then, combining
the N super-pixel block coordinates, the support mask is decomposed into
three sets: foreground regional masks X = {X n }K n=1 , adjacent-boundary
1

background regional masks Y = {Y n }K n=1 , and non-adjacent-boundary back-


2

ground regional masks Z = {Z n }n=1 , K1 , K2 , and K3 denote the number of


K3

super-pixel blocks in each region. Then, by combining each set of regional


masks with support feature F s ∈ RH×W ×C , we generate three sets of support
regional prototypes: the foreground regional prototype P fs = {pfs,n }K n=1 , the
1

b_a
adjacent-boundary background prototype P b_a s = {ps,n }K n=1 , and the non-
2

b_n
adjacent-boundary background prototype P b_n s = {ps,n }K n=1 . This process
3

employs the mask averaging pooling (MAP) strategy, which is guided by the
following formulations:
HW
1 X
pfs,n = MAP (F s , X n ) = F s,i X n,i (1)
|X n | i=1

13
HW
1 X
pb_a
s,n = MAP (F s , Y n ) = F s,i Y n,i (2)
|Y n | i=1
HW
1 X
pb_n
s,n = MAP (F s , Z n ) = F s,i Z n,i (3)
|Z n | i=1
As mentioned, inter-class inconsistency usually occurs in adjacent-boundary
background regions. We mitigate this inconsistency by introducing con-
trastive learning loss Lcontrast . Specifically, we set the positive sample P positive
and the negative sample P negative by performing global averaging pooling
(GAP) operations on P b_n s and P fs , respectively. Then, we set P b_a
s as the
b_a
anchor and utilize the triplet loss to update each prototype pn in P b_a s
to be closer to the positive sample P positive and farther from the negative
sample P negative . The specific formula above is as follows:

P positive = GAP(P b_n


s ) (4)

P negative = GAP(P fs ) (5)

     
b_a b_a
(6)
PK2
Lcontrast = n=1 max L2 pn , P positive − L2 pn , P negative + M, 0

where the M is a constant, equal to 0.5, L2 denotes the Euclidean distance.


Finally, we connect the updated prototype P b_a
s with P b_n
s to generate
the regional support background prototype P s = P s ∪ P s , which is
b b_a b_n

used to guide the subsequent QGPO module and final segmentation of query
image.

3.4. Query Guidance Prototype Optimization Module


Currently, most previous prototype-based FSS methods for medical im-
ages [18, 46, 45] have adopted various strategies to obtain more representative
prototypes from the support feature, while they often overlook the specific
requirements of query images. In medical images, the appearance of the same
organ may vary greatly in different patients or imaging devices. Due to sup-
port and query images are selected randomly, which further contributes to
intra-class inconsistency. This inconsistency may disrupt the support-query
prototype learning process. To tackle this challenge, we introduce the query

14
Figure 5: The structural diagram of the query prototype generation module.

guidance prototype optimization (QGPO) module to mitigate intra-class in-


consistency. Specifically, in the QGPO module, we design a query-guided
support foreground prototype optimization strategy that reweights support
foreground prototypes by evaluating the importance of various parts of sup-
port prototypes for the current query image segmentation, aiming to make
support prototypes more adaptable to the query image content.
As shown in Figure 4(b), in the QGPO module, we first introduce the
query prototype generation (QPG) module to generate the query prototype
P q according to the current support foreground prototype P fs and back-
ground prototype P bs . As shown in Figure 5, following [20, 43], in the QPG
module, the compressed target region features are combined with the learned
soft threshold to predict the query foreground and background masks jointly.
The specific process is as follows:
M fq = 1 − σ(0.5(S(GAP(P fs ), F q ) − τ1 )) (7)

M bq = 1 − σ(0.5(S(GAP(P bs ), F q ) − τ2 )) (8)
where M fq and M bq denote the query foreground and background masks,
respectively. S(x, y) = −αCos(x, y) denotes the cosine similarity with α = 20
is the scaling factor. σ denotes the Sigmoid function, τ1 , τ2 are the learnable
threshold derived by the ResNet-50 through two distinct fully connected
layers. The query prediction mask M q is then obtained as:
M q = λM fq + (1 − λ)(1 − M bq ) (9)

15
Based on this, the query prototype P q can be generated using the MAP
operation.
HW
1 X
P q = MAP (F q , M q ) = F q,i M q,i (10)
|M q | i=1

Then, given the support prototype P fs ∈ RK1 ×C and P q ∈ R1×C , we feed


them into the Bias-alleviation mamba module to correct P fs and regenerate
f
the updated support prototype P̂ s ∈ RK1 ×C . Specifically, we first computes
the similarity matrix S = {Cos(pfs,n , P q )}Kn=1 ∈ R
1 K1 ×1
to reveal the corre-
spondence between the query prototype and each support prototype. Then,
we propose a selective mapping mechanism to filter out support prototypes
irrelevant to the query. This process can be expressed as:
(
Si if S i > 0
λi = , i ∈ {1, ..., K1 } (11)
−∞ otherwise

where λ ∈ RK1 ×1 denotes the filtered similarity matrix.


Subsequently, we apply the Softmax function and matrix multiplication
to reallocate weights on Psf , aiming to eliminate inconsistent parts within
f
the support prototypes. The preliminary rectified support prototype P̃ s ∈
RK1 ×C can be expressed as:
f
P̃ s = Softmax(λ) ⊗ P fs (12)

Finally, we introduce the Mamba model [22], which embeds the prototype
P̃sfas input sequence data to capture essential information while maintain-
ing long-range dependencies. To be specific, Mamba uses the Selective State
Space Model (SSM) to identify key information from the input P̃sf and gen-
f
erate the refined P̂ s . As shown in Figure 4(b), the Mamba block consists
of layer normalization (LN), Linear layer (Linear), 1d convolution (Conv1d),
selective SSM, SiLU activation [54], and residual connection. Assume the
input matrix x, the output matrix y is computed by the Mamba as follows:

x1 = SelectiveSSM(SiLu(Conv1d(Linear(LN(x)))) (13)
x2 = Silu(Linear(LN(x))) (14)
y = Linear(x + (x1 ⊙ x2 )))) (15)

16
where ⊙ denotes the dot product. Please refer to Mamba [22] for a more
detailed explanation of the selective SSM approach.
To enhance bias alleviation, we employ a stacked arrangement of three
QGPO modules to iteratively update support and query prototypes, and the
number of modules is set to 3 for optimal performance. Experimental results
have shown that a stack of three modules achieves the optimal effect (see
Section 4.3 for details).
In summary, in the QGPO module, given the previous support foreground
prototype P fs , it is first fed into the QPG module to generate the query proto-
type P q . Subsequently, both P fs and P q are input into the Bias-alleviation
Mamba (BaM) module to generate the updated support foreground pro-
f f
totype P̂ s . This updated P̂ s then replaces P fs , and the entire process is
repeated three times. The update process can be expressed as:
P q = QPG(P fs , P bs , F q ) (16)
f
P̂ s = BaM(P fs , P q ) (17)
Following optimization by the QGPO module, the support foreground
prototype P fs , combined with the support background prototype P bs , is used
to predict the final query image mask M q through Eqs. (7), (8), and (9).

3.5. Loss Function


We use segmentation loss Lseg and contrast loss Lcontrast to train the
proposed PONet. The segmentation loss computes the cross-entropy between
the predicted masks for the support and query samples and their respective
ground truth (GT), and the contrast loss is calculated as in Eq. (6).
Ltotal = Lseg + Lcontrast (18)
The Lseg consists of two separate loss components: the query loss Lquery
and the support loss Lsupport .
Lseg = α1 Lquery + α2 Lsupport (19)
where α1 and α2 are the balance coefficients, which are set to 0.6 and 0.4,
respectively.
The query loss Lquery is calculated by measuring the error between the
final query prediction mask: M q and the corresponding GT: M̃ q , which is
guided as follows:
Lquery = BCE(M̃ q , M q ) (20)

17
where BCE denotes the binary cross-entropy (BCE) loss. Furthermore, for
the support loss Lsupport , we first generate the support mask M s by combin-
ing the query prediction mask M q with the support foreground prototype P fs
and foreground prototype P bs , as described in Eqs. (7), (8), and (9). Subse-
quently, the Lsupport is used to measure the error between the corresponding
GT: M̃ s and the predicted mask: M s , which is guided as follows:

Lsupport = BCE(M̃ s , M s ) (21)

4. Experiments
4.1. Experimental Setup
1)Benchmark Datasets: We evaluated the proposed PONet on three pub-
licly available datasets, including Abd-MRI [55], Abd-CT [56], and Card-MRI
[57]. Specifically, Abd-MRI is an abdominal MRI dataset from the ISBI 2019
Combined Healthy Abdominal Organ Segmentation Challenge (CHAOS). It
includes 20 3D T2-SPIR MRI scans, each with an average of 36 slices. Abd-
CT is an abdominal CT dataset from the MICCAI 2015 Multi-Atlas Abdom-
inal Labeling challenge, consisting of 30 3D abdominal CT scans. Abd-MRI
and Abd-CT share the same annotation classes, which include the liver, left
kidney (LK), right kidney (RK), and spleen. Card-MRI, from the MICCAI
2019 multi-sequence cardiac MRI segmentation challenge, includes 35 3D car-
diac MRI scans with an average of 13 slices per scan. The annotation labels
for Card-MRI include left ventricle myocardium (LV-MYO), left ventricular
blood pool (LV-BP), and right ventricle (RV).
2)Implementation Details: Our PONet is implemented using PyTorch
(v1.10.2) on the NVIDIA RTX 3090 GPU. During training, we utilize the
standard feature extraction approach, employing the ResNet-50 pre-trained
on the MS-COCO dataset as the backbone of the feature extractor. Refer to
[19] and [58] for the data pre-processing pipeline. Adopting the meta-learning
strategy, we configure the network for 1-way 1-shot learning, performing 50K
iterations with a batch size of 1. The network is optimized using the SGD
[59] optimizer, with an initial learning rate of 0.001, with a step decay of
0.8 for each 1000 iterations. We select support and query slices according to
the strategy in [43] and use five-fold cross-validation for training and testing,
recording the final average values.
3)Experiment Settings: In the evaluation phase, we repeat the training
for each experimental scenario 5 times and use the Dice Similarity Coefficient

18
(DSC) and the Boundary F1 (BF1) score to record the mean experimental
results, thereby assessing the similarity between the predicted mask and the
ground truth. The BF1 score is designed to assess the segmentation quality
on the boundaries. It evaluates the alignment between the dilated predicted
boundaries and the ground truth boundaries, disregarding the performance
on interior pixels and concentrating solely on the precision of the boundary
delineation. We set the dilation parameters as 0.75% of the image diagonal
length pixels to calculate the BF1 score. Following [19, 46], we also use
two different supervision settings to evaluate the generalization ability of the
proposed PONet to novel samples. Specifically, in Setting 1 : test classes are
allowed to appear in the background of training slices. This situation makes
it possible for test classes to participate in the training implicitly and not
be treated as truly invisible new classes, while in Setting 2, slices containing
test classes are forcefully removed from the training data to ensure that test
classes are truly invisible. Notably, setting 2 does not apply to Card-MRI
because all organ classes are usually simultaneously present on one slice.
Therefore, we only consider setting 1 for Card-MRI.

4.2. Comparison With State-of-the-Art Methods


1)Quantitative Results: Tables 2-3 display a quantitative comparison of
the proposed PONet with a series of classical FSS methods, including SSL-
ALPNet [19], SR&CL [42], ADNet++ [20], Q-Net [43], CRAPNet [17], CAT
[18], RPT [44], PFMNet [45], DGPANet [46], CGNet [41], PGRNet [47],
ViT-CAPS [48], along with SAM-based FSS methods including BiASAM
[49] and MASG-SAM [50], as well as supervised learning methods such as
U-Net [23] and ISONet [60]. The quantitative comparison is conducted on
the Abd-MRI, Abd-CT, and Card-MRI datasets.
As shown in Tables 2 and 3, the DSC and BF1 scores obtained by the
proposed method outperform all the listed methods under two experimental
settings. Specifically, for the Abd-MRI dataset, our PONet has the high-
est mean DSC score under settings 1 and 2, with 85.96% and 81.83%, re-
spectively, and outperforms the suboptimal method DGPANet and RPT by
2.26% and 2.30%, respectively. Meanwhile, our method achieved the highest
mean BF1 scores of 77.49% and 75.53%, indicating more accurate segmen-
tation that is closer to the boundaries. For the Abd-CT dataset, our PONet
provides better segmentation results for organs such as the RK, LK, spleen,
and liver. The mean DSC score outperforms the suboptimal methods DG-
PANet and RPT by 3.62% and 3.61% under settings 1 and 2, respectively.

19
Table 2: Experimental comparison results on the Abd-MRI and Abd-CT datasets.

Abd-MRI Abd-CT
Setting Method Source Year
LK RK Spleen Liver mDSC mBF1 LK RK Spleen Liver mDSC mBF1
SSL-ALPNet[19] TMI 2022 81.92 85.18 72.18 76.10 78.84 65.26 72.36 71.81 70.96 78.29 73.35 61.43
SR&CL[42] MICCAI 2022 79.34 87.42 76.01 80.23 80.77 67.35 73.45 71.22 73.41 76.06 73.53 64.22
ADNet++[20] MIA 2023 86.80 86.62 75.69 74.85 80.99 66.35 53.47 50.29 65.76 74.24 60.94 51.27
Q-Net[43] IntelliSys 2023 78.36 87.98 75.99 81.74 81.02 68.22 76.89 71.87 76.31 77.08 75.54 65.31
CRAPNet[17] WACV 2023 81.95 86.42 74.32 76.46 79.79 67.63 74.69 74.18 70.37 75.41 73.66 63.91
CAT[18] MICCAI 2023 74.01 78.90 68.83 78.98 75.18 64.36 63.36 60.05 67.65 75.31 66.59 58.78
RPT[44] MICCAI 2023 81.83 88.73 76.37 82.59 82.38 70.77 76.52 80.57 72.38 81.32 77.69 69.48
PFMNet[45] CMIG 2024 77.48 81.35 72.33 73.55 76.17 65.88 70.32 75.48 68.52 69.36 70.92 62.81
1
DGPANet[46] TIM 2024 85.84 86.99 79.62 81.31 83.70 75.45 82.67 79.56 83.28 65.59 77.77 71.52
CGNet[41] CMIG 2025 80.43 82.69 76.33 77.31 79.19 70.62 75.26 70.38 73.35 73.22 73.05 66.17
PGRNet[47] TMI 2025 81.44 87.44 81.72 83.27 83.47 74.96 74.23 79.88 72.09 82.48 77.17 71.37
ViT-CAPS[48] Neuro. 2025 80.59 85.82 77.38 78.93 80.68 70.63 78.69 76.38 76.60 78.82 70.52 61.77
BiASAM[49] ISPL 2025 82.35 83.50 76.89 78.82 80.39 72.21 76.58 75.36 76.21 77.35 76.37 70.73
MASG-SAM[50] JBHI 2025 84.36 86.58 77.65 82.35 82.73 74.36 76.85 77.16 78.09 77.82 77.48 72.61
U-Net[23] MICCAI 2015 80.64 80.35 75.39 77.22 78.40 71.33 76.33 74.56 76.36 78.25 76.37 69.67
ISONet[60] ESWA 2025 81.28 84.52 76.58 78.63 80.25 71.67 77.25 76.85 74.44 76.89 76.35 70.25
Ours - - 88.66 90.26 80.32 84.60 85.96 77.49 80.65 80.89 79.19 83.63 81.39 75.45
SSL-ALPNet[19] TMI 2022 73.63 78.39 67.02 73.05 73.02 61.22 63.34 54.82 60.25 73.65 63.02 54.43
SR&CL[42] MICCAI 2022 77.07 84.24 73.73 75.55 77.65 72.41 67.39 63.37 67.36 73.63 67.94 62.77
ADNet++[20] MIA 2023 76.25 77.82 69.88 70.65 73.65 67.21 45.62 45.36 61.76 68.42 55.30 48.07
Q-Net[43] IntelliSys 2023 64.81 65.94 65.37 78.25 68.59 60.88 65.67 51.47 63.38 77.07 64.40 57.21
CRAPNet[17] WACV 2023 74.66 82.77 70.82 73.82 75.52 68.58 70.91 67.33 70.17 70.45 69.72 63.35
CAT[18] MICCAI 2023 75.31 83.23 67.31 75.02 75.22 70.31 68.82 64.56 66.02 80.51 70.88 63.74
RPT[44] MICCAI 2023 74.51 86.73 75.80 81.09 79.53 72.32 72.36 67.54 71.95 74.13 71.49 66.21
PFMNet[45] CMIG 2024 72.11 74.35 68.35 69.31 71.03 66.35 66.32 69.35 65.30 65.21 66.54 58.28
2
DGPANet[46] TIM 2024 73.76 75.96 74.10 69.21 73.72 68.21 74.10 68.06 65.91 65.56 68.41 62.33
CGNet[41] CMIG 2025 76.38 77.25 74.36 72.09 75.02 69.28 70.23 66.36 69.39 70.82 69.20 62.33
PGRNet[47] TMI 2025 77.38 81.25 73.58 78.48 77.67 70.58 69.72 67.88 68.36 75.35 70.32 64.13
ViT-CAPS[48] Neuro. 2025 74.63 79.36 74.21 75.06 75.81 70.88 73.48 69.14 67.25 70.21 70.02 64.11
BiASAM[49] ISPL 2025 73.55 77.43 72.80 73.98 74.44 67.59 70.66 68.17 67.89 73.16 70.47 64.33
MASG-SAM[50] JBHI 2025 76.82 82.58 73.55 79.09 78.01 72.43 70.85 69.16 70.38 75.08 71.36 65.12
U-Net[23] MICCAI 2015 74.35 77.31 68.31 70.02 72.49 66.36 68.20 66.31 69.37 71.25 68.78 62.57
ISONet[60] ESWA 2025 73.06 76.58 71.35 72.33 73.33 66.21 69.33 67.39 68.21 70.25 68.79 63.11
Ours - - 78.83 87.23 76.62 81.66 81.83 75.53 76.21 72.12 74.53 77.56 75.10 68.33

* Best results are shown in bold, suboptimal results are indicated by a horizontal line.

Our method also demonstrates superior performance in delineating organ


boundaries, with mBF1 scores of 75.45% and 68.33%. In addition, on the
Card-MRI dataset, we observe that the proposed PONet significantly im-
proves the segmentation of organs LV-BP, LV-MYO, and RV, and the mean
DSC and mBF1 scores are increased by 0.80% and 1.32% compared with the
second-best method, DGPANet.
2)Qualitative Results: To visually assess the segmentation performance
of our PONet, we present qualitative results in Figures 6, 7, comparing our
PONet with the latest baseline methods across three datasets Abd-MRI,
Abd-CT, and Card-MRI. For the Abd-MRI dataset, the proposed PONet

20
Table 3: Experimental comparison results on the Card-MRI dataset.

Card-MRI
Setting Method Source Year
LV-BP LV-MYO RV mDSC mBF1
SSL-ALPNet[19] TMI 2022 83.99 66.74 79.96 76.90 68.35
SR&CL[42] MICCAI 2022 84.74 65.83 78.41 76.32 69.22
ADNet++[20] MIA 2023 82.79 58.67 67.57 69.68 64.43
Q-Net[43] IntelliSys 2023 90.25 65.92 78.19 78.15 71.69
CRAPNet[17] WACV 2023 83.02 65.48 78.27 75.59 69.08
CAT[18] MICCAI 2023 90.54 66.85 79.71 79.03 73.36
RPT[44] MICCAI 2023 89.57 66.82 80.17 78.85 74.66
PFMNet[45] CMIG 2024 86.35 61.58 74.38 74.10 69.31
1
DGPANet[46] TIM 2024 89.82 67.62 80.09 79.18 74.13
CGNet[41] CMIG 2025 87.82 64.28 75.33 75.81 68.19
PGRNet[47] TMI 2025 88.52 62.59 77.47 76.52 70.22
ViT-CAPS[48] Neuro. 2025 86.53 60.86 76.57 74.65 67.39
BiASAM[49] ISPL 2025 88.12 63.59 77.23 76.31 70.28
MASG-SAM[50] JBHI 2025 89.35 65.93 78.88 78.05 73.36
U-Net[23] MICCAI 2015 84.28 61.39 75.48 73.71 67.77
ISONet[60] ESWA 2025 86.86 61.83 75.85 74.84 65.25
Ours - - 91.44 67.85 80.65 79.98 75.45
* Best results are shown in bold, suboptimal results are indicated by a horizontal line.

accurately identifies four organ regions and delineates the foreground edge
regions. Specifically, our method can accurately predict the intact bound-
ary of the liver and spleen organ compared with other methods, effectively
mitigating inconsistency caused by background similarity. Additionally, it
exhibits superior performance in capturing edge details for the LK and RK
organs. For the Abd-CT and Card-MRI datasets, our method can provide a
more comprehensive depiction of organ boundaries compared to other base-
line methods, effectively mitigating false segmentation issues caused by am-
biguous boundaries.

21
Figure 6: Qualitative results on the Abd-MRI and Abd-CT datasets under Setting 1.

Figure 7: Qualitative results on the Card-MRI dataset under Setting 1.

4.3. Ablation Study and Variable Analysis


In this section, we perform an ablation study to evaluate the effectiveness
of each component and hyperparameter configuration within the proposed
method. All ablation experiments are conducted on the Abd-MRI dataset
under setting 1.

22
1) Combined Ablation Analysis for BPCL and QGPO Modules: In the
proposed PONet, we introduce the BPCL and QGPO modules to enhance
segmentation performance for medical FSS tasks. To evaluate the effective-
ness of these modules, we conduct a joint ablation study. Specifically, we
design three different experimental settings. Setting (a) removes the BPCL
and QGPO modules from the PONet architecture, directly guiding the query
segmentation through a prototype that supports image generation. Settings
(b) and (c) build upon setting (a) by incorporating the proposed BPCL and
QGPO modules, respectively.
The experimental results are shown in Table 4. By comparing setting (b)
with setting (a) and our method with setting (c), we evaluate the network’s
performance with and without the BPCL module. The quantitative results
show that the mean DSC score improves by 6.53% and 4.25%, respectively.
Furthermore, as shown in the visualization results in Figure 8, the LK and
spleen organs often experience false segmentation due to boundary confusion
and organ similarity. Our method introduces the BPCL module to reduce
the class inconsistency between LK and the spleen, thereby achieving more
accurate segmentation results. For instance, in setting (a), the DSC for LK
is 72.55%, with 822 false positive (FP) pixels caused by boundary confusion.
In setting (b), the DSC for LK increases to 80.55%, with the FP pixels re-
duced to 283, demonstrating that the introduction of BPCL leads to a 65%
reduction in FP. By comparing setting (c) with setting (a) and our method
with setting (b), we evaluate the network’s performance with and without
the QGPO module. The quantitative results show that the mean DSC score
improves by 8.22% and 4.25%, respectively, demonstrating that the integra-
tion of the QGPO module effectively enhances segmentation performance.
Furthermore, the segmentation results in Figure 8 further illustrate that in-
corporating the QGPO module leads to more complete segmentation results.
In addition to the visual comparisons, the DSC box plot clearly shows the
distribution across different settings. As shown in Figure 9, the box plots of
mean DSC under different settings further highlight that the integration of
the BPCL and QGPO modules not only improves accuracy but also enhances
robustness.
To further verify the effectiveness of the proposed BPCL and QGPO
modules, we apply the t-SNE [61] to visualize the feature distributions under
different settings. The t-SNE is performed during the testing phase, and
each point represents a query feature. As shown in Figure 10, in setting
(a), we observed that the inter-class distance between features is small, while

23
the intra-class distance is large, resulting in insufficient distinguishability be-
tween feature samples. In contrast, after integrating BPCL and QGPO, the
inter-class distance between features increases, and the intra-class distance
decreases, thereby enhancing the distinguishability between feature samples.
The visualization clearly shows that query features of our method exhibit
intra-class cohesion and inter-class separation, which contributes to improved
segmentation performance.

Figure 8: The visual effect of integrating BPCL and QGPO modules.

Table 4: The joint performance analysis of BPCL and QGPO modules.

Module Abd-MRI
Setting
BPCL QGPO LK RK Spleen Liver mDSC
(a) % % 76.23 75.63 68.58 73.54 73.49 ± 4.64
(b) ✓ % 82.38 83.21 74.58 79.83 80.02 ± 4.18
(c) % ✓ 85.14 84.85 72.63 80.24 81.21 ± 3.70
Our method ✓ ✓ 88.66 90.26 80.32 84.60 85.96 ± 2.64

24
Figure 9: Compare the box plots of mean DSC under different settings on the Abd-MRI
dataset.

Figure 10: Visualization of t-SNE embedding for the query features. The colors purple,
blue, green, red, and yellow correspond to the background, LK, RK, liver, and spleen,
respectively.

2) Analysis of Super-pixel Blocks Parameter N: To improve the transfer

25
of support information, we propose using the SLIC method to over-segment
the supporting image and then extract multiple supporting foreground and
background sub-region prototypes. An ablation study is conducted to assess
the impact of the number of superpixel blocks N on PONet performance.
Specifically, we assess the DSC scores for four organs across various settings of
N. As shown in Figure 11, the DSC scores for each organ increased to varying
degrees as region N increased. More specifically, from N = 10, the DSC scores
of the four organs increased significantly, reaching the peak at N = 100,
whereas the DSC scores of the RK and LK organs began to decrease at N =
140. This trend suggests that increasing the number of partitions can enhance
the representativeness of subregions, thereby improving the refinement of
support prototypes by eliminating less relevant parts. However, for smaller-
shaped organs such as the LK and RK, an excessive number of subregions
can result in insufficient feature information within each block, potentially
introducing noise and degrading segmentation performance.

Figure 11: Analysis of super-pixel blocks parameter N.

3) Ablation analysis of the quantity of QGPO modules: In our method,


the network stacks M QGPO modules to effectively mitigate intra-class in-
consistency. To analyze its effectiveness, we configure the model with varying

26
M and evaluate the DSC scores for each organ. As shown in Figure 12, the
highest mean DSC score of 85.96 was achieved with M =3. Beyond this
configuration, additional stacking does not result in further improvements.
While increasing the number of modules helps accelerate model convergence,
too many stacked QGPO modules tend to overfit the model and thus fail to
achieve superior segmentation performance.
To further explore the QGPO module’s impact on performance across
networks with varying parameters, we introduce the parameter quantity
shifting-fitting performance (PQS-FP) coordinate system proposed by Xi-
ang et al. [62]. This framework categorizes models into two regions based
on their behavior as the number of parameters increases: the underfitting
decay region (UAR) and the overfitting deterioration region (OER). Specif-
ically, we begin by reducing the number of channels in the ResNet-50 back-
bone, changing the default configuration from C1 = [256, 512, 1024, 2048] to
C2 = [128, 256, 512, 1024]. We then compare the impact of varying the num-
ber of QGPO blocks on the network performance under these two configu-
rations. The quantitative experimental results are shown in Table 5. Then,
based on the network performance under different settings, we further plotted
the localization of these settings in the PQS-FP coordinate system. As shown
in Figure 13, when C1 and M ∈ {1, 2, 3, 5}, it falls under the UAR category.
In this regime, increasing the number of parameters helps to reduce insuf-
ficiency, thereby improving classification accuracy. However, when C1 and
M ∈ {7, 11}, it falls under the OER category, where increasing the number
of parameters exacerbates overfitting. Although the complexity of the model
is enhanced, segmentation performance is degraded. This trend is consistent
with the observed DSC mean scores in Table 5. The introduction of the
PQS-FP coordinate system effectively captures the relationship between the
number of parameters and segmentation performance when stacking different
quantities of QGPO models.

27
Figure 12: Mean DSC scores for different settings of M.

Table 5: DSC scores for different M module settings under varying channel numbers.

Channels M LK RK Spleen Liver mDSC


1 85.38 86.59 74.63 83.14 82.43 ± 4.32
2 86.91 88.83 76.63 83.83 84.05 ± 2.97
3 88.66 90.26 80.32 84.60 85.96 ± 2.64
C1
5 88.61 90.12 79.11 84.30 85.03 ± 2.55
7 88.62 90.01 78.25 84.25 85.10 ± 2.34
11 85.26 87.32 75.38 83.62 82.89 ± 3.13
1 83.42 84.86 71.37 79.09 79.68 ± 4.15
2 84.83 86.77 73.62 80.92 81.53 ± 4.32
3 86.27 88.73 77.87 82.47 83.83 ± 3.53
C2
5 87.56 89.36 78.58 83.89 84.84 ± 3.06
7 88.60 90.18 78.87 84.21 85.46 ± 2.55
11 86.53 87.69 76.85 83.57 83.66 ± 3.62

28
Figure 13: The positioning of different settings in the PQS-FP coordinate system.

4) Optimization sensitivity analysis: To further assess the stability and


robustness of the proposed method, we conducted an ablation study focused
on sensitivity analysis. Specifically, we compared the segmentation perfor-
mance and loss convergence situation of the utilized SGD [59] optimizer with
three alternative optimizers: Adam [63], Adamax [63], and AdaBoB [64]. For
Adam, Adamax, and AdaBoB, we set the learning rate to r = 0.001, con-
sistent with the default setting of the SGD optimizer, while keeping other
parameters at their default values (β1 = 0.9, β2 = 0.999, ϵ = 10−8 ). As shown
in Table 6, the quantitative experimental comparison results demonstrate
that the proposed method in this manuscript maintains relatively accurate
segmentation performance across different optimizer settings, with a mean
DSC above 85%, and exhibits stable convergence behavior of the loss values.

29
Table 6: The quantitative experimental comparison results of different optimizers.

Optimizer Source Year mDSC Loss


Adam[63] ICLR 2015 85.77 ± 3.37 0.134
Adamax[63] ICLR 2015 85.13 ± 2.32 0.147
AdaBoB[64] PR 2025 85.48 ± 2.77 0.140
SGD[59] AMS 1951 85.96 ± 2.64 0.121

5) 1-shot versus multi-shot experimental analysis: To assess the perfor-


mance and generalization ability of the proposed method under varying num-
bers of support samples, we conducted comparative experiments with 1-shot,
5-shot, 10-shot, and 30-shot settings. The primary distinction between these
settings lies in the number of support samples used during model optimiza-
tion. In the 1-shot setting, a single support image is used to guide query
segmentation in each training iteration, while in the 5-shot, 10-shot, and
30-shot settings, five, ten, and thirty support images are utilized to assist
in model optimization. As shown in Table 7, the quantitative experimental
results indicate that the mean DSC increased by 2.88% in the 5-shot com-
pared to the 1-shot setting. Furthermore, the 10-shot and 30-shot settings
achieved slight improvements in the DSC, with the 30-shot setting reaching
the highest mean DSC of 88.88. The qualitative results presented in Figure 14
demonstrate that the 5-shot setting further reduces over-segmentation in the
LK, RK, and liver, while also alleviating under-segmentation in the spleen.
The above experimental results indicate that a certain number of support
samples, such as 5-shot and 10-shot, can lead to performance improvements.
However, increasing the number of support samples further does not result
in additional performance gains.

Table 7: DSC scores for different shot settings.

Setting LK RK Spleen Liver mDSC


1-shot 88.66 90.26 80.32 84.60 85.96 ± 2.64
5-shot 91.32 92.43 84.18 87.43 88.84 ± 2.43
10-shot 91.38 92.46 84.20 87.45 88.87 ± 2.73
30-shot 91.39 92.45 84.22 87.46 88.88 ± 2.38

30
Figure 14: Comparison of qualitative segmentation results under the 1-shot and 5-shot
settings.

6) Complexity analysis:
The proposed PONet uses the Mamba module to refine support proto-
types. Mamba’s success is mainly attributed to its ability to capture long-
range dependencies while maintaining linear complexity concerning the input
sequence length, making it a promising alternative to CNN and Transform-
ers. To validate its effectiveness, we conducted an ablation study, evaluating
the accuracy and complexity of the proposed method under different con-
figurations, including the Convolutional Block Attention Module (CBAM)
[65], Transformer [66], Gated Axial-Attention (GAA) [67], and Mamba. To
evaluate the trade-off between performance and cost, we report the results
of inference runtime, number of parameters (Params), floating-point oper-
ations (FLOPs), frames processed per second (FPS), and mean DSC. As
shown in Table 8, integrating Mamba into the QGPO module delivers opti-
mal adaptation performance, yielding the best DSC. Furthermore, in terms
of model efficiency, the Mamba-based model outperforms the other configu-
rations. These findings indicate that the introduction of Mamba effectively
balances performance and efficiency, making it suitable for resource-limited
clinical settings. Additionally, we believe that improving the efficiency of
the FSS model without compromising segmentation performance could be a
promising direction for future research.

31
Table 8: Compare the inference runtime [s], the number of parameters (Params) [M],
floating-point operations (FLOPs) [G], frames processed per second (FPS), and the mean
DSC across different configurations.

Model Source Year Inference Params FLOPs FPS mDSC


CBAM[65] ECCV 2018 0.43 27.45 21.21 8.13 82.86 ± 2.98
Transformer[66] NIPS 2017 0.51 38.63 28.63 7.73 84.62 ± 3.85
GAA[67] MICCAI 2021 0.42 22.15 19.68 8.35 82.52 ± 4.35
Mamba[22] ArXiv 2024 0.40 18.26 17.52 8.69 85.96 ± 2.64

5. Conclusion
In this work, we introduce PONet, a model specifically designed for med-
ical image few-shot segmentation. PONet aims to address common issues of
inter-class and intra-class inconsistency in medical image FSS tasks. Specif-
ically, we use the SLIC method to over-segment the support images and
generate multiple support foreground and background sub-regions using the
support mask. We then propose the BPCL module, which employs con-
trastive learning to reduce inconsistency from adjacent-boundary background
regions, decreasing the likelihood of them being activated as foreground and
thus minimizing inter-class inconsistency. Next, we introduce the QGPO
module, which employs a query-guided support foreground prototype opti-
mization strategy to make support foreground prototypes more adaptable to
the content of the query image, thereby minimizing intra-class inconsistency.
Extensive experimental results show that the proposed PONet outperforms
other SOTA methods.
Although our FSS segmentation method has made progress, it still faces
challenges with limited labeled data. In the future, we aim to explore more
effective approaches, such as very few-shot or zero-shot learning, to improve
the model’s performance in real-world scenarios with scarce annotations. In
addition, we will focus on addressing real-time performance in clinical envi-
ronments, optimizing the model to achieve instant response without compro-
mising accuracy.

Abbreviations
Computed tomography (CT); Magnetic resonance imaging (MRI); Few-
shot segmentation (FSS); Prototype optimization network (PONet); Bound-

32
ary prototype contrastive learning (BPCL); Query guidance prototype op-
timization (QGPO); Simple linear iterative clustering (SLIC); State-of-the-
art (SOTA); Convolutional neural networks (CNN); Abdominal MRI (Abd-
MRI); Abdominal CT (Abd-CT); Cardiac MRI (Car-MRI); Mask average
pooling (MAP); Class-agnostic segmentation network (CANet); Large lan-
guage models (LLMs); Cross feature module (CFM); Support guide query
(SGQ); Search and filtering (S&F); Segment anything model (SAM); Global
averaging pooling (GAP); Query prototype generation (QPG); Selective state
space model (SSM); Layer normalization (LN); 1d convolution (Conv1d);
Ground truth (GT); Binary cross-entropy (BCE); Combined Healthy Ab-
dominal Organ Segmentation Challenge (CHAOS); Left kidney (LK); Right
kidney (RK); Left ventricle myocardium (LV-MYO); Left ventricular blood
pool (LV-BP); Right ventricle (RV); Dice Similarity Coefficient (DSC); Triplet
loss (TL); Parameter quantity shifting-fitting performance (PQS-FP); Under-
fitting decay region (UAR); Overfitting deterioration region (OER); Convo-
lutional Block Attention Module (CBAM); Gated Axial-Attention (GAA);
parameters (Params); floating-point operations (FLOPs).

Declarations
This work was supported in part by the National Natural Science Foun-
dation of China under Grant nos. 62306187, 62403108, the Ministry of In-
dustry and Information Technology Project TC220H05X-04, the Liaoning
Provincial Natural Science Foundation Joint Fund 2023-MSBA-075 and the
Fundamental Research Funds for the Central Universities N2426005.

References
[1] B. H. Menze, A. Jakab, S. Bauer, J. Kalpathy-Cramer, K. Farahani,
J. Kirby, Y. Burren, N. Porz, J. Slotboom, R. Wiest, et al., The multi-
modal brain tumor image segmentation benchmark (brats), IEEE trans-
actions on medical imaging 34 (10) (2014) 1993–2024.

[2] Q. Zhu, H. Wang, B. Xu, Z. Zhang, W. Shao, D. Zhang, Multimodal


triplet attention network for brain disease diagnosis, IEEE Transactions
on Medical Imaging 41 (12) (2022) 3884–3894.

[3] M. V. Sherer, D. Lin, S. Elguindi, S. Duke, L.-T. Tan, J. Cacicedo,


M. Dahele, E. F. Gillespie, Metrics to evaluate the performance of auto-

33
segmentation for radiation treatment planning: A critical review, Ra-
diotherapy and Oncology 160 (2021) 185–191.

[4] L. Wang, Q. Dou, P. T. Fletcher, S. Speidel, S. Li, Medical image com-


puting and computer assisted intervention–miccai 2022, in: Proceedings
of the 24th International Conference, Strasbourg, France, Vol. 12901,
2021, pp. 109–119.

[5] M. Ali, H. Hu, T. Wu, M. Mansoor, Q. Luo, W. Zheng, N. Jin, Seg-


mentation of mri tumors and pelvic anatomy via cgan-synthesized data
and attention-enhanced u-net, Pattern Recognition Letters 187 (2025)
100–106. doi:https://doi.org/10.1016/j.patrec.2024.11.003.
URL https://www.sciencedirect.com/science/article/pii/
S016786552400309X

[6] M. Ali, H. Hu, T. Muhammad, M. A. Qureshi, T. Mahmood, Deep


learning and shape-driven combined approach for breast cancer tumor
segmentation, in: 2025 6th International Conference on Advancements
in Computational Sciences (ICACS), 2025, pp. 1–6. doi:10.1109/
ICACS64902.2025.10937847.

[7] S. S. Patil, M. Ramteke, A. S. Rathore, Permutation invariant self-


attention infused u-shaped transformer for medical image segmentation,
NEUROCOMPUTING 625 (APR 7 2025). doi:10.1016/j.neucom.
2025.129577.

[8] J. Gu, F. Tian, I.-S. Oh, Ramis: Increasing robustness and accuracy
in medical image segmentation with hybrid cnn-transformer synergy,
NEUROCOMPUTING 618 (FEB 14 2025). doi:10.1016/j.neucom.
2024.129009.

[9] Z. Cheng, J. Guo, J. Zhang, L. Qi, L. Zhou, Y. Shi, Y. Gao, Mamba-sea:


A mamba-based framework with global-to-local sequence augmentation
for generalizable medical image segmentation, IEEE Transactions on
Medical Imaging (2025) 1–1doi:10.1109/TMI.2025.3564765.

[10] J. Sun, K. Chen, S. Wang, Y. Zhang, Z. Xu, X. Wu, C. Tang, Dgfe-


mamba: Mamba-based 2d image segmentation network, JOURNAL
OF BIONIC ENGINEERING (2025 APR 26 2025). doi:10.1007/
s42235-025-00711-x.

34
[11] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson,
T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dolla’r, R. Girshick,
Segment anything, in: 2023 IEEE/CVF INTERNATIONAL CONFER-
ENCE ON COMPUTER VISION, ICCV, IEEE International Confer-
ence on Computer Vision, IEEE; IEEE Comp Soc; CVF, 2023, pp.
3992–4003, iEEE/CVF International Conference on Computer Vision
(ICCV), Paris, FRANCE, OCT 02-06, 2023. doi:10.1109/ICCV51070.
2023.00371.

[12] M. Ali, T. Wu, H. Hu, Q. Luo, D. Xu, W. Zheng, N. Jin, C. Yang,


J. Yao, A review of the segment anything model (sam) for medi-
cal image analysis: Accomplishments and perspectives, COMPUT-
ERIZED MEDICAL IMAGING AND GRAPHICS 119 (JAN 2025).
doi:10.1016/j.compmedimag.2024.102473.

[13] J. Ma, Y. He, F. Li, L. Han, C. You, B. Wang, Segment anything in


medical images, NATURE COMMUNICATIONS 15 (1) (JAN 22 2024).
doi:10.1038/s41467-024-44824-z.

[14] X. Zhou, L. Yan, R. Ding, C. C. Atabansi, J. Nie, L. Chen, Y. Feng,


H. Liu, Mit-sam: Medical image-text sam with mutually enhanced het-
erogeneous features fusion for medical image segmentation, IEEE Jour-
nal of Biomedical and Health Informatics (2025) 1–14doi:10.1109/
JBHI.2025.3561425.

[15] Y. Song, T. Wang, P. Cai, S. K. Mondal, J. P. Sahoo, A comprehensive


survey of few-shot learning: Evolution, applications, challenges, and
opportunities, ACM Computing Surveys 55 (13s) (2023) 1–40.

[16] R. Feng, X. Zheng, T. Gao, J. Chen, W. Wang, D. Z. Chen, J. Wu,


Interactive few-shot learning: Limited supervision, better medical image
segmentation, IEEE Transactions on Medical Imaging 40 (10) (2021)
2575–2588.

[17] H. Ding, C. Sun, H. Tang, D. Cai, Y. Yan, Few-shot medical image


segmentation with cycle-resemblance attention, in: Proceedings of the
IEEE/CVF Winter Conference on Applications of Computer Vision,
2023, pp. 2488–2497.

35
[18] Y. Lin, Y. Chen, K.-T. Cheng, H. Chen, Few shot medical image seg-
mentation with cross attention transformer, in: H. Greenspan, A. Mad-
abhushi, P. Mousavi, S. Salcudean, J. Duncan, T. Syeda-Mahmood,
R. Taylor (Eds.), Medical Image Computing and Computer Assisted In-
tervention – MICCAI 2023, Springer Nature Switzerland, Cham, 2023,
pp. 233–243.

[19] C. Ouyang, C. Biffi, C. Chen, T. Kart, H. Qiu, D. Rueckert, Self-


supervised learning for few-shot medical image segmentation, IEEE
Transactions on Medical Imaging 41 (7) (2022) 1837–1848. doi:10.
1109/TMI.2022.3150682.

[20] S. Hansen, S. Gautam, S. A. Salahuddin, M. Kampffmeyer, R. Jenssen,


Adnet++: A few-shot learning framework for multi-class medical im-
age volume segmentation with uncertainty-guided feature refinement,
Medical Image Analysis 89 (2023) 102870.

[21] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, S. Suesstrunk,


Slic superpixels compared to state-of-the-art superpixel methods, IEEE
TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE IN-
TELLIGENCE 34 (11) (2012) 2274–2281. doi:10.1109/TPAMI.2012.
120.

[22] A. Gu, T. Dao, Mamba: Linear-time sequence modeling with selective


state spaces (2024). arXiv:2312.00752.
URL https://arxiv.org/abs/2312.00752

[23] O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional net-


works for biomedical image segmentation, in: N. Navab, J. Horneg-
ger, W. Wells, A. Frangi (Eds.), MEDICAL IMAGE COMPUTING
AND COMPUTER-ASSISTED INTERVENTION, PT III, Vol. 9351
of Lecture Notes in Computer Science, Tech Univ Munich; Friedrich
Alexander Univ Erlangen Nuremberg, 2015, pp. 234–241, 18th In-
ternational Conference on Medical Image Computing and Computer-
Assisted Intervention (MICCAI), Munich, GERMANY, OCT 05-09,
2015. doi:10.1007/978-3-319-24574-4\_28.

[24] Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, J. Liang, Unet plus plus :


Redesigning skip connections to exploit multiscale features in image seg-

36
mentation, IEEE TRANSACTIONS ON MEDICAL IMAGING 39 (6)
(2020) 1856–1867. doi:10.1109/TMI.2019.2959609.

[25] F. Isensee, P. F. Jaeger, S. A. A. Kohl, J. Petersen, K. H. Maier-Hein,


nnu-net: a self-configuring method for deep learning-based biomedical
image segmentation, NATURE METHODS 18 (2) (2021) 203+. doi:
10.1038/s41592-020-01008-z.

[26] O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Mis-


awa, K. Mori, S. McDonagh, N. Y. Hammerla, B. Kainz, et al., At-
tention u-net: Learning where to look for the pancreas, arXiv preprint
arXiv:1804.03999 (2018).

[27] A. Shaban, S. Bansal, Z. Liu, I. Essa, B. Boots, One-shot learning for


semantic segmentation, arXiv preprint arXiv:1709.03410 (2017).

[28] N. Dong, E. P. Xing, Few-shot semantic segmentation with prototype


learning., in: BMVC, Vol. 3, 2018, p. 4.

[29] X. Zhang, Y. Wei, Y. Yang, T. S. Huang, Sg-one: Similarity guidance


network for one-shot semantic segmentation, IEEE transactions on cy-
bernetics 50 (9) (2020) 3855–3865.

[30] Y. Liu, X. Zhang, S. Zhang, X. He, Part-aware prototype network


for few-shot semantic segmentation, in: Computer Vision–ECCV 2020:
16th European Conference, Glasgow, UK, August 23–28, 2020, Proceed-
ings, Part IX 16, Springer, 2020, pp. 142–158.

[31] B. Yang, C. Liu, B. Li, J. Jiao, Q. Ye, Prototype mixture models for few-
shot semantic segmentation, in: Computer Vision–ECCV 2020: 16th
European Conference, Glasgow, UK, August 23–28, 2020, Proceedings,
Part VIII 16, Springer, 2020, pp. 763–778.

[32] G. Li, V. Jampani, L. Sevilla-Lara, D. Sun, J. Kim, J. Kim, Adaptive


prototype learning and allocation for few-shot segmentation, in: Pro-
ceedings of the IEEE/CVF conference on computer vision and pattern
recognition, 2021, pp. 8334–8343.

[33] Y. Yang, Q. Chen, Y. Feng, T. Huang, Mianet: Aggregating unbiased


instance and general information for few-shot semantic segmentation,

37
in: Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, 2023, pp. 7131–7140.
[34] L. Zhu, T. Chen, D. Ji, J. Ye, J. Liu, Llafs: When large language
models meet few-shot segmentation, in: Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, 2024, pp.
3065–3075.
[35] Y. Yang, Y. Gao, L. Wei, M. He, Y. Shi, H. Wang, Q. Li, Z. Zhu,
Self-support matching networks with multiscale attention for few-shot
semantic segmentation, NEUROCOMPUTING 594 (AUG 14 2024).
doi:10.1016/j.neucom.2024.127811.
[36] X. Wang, Q. Chen, Y. Yang, Word vector embedding and self-
supplementing network for generalized few-shot semantic segmentation,
NEUROCOMPUTING 613 (JAN 14 2025). doi:10.1016/j.neucom.
2024.128737.
[37] Z. Wu, S. Zhao, Y. Zhang, Y. Jin, Defectsam: Prototype prompt
guided sam for few-shot defect segmentation, IEEE TRANSACTIONS
ON INSTRUMENTATION AND MEASUREMENT 74 (2025). doi:
10.1109/TIM.2025.3548183.
[38] J. Wang, Y. Liu, Q. Zhou, Z. Wang, F. Wang, Few-shot semantic seg-
mentation on remote sensing images with learnable prototype, IEEE
TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 63
(2025). doi:10.1109/TGRS.2025.3568475.
[39] P. Teng, W. Liu, X. Wang, D. Wu, C. Yuan, Y. Cheng, D.-S. Huang,
Beyond singular prototype: A prototype splitting strategy for few-shot
medical image segmentation, NEUROCOMPUTING 597 (SEP 7 2024).
doi:10.1016/j.neucom.2024.127990.
[40] K. Tang, S. Wang, Y. Chen, Cross modulation and region contrast learn-
ing network for few-shot medical image segmentation, IEEE Signal Pro-
cessing Letters (2024).
[41] W. Gong, Y. Luo, F. Yang, H. Zhou, Z. Lin, C. Cai, Y. Lin, J. Chen,
Cgnet: Few-shot learning for intracranial hemorrhage segmentation,
COMPUTERIZED MEDICAL IMAGING AND GRAPHICS 121 (APR
2025). doi:10.1016/j.compmedimag.2025.102505.

38
[42] R. Wang, Q. Zhou, G. Zheng, Few-shot medical image segmentation
regularized with self-reference and contrastive learning, in: L. Wang,
Q. Dou, P. T. Fletcher, S. Speidel, S. Li (Eds.), Medical Image Com-
puting and Computer Assisted Intervention – MICCAI 2022, Springer
Nature Switzerland, Cham, 2022, pp. 514–523.

[43] Q. Shen, Y. Li, J. Jin, B. Liu, Q-net: Query-informed few-shot medical


image segmentation, in: K. Arai (Ed.), Intelligent Systems and Appli-
cations, Springer Nature Switzerland, Cham, 2024, pp. 610–628.

[44] Y. Zhu, S. Wang, T. Xin, H. Zhang, Few-shot medical image segmenta-


tion via a region-enhanced prototypical transformer, in: H. Greenspan,
A. Madabhushi, P. Mousavi, S. Salcudean, J. Duncan, T. Syeda-
Mahmood, R. Taylor (Eds.), Medical Image Computing and Computer
Assisted Intervention – MICCAI 2023, Springer Nature Switzerland,
Cham, 2023, pp. 271–280.

[45] R. Wang, G. Zheng, Pfmnet: Prototype-based feature mapping net-


work for few-shot domain adaptation in medical image segmentation,
COMPUTERIZED MEDICAL IMAGING AND GRAPHICS 116 (SEP
2024). doi:10.1016/j.compmedimag.2024.102406.

[46] Y. Shen, W. Fan, C. Wang, W. Liu, W. Wang, Q. Zhang, D. Zhou,


Dual-guided prototype alignment network for few-shot medical image
segmentation, IEEE Transactions on Instrumentation and Measurement
(2024).

[47] W. Huang, J. Hu, J. Xiao, Y. Wei, X. Bi, B. Xiao, Prototype-guided


graph reasoning network for few-shot medical image segmentation, IEEE
TRANSACTIONS ON MEDICAL IMAGING 44 (2) (2025) 761–773.
doi:10.1109/TMI.2024.3459943.

[48] K. I. Rashid, C. Yang, Vit-caps: Vision transformer with contrastive


adaptive prompt segmentation, NEUROCOMPUTING 625 (APR 7
2025). doi:10.1016/j.neucom.2025.129578.

[49] W. Zhou, G. Guan, W. Cui, Y. Yi, Biasam: Bidirectional-attention


guided segment anything model for very few-shot medical image seg-
mentation, IEEE SIGNAL PROCESSING LETTERS 32 (2025) 246–
250. doi:10.1109/LSP.2024.3513240.

39
[50] W. Zhou, G. Guan, Y. Gao, P. Si, M. Xu, Q. Yan, Masg-sam: Enhanc-
ing few-shot medical image segmentation with multi-scale attention and
semantic guidance, IEEE Journal of Biomedical and Health Informatics
(2025) 1–12doi:10.1109/JBHI.2025.3571430.

[51] T. Hospedales, A. Antoniou, P. Micaelli, A. Storkey, Meta-learning in


neural networks: A survey, IEEE transactions on pattern analysis and
machine intelligence 44 (9) (2021) 5149–5169.

[52] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image
recognition, in: 2016 IEEE CONFERENCE ON COMPUTER VI-
SION AND PATTERN RECOGNITION (CVPR), IEEE Conference
on Computer Vision and Pattern Recognition, IEEE Comp Soc; Comp
Vis Fdn, 2016, pp. 770–778, 2016 IEEE Conference on Computer Vi-
sion and Pattern Recognition (CVPR), Seattle, WA, JUN 27-30, 2016.
doi:10.1109/CVPR.2016.90.

[53] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan,


P. Dollar, C. L. Zitnick, Microsoft coco: Common objects in context,
in: D. Fleet, T. Pajdla, B. Schiele, T. Tuytelaars (Eds.), COMPUTER
VISION - ECCV 2014, PT V, Vol. 8693 of Lecture Notes in Computer
Science, 2014, pp. 740–755, 13th European Conference on Computer
Vision (ECCV), Zurich, SWITZERLAND, SEP 06-12, 2014. doi:10.
1007/978-3-319-10602-1\_48.

[54] D. Hendrycks, K. Gimpel, Gaussian error linear units (gelus) (2023).


arXiv:1606.08415.
URL https://arxiv.org/abs/1606.08415

[55] A. E. Kavur, N. S. Gezer, M. Barış, S. Aslan, P.-H. Conze,


V. Groza, D. D. Pham, S. Chatterjee, P. Ernst, S. Özkan, B. Bay-
dar, D. Lachinov, S. Han, J. Pauli, F. Isensee, M. Perkonigg,
R. Sathish, R. Rajan, D. Sheet, G. Dovletov, O. Speck, A. Nürn-
berger, K. H. Maier-Hein, G. Bozdağı Akar, G. Ünal, O. Dicle,
M. A. Selver, Chaos challenge - combined (ct-mr) healthy abdomi-
nal organ segmentation, Medical Image Analysis 69 (2021) 101950.
doi:https://doi.org/10.1016/j.media.2020.101950.
URL https://www.sciencedirect.com/science/article/pii/
S1361841520303145

40
[56] B. Landman, Z. Xu, J. Igelsias, M. Styner, T. Langerak, A. Klein,
Miccai multi-atlas labeling beyond the cranial vault–workshop and
challenge, in: Proc. MICCAI Multi-Atlas Labeling Beyond Cranial
Vault—Workshop Challenge, Vol. 5, 2015, p. 12.

[57] X. Zhuang, Multivariate mixture model for myocardial segmentation


combining multi-source images, IEEE transactions on pattern analysis
and machine intelligence 41 (12) (2018) 2933–2946.

[58] S. Hansen, S. Gautam, R. Jenssen, M. Kampffmeyer, Anomaly


detection-inspired few-shot medical image segmentation through self-
supervision with supervoxels, Medical Image Analysis 78 (2022) 102385.

[59] H. Robbins, S. Monro, A stochastic approximation method, The annals


of mathematical statistics (1951) 400–407.

[60] Q. Xiang, X. Wang, Y. Song, L. Lei, Isonet: Reforming 1dcnn for


aero-engine system inter-shaft bearing fault diagnosis via input spatial
over-parameterization, EXPERT SYSTEMS WITH APPLICATIONS
277 (JUN 5 2025). doi:10.1016/j.eswa.2025.127248.

[61] L. van der Maaten, G. Hinton, Visualizing data using t-sne, JOURNAL
OF MACHINE LEARNING RESEARCH 9 (2008) 2579–2605.

[62] Q. Xiang, X. Wang, J. Lai, L. Lei, Y. Song, J. He, R. Li, Quadruplet


depth-wise separable fusion convolution neural network for ballistic tar-
get recognition with limited samples, EXPERT SYSTEMS WITH AP-
PLICATIONS 235 (JAN 2024). doi:10.1016/j.eswa.2023.121182.

[63] D. Kinga, J. B. Adam, et al., A method for stochastic optimization, in:


International conference on learning representations (ICLR), Vol. 5, San
Diego, California;, 2015.

[64] Q. Xiang, X. Wang, L. Lei, Y. Song, Dynamic bound adaptive gradient


methods with belief in observed gradients, PATTERN RECOGNITION
168 (DEC 2025). doi:10.1016/j.patcog.2025.111819.

[65] S. Woo, J. Park, J.-Y. Lee, I. S. Kweon, Cbam: Convolutional block


attention module, in: V. Ferrari, M. Hebert, C. Sminchisescu, Y. Weiss
(Eds.), COMPUTER VISION - ECCV 2018, PT VII, Vol. 11211 of
Lecture Notes in Computer Science, 2018, pp. 3–19, 15th European

41
Conference on Computer Vision (ECCV), Munich, GERMANY, SEP
08-14, 2018. doi:10.1007/978-3-030-01234-2\_1.

[66] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N.


Gomez, L. Kaiser, I. Polosukhin, Attention is all you need, in: I. Guyon,
U. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Gar-
nett (Eds.), ADVANCES IN NEURAL INFORMATION PROCESSING
SYSTEMS 30 (NIPS 2017), Vol. 30 of Advances in Neural Information
Processing Systems, 2017, 31st Annual Conference on Neural Informa-
tion Processing Systems (NIPS), Long Beach, CA, DEC 04-09, 2017.

[67] J. M. J. Valanarasu, P. Oza, I. Hacihaliloglu, V. M. Patel, Medi-


cal transformer: Gated axial-attention for medical image segmenta-
tion, in: M. DeBruijne, P. Cattin, S. Cotin, N. Padoy, S. Speidel,
Y. Zheng, C. Essert (Eds.), MEDICAL IMAGE COMPUTING AND
COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT I, Vol.
12901 of Lecture Notes in Computer Science, 2021, pp. 36–46, interna-
tional Conference on Medical Image Computing and Computer Assisted
Intervention (MICCAI), ELECTR NETWORK, SEP 27-OCT 01, 2021.
doi:10.1007/978-3-030-87193-2\_4.

Siqi Wang received the M.S. degree from Northeastern Uni-


versity in 2021. He is currently pursuing the Ph.D. degree in
the Faculty of Robot Science and Engineering, Northeastern
University, Shenyang, China. His research interests include
medical image segmentation, deep learning, and feature fu-
sion.

Xiaosheng Yu received the Ph.D. degree from Northeastern


University, China, in 2015. He is an associate professor with
the Faculty of Robot Science and Engineering, Northeastern
University, Shenyang, China. His research interests include
medical image processing, visual significance detection, and
abnormal event detection.

42
Jianning Chi received the Ph.D. degree in Computer Sci-
ence from the University of Saskatchewan, Canada, in 2017.
He is an associate professor with the Faculty of Robot Sci-
ence and Engineering, Northeastern University, Shenyang,
China. His research interests include image quality enhance-
ment, object recognition, and scene understanding.

Chengdong Wu received the M.S. degree from Tsinghua


University in 1988, and the Ph.D. degree in Industrial Au-
tomation, Northeastern University, Shenyang, China, in
1994. He is a professor with the Faculty of Robot Science and
Engineering, Northeastern University, Shenyang, China. His
research interests include machine vision technology, wireless
sensor networks, and intelligent image processing.

Xiujing Gao received the Ph.D. degree from the Tokyo


University of Marine Science and Technology, Japan. He
is currently a professor with the College of Intelligent Ma-
rine Science and Technology, Fujian University of Technol-
ogy, Fuzhou, China. His research interests include underwa-
ter robotics, unmanned vessels, and autonomous navigation.

43
Declaration of interests

☒ The authors declare that they have no known competing financial interests or personal relationships
that could have appeared to influence the work reported in this paper.

☐ The authors declare the following financial interests/personal relationships which may be considered
as potential competing interests:
Siqi Wang received the M.S. degree from Northeastern Uni-
versity in 2021. He is currently pursuing the Ph.D. degree in
the Faculty of Robot Science and Engineering, Northeastern
University, Shenyang, China. His research interests include
medical image segmentation, deep learning, and feature fu-
sion.

Xiaosheng Yu received the Ph.D. degree from Northeastern


University, China, in 2015. He is an associate professor with
the Faculty of Robot Science and Engineering, Northeastern
University, Shenyang, China. His research interests include
medical image processing, visual significance detection, and
abnormal event detection.

Jianning Chi received the Ph.D. degree in Computer Sci-


ence from the University of Saskatchewan, Canada, in 2017.
He is an associate professor with the Faculty of Robot Sci-
ence and Engineering, Northeastern University, Shenyang,
China. His research interests include image quality enhance-
ment, object recognition, and scene understanding.

Chengdong Wu received the M.S. degree from Tsinghua


University in 1988, and the Ph.D. degree in Industrial Au-
tomation, Northeastern University, Shenyang, China, in
1994. He is a professor with the Faculty of Robot Science and
Engineering, Northeastern University, Shenyang, China. His
research interests include machine vision technology, wireless
sensor networks, and intelligent image processing.

1
Xiujing Gao received the Ph.D. degree from the Tokyo
University of Marine Science and Technology, Japan. He
is currently a professor with the College of Intelligent Ma-
rine Science and Technology, Fujian University of Technol-
ogy, Fuzhou, China. His research interests include underwa-
ter robotics, unmanned vessels, and autonomous navigation.

You might also like