0% found this document useful (0 votes)

89 views11 pages

Mamba-UNet: Enhanced Medical Image Segmentation

Mamba-UNet is a novel architecture for medical image segmentation that combines the U-Net structure with Visual Mamba blocks to enhance long-range dependency modeling. The proposed model outperforms traditional U-Net and Swin-UNet architectures in segmentation tasks, as demonstrated through experiments on a publicly available MRI cardiac segmentation dataset. Mamba-UNet's design includes an encoder-decoder structure with skip connections, facilitating effective feature learning and improved segmentation performance.

Uploaded by

wxwu3219

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views11 pages

Mamba-UNet: Enhanced Medical Image Segmentation

Uploaded by

wxwu3219

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Mamba-UNet: UNet-Like Pure Visual Mamba for

Medical Image Segmentation

Ziyang Wang1 , Jian-Qing Zheng1 , Yichi Zhang2 , Ge Cui3 , Lei Li4

1
University of Oxford, UK
2
Fudan University, China
3
University of Pittsburgh, USA
arXiv:2402.05079v1 [eess.IV] 7 Feb 2024

4
University of Copenhagen, DK
[email protected]

Abstract. In recent advancements in medical image analysis, Convo-

lutional Neural Networks (CNN) and Vision Transformers (ViT) have
set significant benchmarks. While the former excels in capturing local
features through its convolution operations, the latter achieves remark-
able global context understanding by leveraging self-attention mecha-
nisms. However, both architectures exhibit limitations in efficiently mod-
eling long-range dependencies within medical images, which is a critical
aspect for precise segmentation. Inspired by the Mamba architecture,
known for its proficiency in handling long sequences and global con-
textual information with enhanced computational efficiency as a State
Space Model (SSM), we propose Mamba-UNet, a novel architecture that
synergizes the U-Net in medical image segmentation with Mamba’s ca-
pability. Mamba-UNet adopts a pure Visual Mamba (VMamba)-based
encoder-decoder structure, infused with skip connections to preserve spa-
tial information across different scales of the network. This design facili-
tates a comprehensive feature learning process, capturing intricate details
and broader semantic contexts within medical images. We introduce a
novel integration mechanism within the VMamba blocks to ensure seam-
less connectivity and information flow between the encoder and decoder
paths, enhancing the segmentation performance. We conducted experi-
ments on publicly available MRI cardiac multi-structures segmentation
dataset. The results show that Mamba-UNet outperforms UNet, Swin-
UNet in medical image segmentation under the same hyper-parameter
setting 1 . The source code and baseline implementations are available at
https://github.com/ziyangwang007/Mamba-UNet.

Keywords: Medical Image Segmentation · Convolution · Transformer ·

Mamba · State Space Models.

1 Introduction
Medical image segmentation is essential for diagnostics and treatments, and
deep learning-based networks have shown dominate performance in this field
1
The hyper-parameter setting includes: loss function, optimizer, training iterations,
batch size, learning rate, same data splitting, etc.
2 Ziyang Wang, Jian-Qing Zheng, Yichi Zhang, Ge Cui, Lei Li

[20]. U-Net is one of the most essential architectures known for its symmetri-
cal encoder-decoder style architecture and skip connections [24], where various
encoders and decoders extract feature information on different level, and skip
connections enable the efficient transformation of feature information. Most of
studies further explore U-Net with advanced network blocks techniques such as
dense connections [12], residual blocks [10], attention mechanisms [30], depthwise
convolutions [11], and atrous convolutions [33,35], resulting in various modified
UNet in CT, MRI, Ultrasound medical image segmentation [23,13,15,29,34,36].

Fig. 1. A brief introduction of the evolution of recent developments of UNet with

incorporation of Transformer and State Space Models (SSM) for medical image seg-
mentation.

Motivated by the success of self-attention mechanisms from natural language

processing [26], ViT was the first to utilize a pure multi-head self-attention
mechanism for the image recognition task with the state-of-the-art performance
[5]. This showcase its promising capabilities in modeling long-range dependen-
cies. Techniques like shift windows have further tailored ViT, resulting in Swin-
Transformer [18], which enhances their applicability in dense prediction tasks in
computer vision, such as image segmentation, and detection [19,31,17]. In medi-
cal image segmentation, the integration of ViT with U-Net architectures, inspired
by traditional CNN designs, has also led to various hybrid and pure ViT-based
U-Nets. For instance, TransUNet is the first work to harness the feature learn-
ing power of ViT in the encoders of UNet [4]. UNETR combines ViT with UNet
for 3D segmentation [9], while Swin-UNet and DCSUnet further explore purely
Swin Vision Transformer network blocks with U-Net-based structure [3,28].
While Transformers excel in capturing long-range dependencies, their high
computational cost, due to the quadratic scaling of the self-attention mechanism
with input size, poses a challenge, particularly for high-resolution biomedical
images [32,21]. Recent developments in State Space Models (SSMs) [6,22,27],
especially Structured SSMs (S4) [8], offer a promising solution with their efficient
Mamba-UNet for Medical Image Segmentation 3

performance in processing long sequences. The Mamba model enhances S4 with

a selective mechanism and hardware optimization, showing superior performance
in dense data domains [7]. The introduction of the Cross-Scan Module (CSM) in
the Visual State Space Model (VMamba) further enhances Mamba’s applicability
to computer vision tasks by enabling the traversal of the spatial domain and
converting non-causal visual images into ordered patch sequences [16]. Inspired
by these capabilities, we propose leveraging Visual Mamba blocks (VSS) within
the U-Net architecture to improve long-range dependency modeling in medical
image analysis, resulting in Mamba-UNet. The evolution of U-Net with various
network blocks and the positioning of our proposed Mamba-UNet are briefly
illustrated in Figure 1.

Fig. 2. The architecture of Mamba-UNet, which is composed of encoder, bottleneck,

decoder and skip connections. The encoder, bottleneck and decoder are all constructed
based on Visual Mamba block.

2 Approach
2.1 Architecture Overview
The architecture of the proposed Mamba-UNet is sketched in Figure 2, which is
motivated by UNet [24] and Swin-UNet [3]. The input 2D grey-scale image with
4 Ziyang Wang, Jian-Qing Zheng, Yichi Zhang, Ge Cui, Lei Li

the size of H × W × 1 is firstly spited into patch similar to ViT and VMamba
[5,16] then to 1-D sequence with the dimensions of H4 × W 4 × 16. An initial
linear embedding layer adjusts feature dimensions to an arbitrary size denoted
as C. These patch tokens are then processed through multiple VSS blocks and
patch merging layers, creating hierarchical features. Patch merging layers handle
downsampling and dimension increase, while VSS blocks focus on learning fea-
ture representations. The output of each stage of encoder is with the resolution
of H4 × W H W H W H W
4 × C, 8 × 8 × 2C, 16 × 16 × 4C, and 32 × 32 × 8C, respectively.
The decoder comprises VSS blocks and patch expanding layers following the
encoder style enable the exact same feature size output, thus enhancing spatial
details lost in downsampling through skip connections. In each of encoder and
decoder, 2 VSS blocks are utilized, and the pretrained VMamba-Tiny [16] is
loaded in the encoder, following the same process that Swin-UNet load the pre-
trained SwinViT-Tiny [3]. The details of VSS block, patch merging of encoder,
and patch expanding of decoder is discussed in the following subsections.

2.2 VSS Block

The VSS network block is illustrated in Figure 3, which is mainly based on
Visual Mamba [16]. In the VSS block, the input feature first encounters a linear
embedding layer, then bifurcates into dual pathways. One branch undergoes
depth-wise convolution [11] and SiLU activation [25], proceeding to the SS2D
module, and post-layer normalization, merges with the alternate stream post-
SiLU activation. This VSS block eschews positional embedding, unlike typical
vision transformers, opting for a streamlined structure sans the MLP phase,
enabling a denser stack of blocks within the same depth budget.

Fig. 3. The detailed structure of the Visual State Space (VSS) Block.
Mamba-UNet for Medical Image Segmentation 5

2.3 Encoder

In the encoder, C-dimensional tokenized inputs with reduced resolution undergo

two consecutive VSS blocks for feature learning, maintaining dimension and res-
olution. The patch merging as downsampling process is utilized for three times
in the encoder of Mamba-UNet, reduces the token count by 12 while doubling fea-
ture dimensions by 2×, by segmenting inputs into quadrants by 14 , concatenating
them, and then normalizing dimensions through a layernorm each time.

2.4 Decoder

Mirroring the encoder, the decoder utilizes two successive VSS blocks for fea-
ture reconstruction, employing patch expanding layers instead of merging ones
for upscaling deep features [3]. These layers enhance resolution (2× upscaling)
while halving feature dimensions by 12 , exemplified by an initial layer that dou-
bles feature dimensions before reorganizing and reducing them for resolution
enhancement.

2.5 Bottleneck & Skip Connetions

Two VSS blocks are utilized for the bottleneck of Mamba-UNet. Each level of
encoder and decoder employs skip connections to blend multi-scale features with
upscaled outputs, enhancing spatial detail by merging shallow and deep layers.
A subsequent linear layer maintains the dimensionality of this integrated feature
set, ensuring consistency with the upscaled resolution.

3 Experiments and Results

3.1 Data Sets

We conducted our experiments using the publicly available ACDC MRI cardiac
segmentation dataset from the MICCAI 2017 Challenge [1]. This dataset com-
prises MRI scans from 100 patients, annotated for multiple cardiac structures
such as the right ventricle, and both the endocardial and epicardial walls of the
left ventricle. It encompasses a diverse range of pathological conditions, catego-
rized into five subgroups: normal, myocardial infarction, dilated cardiomyopathy,
hypertrophic cardiomyopathy, and abnormal right ventricle, ensuring a broad
distribution of feature information. To comply with the input requirements of
the ViT segmentation backbone network, all images were resized to 224×224.
The dataset was partitioned such that 20% of the images were allocated to the
testing set, with the remainder used for training (including validation).
6 Ziyang Wang, Jian-Qing Zheng, Yichi Zhang, Ge Cui, Lei Li

3.2 Implementation Details

The implementation was carried out on an Ubuntu 20.04 system, using Python
3.8.8, PyTorch 1.10, and CUDA 11.3. The hardware setup included an Nvidia
GeForce RTX 3090 GPU and an Intel Core i9-10900K CPU. The average run-
time was approximately 5 hours, encompassing data transfer, model training,
and inference processes. The dataset was specifically processed for 2D image
segmentation. The Mamba-UNet model underwent training for 10,000 iterations
with a batch size of 24. The Stochastic Gradient Descent (SGD) optimizer [2]
was employed with a learning rate of 0.01, momentum of 0.9, and weight decay
set to 0.0001. Network performance was evaluated on the validation set every
200 iterations, with model weights being saved only upon achieving a new best
performance on the validation set.

3.3 Baseline Methods

For comparative purposes, UNet and Swin-UNet were also trained under identi-
cal hyperparameter configurations. The Mamba-UNet, along with other baseline
methods including UNet [24] and Swin-UNet [3] are directly compared.

3.4 Evaluation Metrics

The assessment of Mamba-UNet against baseline methods utilizes a broad spec-
trum of evaluation metrics. Similarity measures, which are preferred to be higher,
include: Dice, Intersection over Union (IoU), Accuracy, Precision, Sensitivity, and
Specificity, denoted with an upward arrow (↑) to indicate that higher values sig-
nify better performance. Conversely, difference measures such as the Hausdorff
Distance (HD) 95% and Average Surface Distance (ASD), marked with a down-
ward arrow (↓), are desirable when lower, indicating closer resemblance between
the predicted and ground truth segmentations.
2 × TP
Dice = (1)
2 × TP + FP + FN
TP + TN
Accuracy = (2)
TP + TN + FP + FN
TP
Precision = (3)
TP + FP
TP
Sensitivity = (4)
TP + FN
TN
Specificity = (5)
TN + FP
Where, T P represents the number of true positives, T N denotes the number
of true negatives, F P signifies the number of false positives, and F N stands for
the number of false negatives.
Mamba-UNet for Medical Image Segmentation 7

Hausdorff Distance (HD) 95% = max max min d(a, b), max min d(a, b)
a∈A b∈B b∈B a∈A 95%
(6)

!
1 X X
Average Surface Distance (ASD) = min d(a, b) + min d(a, b)
|A| + |B| b∈B a∈A
a∈A b∈B
(7)
Where, a and b represent the sets of points on the predicted and ground
truth surfaces, respectively. d(a, b) denotes the Euclidean distance between two
points. 95% is a modified version of the Hausdorff Distance, focusing on the 95th
percentile of the distances to reduce the impact of outliers.

3.5 Qualitative Results

Figure 4 illustrates three randomly selected sample raw images, corresponding

inference against the publish ground truth of all baseline methods including
Mamba-UNet, where different colours demonstrating the boundary of ground
truth.

3.6 Quantitative Results

Table 1 reports the direct comparison of Mamba-UNet against other segmenta-

tion networks including similarity measures and difference measures. The best
performance is with Bold, and the second best performance of Mamba-UNet is
with Underline. Quantitative results demonstrates that Mamba-UNet is morel
likely to predict precise segmentation masks. To further validate the Mamba-
UNet on test set, we also validate on the image by image fashion, and the dis-
tribution of segmentation prediction according to Dice-Coefficient is sketched
in Figure 5, where the X-axis is the Dice-Coefficient, and Y-axis is the number
of predictions. This histogram further demonstrates that Mamba-UNet is more
likely to provide prediction with high Dice-Coefficient performance.

Table 1. Direct Comparison of Segmentation Networks Performance on MRI Cardiac

Test Set

Framework Dice↑ IoU↑ Acc↑ Pre↑ Sen↑ Spe↑ HD↓ ASD↓

UNet [24] 0.9248 0.8645 0.9969 0.9157 0.9364 0.9883 2.7655 0.8180
Swin-UNet [3] 0.9188 0.8545 0.9968 0.9151 0.9231 0.9857 3.1817 0.9932
Mamba-UNet 0.9281 0.8698 0.9972 0.9275 0.9289 0.9859 2.4645 0.7677
8 Ziyang Wang, Jian-Qing Zheng, Yichi Zhang, Ge Cui, Lei Li

Fig. 4. The visual comparison of segmentation results of Mamba-UNet and other seg-
mentation methods against ground truth. (a) Raw MRI Image, (b) Mamba-UNet, (c)
UNet, (d) Swin-UNet.

Fig. 5. The histogram of the Dice distribution of Mamba-UNet and other segmentation
methods against ground truth.
Mamba-UNet for Medical Image Segmentation 9

4 Conclusion

In this paper, we introduced Mamba-UNet, which is a purely Visual Mamba

block-based UNet style network for medical image segmentation. The perfor-
mance demonstrates that Mamba-UNet superior performance against classical
similar network such as UNet and Swin-UNet. In the future, we aim to con-
duct more in-depth explorations on more medical image segmentation tasks
from different modalities and targets, with comparisons to more segmentation
backbones. Besides, we aim to extend Mamba-UNet to 3D medical images, and
semi/weakly-supervised learning [14] to further enhance the developments in
medical imaging.

References
1. Bernard, O., et al.: Deep learning techniques for automatic mri cardiac multi-
structures segmentation and diagnosis: is the problem solved? IEEE transactions
on medical imaging 37(11), 2514–2525 (2018)
2. Bottou, L.: Stochastic gradient learning in neural networks. In: Proceedings of
Neuro-Nîmes 91. EC2, Nimes, France (1991)
3. Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-
unet: Unet-like pure transformer for medical image segmentation. In: European
conference on computer vision. pp. 205–218. Springer (2022)
4. Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., Yuille, A.L., Zhou,
Y.: Transunet: Transformers make strong encoders for medical image segmentation.
arXiv preprint arXiv:2102.04306 (2021)
5. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner,
T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is
worth 16x16 words: Transformers for image recognition at scale. arXiv preprint
arXiv:2010.11929 (2020)
6. Gu, A.: Modeling Sequences with Structured State Spaces. Ph.D. thesis, Stanford
University (2023)
7. Gu, A., Dao, T.: Mamba: Linear-time sequence modeling with selective state
spaces. arXiv preprint arXiv:2312.00752 (2023)
8. Gu, A., Goel, K., Ré, C.: Efficiently modeling long sequences with structured state
spaces. arXiv preprint arXiv:2111.00396 (2021)
9. Hatamizadeh, A., Tang, Y., Nath, V., Yang, D., Myronenko, A., Landman, B.,
Roth, H.R., Xu, D.: Unetr: Transformers for 3d medical image segmentation. In:
Proceedings of the IEEE/CVF winter conference on applications of computer vi-
sion. pp. 574–584 (2022)
10. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In:
Proceedings of the IEEE conference on computer vision and pattern recognition.
pp. 770–778 (2016)
11. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., An-
dreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for
mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
12. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected
convolutional networks. In: Proceedings of the IEEE conference on computer vision
and pattern recognition. pp. 4700–4708 (2017)
10 Ziyang Wang, Jian-Qing Zheng, Yichi Zhang, Ge Cui, Lei Li

13. Ibtehaz, N., Rahman, M.S.: Multiresunet: Rethinking the u-net architecture for
multimodal biomedical image segmentation. Neural networks 121, 74–87 (2020)
14. Jiao, R., Zhang, Y., Ding, L., Xue, B., Zhang, J., Cai, R., Jin, C.: Learning with
limited annotations: A survey on deep semi-supervised learning for medical image
segmentation. Computers in Biology and Medicine (2023)
15. Li, X., Chen, H., Qi, X., Dou, Q., Fu, C.W., Heng, P.A.: H-denseunet: hybrid
densely connected unet for liver and tumor segmentation from ct volumes. IEEE
transactions on medical imaging 37(12), 2663–2674 (2018)
16. Liu, Y., Tian, Y., Zhao, Y., Yu, H., Xie, L., Wang, Y., Ye, Q., Liu, Y.: Vmamba:
Visual state space model. arXiv preprint arXiv:2401.10166 (2024)
17. Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., Dong,
L., et al.: Swin transformer v2: Scaling up capacity and resolution. In: Proceedings
of the IEEE/CVF conference on computer vision and pattern recognition. pp.
12009–12019 (2022)
18. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin
transformer: Hierarchical vision transformer using shifted windows. In: Proceedings
of the IEEE/CVF international conference on computer vision. pp. 10012–10022
(2021)
19. Liu, Z., Ning, J., Cao, Y., Wei, Y., Zhang, Z., Lin, S., Hu, H.: Video swin trans-
former. In: Proceedings of the IEEE/CVF conference on computer vision and pat-
tern recognition. pp. 3202–3211 (2022)
20. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic
segmentation. In: Proceedings of the IEEE conference on computer vision and
pattern recognition. pp. 3431–3440 (2015)
21. Ma, J., Li, F., Wang, B.: U-mamba: Enhancing long-range dependency for biomed-
ical image segmentation. arXiv preprint arXiv:2401.04722 (2024)
22. Mehta, H., Gupta, A., Cutkosky, A., Neyshabur, B.: Long range language modeling
via gated state spaces. arXiv preprint arXiv:2206.13947 (2022)
23. Oktay, O., Schlemper, J., Folgoc, L.L., Lee, M., Heinrich, M., Misawa, K., Mori,
K., McDonagh, S., Hammerla, N.Y., Kainz, B., et al.: Attention u-net: Learning
where to look for the pancreas. arXiv preprint arXiv:1804.03999 (2018)
24. Ronneberger, O., et al: U-net: Convolutional networks for biomedical image seg-
mentation. In: MICCAI (2015)
25. Shazeer, N.: Glu variants improve transformer. arXiv preprint arXiv:2002.05202
(2020)
26. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser,
Ł., Polosukhin, I.: Attention is all you need. Advances in neural information pro-
cessing systems 30 (2017)
27. Wang, J., Zhu, W., Wang, P., Yu, X., Liu, L., Omar, M., Hamid, R.: Selective
structured state-spaces for long-form video understanding. In: Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6387–
6397 (2023)
28. Wang, Z., Su, M., Zheng, J.Q., Liu, Y.: Densely connected swin-unet for multiscale
information aggregation in medical image segmentation. In: 2023 IEEE Interna-
tional Conference on Image Processing (ICIP). pp. 940–944. IEEE (2023)
29. Wang, Z., Zhang, Z., Voiculescu, I.: Rar-u-net: a residual encoder to attention
decoder by residual connections framework for spine segmentation under noisy
labels. In: 2021 IEEE International Conference on Image Processing (ICIP). pp.
21–25. IEEE (2021)
Mamba-UNet for Medical Image Segmentation 11

30. Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: Convolutional block attention
module. In: Proceedings of the European conference on computer vision (ECCV).
pp. 3–19 (2018)
31. Xie, Z., Lin, Y., Yao, Z., Zhang, Z., Dai, Q., Cao, Y., Hu, H.: Self-supervised
learning with swin transformers. arXiv preprint arXiv:2105.04553 (2021)
32. Xing, Z., Ye, T., Yang, Y., Liu, G., Zhu, L.: Segmamba: Long-range sequential mod-
eling mamba for 3d medical image segmentation. arXiv preprint arXiv:2401.13560
(2024)
33. Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv
preprint arXiv:1511.07122 (2015)
34. Zhang, Y., Yuan, L., Wang, Y., Zhang, J.: Sau-net: efficient 3d spine mri segmen-
tation using inter-slice attention. In: Medical Imaging With Deep Learning. pp.
903–913. PMLR (2020)
35. Zhou, X.Y., Zheng, J.Q., Li, P., Yang, G.Z.: Acnn: a full resolution dcnn for med-
ical image segmentation. In: 2020 IEEE International Conference on Robotics and
Automation (ICRA). pp. 8455–8461. IEEE (2020)
36. Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., Liang, J.: Unet++: Redesigning skip
connections to exploit multiscale features in image segmentation. IEEE transac-
tions on medical imaging 39(6), 1856–1867 (2019)

Mamba-Unet: Unet-Like Pure Visual Mamba For Medical Image Segmentation
No ratings yet
Mamba-Unet: Unet-Like Pure Visual Mamba For Medical Image Segmentation
12 pages
VM-UNet: SSM-Based Medical Segmentation
No ratings yet
VM-UNet: SSM-Based Medical Segmentation
9 pages
RM UNetUNet LikeMambawithrotationalSSMmoduleformedical
No ratings yet
RM UNetUNet LikeMambawithrotationalSSMmoduleformedical
17 pages
TBConvL-Net A Hybrid Deep Learning Architecture For Robust Medical Image Segmentation - Main Ver
No ratings yet
TBConvL-Net A Hybrid Deep Learning Architecture For Robust Medical Image Segmentation - Main Ver
12 pages
Swin-Unet: Unet-Like Pure Transformer For Medical Image Segmentation
No ratings yet
Swin-Unet: Unet-Like Pure Transformer For Medical Image Segmentation
14 pages
参考-Swin-UMamba-Mamba-based UNet with IMAGENET预训练模型
No ratings yet
参考-Swin-UMamba-Mamba-based UNet with IMAGENET预训练模型
11 pages
U-KAN Makes Strong Backbone For Medical Image Segmentation and Generation
No ratings yet
U-KAN Makes Strong Backbone For Medical Image Segmentation and Generation
14 pages
VT-UNet: Volumetric Transformer for Segmentation
No ratings yet
VT-UNet: Volumetric Transformer for Segmentation
12 pages
Image Segmentation New
No ratings yet
Image Segmentation New
11 pages
Paper 2
No ratings yet
Paper 2
13 pages
UNETR: Transformers For 3D Medical Image Segmentation
No ratings yet
UNETR: Transformers For 3D Medical Image Segmentation
11 pages
UNetFormer: 3D Medical Image Segmentation
No ratings yet
UNetFormer: 3D Medical Image Segmentation
12 pages
Transunet: Transformers Make Strong Encoders For Medical Image Segmentation
No ratings yet
Transunet: Transformers Make Strong Encoders For Medical Image Segmentation
13 pages
Simple Vision Transformer for Localization
No ratings yet
Simple Vision Transformer for Localization
12 pages
Vivim: Efficient Medical Video Segmentation
No ratings yet
Vivim: Efficient Medical Video Segmentation
7 pages
Multi-Axis Vision Transformer For Medical Image Segmentation
No ratings yet
Multi-Axis Vision Transformer For Medical Image Segmentation
49 pages
Enhancing Retinal Vascular Structure Segmentation in Images With A Novel Design Two-Path Interactive Fusion Module Model
No ratings yet
Enhancing Retinal Vascular Structure Segmentation in Images With A Novel Design Two-Path Interactive Fusion Module Model
14 pages
SAM-VMNet for Coronary Angiography Segmentation
No ratings yet
SAM-VMNet for Coronary Angiography Segmentation
12 pages
MambaVision: NVIDIA's Hybrid Vision Model
No ratings yet
MambaVision: NVIDIA's Hybrid Vision Model
8 pages
Mambavision: A Hybrid Mamba-Transformer Vision Backbone: Ali Hatamizadeh, Jan Kautz Nvidia
No ratings yet
Mambavision: A Hybrid Mamba-Transformer Vision Backbone: Ali Hatamizadeh, Jan Kautz Nvidia
13 pages
JPM 13 01298
No ratings yet
JPM 13 01298
23 pages
Multi-scale Vision Transformer for Medical Segmentation
No ratings yet
Multi-scale Vision Transformer for Medical Segmentation
19 pages
UNesT - Local Spatial Representation Learning With Hierarchical Transformer For Efficient Medical Segmentation
No ratings yet
UNesT - Local Spatial Representation Learning With Hierarchical Transformer For Efficient Medical Segmentation
21 pages
Mamba Survey
No ratings yet
Mamba Survey
20 pages
Artigo de UNETMER UNet
No ratings yet
Artigo de UNETMER UNet
10 pages
Self-Supervised Pre-Training of Swin Transformers For 3D Medical Image Analysis
No ratings yet
Self-Supervised Pre-Training of Swin Transformers For 3D Medical Image Analysis
11 pages
Transformer-Based Innovations in Medical Image Segmentation: A Mini Review
No ratings yet
Transformer-Based Innovations in Medical Image Segmentation: A Mini Review
21 pages
Transdeeplab: Convolution-Free Transformer-Based Deeplab V3+ For Medical Image Segmentation
No ratings yet
Transdeeplab: Convolution-Free Transformer-Based Deeplab V3+ For Medical Image Segmentation
13 pages
Diffusion Unet
No ratings yet
Diffusion Unet
10 pages
Liu 等 - VMamba Visual State Space Model
No ratings yet
Liu 等 - VMamba Visual State Space Model
33 pages
MISSFormer An Effective Transformer For 2D Medical Image Segmentation
No ratings yet
MISSFormer An Effective Transformer For 2D Medical Image Segmentation
12 pages
Advantages of Transformer and Its Application For Medical Image Segmentation: A Survey
No ratings yet
Advantages of Transformer and Its Application For Medical Image Segmentation: A Survey
22 pages
C - E E: A C S S S M M I A: Omputation Fficient RA Omprehensive Urvey of Tate Pace Odels in Edical Mage Nalysis
No ratings yet
C - E E: A C S S S M M I A: Omputation Fficient RA Omprehensive Urvey of Tate Pace Odels in Edical Mage Nalysis
29 pages
D-Former: A U-Shaped Dilated Transformer For 3D Medical Image Segmentation
No ratings yet
D-Former: A U-Shaped Dilated Transformer For 3D Medical Image Segmentation
14 pages
PlainMamba: Efficient Visual Recognition Model
No ratings yet
PlainMamba: Efficient Visual Recognition Model
22 pages
Paper of Rolling Net
No ratings yet
Paper of Rolling Net
9 pages
U-Net Biomedical Segmentation Updated
No ratings yet
U-Net Biomedical Segmentation Updated
8 pages
Asp-Vmunet: Atrous Shifted Parallel Vision Mamba U-Net For Skin Lesion Segmentation
No ratings yet
Asp-Vmunet: Atrous Shifted Parallel Vision Mamba U-Net For Skin Lesion Segmentation
12 pages
1 s2.0 S1566253525005792 Main
No ratings yet
1 s2.0 S1566253525005792 Main
58 pages
Transfer Learning U-Net Deep Learning For Lung Ultrasound Segmentation
No ratings yet
Transfer Learning U-Net Deep Learning For Lung Ultrasound Segmentation
14 pages
TBConvL-Net A Hybrid Deep Learning Architecture For Robust Medical Image Segmentation - Test Ver
No ratings yet
TBConvL-Net A Hybrid Deep Learning Architecture For Robust Medical Image Segmentation - Test Ver
6 pages
A Novel Deep Learning Model For Medical Image Segmentation With Convolutional Neural Network and Transformer
No ratings yet
A Novel Deep Learning Model For Medical Image Segmentation With Convolutional Neural Network and Transformer
15 pages
Paper 1
No ratings yet
Paper 1
14 pages
U-Net-Based Medical Image Segmentation
No ratings yet
U-Net-Based Medical Image Segmentation
16 pages
Trans Unet
No ratings yet
Trans Unet
4 pages
CNN and Vit Efficiency Study On Tiny Imagenet and Dermamnist Datasets
No ratings yet
CNN and Vit Efficiency Study On Tiny Imagenet and Dermamnist Datasets
9 pages
Vision Mamba: Efficient Visual Learning
No ratings yet
Vision Mamba: Efficient Visual Learning
14 pages
Modality Preserving U-Net For Segmentation of Multimodal Medical Images
No ratings yet
Modality Preserving U-Net For Segmentation of Multimodal Medical Images
16 pages
H2Former An Efficient Hierarchical Hybrid Transformer For Medical Image Segmentation
No ratings yet
H2Former An Efficient Hierarchical Hybrid Transformer For Medical Image Segmentation
13 pages
Multimodal GenAi Pranav
No ratings yet
Multimodal GenAi Pranav
7 pages
MTANet: Efficient Medical Image Segmentation & Classification
No ratings yet
MTANet: Efficient Medical Image Segmentation & Classification
12 pages
EMCAD Efficient Multi-Scale Convolutional Attention Decoding For Medical Image Segmentation
No ratings yet
EMCAD Efficient Multi-Scale Convolutional Attention Decoding For Medical Image Segmentation
14 pages
Pprid 104
No ratings yet
Pprid 104
11 pages
Bharath Simha Reddy 2021 IOP Conf. Ser. Mater. Sci. Eng. 1022 012020
No ratings yet
Bharath Simha Reddy 2021 IOP Conf. Ser. Mater. Sci. Eng. 1022 012020
11 pages
Swin UNETR: Self-Supervised 3D Medical Imaging
No ratings yet
Swin UNETR: Self-Supervised 3D Medical Imaging
11 pages
2............... EFFResNet-ViT A Fusion-Based Convolutional and Vision Transformer Model For Explainable Medical Image Classification
No ratings yet
2............... EFFResNet-ViT A Fusion-Based Convolutional and Vision Transformer Model For Explainable Medical Image Classification
29 pages
Medical Image Segmentation With Deep Learning
No ratings yet
Medical Image Segmentation With Deep Learning
42 pages
CE 321 Sample Problems
No ratings yet
CE 321 Sample Problems
36 pages
Um 12128
No ratings yet
Um 12128
86 pages
Campus Network Design Guide
No ratings yet
Campus Network Design Guide
10 pages
Ws FTP 65
No ratings yet
Ws FTP 65
161 pages
Aircraft Systems and Auto Flight
No ratings yet
Aircraft Systems and Auto Flight
36 pages
Rexroth Servo Drives Programming:: Page 1 of 56
No ratings yet
Rexroth Servo Drives Programming:: Page 1 of 56
56 pages
GIGABYTE GA-X79S-UP5-WiFi Motherboard
0% (1)
GIGABYTE GA-X79S-UP5-WiFi Motherboard
63 pages
Kamiq - General Body Repairs
No ratings yet
Kamiq - General Body Repairs
74 pages
Data Mining Anomaly Detection
No ratings yet
Data Mining Anomaly Detection
33 pages
ManishDash Resume
No ratings yet
ManishDash Resume
1 page
Ethical Considerations and Prospects of Implanting Medical Technology in Patients
No ratings yet
Ethical Considerations and Prospects of Implanting Medical Technology in Patients
4 pages
FE Lec12 Psychrometrics
No ratings yet
FE Lec12 Psychrometrics
12 pages
Xfinity Universal Remote Siver With Red Ok Button Setup Guide and Codes Optimized
No ratings yet
Xfinity Universal Remote Siver With Red Ok Button Setup Guide and Codes Optimized
2 pages
Guidelines For Meshing in Ansoft HFSS
No ratings yet
Guidelines For Meshing in Ansoft HFSS
19 pages
Cfile
No ratings yet
Cfile
14 pages
BC 5000 5150
81% (32)
BC 5000 5150
242 pages
Sanding STS-750 Series Total Station Owners Manual
No ratings yet
Sanding STS-750 Series Total Station Owners Manual
214 pages
Hydraulic Excavator R984B - R984C: Document Identification
No ratings yet
Hydraulic Excavator R984B - R984C: Document Identification
6 pages
Mixed Methods Research 2024
No ratings yet
Mixed Methods Research 2024
24 pages
MKA 3D Modeling 3 Year Lesson 11 1542272133
No ratings yet
MKA 3D Modeling 3 Year Lesson 11 1542272133
18 pages
UltraEdit Tutorial
No ratings yet
UltraEdit Tutorial
78 pages
PassMark Videocards G3D 2024 - 03 - 20
No ratings yet
PassMark Videocards G3D 2024 - 03 - 20
15 pages
A Searching Based Method For Locating High Impedance Arcing Fault in Distribution Networks
No ratings yet
A Searching Based Method For Locating High Impedance Arcing Fault in Distribution Networks
10 pages
The Pocket Guide To Medical Retina Pocket Guides
No ratings yet
The Pocket Guide To Medical Retina Pocket Guides
301 pages
PowerPoint MCQ
No ratings yet
PowerPoint MCQ
13 pages
Department of Education
No ratings yet
Department of Education
4 pages
How Java and Android Work Together
No ratings yet
How Java and Android Work Together
7 pages
Digital Marketing Prompts For ChatGPT - Gemini - DeepSeek
No ratings yet
Digital Marketing Prompts For ChatGPT - Gemini - DeepSeek
86 pages
IoT Based Smart Water Quality Monitoring System
No ratings yet
IoT Based Smart Water Quality Monitoring System
63 pages
400 kV Autotransformers in India
No ratings yet
400 kV Autotransformers in India
3 pages

Mamba-UNet: Enhanced Medical Image Segmentation

Uploaded by

Mamba-UNet: Enhanced Medical Image Segmentation

Uploaded by

Mamba-UNet: UNet-Like Pure Visual Mamba for

Medical Image Segmentation

Ziyang Wang1 , Jian-Qing Zheng1 , Yichi Zhang2 , Ge Cui3 , Lei Li4

Abstract. In recent advancements in medical image analysis, Convo-

Keywords: Medical Image Segmentation · Convolution · Transformer ·

Fig. 1. A brief introduction of the evolution of recent developments of UNet with

Motivated by the success of self-attention mechanisms from natural language

performance in processing long sequences. The Mamba model enhances S4 with

Fig. 2. The architecture of Mamba-UNet, which is composed of encoder, bottleneck,

2.2 VSS Block

In the encoder, C-dimensional tokenized inputs with reduced resolution undergo

2.5 Bottleneck & Skip Connetions

3 Experiments and Results

3.1 Data Sets

3.2 Implementation Details

3.3 Baseline Methods

3.4 Evaluation Metrics

3.5 Qualitative Results

Figure 4 illustrates three randomly selected sample raw images, corresponding

3.6 Quantitative Results

Table 1 reports the direct comparison of Mamba-UNet against other segmenta-

Table 1. Direct Comparison of Segmentation Networks Performance on MRI Cardiac

Framework Dice↑ IoU↑ Acc↑ Pre↑ Sen↑ Spe↑ HD↓ ASD↓

In this paper, we introduced Mamba-UNet, which is a purely Visual Mamba

You might also like