文献速递：基于SAM的医学图像分割---nnSAM：即插即用的分割任何东西模型模型提升了nnUNet性能

Title

题目

nnSAM: Plug-and-play Segment Anything Model Improves nnUNet Performance

nnSAM：即插即用的分割任何东西模型模型提升了nnUNet性能

文献速递介绍

在现代临床工作流中，医学图像的高效准确分割对于疾病诊断和预后、治疗计划和监控以及治疗结果跟踪至关重要。传统上，医学图像分割是一个非常耗时和劳动密集的任务。深度学习自动分割技术的出现显著减少了放射科医生和放射肿瘤学家所需的时间和努力。在为生物医学图像分割设计的众多深度学习架构中，U-Net以其有效和高效捕获全局和局部特征的能力而脱颖而出，以获得更好的分割结果。基于U-Net骨架，大量研究开发了针对不同任务的各种修改架构。例如，TransUNet集成了U-Net和Transformers的优势，为医学图像分割定义了一个新的基准。通过利用Transformers的全局上下文理解和U-Net的精确定位能力，TransUNet能够捕获长距离依赖性，同时保持对局部结构的分割精度。另一个例子是UNet++，旨在弥合编码器和解码器特征图之间的语义差距。它结合了深度监督的编解码器网络和嵌套的密集跳跃路径，以提高分割精度。另一个网络，SwinUNet引入了另一种基于Transformer的方法来进行医学图像分割，利用U型编解码器架构和跳跃连接，增强了局部-全局语义特征学习。该模型表现出了优于传统基于卷积方法和混合Transformer-卷积技术的卓越性能。然而，许多分割工作仍需要大量人力进行架构修改和超参数调整，以适应不同的应用或数据集。为了应对这一挑战，提出了nnUNet框架。nnUNet框架采取了一种独特的方法，即“无新网络”，不提出新的网络架构。相反，它重新关注方法学、架构搜索和数据预处理步骤，以获得最佳性能。nnUNet策略证明，通过适当的预处理和后处理组合，即使是基本的网络架构也可以在广泛的医学分割任务中实现最先进的性能。

Abstract

摘要

The recent developments of foundation models in computer vision, especially the Segment Anything Model (SAM), allow scalable and domain-agnostic image segmentation to serve as a general-purpose

segmentation tool. In parallel, the field of medical image segmentation has benefited significantly from specialized neural networks like the nnUNet, which is trained on domain-specific datasets and can automatically con figure the network to tailor to specific segmentation challenges. To com bine the advantages of foundation models and domain-specific models, we present nnSAM, which synergistically integrates the SAM model with the nnUNet model to achieve more accurate and robust medical im age segmentation. The nnSAM model leverages the powerful and ro bust feature extraction capabilities of SAM, while harnessing the auto matic configuration capabilities of nnUNet to promote dataset-tailored learning. Our comprehensive evaluation of nnSAM model on different sizes of training samples shows that it allows few-shot learning, which is highly relevant for medical image segmentation where high-quality, annotated data can be scarce and costly to obtain. By melding the strengths of both its predecessors, nnSAM positions itself as a potential new benchmark in medical image segmentation, offering a tool that com bines broad applicability with specialized efficiency. The code is available at https://github.com/Kent0n-Li/Medical-Image-Segmentation.

近期，在计算机视觉领域的基础模型发展，尤其是“分割任何东西模型”（SAM），允许可扩展且领域无关的图像分割，成为一种通用的分割工具。与此同时，医学图像分割领域显著受益于专门的神经网络，如nnUNet，该网络针对特定领域的数据集进行训练，并能自动配置网络，以适应特定的分割挑战。为了结合基础模型和特定领域模型的优势，我们提出了nnSAM，它将SAM模型与nnUNet模型协同整合，以实现更准确、更稳健的医学图像分割。nnSAM模型利用了SAM的强大和稳健的特征提取能力，同时利用nnUNet的自动配置能力，促进了针对特定数据集的学习。我们对nnSAM模型在不同规模训练样本上的综合评估显示，它允许少量样本学习，这对于医学图像分割来说非常相关，因为高质量、标注数据可能稀缺且成本高昂。通过融合其前身的优势，nnSAM将自身定位为医学图像分割中潜在的新基准，提供了一种结合广泛适用性与专业效率的工具。代码可在 https://github.com/Kent0n-Li/Medical-Image-Segmentation 获取。

METHOD

方法

2.1 Architecture Overview

The architecture of the proposed nnSAM framework is depicted in Fig. 1. The model is designed to combine the strengths of nnUNet [8] and SAM [9]. Specif ically, nnSAM consists of two parallel encoders: the nnUNet encoder and the SAM encoder. The SAM encoder is a pre-trained Vision Transformer (ViT) . The embeddings from both encoders are concatenated and subsequently fed into nnUNet’s decoder to output the final segmentation map. Furthermore, the SAM encoder is used as a plug-and-play plugin whose parameters are frozen during

training. Correspondingly, only the weightings of the encoder and decoder of thennUNet are updated during the training.

2.1 架构概览

所提出的nnSAM框架的架构如图 1 所示。该模型旨在结合nnUNet [8]和SAM [9]的优势。具体来说，nnSAM包含两个并行的编码器：nnUNet编码器和SAM编码器。SAM编码器是一个预训练的视觉Transformer（ViT）[13]。两个编码器的嵌入被串联起来，随后输入到nnUNet的解码器中，以输出最终的分割图。此外，SAM编码器被用作即插即用的插件，其参数在训练期间被冻新。

CONCLUSION

结论

We introduce nnSAM, a novel, few-shot learning solution for medical image seg mentation that melds the strengths of the Segment Anything Model (SAM) and nnUNet. Our extensive evaluation across different numbers of 2D training sam ples sets a potential new benchmark in medical image segmentation, especially in scenarios where training data is scarce. The results also highlight the robust ness and superior segmentation performance of nnSAM, making it a promising tool for future research and practical applications in medical imaging.

我们介绍了nnSAM，这是一个新颖的少样本学习解决方案，用于医学图像分割，它融合了Segment Anything Model (SAM)和nnUNet的优势。我们通过不同数量的2D训练样本进行的广泛评估，为医学图像分割设置了一个潜在的新基准，特别是在训练数据稀缺的情况下。结果还突出了nnSAM的鲁棒性和卓越的分割性能，使其成为未来研究和医学成像实际应用中的一个有前景的工具。

Fig

图

Fig. 1. The architecture of nnSAM, which integrates nnUNet’s encoder with the pre trained SAM encoder. The correspondingly concatenated embeddings are input into nnUNet’s decoder to output the final segmentation. A cardiac sub-structure segmenta tion example is presented. (LV: left ventricle; RV: right ventricle; LA: left atrium; RA: right atrium; Myo: myocardium of LV)

图 1. nnSAM的架构，它将nnUNet的编码器与预训练的SAM编码器整合在一起。相应连接的嵌入作为输入输入到nnUNet的解码器中，以输出最终的分割结果。这里展示了一个心脏亚结构分割的例子。（LV：左心室；RV：右心室；LA：左心房；RA：右心房；Myo：左心室的心肌）