Stable Diffusion Segmentation
for Biomedical Images with Single-step Reverse Process

{1School of Biomedical Engineering, 2School of Computer Science and Engineering}, Sun Yat-sen University
3International School, Beijing University of Posts and Telecommunications

*Corresponding Author

MICCAI 2024
MY ALT TEXT

The overview of SDSeg.

Highlights

🌟 Stable not only shows that SDSeg is built on Stable Diffusion but also indicates its remarkable stability.
🌟 SDSeg only requires a single-step reverse process to generate segmentation results.
🌟 SDSeg has remarkable stability and doesn't need to sample multiple times for average.

Motivation

Inspired by the Occam's razor✂️, we believe that remove redundant techniques in conventional diffusion models can benefit segmentation task.

🔒 Challenges

(1) Pixel-level Diffusion process is computing resource-consuming during training.
(2) Multi-step reverse process and multiple-sample average scheme are time-consuming during inference.

🔑 Solutions

(1) Using Latent-level Diffusion model (Stable Diffusion, SD).
Also, SD's Autoencoder can generalize to segmentation maps. No fintune needed.
(2) Designing a single-step reverse process with strong stability against initial noise.

MY ALT TEXT

visualization of latent representation maps.

The latent representations has high similarity among their corresponding segmentation maps, and segmentation maps have much less semantic knowledge comparing to RGB images. These observations indicate a much simpler diffusion process can be established.

Single-step Reverse process

SDSeg only needs ☝️ step to generate the final segmentation map, which even has better result comparing to dozens of steps of DDIM sampling.
MY ALT TEXT

Comparison of DDIM convergence speed w/ and w/o latent estimation.

Visualization of the predicted probability maps in reverse process (DDIM sampler).

Stability Evaluation

We proposed a Stability Evaluation scheme to measure the stability of any diffusion-based segmentation methods:

👉 Dataset-level Stability: performs repeated inferences on test data to measure variability.
👉 Instance-level Stability: examines the model’s consistency under varying initial noise.

MY ALT TEXT

The proposed Stability Evaluation scheme.

Other Qualitative Analysis

MY ALT TEXT

Visualization of the latent representations of medical images from the trainable vision encoder.

At iteration 0, the encoder pre-trained on natural images couldn’t capture enough meaningful semantic features for segmentation. During training, the conditioning encoder gradually learns to focus on segmentation targets.

BibTex


        @InProceedings{lin2024stable,
          author="Lin, Tianyu
            and Chen, Zhiguang
            and Yan, Zhonghao
            and Yu, Weijiang
            and Zheng, Fudan",
          title="Stable Diffusion Segmentation for Biomedical Images with Single-Step Reverse Process",
          booktitle="Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024",
          year="2024",
          publisher="Springer Nature Switzerland",
          address="Cham",
          pages="656--666",
          isbn="978-3-031-72111-3"
          }