[Pattern Recognition 2025] Official implementation of paper: "TSLDSeg: A Texture-aware and Semantic-enhanced Latent Diffusion Model for Medical Image Segmentation".
This repository is based on SDSeg, a latent diffusion model for medical image segmentation.
Specifically, our work focuses on alleviating the information loss of perceptual compression in conditioning by:
- Enhancing fine-grained representations to preserve high-frequency details (edges, fine textures, orientations).
- Leveraging hypergraph modeling to capture semantic relationships and spatial/topological constraints.
A suitable conda environment named TSLDSeg can be created
and activated with:
conda env create -f environment.yaml
conda activate TSLDSegThen, install some dependencies by:
pip install -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
pip install -e git+https://github.com/openai/CLIP.git@main#egg=clip
pip install -e .Solve GitHub connection issues when downloading taming-transformers or clip
After creating and entering the TSLDSeg environment:
- create an
srcfolder and enter:
mkdir src
cd src- download the following codebases in
*.zipfiles and upload tosrc/:- https://github.com/CompVis/taming-transformers,
taming-transformers-master.zip - https://github.com/openai/CLIP,
CLIP-main.zip
- https://github.com/CompVis/taming-transformers,
- unzip and install taming-transformers:
unzip taming-transformers-master.zip
cd taming-transformers-master
pip install -e .
cd ..- unzip and install clip:
unzip CLIP-main.zip
cd CLIP-main
pip install -e .
cd ..- install TSLDSeg:
cd ..
pip install -e .Then you're good to go!
TSLDSeg uses pre-trained weights from LDM to initialize before training.
For pre-trained weights of the autoencoder and conditioning model, run
bash scripts/download_first_stages_f8.shFor pre-trained wights of the denoising UNet, run
bash scripts/download_models_lsun_churches.shTake CVC dataset as an example, run
nohup python -u main.py --base configs/latent-diffusion/cvc-ldm-kl-8.yaml -t --gpus 0, --name experiment_name > nohup/experiment_name.log 2>&1 &You can check the training log by
tail -f nohup/experiment_name.logAlso, tensorboard will be on automatically. You can start a tensorboard session with --logdir=./logs/. For example,
tensorboard --logdir=./logs/Note
If you want to use parallel training, the code trainer_config["accelerator"] = "gpu" in main.py should be changed to trainer_config["accelerator"] = "ddp". However, parallel training is not recommended since it has no performance gain (in my experience).
Warning
A single TSLDSeg model ckeckpoint is around 5GB. By default, save only the last model and the model with the highest dice score. If you have tons of storage space, feel free to save more models by increasing the save_top_k parameter in main.py.
After training an TSLDSeg model, you should manually modify the run paths in scripts/slice2seg.py, and begin an inference process like
python -u scripts/slice2seg.py --dataset cvcIf you find this repository useful in your research, please consider citing:
@article{YANG2026112795,
title = {TSLDSeg: A texture-aware and semantic-enhanced latent diffusion model for medical image segmentation},
journal = {Pattern Recognition},
volume = {173},
pages = {112795},
year = {2026},
issn = {0031-3203},
doi = {https://doi.org/10.1016/j.patcog.2025.112795},
author = {Zongjian Yang and Chunquan Li and Jiquan Ma},
}or
Z. Yang, C. Li and J. Ma, “TSLDSeg: A texture-aware and semantic-enhanced latent diffusion model for medical image segmentation,” Pattern Recognition, vol. 173, p. 112795, 2026, doi: 10.1016/j.patcog.2025.112795.
This work is built upon the following open-source projects. We sincerely thank the authors for their excellent contributions: