[Cross-Post] MAISI ControlNet Semantic Mask Conditioning Training Issues #1991

pshavela · 2025-05-19T05:58:40Z

pshavela
May 19, 2025

Hello all,
I will cross-post this from the main repository ( #8451 )
I am currently working on my master's project and it is a bit urgent:

firstly I would like to thank the MAISI team for their amazing work.
I have been playing around with it and I am currently trying to reproduce the mask-conditioned synthesis on a coronary CT dataset. I have trained the VAE and the MAISI diffusion model from scratch according to the tutorials, with great success.

Unfortunately, I cannot get the ControlNet to converge, I have tried different learning rates (1e-5 and 1e-6), let it train
for more than 20000 steps, however, as the images below show the conditioning is not applied well enough.
I am also comparing the conditionally generated images at every epoch, but they yield similar results.

Does someone have some pointers on how to properly train the ControlNet with semantic masks and how to achieve comparable results?

Can-Zhao · 2025-05-22T16:34:42Z

Can-Zhao
May 22, 2025
Collaborator

Did you see it get better after training it longer?

12 replies

sara-create Aug 21, 2025

@pshavela Thank you so much for your answer ! I really appreciate it. So at the end the solution that worked best for you was the hypernet along with the DDPM is that right ? You retrained the DDPM from scratch ? Was it enough with 700 volumes ? I am using their checkpoint since I thought , in my case , that 300 volumes weren't enough to retrain the DDPM from scratch

pshavela Aug 21, 2025
Author

@sara-create Exactly, we trained the DDPM and the hypernet both from scratch at the same time. Keep in mind that
this requires more GPU memory than training seperately. And yes, 700 images for training was enough.

But i would also like to mention that we switched to a flow matching-based approach (rectified flow) instead of DDPM, as it proved to be superior in terms of training (500 epochs was enough) and also inference speeds. we used image sizes of 256x256x128.

sara-create Aug 22, 2025

Thank you so much @pshavela !! One more last question , have you retrained the VAE or used their checkpoint ?

Can-Zhao Aug 22, 2025
Collaborator

One possible reason that fune-tuning controlnet did not work: The pretrained controlnet is based on a certain label definition ./configs/label_dict.json
If your data label did not follow it, then fine-tuning from MAISI controlnet does not make sense. Training controlnet from scratch is the way to go. But you can still reuse VAE and diffusion model.

If your data label already followed it, but still not working, then maybe trying rectified flow could be a direction.

pshavela Aug 22, 2025
Author

@sara-create no problem, yes, we did retrain the VAE 👍

Mudassar-MLE · 2025-08-28T02:42:21Z

Mudassar-MLE
Aug 28, 2025

Hi @Can-Zhao ,

First, thank you and the MAISI team for the incredible work. I am currently exploring MAISI for neck CT, but I’m running into several challenges. There don’t seem to be many anatomies supported for head/neck CT. Even the existing models are unable to generate meaningful images of the brain, head, or neck although the documentation mentions that indexing “1” should support head-neck CT.

I tried fine-tuning ControlNet following the provided instructions, adding around 40 new labels with 5–10 cases. However, the model did not converge. After experimenting with different parameters and epochs, it showed slight convergence, but the outputs still mostly resembled chest anatomy with a rough neck contour.

In addition, I reviewed the configs/image_median_statistics.json file in the HuggingFace repo and couldn’t find anything related to neck anatomy. Dataset preparation is not an issue on my side, I am generating embeddings through the MAISI scripts, and masks are 100% verified.

This leaves me confused: do I need to train ControlNet from scratch (and potentially the each model from scratch), or is MAISI simply not designed to handle neck scenarios? Also, what would be a good dataset size to train on? I noticed in discussions that @sara-create had issues even with 300 cases, while @pshavela trained on ~700 cases and reported good results.
A reply on this issue would really help me a lot, much appreciated!

1 reply

Can-Zhao Sep 2, 2025
Collaborator

Thank you! I answered it in #2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Cross-Post] MAISI ControlNet Semantic Mask Conditioning Training Issues #1991

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 13 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[Cross-Post] MAISI ControlNet Semantic Mask Conditioning Training Issues #1991

Uh oh!

pshavela May 19, 2025

Replies: 2 comments · 13 replies

Uh oh!

Can-Zhao May 22, 2025 Collaborator

Uh oh!

sara-create Aug 21, 2025

Uh oh!

pshavela Aug 21, 2025 Author

Uh oh!

sara-create Aug 22, 2025

Uh oh!

Can-Zhao Aug 22, 2025 Collaborator

Uh oh!

pshavela Aug 22, 2025 Author

Uh oh!

Mudassar-MLE Aug 28, 2025

Uh oh!

Can-Zhao Sep 2, 2025 Collaborator

pshavela
May 19, 2025

Replies: 2 comments 13 replies

Can-Zhao
May 22, 2025
Collaborator

pshavela Aug 21, 2025
Author

Can-Zhao Aug 22, 2025
Collaborator

pshavela Aug 22, 2025
Author

Mudassar-MLE
Aug 28, 2025

Can-Zhao Sep 2, 2025
Collaborator