Replies: 2 comments 13 replies
-
Did you see it get better after training it longer? |
Beta Was this translation helpful? Give feedback.
-
Hi @Can-Zhao , First, thank you and the MAISI team for the incredible work. I am currently exploring MAISI for neck CT, but I’m running into several challenges. There don’t seem to be many anatomies supported for head/neck CT. Even the existing models are unable to generate meaningful images of the brain, head, or neck although the documentation mentions that indexing “1” should support head-neck CT. I tried fine-tuning ControlNet following the provided instructions, adding around 40 new labels with 5–10 cases. However, the model did not converge. After experimenting with different parameters and epochs, it showed slight convergence, but the outputs still mostly resembled chest anatomy with a rough neck contour. In addition, I reviewed the configs/image_median_statistics.json file in the HuggingFace repo and couldn’t find anything related to neck anatomy. Dataset preparation is not an issue on my side, I am generating embeddings through the MAISI scripts, and masks are 100% verified. This leaves me confused: do I need to train ControlNet from scratch (and potentially the each model from scratch), or is MAISI simply not designed to handle neck scenarios? Also, what would be a good dataset size to train on? I noticed in discussions that @sara-create had issues even with 300 cases, while @pshavela trained on ~700 cases and reported good results. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello all,
I will cross-post this from the main repository ( #8451 )
I am currently working on my master's project and it is a bit urgent:
firstly I would like to thank the MAISI team for their amazing work.
I have been playing around with it and I am currently trying to reproduce the mask-conditioned synthesis on a coronary CT dataset. I have trained the VAE and the MAISI diffusion model from scratch according to the tutorials, with great success.
Unfortunately, I cannot get the ControlNet to converge, I have tried different learning rates (1e-5 and 1e-6), let it train
for more than 20000 steps, however, as the images below show the conditioning is not applied well enough.
I am also comparing the conditionally generated images at every epoch, but they yield similar results.
Does someone have some pointers on how to properly train the ControlNet with semantic masks and how to achieve comparable results?
Beta Was this translation helpful? Give feedback.
All reactions