Based on my understanding of the original paper, a trained diffusion model should be able to directly output multiple views after reading a single image, without needing additional fine-tuning for each input. However, when I ran ./threestudio/scripts/run_imagedream.sh, I found that training is still required. I tried modifying line 28 of this script to "--test", but found that the output result was completely gray. I would like to ask, does the diffusion model require additional fine-tuning for each input, or am I running it incorrectly? Is the training process in run_imagedream.sh functioning as intended?
依据我对论文原文的理解,经过训练后的diffusion model在读入单张图片后应该可以直接输出多视图而不需要针对每个输入做额外的微调。但我运行了./threestudio/scripts/run_imagedream.sh,发现还是需要训练。我试图将该脚本的第28行改为--test,但发现此时输出结果是一片灰色。我想问一下,是diffusion model需要针对每个输入做额外的微调,还是我的运行方式不对?run_imagedream.sh中的训练过程是正常的吗?