Skip to content
This repository was archived by the owner on Jan 13, 2026. It is now read-only.

Is the fine-tuning needed during the inference of multi-view diffusion model? #14

@LargeRaindrop

Description

@LargeRaindrop

Based on my understanding of the original paper, a trained diffusion model should be able to directly output multiple views after reading a single image, without needing additional fine-tuning for each input. However, when I ran ./threestudio/scripts/run_imagedream.sh, I found that training is still required. I tried modifying line 28 of this script to "--test", but found that the output result was completely gray. I would like to ask, does the diffusion model require additional fine-tuning for each input, or am I running it incorrectly? Is the training process in run_imagedream.sh functioning as intended?

依据我对论文原文的理解,经过训练后的diffusion model在读入单张图片后应该可以直接输出多视图而不需要针对每个输入做额外的微调。但我运行了./threestudio/scripts/run_imagedream.sh,发现还是需要训练。我试图将该脚本的第28行改为--test,但发现此时输出结果是一片灰色。我想问一下,是diffusion model需要针对每个输入做额外的微调,还是我的运行方式不对?run_imagedream.sh中的训练过程是正常的吗?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions