Skip to content

Using Vlm to generate overall information has some problems #19

@Hotzerara

Description

@Hotzerara

According to the paper, this VLM is trained in two stages. However, I noticed that only one checkpoint is available on Hugging Face. In 1_vlm_demo.py, this single checkpoint appears to be used for both Stage 1 and Stage 2.
I attempted to run 1_vlm_demo.py and checked the Stage 1 output in the /demo folder. The performance seems suboptimal, with many outputs failing to align with the prompt templates.
Is this result expected? Additionally, is it possible to obtain the fine-tuned model for Stage 1 separately?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions