Skip to content

Training Vision Encoder from scractch #52

@Justin-Regef

Description

@Justin-Regef

Hi!
Thank you for the really cool research and available code. I was wondering, would it be possible / feasable / interesting to train the LLM2CLIP's vision encoder from scratch using the CC-LLM as text encoder?
I noticed in the paper you only finetuned vision encoders with the CC-LLM, but I don't see why we couldn't just immediately train a blank vision encoder. Is it because generating so many embeddings with the CC-LLM would cost too much?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions