Skip to content

Does the mask of multi-word concepts still only use the first token #18

@120L020904

Description

@120L020904

Thank you very much for your exploration of semantic segmentation of the MM-DIT generative model.
When reading the code, I encountered a question. I saw that you only took the first token in the concept embedding as the final token to be used.
Is this because the first token after passing through the t5 encoder contains all the semantic information?
Will this lead to the loss of information?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions