Skip to content

can i provide frame-index-level dialogue information when organizing my videodata? #8

@jamesbond20181031

Description

@jamesbond20181031

hi! great job!
i want to finetune my videodata,but when i check the dataset prepare part, the annos are like this:

{
        "video": ["videos/xxx.mp4"],
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nWhat are the main activities that take place in the video?"
            },
            {
                "from": "gpt",
                "value": "The main activities that take place in the video are the preparation of camera equipment by a man, a group of men riding a helicopter, and a man sailing a boat through the water."
            },
            ...
        ]
    },

these annos are video-level conversations,just tell us what the video says, but i want to know what happened in the video and also when does it happen.
so can i provide frame-index-level dialogue information when organizing my data?
For example: Does the person in the video pick up any object? If so, at what time segments does it happen?
further more,maybe some objects` bbox are needed(does the man holding a phone? if so,tell me the cords of the phone!)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions