can i provide frame-index-level dialogue information when organizing my videodata?

hi! great job!
i want to finetune my videodata,but when i check the dataset prepare part, the annos are like this:
```
{
        "video": ["videos/xxx.mp4"],
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nWhat are the main activities that take place in the video?"
            },
            {
                "from": "gpt",
                "value": "The main activities that take place in the video are the preparation of camera equipment by a man, a group of men riding a helicopter, and a man sailing a boat through the water."
            },
            ...
        ]
    },
```

these annos are video-level conversations,just tell us what the video says, but i want to know what happened in the video and also when does it happen.
so can i provide frame-index-level dialogue information when organizing my data?
For example: Does the person in the video pick up any object? If so, at what time segments does it happen?
further more,maybe some objects` bbox are needed(does the man holding a phone? if so,tell me  the cords of the phone!)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can i provide frame-index-level dialogue information when organizing my videodata? #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

can i provide frame-index-level dialogue information when organizing my videodata? #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions