-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
hi! great job!
i want to finetune my videodata,but when i check the dataset prepare part, the annos are like this:
{
"video": ["videos/xxx.mp4"],
"conversations": [
{
"from": "human",
"value": "<video>\nWhat are the main activities that take place in the video?"
},
{
"from": "gpt",
"value": "The main activities that take place in the video are the preparation of camera equipment by a man, a group of men riding a helicopter, and a man sailing a boat through the water."
},
...
]
},
these annos are video-level conversations,just tell us what the video says, but i want to know what happened in the video and also when does it happen.
so can i provide frame-index-level dialogue information when organizing my data?
For example: Does the person in the video pick up any object? If so, at what time segments does it happen?
further more,maybe some objects` bbox are needed(does the man holding a phone? if so,tell me the cords of the phone!)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels