ViViT for Regression Tasks 

I have been trying to use TimeSformer and ViViT, I have managed to convert it into a regression model by changing the loss function and setting the output of the mlp to 1. However what i understand is that a video vision transformer takes a video clip as an input(broken into frames) and outputs a single value corresponding to that input clip. I would like the model to output a value for each frame of the clip input so instead of outputing 1 value it outputs 32 values. Can you guide me in this regards.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ViViT for Regression Tasks #7

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

ViViT for Regression Tasks #7

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions