|
2 | 2 |
|
3 | 3 | <!-- [中文 README](README_cn.md) --> |
4 | 4 |
|
| 5 | +[](https://paperswithcode.com/sota/action-classification-on-kinetics-400?p=internvideo2-scaling-video-foundation-models) |
| 6 | + |
| 7 | +[](https://paperswithcode.com/sota/action-classification-on-kinetics-600?p=internvideo2-scaling-video-foundation-models) |
| 8 | + |
| 9 | +[](https://paperswithcode.com/sota/action-classification-on-kinetics-700?p=internvideo2-scaling-video-foundation-models) |
| 10 | + |
| 11 | +[](https://paperswithcode.com/sota/action-recognition-in-videos-on-something?p=internvideo2-scaling-video-foundation-models) |
| 12 | + |
| 13 | +[](https://paperswithcode.com/sota/action-recognition-in-videos-on-activitynet?p=internvideo2-scaling-video-foundation-models) |
| 14 | + |
| 15 | +[](https://paperswithcode.com/sota/action-classification-on-moments-in-time?p=internvideo2-scaling-video-foundation-models) |
| 16 | + |
| 17 | +[](https://paperswithcode.com/sota/action-recognition-on-hacs?p=internvideo2-scaling-video-foundation-models) |
| 18 | + |
| 19 | +[](https://paperswithcode.com/sota/zero-shot-video-retrieval-on-msr-vtt?p=internvideo2-scaling-video-foundation-models) |
| 20 | + |
| 21 | +[](https://paperswithcode.com/sota/zero-shot-video-retrieval-on-msvd?p=internvideo2-scaling-video-foundation-models) |
| 22 | + |
| 23 | +[](https://paperswithcode.com/sota/zero-shot-video-retrieval-on-lsmdc?p=internvideo2-scaling-video-foundation-models) |
| 24 | + |
| 25 | +[](https://paperswithcode.com/sota/zero-shot-video-retrieval-on-didemo?p=internvideo2-scaling-video-foundation-models) |
| 26 | + |
| 27 | +[](https://paperswithcode.com/sota/zero-shot-video-retrieval-on-vatex?p=internvideo2-scaling-video-foundation-models) |
| 28 | + |
| 29 | +[](https://paperswithcode.com/sota/zero-shot-video-retrieval-on-activitynet?p=internvideo2-scaling-video-foundation-models) |
| 30 | + |
| 31 | +[](https://paperswithcode.com/sota/video-retrieval-on-msr-vtt?p=internvideo2-scaling-video-foundation-models) |
| 32 | + |
| 33 | +[](https://paperswithcode.com/sota/video-retrieval-on-didemo?p=internvideo2-scaling-video-foundation-models) |
| 34 | + |
| 35 | +[](https://paperswithcode.com/sota/video-retrieval-on-msvd?p=internvideo2-scaling-video-foundation-models) |
| 36 | + |
| 37 | +[](https://paperswithcode.com/sota/video-retrieval-on-lsmdc?p=internvideo2-scaling-video-foundation-models) |
| 38 | + |
| 39 | +[](https://paperswithcode.com/sota/video-retrieval-on-activitynet?p=internvideo2-scaling-video-foundation-models) |
| 40 | + |
| 41 | +[](https://paperswithcode.com/sota/video-retrieval-on-vatex?p=internvideo2-scaling-video-foundation-models) |
| 42 | + |
| 43 | +[](https://paperswithcode.com/sota/text-to-audio-retrieval-on-audiocaps?p=internvideo2-scaling-video-foundation-models) |
| 44 | + |
| 45 | +[](https://paperswithcode.com/sota/text-to-audio-retrieval-on-clotho?p=internvideo2-scaling-video-foundation-models) |
| 46 | + |
| 47 | +[](https://paperswithcode.com/sota/zero-shot-text-to-audio-retrieval-on?p=internvideo2-scaling-video-foundation-models) |
| 48 | + |
| 49 | +[](https://paperswithcode.com/sota/zero-shot-text-to-audio-retrieval-on-clotho?p=internvideo2-scaling-video-foundation-models) |
| 50 | + |
| 51 | +[](https://paperswithcode.com/sota/audio-classification-on-esc-50?p=internvideo2-scaling-video-foundation-models) |
| 52 | + |
| 53 | +[](https://paperswithcode.com/sota/video-grounding-on-qvhighlights?p=internvideo2-scaling-video-foundation-models) |
| 54 | + |
| 55 | +[](https://paperswithcode.com/sota/temporal-action-localization-on-fineaction?p=internvideo2-scaling-video-foundation-models) |
| 56 | + |
| 57 | +[](https://paperswithcode.com/sota/temporal-action-localization-on-hacs?p=internvideo2-scaling-video-foundation-models) |
| 58 | + |
| 59 | +[](https://paperswithcode.com/sota/temporal-action-localization-on-thumos14?p=internvideo2-scaling-video-foundation-models) |
| 60 | + |
| 61 | +[](https://paperswithcode.com/sota/temporal-action-localization-on-activitynet?p=internvideo2-scaling-video-foundation-models) |
| 62 | + |
| 63 | +[](https://paperswithcode.com/sota/zero-shot-video-question-answer-on-egoschema-1?p=internvideo2-scaling-video-foundation-models) |
| 64 | + |
| 65 | +[](https://paperswithcode.com/sota/video-instance-segmentation-on-youtube-vis-1?p=internvideo2-scaling-video-foundation-models) |
| 66 | + |
5 | 67 | This repo will give the code and models of '[InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding](https://arxiv.org/abs/2403.15377)' soon. |
6 | 68 |
|
7 | 69 | - **Achieved `92.1%` Top1 accuracy in Kinetics 400.** |
|
0 commit comments