Skip to content

Commit 8563e05

Browse files
authored
Update README.md (#27)
* Update README.md * Update README.md * Update README.md
1 parent a7a8eb7 commit 8563e05

File tree

1 file changed

+23
-5
lines changed

1 file changed

+23
-5
lines changed

README.md

Lines changed: 23 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,22 @@
11
# LLaVAction: Evaluating and Training Multi-Modal Large Language Models for Action Recognition
22

3+
[![Static Badge](https://img.shields.io/badge/LLaVAction-paper-green)](http://arxiv.org/abs/tbd)
4+
[![Demo Website](https://img.shields.io/badge/LLaVAction-website-red)](https://mmathislab.github.io/llavaction/)
5+
[![llavaction-checkpoints](https://img.shields.io/badge/LLaVAction-checkpoints_🤗-blue)](https://huggingface.co/MLAdaptiveIntelligence)
36

4-
- This repository contains the implementation for our ICCV 2025 submission on evaluating and training multi-modal large language models for action recognition.
5-
- Our code is built on [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT), and files in the directory `llavaction/action` are related to this work. We thank the authors of LLaVA-NeXT for making their code publicly available.
7+
[![Downloads](https://static.pepy.tech/badge/llavaction)](https://pepy.tech/project/llavaction)
8+
[![Downloads](https://static.pepy.tech/badge/llavaction/month)](https://pepy.tech/project/llavaction)
9+
[![PyPI version](https://badge.fury.io/py/llavaction.svg)](https://badge.fury.io/py/llavaction)
10+
![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-red)
11+
12+
## Abstract
13+
14+
Understanding human behavior requires measuring behavioral actions. Due to its complexity, behavior is best mapped onto a rich, semantic structure such as language. The recent development of multi-modal large language models (MLLMs) is a promising candidate for a wide range of action understanding tasks. In this work, we focus on evaluating and then improving MLLMs to perform action recognition. We reformulate EPIC-KITCHENS-100, one of the largest and most challenging egocentric action datasets, to the form of video multiple question answering (EPIC-KITCHENS-100-MQA). We show that when we sample difficult incorrect answers as distractors, leading MLLMs struggle to recognize the correct actions. We propose a series of methods that greatly improve the MLLMs' ability to perform action recognition, achieving state-of-the-art on both the EPIC-KITCHENS-100 Challenge, as well as outperforming GPT-4o by 21 points in accuracy on EPIC-KITCHENS-100-MQA. Lastly, we show improvements on other action-related video benchmarks such as VideoMME, PerceptionTest and MVBench.
15+
16+
## Code
17+
18+
- This repository contains the implementation for our preprint on evaluating and training multi-modal large language models for action recognition.
19+
- Our code is built on [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT), and files in the directory `llavaction/action` are related to our work. We thank the authors of LLaVA-NeXT for making their code publicly available.
620
- The files in the `/eval`, `/model`, `/serve` and `/train` are directly from [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT), unless modified and noted below.
721
- Modified files are:
822
- - /model/llava_arch.py
@@ -11,12 +25,12 @@
1125
- - /train/llava_trainer.py
1226
- - /utils.py
1327
- - A diff can be generated against the commit (79ef45a6d8b89b92d7a8525f077c3a3a9894a87d) of LLaVA-NeXT to see our modifications.
14-
- The code will be made publicly available when published. For review, the provided code and model license is [no license](https://choosealicense.com/no-permission/).
15-
1628

1729
## Demo
1830
- Currently, we provide code to run video inference in a Jupyter Notebook (which can be run on Google Colaboratory).
19-
**Installation guide for video inference:**
31+
32+
33+
### Installation guide for video inference:
2034
```bash
2135
conda create -n llavaction python=3.10 -y
2236
conda activate llavaction
@@ -25,3 +39,7 @@ pip install -e .
2539
```
2640

2741
- Please see the `/example` directory for a demo notebook.
42+
43+
## EPIC-KITCHENS-100-MQA
44+
45+
In our work, we introduce a new way to evaluate MLMMs for action recognition by casting EPIC-KITCHENS-100 into a multi-question-answer benchmark. This has not yet been released [as of 3/2025], but please check the issues or open an issue if you are interested in accessing this resource before the paper is published. We also plan to integrate this the package [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval).

0 commit comments

Comments
 (0)