MERGE: Guided Vision-Language Models for Multi-actor Event Reasoning and Grounding in Human–Robot Interaction
It is recommneded to use uv for the installation uv installation guide.
Once, you have uv installed, run following commands form the root folder:
uv venv
uv syncYou can switch between the different experiments editing following lines: ./examples/evaluate.py. Default setting is MERGE with GPT-4o.
uv run examples/evaluate.pyThe required image files will be shared via Github once the repository is available.
To use GPT, you need to set
export OPENAI_API_KEY="53CRE7_KEY"and for Gemini
export GEMINI_API_KEY="53CRE7_KEY"After this you can run MERGE with:
uv run examples/merge_full.pyand the baseline exmaples with:
uv run examples/baselines.py