[arXiv] [Project Page]
DAAAM is a novel spatio-temporal memory framework for large-scale and real-time 4D scene understanding, building hierarchical 4D scene graphs with detailed natural language descriptions.
Key contributions:
- Novel optimization-based frontend for semantic descriptions from localized captioning models
- Hierarchical 4D scene graph construction with real-time performance
- State-of-the-art results on NaVQA and SG3D benchmarks
This work was supported by the ARL DCIST program and the ONR RAPID program.
Code coming soon.
If you use this code in your work, please cite the following paper:
Nicolas Gorlo, Lukas Schmid, and Luca Carlone, "Describe Anything, Anywhere, at Any Moment". arXiv preprint arXiv:2512.00565, 2025.
@article{Gorlo2025DAAAM,
title={Describe Anything Anywhere At Any Moment},
author={Nicolas Gorlo and Lukas Schmid and Luca Carlone},
year={2025},
eprint={2512.00565},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.00565}
}