Yi Xu, Ruining Yang, Yitian Zhang, Yizhou Wang, Jianglin Lu, Mingyuan Zhang, Lili Su, Yun Fu
Department of Electrical and Computer Engineering, Northeastern University
Recent advances in Large Language Models (LLMs) are transforming how autonomous systems understand, predict, and reason about motion. This survey offers the first comprehensive review of LLM-based trajectory prediction, highlighting how natural language can enhance modeling, supervision, interpretability, and simulation in trajectory prediction.
π Click here to view the full PDF
We categorize current research into five key directions:
-
Trajectory Prediction via Language Modeling Paradigms
Reformulating trajectory generation as a language-style sequence modeling task using tokenization and autoregressive prediction. -
Direct Trajectory Prediction with Pretrained Language Models
Employing GPT-style models (e.g., T5, GPT-3.5, LLaMA) directly to predict motion trajectories through prompting or fine-tuning. -
Language-Guided Scene Understanding for Trajectory Prediction
Using natural language to enrich environmental understanding and support context-aware forecasting. -
Language-Driven Data Generation for Trajectory Prediction
Generating synthetic trajectory data or driving scenarios from textual descriptions using LLMs. -
Language-Based Reasoning and Interpretability for Trajectory Prediction
Providing natural language rationales, decision chains, and planning justifications to improve transparency and trust.
βLanguage is inherently expressive and compositional, LLMs offer a powerful tool for capturing context, goals, and intent in dynamic environments.β
This work bridges NLP and trajectory prediction communities, showcasing how LLMs support reasoning, few-shot generalization, and multimodal integration in dynamic, agent-based scenarios.
We discuss several open challenges and promising directions, including:
- Effective tokenization of continuous motion data
- Prompt design and alignment across tasks
- Commonsense and causal reasoning with LLMs
- Multimodal context fusion (e.g., maps, images, language)
- Explanation fidelity and interpretability for real-world deployment
We also provide structured comparisons of recent LLM-based methods across four core tasks:
- Direct Prediction
- Scene Understanding
- Data Generation
- Reasoning & Interpretability
Each table includes model types, LLM usage, prompting strategy, fine-tuning method, and datasets.
Please cite our work if you find it helpful:
@article{xu2025llmtraj,
title={Trajectory Prediction Meets Large Language Models: A Survey},
author={Xu, Yi and Yang, Ruining and Zhang, Yitian and Wang, Yizhou and Lu, Jianglin and Zhang, Mingyuan and Su, Lili and Fu, Yun},
journal={arXiv preprint arXiv:2506.03408},
year={2025},
url={https://arxiv.org/abs/2506.03408}
}