Skip to content

aixiaodewugege/RoleplayAgent-Survey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

3 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

True-to-Role, Tailored-to-You: A Survey of Role-Playing LLM Agents

Teaser Figure: Static-Dynamic Perspective

[ Paper ]

๐Ÿ“– Introduction

This is the official repository for the survey paper "True-to-Role, Tailored-to-You: A Survey of Role-Playing LLM Agents".

Abstract:

Role-playing, anchored in sociology and psychology, involves enacting a character within a context. With the emergence of Large Language Models (LLMs), role-playing has become a compelling application. However, existing research often fragments into separate lines of predefined identity and adaptive behavior. To address this, we present a systematic survey from a unified Staticโ€“Dynamic perspective:

  • ๐ŸŽญ Static Persona: A prescriptive anchor of identity (who the agent is), differentiating it from general assistants.
  • ๐Ÿง  Dynamic Memory: Experience evolved through interactions, decomposed into Self Memory (internal consistency) and User Memory (external personalization).

This repository maintains a curated list of papers discussed in our survey.

๐Ÿ—‚๏ธ Taxonomy & Table of Contents


๐ŸŒŸ Surveys & Foundations

Year Title Venue
2025 Memory in the Age of AI Agents arXiv
2025 From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs arXiv
2025 A Survey of Personalized Large Language Models: Progress and Future Directions arXiv
2025 A Survey of Personalization: From RAG to Agent arXiv
2025 Evaluation and Benchmarking of LLM Agents: A Survey KDD
2025 Towards a Design Guideline for RPA Evaluation ACL Findings
2025 Instruction Tuning for Large Language Models: A Survey arXiv
2025 The Oscars of AI Theater: A Survey on Role-Playing with Language Models arXiv
2024 A survey on large language model based autonomous agents Frontiers CS
2024 Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization EMNLP Findings
2023 The Rise and Potential of Large Language Model Based Agents: A Survey arXiv
2023 Role play with large language models Nature
- Theoretical Foundations
1959 The Presentation of Self in Everyday Life Book (Goffman)
1979 Role Theory: Expectations, Identities, and Behaviors Book (Biddle)
2000 The construction of autobiographical memories in the self-memory system Psych. Review

๐ŸŽญ Static Persona

Persona Construction (Data & Profiling)

Year Title Venue
2025 OpenCharacter: Training Customizable Role-Playing LLMs with Large-Scale Synthetic Personas arXiv
2025 MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents ICLR
2025 Scaling Synthetic Data Creation with 1,000,000,000 Personas arXiv
2025 Video2Roleplay: A Multimodal Dataset and Framework for Video-Guided Role-playing Agents arXiv
2025 PersonaCraft: Leveraging language models for data-driven persona development IJHCS
2025 AudioRole: An Audio Dataset for Character Role-Playing in Large Language Models arXiv
2025 Personality Traits in Large Language Models arXiv
2024 RoleAgent: Building, Interacting, and Benchmarking High-quality Role-Playing Agents from Scripts NeurIPS
2024 GenLARP: Enabling Immersive Live Action Role-Play arXiv
2022 Heroes, Villains, and Victims, and GPT-3: Automated Extraction of Character Roles arXiv
2021 "Let Your Characters Tell Their Story": A Dataset for Character-Centric Narrative Understanding EMNLP
2020 Personalized Dialogue Generation with Diversified Traits arXiv
2019 Learning to Speak and Act in a Fantasy Text Adventure Game EMNLP
2018 Personalizing Dialogue Agents: I have a dog, do you have pets? ACL

Persona Alignment

Year Title Venue
2025 RoleCraft-GLM: Advancing Personalized Role-Playing in Large Language Models ACL Workshop
2025 Enhancing Persona Consistency for LLMs' Role-Playing using Persona-Aware Contrastive Learning ACL Findings
2025 Persona-judge: Personalized Alignment of Large Language Models via Token-level Self-judgment ACL Findings
2025 ORPP: Self-Optimizing Role-playing Prompts to Enhance Language Model Capabilities arXiv
2025 Talk Less, Call Right: Enhancing Role-Play LLM Agents with Automatic Prompt Optimization arXiv
2025 BILLY: Steering Large Language Models via Merging Persona Vectors arXiv
2025 SynthesizeMe! Inducing Persona-Guided Prompts for Personalized Reward Models ACL
2025 Persona-Consistent Dialogue Generation via Pseudo Preference Tuning COLING
2025 ChARM: Character-based Act-adaptive Reward Modeling arXiv
2025 Let's Roleplay: Examining LLM Alignment in Collaborative Dialogues arXiv
2025 Moral Susceptibility and Robustness under Persona Role-Play arXiv
2025 Chain-of-Agents: End-to-End Agent Foundation Models arXiv
2025 SFT Memorizes, RL Generalizes: A Comparative Study arXiv
2025 Enhancing Character-Coherent Role-Playing Dialogue with a Verifiable Emotion Reward Information
2024 RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities ACL Findings
2024 Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play ACL
2024 Quantifying and Optimizing Global Faithfulness in Persona-driven Role-playing NeurIPS
2024 SimPO: Simple Preference Optimization with a Reference-Free Reward NeurIPS
2024 DeepSeekMath: Pushing the Limits of Mathematical Reasoning arXiv
2024 Role-playing Prompt Framework: Generation and Evaluation arXiv
2024 Editing Personality For Large Language Models NLPCC
2024 Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models ICML
2024 LLMs as Method Actors: A Model for Prompt Engineering and Architecture arXiv
2023 Character-LLM: A Trainable Agent for Role-Playing EMNLP
2023 SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF EMNLP Findings
2023 Building Persona Consistent Dialogue Agents with Offline Reinforcement Learning EMNLP Findings

Persona Evaluation

Year Title Venue
2025 RoleMRC: A Fine-Grained Composite Benchmark for Role-Playing ACL Findings
2025 Guess What I am Thinking: A Benchmark for Inner Thought Reasoning EMNLP Findings
2025 DMT-RoleBench: A Dynamic Multi-Turn Dialogue Based Benchmark AAAI
2025 RoleRMBench & RoleRM: Towards Reward Modeling for Profile-Based Role Play arXiv
2025 Role-Playing Evaluation for Large Language Models arXiv
2025 RVBench: Role values benchmark for role-playing LLMs CHB
2025 RMTBench: Benchmarking LLMs Through Multi-Turn User-Centric Role-Playing arXiv
2025 Evaluating Personality Traits in LLMs: Insights from Psychological Questionnaires WWW
2025 RoleBreak: Character Hallucination as a Jailbreak Attack COLING
2024 InCharacter: Evaluating Personality Fidelity in Role-Playing Agents ACL
2024 RoleEval: A Bilingual Role Evaluation Benchmark arXiv
2024 CharacterGLM: Customizing Social Characters with LLMs EMNLP Industry
2024 CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agents EMNLP
2024 DialogBench: Evaluating LLMs as Human-like Dialogue Systems NAACL
2024 Sotopia: Interactive Evaluation for Social Intelligence in Language Agents ICLR
2024 PersonaLLM: Investigating the Ability of LLMs to Express Personality Traits NAACL Findings
2024 TimeChara: Evaluating Point-in-Time Character Hallucination ACL Findings
2024 Mitigating Hallucination in Fictional Character Role-Play (SGR) EMNLP Findings
2024 Evaluating Character Understanding via Character Profiling from Fictional Works EMNLP
2023 ChatHaruhi: Reviving Anime Character in Reality via Large Language Model arXiv
2023 Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation arXiv
2023 Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions EMNLP Findings

๐Ÿง  Dynamic Memory

Dynamic Memory Framework

Memory Structure & Storage

Year Title Venue Structure/Storage
2025 Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory arXiv Graph
2025 MemOS: An Operating System for Memory-Augmented Generation arXiv Graph + Index
2025 H-MEM: Hierarchical Memory for High-Efficiency Long-Term Reasoning arXiv Hierarchical
2025 HiAgent: Hierarchical Working Memory Management ACL Hierarchical
2025 A-MEM: Agentic Memory for LLM Agents arXiv Graphical (Zettelkasten)
2025 An Efficient Context-Dependent Memory Framework (CDMem) NAACL Hierarchical
2025 MIRIX: Multi-Agent Memory System for LLM-Based Agents arXiv Modular
2025 MemEngine: A Unified and Modular Library for Memory of LLM-based Agents WWW Modular
2025 From RAG to Memory: Non-Parametric Continual Learning arXiv Storage
2024 Crafting Personalized Agents through RAG on Editable Memory Graphs (EMG-RAG) EMNLP Graph
2024 MemoryBank: Enhancing Large Language Models with Long-Term Memory AAAI Indexing
2024 MemGPT: Towards LLMs as Operating Systems arXiv Hierarchical/OS

Memory Evolution

Year Title Venue
2025 THEANINE: Towards Lifelong Dialogue Agents via Timeline-based Memory Management NAACL
2025 MOOM: Maintenance, Organization and Optimization of Memory arXiv
2025 Memory as Action: Autonomous Context Curation (MemAct) arXiv
2025 ReflexGrad: Three-Way Synergistic Architecture for Zero-Shot Generalization arXiv
2024 HippoRAG: Neurobiologically Inspired Long-Term Memory NeurIPS
2024 Expel: LLM agents are experiential learners AAAI
2023 Generative Agents: Interactive Simulacra of Human Behavior UIST
2023 Reflexion: language agents with verbal reinforcement learning NeurIPS

Memory Alignment (Personalization)

Year Title Venue Method
2025 Democratizing Large Language Models via Personalized PEFT arXiv Tuning
2025 User-LLM: Efficient LLM Contextualization with User Embeddings WWW Encoding
2025 PersonaAgent: When LLM Agents Meet Personalization at Test Time arXiv RL/Agent
2025 CoPe: Personalized LLM Decoding via Contrasting Personal Preference arXiv Decoding
2025 Teaching Language Models to Evolve with Users: Dynamic Profile Modeling arXiv Tuning/RL
2025 LLMs + Persona-Plug = Personalized LLMs ACL Encoding
2025 Hello Again! LLM-powered Personalized Agent for Long-term Dialogue arXiv Prompting
2025 Enhancing Persona Consistency using Persona-Aware Contrastive Learning ACL Findings Learning
2025 Personalize Before Retrieve: LLM-based Personalized Query Expansion arXiv RAG
2025 Does your AI agent get you? Framework for approximating human models AAAI Profiling
2025 Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling arXiv Benchmarking
2025 Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following arXiv Eval
2025 On Memory Construction and Retrieval for Personalized Conversational Agents arXiv RAG
2025 ComMer: a Framework for Compressing and Merging User Data arXiv Compression
2025 Rehearse With User: Personalized Opinion Summarization via Role-Playing ACL Findings Summmarization
2025 Few-shot Personalization of LLMs with Mis-aligned Responses NAACL Prompting
2025 Measuring What Makes You Unique: Difference-Aware User Modeling ACL Findings Modeling
2025 Personalized Graph-Based Retrieval for Large Language Models arXiv RAG
2025 Matryoshka Pilot: Learning to Drive Black-Box LLMs with LLMs arXiv Tuning
2025 MemInsight: Autonomous Memory Augmentation for LLM Agents EMNLP Memory
2025 Personalized LLM Response Generation with Parameterized Memory Injection arXiv Memory
2025 Evaluating Personalized Tool-Augmented LLMs arXiv Tool Use
2025 MR.Rec: Synergizing Memory and Reasoning for Personalized Recommendation arXiv RecSys
2025 TOBUGraph: Knowledge Graph-Based Retrieval EMNLP Industry Graph
2024 LaMP: When Large Language Models Meet Personalization ACL RAG
2024 Direct Preference Optimization (DPO) NeurIPS Alignment
2024 Optimization Methods for Personalizing LLMs through Retrieval Augmentation SIGIR RAG
2024 Personalized LoRA for human-centered text understanding AAAI Tuning
2024 Customizing Language Models with Instance-wise LoRA NeurIPS Tuning
2024 Guided Profile Generation Improves Personalization with LLMs EMNLP Findings Profiling
2024 Understanding the Role of User Profile in the Personalization arXiv Profiling
2024 Knowledge Graph Tuning: Real-time LLM Personalization arXiv Tuning
2024 Lifelong Personalized Low-Rank Adaptation of LLMs arXiv Tuning
2024 Personalized Pieces: Efficient Personalized LLMs arXiv Tuning
2024 Personalized Language Modeling from Personalized Human Feedback arXiv RLHF
2024 Personalized Large Language Models ICDM General
2024 PersonaRAG: Enhancing RAG Systems with User-Centric Agents arXiv RAG
2024 MeMemo: On-device Retrieval Augmentation SIGIR RAG
2024 LLM-based Medical Assistant Personalization NAACL Application
2024 CheatAgent: Attacking LLM-Empowered Recommender Systems KDD RecSys
2024 Interpretable User Satisfaction Estimation for Conversational Systems ACL Metric
2023 Integrating Summarization and Retrieval for Enhanced Personalization arXiv RAG
2023 Teach LLMs to Personalize -- An Approach inspired by Writing Education arXiv Prompting
2023 Do LLMs Understand User Preferences? Evaluating LLMs On User Rating Prediction arXiv RecSys
2022 LoRA: Low-Rank Adaptation of Large Language Models ICLR Tuning
2022 Training language models to follow instructions with human feedback (RLHF) NeurIPS RLHF
2017 Proximal Policy Optimization Algorithms (PPO) arXiv RL
2017 A Theoretical Framework for Conversational Search CHIIR Theory

๐Ÿค Intersection: Persona-Memory Coordination

System Design

Year Title Venue
2025 CoSER: Coordinating LLM-Based Persona Simulation of Established Roles ICML
2025 A Persona-Aware LLM-Enhanced Framework for Multi-Session Personalized Dialogue ACL Findings
2025 Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning arXiv
2025 R-CHAR: A Metacognition-Driven Framework for Role-Playing EMNLP
2025 Character is Destiny: Can Persona-assigned Language Models Make Personal Choices? EMNLP Findings
2025 PRIME: Large Language Model Personalization with Cognitive Memory arXiv
2024 RoleInteract: Evaluating the Social Interaction of Role-Playing Agents arXiv
2024 SOTOPIA-ฯ€: Interactive Learning of Socially Intelligent Language Agents ACL

Applications

Year Title Domain
2025 CharacterBox: Evaluating the Role-Playing Capabilities in Virtual Worlds Simulation
2025 GuideLLM: Exploring LLM-Guided Conversation... in Autobiography Interviewing Interviewing
2025 Teaching According to Students' Aptitude: Personalized Mathematics Tutoring Education
2025 LLMs Can Simulate Standardized Patients via Agent Coevolution Medical
2025 Human or LLM as Standardized Patients? Medical
2025 Adaptive-VP: A Framework for LLM-Based Virtual Patients... Nurse Communication Medical
2025 Role-Playing LLM-Based Multi-Agent Support... Detecting Family Communication Bias Social Good
2025 GUARDIAN: Safeguarding LLM Multi-Agent Collaborations Safety
2025 A Joint Optimization Framework for Enhancing Efficiency of Tool Utilization Tools
2024 AgentVerse: Facilitating Multi-Agent Collaboration Multi-Agent
2024 Emergence of Social Norms in Generative Agent Societies Social Sim
2024 EduAgent: Generative Student Agents in Learning Education
2024 ACE: A LLM-based Negotiation Coaching System Coaching
2024 PsySafe: A Framework for Psychological-based Attack, Defense... of Multi-agent Safety
2023 Voyager: An Open-Ended Embodied Agent with Large Language Models Embodied
2023 CAMEL: Communicative Agents for "Mind" Exploration Multi-Agent
2023 War and Peace (WarAgent): LLM-based Multi-Agent Simulation of World Wars Simulation
2023 SoulChat: Improving LLMs' Empathy, Listening, and Comfort Abilities Companionship
2023 ReAct: Synergizing Reasoning and Acting in Language Models Agent Reasoning
2019 The Second Conversational Intelligence Challenge (ConvAI2) Challenge

โš–๏ธ General Evaluation & Benchmarks

Year Title Venue Metric/Focus
2025 MemBench: Towards More Comprehensive Evaluation on the Memory of LLM-based Agents ACL Findings Memory
2025 LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory ICLR Long-term
2025 In Prospect and Retrospect: Reflective Memory Management (MemBench) ACL Memory
2025 A Survey on Mixture of Experts in Large Language Models IEEE TKDE Architecture
2024 Evaluating Very Long-Term Conversational Memory of LLM Agents (LOCOMO) ACL Long-term
2024 Doing Personal LAPS: LLM-Augmented Dialogue... for Multi-Session Search SIGIR Session
2024 PerLTQA: A Personal Long-Term Memory Dataset arXiv QA/Memory
2024 LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding ACL Long Context
2024 Lost in the Middle: How Language Models Use Long Contexts TACL Context
2024 RULER: What's the Real Context Size of Your Long-Context Language Models? arXiv Context
2024 InfiniteBench: Extending Long Context Evaluation Beyond 100K Tokens arXiv Context
2024 Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference ICML Preference
2024 Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators arXiv Eval Bias
2023 Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena NeurIPS Judge
2023 G-Eval: NLG Evaluation using Gpt-4 with Better Human Alignment EMNLP Metric
2020 BLEURT: Learning Robust Metrics for Text Generation ACL Metric
2020 BERTScore: Evaluating Text Generation with BERT ICLR Metric
2004 ROUGE: A Package for Automatic Evaluation of Summaries ACL Metric
2002 Bleu: a Method for Automatic Evaluation of Machine Translation ACL Metric

๐Ÿ”– Citation

If you find this survey useful for your research, please cite:

@article{Wu2025Survey,
  title={True-to-Role, Tailored-to-You: A Survey of Role-Playing LLM Agents},
  author={},
  journal={arXiv preprint arXiv:25XX.XXXXX},
  year={2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors