🔥 Recognizing the paradigm shift brought by Large Language Models, we now focus exclusively on LLM-related research.
For those interested in previous research before LLM era, an archived version is available in Old Version (this archive is no longer maintained).
This repository collects awesome papers about Story Generation / Storytelling in LLM era.
Papers are listed chronologically (most recent first).
Thank you for the stars! We're continuously updating with the latest research. Cheers! 🍻
Your contributions matter! Please help keep this list current and accurate by opening issues or PRs for any mistakes or missing papers.
Contact: mayingpeng33 [AT] gmail [DOT] com
Eg. ACL-20XX Title [paper] [code] .. [authors]
ArXiv-2024What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation [paper] [Dingyi Yang, Qin Jin]CHI-2024The Value, Benefits, and Concerns of Generative AI-Powered Assistance in Writing [paper] [Zhuoyan Li, Chen Liang, Jing Peng, Ming Yin] [Mainly about ChatGPT, not including other models]ArXiv-2024Weaver: Foundation Models for Creative Writing [paper] [Tiannan Wang, Jiamin Chen, Qingrui Jia, Shuai Wang, Ruoyu Fang, ... , Yuchen Eleanor Jiang, Wangchunshu Zhou] [Foundation Models which focus on writing capabilities]Neurocomputing-2023Open-world Story Generation with Structured Knowledge Enhancement: A Comprehensive Survey [paper] [Yuxin Wang, Jieru Lin, Zhiwei Yu, Wei Hu, Börje F. Karlsson]EMNLP Findings-2023Are NLP Models Good at Tracing Thoughts: An Overview of Narrative Understanding [paper] [Lixing Zhu, Runcong Zhao, Lin Gui, Yulan He]
ArXiv-2025Learning to Reason for Long-Form Story Generation [paper] [Alexander Gurung, Mirella Lapata]NAACL-2025Generating Long-form Story Using Dynamic Hierarchical Outlining with Memory-Enhancement [paper] [Qianyue Wang, Jinwu Hu, Zhengping Li, Yufeng Wang, daiyuan li, Yu Hu, Mingkui Tan]EMNLP-2024Collective Critics for Creative Story Generation [paper] [Minwook Bae, Hyounghun Kim]ACL-2024Ex3: Automatic Novel Writing by Extracting, Excelsior and Expanding [paper] [Lei Huang, Jiaming Guo, Guanhua He, Xishan Zhang, Rui Zhang, Shaohui Peng, Shaoli Liu, Tianshi Chen]ACL Workshop-2025Guiding and Diversifying LLM-Based Story Generation via Answer Set Programming [paper] [Phoebe J. Wang, Max Kreminski]NAACL-2025Navigating the Path of Writing: Outline-guided Text Generation with Large Language Models [paper] [Yukyung Lee, Soonwon Ka, Bokyung Son, Pilsung Kang, Jaewook Kang]EACL-2024Creating Suspenseful Stories: Iterative Planning with Large Language Models [paper] [Kaige Xie, Mark Riedl] [Prompt Engineering]ArXiv-2023EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation [paper] [Wang You, Wenshan Wu, Yaobo Liang, Shaoguang Mao, Chenfei Wu, Maosong Cao, Yuzhe Cai, Yiduo Guo, Yan Xia, Furu Wei, Nan Duan] [Prompt Engineering]
ArXiv-2025HAMLET: Hyperadaptive Agent-based Modeling for Live Embodied Theatrics [paper] [Sizhou Chen, Shufan Jiang, Chi Zhang, Xiao-Lei Zhang, Xuelong Li]ArXiv-2025A Cognitive Writing Perspective for Constrained Long-Form Text Generation [paper] [Kaiyang Wan, Honglin Mu, Rui Hao, Haoran Luo, Tianle Gu, Xiuying Chen]ICLR-2025Agents' Room: Narrative Generation through Multi-step Collaboration [paper] [Fantine Huot, Reinald Kim Amplayo, Jennimaria Palomaki, Alice Shoshana Jakobovits, Elizabeth Clark, Mirella Lapata]ACL-2024IBSEN: Director-Actor Agent Collaboration for Controllable and Interactive Drama Script Generation [paper] [Senyu Han, Lu Chen, Li-Min Lin, Zhengshan Xu, Kai Yu]EMNLP Findings-2024HoLLMwood: Unleashing the Creativity of Large Language Models in Screenwriting via Role Playing [paper] [Jing Chen, Xinyu Zhu, Cheng Yang, Chufan Shi, Yadong Xi, Yuxiang Zhang, Junjie Wang, Jiashu Pu, Rongsheng Zhang, Yujiu Yang, Tian Feng]FDG-2024StoryVerse: Towards Co-authoring Dynamic Plot with LLM-based Character Simulation via Narrative Planning [paper] [Yi Wang, Qian Zhou, David Ledo] [virtual characters]IJCAI-2024AutoAgents: A Framework for Automatic Agent Generation [paper] [Guangyao Chen, Siwei Dong, Yu Shu, Ge Zhang, Jaward Sesay, Börje F. Karlsson, Jie Fu, Yemin Shi] [Prompt Engineering]
ArXiv-2024SEED-Story: Multimodal Long Story Generation with Large Language Model [paper] [Shuai Yang, Yuying Ge, Yang Li, Yukang Chen, Yixiao Ge, Ying Shan, Yingcong Chen]CVPR-2023Make-A-Story: Visual Memory Conditioned Consistent Story Generation [paper] [Tanzila Rahman, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Shweta Mahajan, Leonid Sigal]
ArXiv-2025All Stories Are One Story: Emotional Arc Guided Procedural Game Level Generation [paper] [Yunge Wen, Chenliang Huang, Hangyu Zhou, Zhuo Zeng, Chun Ming Louis Po, Julian Togelius, Timothy Merino, Sam Earle]ArXiv-2025Finding Flawed Fictions: Evaluating Complex Reasoning in Language Models via Plot Hole Detection [paper] [Kabir Ahuja, Melanie Sclar, Yulia Tsvetkov]ArXiv-2025Learning to Reason for Long-Form Story Generation [paper] [Alexander Gurung, Mirella Lapata]ArXiv-2024MLD-EA: Check and Complete Narrative Coherence by Introducing Emotions and Actions [paper] [Jinming Zhang, Yunfei Long]EMNLP Findings-2024SWAG: Storytelling With Action Guidance [paper] [Zeeshan Patel, Karim El-Refai, Jonathan Pei, Tianle Li] [Reinforcement learning / SFT]EMNLP Findings-2023Improving Pacing in Long-Form Story Planning [paper] [Yichen Wang, Kevin Yang, Xiaoming Liu, Dan Klein] [Story pacing]ArXiv-2023End-to-End Story Plot Generator [paper] [Hanlin Zhu, Andrew Cohen, Danqing Wang, Kevin Yang, Xiaomeng Yang, Jiantao Jiao, Yuandong Tian] [SFT]EMNLP Findings-2023GROVE: A Retrieval-augmented Complex Story Generation Framework with A Forest of Evidence [paper] [Zhihua Wen, Zhiliang Tian, Wei Wu, Yuxin Yang, Yanqi Shi, Zhen Huang, Dongsheng Li] [RAG]
ArXiv-2025SCORE: Story Coherence and Retrieval Enhancement for AI Narratives [paper] [Qiang Yi, Yangfan He, Jianhui Wang, Xinyuan Song, Shiyao Qian, Miao Zhang, Li Sun, Tianyu Shi]AIIDE-2024NarrativeGenie: Generating Narrative Beats and Dynamic Storytelling with Large Language Models [paper] [Vikram Kumaran, Jonathan Rowe, James Lester]ArXiv-2024Crafting Narrative Closures: Zero-Shot Learning with SSM Mamba for Short Story Ending Generation [paper] [Divyam Sharma, Divya Santhanam]ACL-2024MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation [paper] [code] [Yan Ma, Yu Qiao, Pengfei Liu]ArXiv-2024Multigenre AI-powered Story Composition [paper] [Edirlei Soares de Lima, Margot M. E. Neggers, Antonio L. Furtado]NAACL-2024Returning to the Start: Generating Narratives with Related Endpoints [paper] [code] [Anneliese Brei, Chao Zhao, Snigdha Chaturvedi] [SFT / Prompt Engineering]COLM-2024With Greater Text Comes Greater Necessity: Inference-Time Training Helps Long Text Generation [paper] [Y. Wang, D. Ma, D. Cai] [LoRA]ICLR-2024RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment [paper] [Kevin Yang, Dan Klein, Asli Celikyilmaz, Nanyun Peng, Yuandong Tian] [Reinforcement learning]ArXiv-2023RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text [paper] [code] [Wangchunshu Zhou, Yuchen Eleanor Jiang, Peng Cui, Tiannan Wang, Zhenxin Xiao, Yifan Hou, Ryan Cotterell, Mrinmaya Sachan] [Prompt Engineering]
ArXiv-2025STORY2GAME: Generating (Almost) Everything in an Interactive Fiction Game [paper] [Eric Zhou, Shreyas Basavatia, Moontashir Siam, Zexin Chen, Mark O. Riedl] [Game]ICLR-2025R^2: A LLM BASED NOVEL-TO-SCREENPLAY GENERATION FRAMEWORK WITH CAUSAL PLOT GRAPHS [paper] [Zefeng Lin, Yi Xiao, Zhiqiang Mo, Qifan Zhang, Jie Wang, Jiayang Chen, Jiajing Zhang, Hui Zhang, Zhengyi Liu, Xianyong Fang, Xiaohua Xu]ArXiv-2025Towards Enhanced Immersion and Agency for LLM-based Interactive Drama [paper] [Hongqiu Wu, Weiqi Wu, Tianyang Xu, Jiameng Zhang, Hai Zhao]ArXiv-2025Pastiche Novel Generation Creating: Fan Fiction You Love in Your Favorite Author's Style [paper] [Xueran Han, Yuhan Liu, Mingzhe Li, Wei Liu, Sen Hu, Rui Yan, Zhiqiang Xu, Xiuying Chen] [Style]ArXiv-2025Whose story is it? Personalizing story generation by inferring author styles [paper] [Nischal Ashok Kumar, Chau Minh Pham, Mohit Iyyer, Andrew Lan]EMNLP-2024MirrorStories: Reflecting Diversity through Personalized Narrative Generation with Large Language Models [paper] [Sarfaroz Yunusov, Hamza Sidat, Ali Emami]ArXiv-2024CAT-LLM: Prompting Large Language Models with Text Style Definition for Chinese Article-style Transfer [paper] [Zhen Tao, Dinghao Xi, Zhiyu Li, Liumin Tang, Wei Xu] [Style]ArXiv-2023Learning to Generate Text in Arbitrary Writing Styles [paper] [Aleem Khan, Andrew Wang, Sophia Hager, Nicholas Andrews] [Style]
ICCV AISTORY Workshop-2025Re:Verse -- Can Your VLM Read a Manga? [paper] [Aaditya Baranwal, Madhav Kataria, Naitik Agrawal, Yogesh S Rawat, Shruti Vyas]ArXiv-2025CoKe: Customizable Fine-Grained Story Evaluation via Chain-of-Keyword Rationalization [paper] [Brihi Joshi, Sriram Venkatapathy, Mohit Bansal, Nanyun Peng, Haw-Shiuan Chang]ArXiv-2025LongEval: A Comprehensive Analysis of Long-Text Generation Through a Plan-based Paradigm [paper] [Siwei Wu, Yizhi Li, Xingwei Qu, Rishi Ravikumar, Yucheng Li, Tyler Loakman, Shanghaoran Quan, Xiaoyong Wei, Riza Batista-Navarro, Chenghua Lin]ArXiv-2025Echoes in AI: Quantifying Lack of Plot Diversity in LLM Outputs [paper] [Weijia Xu, Nebojsa Jojic, Sudha Rao, Chris Brockett, Bill Dolan]ArXiv-2024Evaluating Creative Short Story Generation in Humans and Large Language Models [paper] [Mete Ismayilzada, Claire Stevenson, Lonneke van der Plas]ArXiv-2024CS4: Measuring the Creativity of Large Language Models Automatically by Controlling the Number of Story-Writing Constraints [paper] [Anirudh Atmakuru, Jatin Nainani, Rohith Siddhartha Reddy Bheemreddy, Anirudh Lakkaraju, Zonghai Yao, Hamed Zamani, Haw-Shiuan Chang]COLING-2025Small Language Models can Outperform Humans in Short Creative Writing: A Study Comparing SLMs with Humans and LLMs [paper] [Guillermo Marco, Luz Rello, Julio Gonzalo]NAACL-2025FACTTRACK: Time-Aware World State Tracking in Story Outlines [paper] [Zhiheng Lyu, Kevin Yang, Lingpeng Kong, Daniel Klein]EMNLP-2024Are Large Language Models Capable of Generating Human-Level Narratives? [paper] [Yufei Tian, Tenghao Huang, Miri Liu, Derek Jiang, Alexander Spangher, Muhao Chen, Jonathan May, Nanyun Peng]EMNLP-2024STORYSUMM: Evaluating Faithfulness in Story Summarization [paper] [Melanie Subbiah, Faisal Ladhak, Akankshya Mishra, Griffin Adams, Lydia B. Chilton, Kathleen McKeown]ArXiv-2024Pron vs Prompt: Can Large Language Models already Challenge a World-Class Fiction Author at Creative Text Writing? [paper] [Guillermo Marco, Julio Gonzalo, Ramón del Castillo, María Teresa Mateo Girona]EMNLP-2024Measuring Psychological Depth in Language Models [paper] [Fabrice Harel-Canada, Hanyu Zhou, Sreya Muppalla, Zeynep Yildiz, Miryung Kim, Amit Sahai, Nanyun Peng]TACL-2024Do Language Models Enjoy Their Own Stories? Prompting Large Language Models for Automatic Story Evaluation [paper] [Cyril Chhun, Fabian M. Suchanek, Chloé Clavel]TACL-2024Reading Subtext: Evaluating Large Language Models on Short Story Summarization with Writers [paper] [Melanie Subbiah, Sean Zhang, Lydia B. Chilton, Kathleen McKeown]EMNLP Findings-2023A Confederacy of Models: a Comprehensive Evaluation of LLMs on Creative Writing [paper] [Carlos Gómez-Rodríguez, Paul Williams]EMNLP-2024Learning Personalized Alignment for Evaluating Open-ended Text Generation [paper] [Danqing Wang, Kevin Yang, Hanlin Zhu, Xiaomeng Yang, Andrew Cohen, Lei Li, Yuandong Tian]ICLR-2024BooookScore: A systematic exploration of book-length summarization in the era of LLMs[paper][Yapei Chang, Kyle Lo, Tanya Goyal, Mohit Iyyer]TMLR-2024TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks[paper][Dongfu Jiang, Yishan Li, Ge Zhang, Wenhao Huang, Bill Yuchen Lin, Wenhu Chen]CHI-2023Art or Artifice? Large Language Models and the False Promise of Creativity [paper] [Tuhin Chakrabarty, Philippe Laban, Divyansh Agarwal, Smaranda Muresan, Chien-Sheng Wu]ACL-2023HAUSER: Towards Holistic and Automatic Evaluation of Simile Generation [paper] [Qianyu He, Yikai Zhang, Jiaqing Liang, Yuncheng Huang, Yanghua Xiao, Yunwen Chen]ACL-2023Can Large Language Models Be an Alternative to Human Evaluations? [paper] [Cheng-Han Chiang, Hung-yi Lee]EMNLP Findings-2023DeltaScore: Evaluating Story Generation with Differentiating Perturbations [paper] [Zhuohan Xie, Miao Li, Trevor Cohn, Jey Han Lau]EMNLP-2022StoryER: Automatic Story Evaluation via Ranking, Rating and Reasoning [paper] [Hong Chen, Duc Minh Vo, Hiroya Takamura, Yusuke Miyao, Hideki Nakayama]COLING-2022Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation [paper] [Cyril Chhun, Pierre Colombo, Chloé Clavel, Fabian M. Suchanek]TACL-2022LOT: A story-centric benchmark for evaluating Chinese long text understanding and generation [paper] [Jian Guan, Zhuoer Feng, Yamei Chen, Ruilin He, Xiaoxi Mao, Changjie Fan, Minlie Huang]ACL-2021Openmeva: A benchmark for evaluating open-ended story generation metrics [paper] [Jian Guan, Zhexin Zhang, Zhuoer Feng, Zitao Liu, Wenbiao Ding, Xiaoxi Mao, Changjie Fan, Minlie Huang]EMNLP-2020Union: An unreferenced metric for evaluating open-ended story generation [paper] [code] [Jian Guan, Minlie Huang]
EMNLP Findings-2024BookWorm: A Dataset for Character Description and Analysis [paper] [Argyrios Papoudakis, Mirella Lapata, Frank Keller]EMNLP Workshop-2025The GPT-WritingPrompts Dataset: A Comparative Analysis of Character Portrayal in Short Stories [paper] [Xi Yu Huang, Krishnapriya Vishnubhotla, Frank Rudzicz]NAACL Findings-2025CollabStory: Multi-LLM Collaborative Story Generation and Authorship Analysis [paper] [Saranya Venkatraman, Nafis Irtiza Tripto, Dongwon Lee]ACL Findings-2024Large Language Models Fall Short: Understanding Complex Relationships in Detective Narratives [paper] [Runcong Zhao, Qinglin Zhu, Hainiu Xu, Jiazheng Li, Yuxiang Zhou, Yulan He, Lin Gui]LREC-COLING-2024CLAUSE-ATLAS: A Corpus of Narrative Information to Scale up Computational Literary Analysis [paper] [Enrica Troiano, Piek T.J.M. Vossen]LREC-COLING-2024Reflections & Resonance: Two-Agent Partnership for Advancing LLM-based Story Annotation [paper] [Yuetian Chen, Mei Si]LREC-COLING-2024CMDAG: A Chinese Metaphor Dataset with Annotated Grounds as CoT for Boosting Metaphor Generation [paper] [Yujie Shao, Xinrong Yao, Xingwei Qu, Chenghua Lin, Shi Wang, Stephen W. Huang, Ge Zhang, Jie Fu]ArXiv-2023STONYBOOK: A System and Resource for Large-Scale Analysis of Novels [paper] [Charuta Pethe, Allen Kim, Rajesh Prabhakar, Tanzir Pial, Steven Skiena]ACL-2023StoryWars: A Dataset and Instruction Tuning Baselines for Collaborative Story Understanding and Generation [paper] [Yulun Du, Lydia Chilton]NAACL-2022A corpus for understanding and generating moral stories [paper] [Jian Guan, Ziqi Liu, Minlie Huang]EVAL4NLP-2021StoryDB: Broad Multi-language Narrative Dataset [paper] [Alexey Tikhonov, Igor Samenko, Ivan P. Yamshchikov]ACL-2022SummScreen: A Dataset for Abstractive Screenplay Summarization [paper] [data] [Mingda Chen, Zewei Chu, Sam Wiseman, Kevin Gimpel]ArXiv-2021TVStoryGen: A Dataset for Generating Stories with Character Descriptions [paper] [Mingda Chen, Kevin Gimpel]EMNLP-2020STORIUM: A Dataset and Evaluation Platform for Machine-in-the-Loop Story Generation [paper] [Nader Akoury, Shufan Wang, Josh Whiting, Stephen Hood, Nanyun Peng, Mohit Iyyer]NAACL-2016A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories [paper] [Nasrin Mostafazadeh, Nathanael Chambers, Xiaodong He, Devi Parikh, Dhruv Batra, Lucy Vanderwende, Pushmeet Kohli, James Allen]
- Understanding AI for Stories serves as a survey blog that delves into the application of AI in the realm of story generation, shedding light on its potential as well as the challenges that it encounters.
- ROC Stories is a compilation of 100,000 five-sentence stories and 3,742 Story Cloze Test stories, capturing a rich array of causal and temporal commonsense connections between everyday events, making it suitable for story generation tasks.
- CommonGen was developed by combining crowdsourced and existing caption corpora, containing 79k commonsense descriptions across 35k distinct concept-sets.
- CMU Movie Summary Corpus offers access to a dataset containing movie plot summaries and related metadata.
- Scifi TV Show Plot Summaries & Events is a collection of plot synopses for long-running (80+ episodes) science fiction TV shows, sourced from Fandom.com wikis.