Agentic Web: Weaving the Next Web with AI Agents

Yingxuan Yang¹ Mulei Ma² Yuxuan Huang³ Huacan Chai¹ Chenyu Gong² Haoran Geng⁴ Yuanjian Zhou⁵ Ying Wen¹ Meng Fang³ Muhao Chen⁶ Shangding Gu^4* Ming Jin⁷ Costas Spanos⁴ Yang Yang² Pieter Abbeel⁴ Dawn Song⁴ Weinan Zhang^1,5* Jun Wang^8*

¹Shanghai Jiao Tong University ²The Hong Kong University of Science and Technology, Guangzhou ³University of Liverpool ⁴University of California, Berkeley ⁵Shanghai Innovation Institute ⁶University of California, Davis ⁷Virginia Tech ⁸University College London

* Corresponding authors (Project Lead).

The repository is for Agentic Web research, in which we investigate various agentic web studies. If any authors do not want their paper to be listed here, please feel free to contact [email protected]. (This repository is under actively development. We appreciate any constructive comments and suggestions)

You are more than welcome to update this list! If you find a paper about agentic web which is not listed here, please

fork this repository, add it and merge back;
or report an issue here;
or email [email protected]

Content

Agentic Web Development
Information Retrieval
Recommendation
Agent Planning
Multi-Agent Learning
Safety and Security
Benchmark
Citation

Figure 1: Web Evolution: From Directories to Agents.

Figure 2: Timeline of Web Evolution: Three Major Eras. Note: These eras are not strictly separated. Transitions occurred gradually, and features of one era often coexisted with the next. Technologies and business models frequently overlapped during these transitions.

Agentic Web Development

BetaWeb: Towards a Blockchain-enabled Trustworthy Agentic Web by Guo, Zihan, Yuanjian Zhou, Chenyi Wang, Linlin You, Minjie Bian, and Weinan Zhang. 2025
A Survey of AI Agent Registry Solutions by Aditi Singh, Abul Ehtesham, Ramesh Raskar, Mahesh Lambe, Pradyumna Chari, Jared James Grogan, Abhishek Singh, Saket Kumar. 2025
Using the NANDA Index Architecture in Practice: An Enterprise Perspective by Sichao Wang, Ramesh Raskar, Mahesh Lambe, Pradyumna Chari, Rekha Singhal, Shailja Gupta, Rajesh Ranjan, Ken Huang. 2025
Web3 x AI Agents: Landscape, Integrations, and Foundational Challenges by Yiming Shen, Jiashuo Zhang, Zhenzhe Shao, Wenxuan Luo, Yanlin Wang, Ting Chen, Zibin Zheng, Jiachi Chen. 2025
Plan-and-act: Improving planning of agents for long-horizon tasks by Erdogan, Lutfi Eren, Nicholas Lee, Sehoon Kim, Suhong Moon, Hiroki Furuta, Gopala Anumanchipalli, Kurt Keutzer, and Amir Gholami. 2025
A survey of webagents: Towards next-generation ai agents for web automation with large foundation models by Ning, Liangbo, Ziran Liang, Zhuohang Jiang, Haohao Qu, Yujuan Ding, Wenqi Fan, Xiao-yong Wei et al. 2025
WebDancer: Towards Autonomous Information Seeking Agency by Wu, Jialong, Baixuan Li, Runnan Fang, Wenbiao Yin, Liwen Zhang, Zhengwei Tao, Dingchu Zhang et al. 2025
From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents by Zhang, Weizhi, Yangning Li, Yuanchen Bei, Junyu Luo, Guancheng Wan, Liangwei Yang, Chenxuan Xie et al. 2025
MA-RAG: Multi-Agent Retrieval-Augmented Generation via Collaborative Chain-of-Thought Reasoning by Nguyen, Thang, Peter Chin, and Yu-Wing Tai. 2025
Deep Research Agents: A Systematic Examination And Roadmap by Huang, Yuxuan, Yihang Chen, Haozheng Zhang, Kang Li, Meng Fang, Linyi Yang, Xiaoguang Li et al. 2025
Smartagent: Chain-of-user-thought for embodied personalized agent in cyber world by Zhang, Jiaqi, Chen Gao, Liyuan Zhang, Yong Li, and Hongzhi Yin. 2025
ArchRAG: Attributed Community-based Hierarchical Retrieval-Augmented Generation by Wang, Shu, Yixiang Fang, Yingli Zhou, Xilin Liu, and Yuchi Ma. 2025
Macrec: A multi-agent collaboration framework for recommendation by Wang, Zhefan, Yuanqing Yu, Wendi Zheng, Weizhi Ma, and Min Zhang. 2024
Webarena: A realistic web environment for building autonomous agents by Zhou, Shuyan, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng et al. 2023
Toolllm: Facilitating large language models to master 16000+ real-world apis by Qin, Yujia, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin et al. 2023
Api-bank: A comprehensive benchmark for tool-augmented llms by Li, Minghao, Yingxiu Zhao, Bowen Yu, Feifan Song, Hangyu Li, Haiyang Yu, Zhoujun Li, Fei Huang, and Yongbin Li. 2023
React: Synergizing reasoning and acting in language models by Yao, Shunyu, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023
Voyager: An open-ended embodied agent with large language models by Wang, Guanzhi, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023
Toolformer: Language models can teach themselves to use tools by Schick, Timo, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023
Swe-bench: Can language models resolve real-world github issues? by Jimenez, Carlos E., John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. 2023
Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face by Shen, Yongliang, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, and Yueting Zhuang. 2023
Training language models to follow instructions with human feedback by Ouyang, Long, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang et al. 2022
Webgpt: Browser-assisted question-answering with human feedback by Nakano, Reiichiro, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse et al. 2021
Deep reinforcement learning for list-wise recommendations by Zhao, Xiangyu, Liang Zhang, Long Xia, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2019
SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets. by Ie, Eugene, Vihan Jain, Jing Wang, Sanmit Narvekar, Ritesh Agarwal, Rui Wu, Heng-Tze Cheng, Tushar Chandra, and Craig Boutilier. 2019

Information Retrieval

Agentic information retrieval by Zhang, Weinan, Junwei Liao, Ning Li, Kounianhua Du, and Jianghao Lin. 2025
Large language models for generative information extraction: A survey by Xu, Derong, Wei Chen, Wenjun Peng, Chao Zhang, Tong Xu, Xiangyu Zhao, Xian Wu, Yefeng Zheng, Yang Wang, and Enhong Chen. 2024
Bias and unfairness in information retrieval systems: New challenges in the llm era by Dai, Sunhao, Chen Xu, Shicheng Xu, Liang Pang, Zhenhua Dong, and Jun Xu. 2024
Retrieval-augmented code generation for universal information extraction by Guo, Yucan, Zixuan Li, Xiaolong Jin, Yantao Liu, Yutao Zeng, Wenxuan Liu, Xiang Li et al. 2024
Large language models for information retrieval: A survey by Zhu, Yutao, Huaying Yuan, Shuting Wang, Jiongnan Liu, Wenhan Liu, Chenlong Deng, Haonan Chen, Zheng Liu, Zhicheng Dou, and Ji-Rong Wen. 2023
Inpars-v2: Large language models as efficient dataset generators for information retrieval by Jeronymo, Vitor, Luiz Bonifacio, Hugo Abonizio, Marzieh Fadaee, Roberto Lotufo, Jakub Zavrel, and Rodrigo Nogueira. 2023
Unified structure generation for universal information extraction by Lu, Yaojie, Qing Liu, Dai Dai, Xinyan Xiao, Hongyu Lin, Xianpei Han, Le Sun, and Hua Wu. 2022
An Adversarial Imitation Click Model for Information Retrieval by Dai, Xinyi, Jianghao Lin, Weinan Zhang, Shuai Li, Weiwen Liu, Ruiming Tang, Xiuqiang He, Jianye Hao, Jun Wang, and Yong Yu. 2021
A survey on deep matrix factorizations by De Handschutter, Pierre, Nicolas Gillis, and Xavier Siebert. 2021
Deepcf: A unified framework of representation learning and matching function learning in recommender system by Deng, Zhi-Hong, Ling Huang, Chang-Dong Wang, Jian-Huang Lai, and Philip S. Yu. 2019
On the Equilibrium of Query Reformulation and Document Retrieval by Zou, Shihao, Guanyu Tao, Jun Wang, Weinan Zhang, and Dell Zhang. 2018
Irgan: A minimax game for unifying generative and discriminative information retrieval models by Wang, Jun, Lantao Yu, Weinan Zhang, Yu Gong, Yinghui Xu, Benyou Wang, Peng Zhang, and Dell Zhang. 2017
Neural collaborative filtering by He, Xiangnan, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017
DeepFM: a factorization-machine based neural network for CTR prediction by Guo, Huifeng, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017
Wide & deep learning for recommender systems by Cheng, Heng-Tze, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson et al. 2016
Autorec: Autoencoders meet collaborative filtering by Sedhain, Suvash, Aditya Krishna Menon, Scott Sanner, and Lexing Xie. 2015
Top-k Retrieval using Facility Location Analysis by Zuccon, Guido, Leif Azzopardi, Dell Zhang, and Jun Wang. 2012
Mean-variance analysis: A new document ranking theory in information retrieval by Wang, Jun. 2009
Portfolio theory of information retrieval by Wang, Jun, and Jianhan Zhu. 2009
The probabilistic relevance framework: BM25 and beyond by Robertson, Stephen, and Hugo Zaragoza. 2009
Matrix factorization techniques for recommender systems by Koren, Yehuda, Robert Bell, and Chris Volinsky. 2009
Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords by Edelman, Benjamin, Michael Ostrovsky, and Michael Schwarz. 2007
A user-item relevance model for log-based collaborative filtering by Wang, Jun, Arjen P. De Vries, and Marcel JT Reinders. 2006
Item-based collaborative filtering recommendation algorithms by Sarwar, Badrul, George Karypis, Joseph Konstan, and John Riedl. 2001
The PageRank citation ranking: Bringing order to the web by Page, Lawrence, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999
Indexing by latent semantic analysis by Deerwester, Scott, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. 1990
A statistical interpretation of term specificity and its application in retrieval by Sparck Jones, Karen. 1972

Recommendation

A survey of large language model empowered agents for recommendation and search: Towards next-generation information retrieval by Zhang, Yu, Shutong Qiao, Jiaqi Zhang, Tzu-Heng Lin, Chen Gao, and Yong Li. 2025
A survey on llm-powered agents for recommender systems by Peng, Qiyao, Hongtao Liu, Hua Huang, Qing Yang, and Minglai Shao. 2025
AgentRecBench: Benchmarking LLM Agent-based Personalized Recommender Systems by Shang, Yu, Peijie Liu, Yuwei Yan, Zijing Wu, Leheng Sheng, Yuanqing Yu, Chumeng Jiang et al. 2025
Deep reinforcement learning based resource allocation for network slicing with massive MIMO by Yan, Dandan, Benjamin K. Ng, Wei Ke, and Chan-Tong Lam. 2024
Macrec: A multi-agent collaboration framework for recommendation by Wang, Zhefan, Yuanqing Yu, Wendi Zheng, Weizhi Ma, and Min Zhang. 2024
Raserec: Retrieval-augmented sequential recommendation by Zhao, Xinping, Baotian Hu, Yan Zhong, Shouzheng Huang, Zihao Zheng, Meng Wang, Haofen Wang, and Min Zhang. 2024
Probing early modification of gravity with Planck, ACT and SPT by Abellán, Guillermo Franco, Matteo Braglia, Mario Ballardini, Fabio Finelli, and Vivian Poulin. 2023
SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets by Ie, Eugene, Vihan Jain, Jing Wang, Sanmit Narvekar, Ritesh Agarwal, Rui Wu, Heng-Tze Cheng, Tushar Chandra, and Craig Boutilier. 2019
Novelty and diversity metrics for recommender systems: choice, discovery and relevance vy Castells, Pablo, Saúl Vargas, and Jun Wang. 2011

Agent Planning

Plangenllms: A modern survey of llm planning capabilities by Wei, Hui, Zihao Zhang, Shenghua He, Tian Xia, Shijia Pan, and Fei Liu. 2025
Plan-and-act: Improving planning of agents for long-horizon tasks by Erdogan, Lutfi Eren, Nicholas Lee, Sehoon Kim, Suhong Moon, Hiroki Furuta, Gopala Anumanchipalli, Kurt Keutzer, and Amir Gholami. 2025
Acpbench: Reasoning about action, change, and planning by Kokel, Harsha, Michael Katz, Kavitha Srinivas, and Shirin Sohrabi. 2025
Natural plan: Benchmarking llms on natural language planning by Zheng, Huaixiu Steven, Swaroop Mishra, Hugh Zhang, Xinyun Chen, Minmin Chen, Azade Nova, Le Hou et al. 2024
Adaplanner: Adaptive planning from feedback with language models by Sun, Haotian, Yuchen Zhuang, Lingkai Kong, Bo Dai, and Chao Zhang. 2023
Toolllm: Facilitating large language models to master 16000+ real-world apis by Qin, Yujia, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin et al. 2023
Webarena: A realistic web environment for building autonomous agents by Zhou, Shuyan, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng et al. 2023

Multi-Agent Learning

Realm-bench: A real-world planning benchmark for llms and multi-agent systems by Geng, Longling, and Edward Y. Chang. 2025
Autogen: Enabling next-gen LLM applications via multi-agent conversations by Wu, Qingyun, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang et al. 2024
Agentboard: An analytical evaluation board of multi-turn llm agents by Chang, Ma, Junlei Zhang, Zhihao Zhu, Cheng Yang, Yujiu Yang, Yaohui Jin, Zhenzhong Lan, Lingpeng Kong, and Junxian He. 2024
Learning to use tools via cooperative and interactive agents by Shi, Zhengliang, Shen Gao, Xiuyi Chen, Yue Feng, Lingyong Yan, Haibo Shi, Dawei Yin, Pengjie Ren, Suzan Verberne, and Zhaochun Ren. 2024
Camel: Communicative agents for" mind" exploration of large language model society by Li, Guohao, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. 2023
Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors in agents by Chen, Weize, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chen Qian, Chi-Min Chan et al. 2023
Metagpt: Meta programming for multi-agent collaborative framework by Hong, Sirui, Xiawu Zheng, Jonathan Chen, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang et al. 2023
Taxai: A dynamic economic simulator and benchmark for multi-agent reinforcement learning by Mi, Qirui, Siyu Xia, Yan Song, Haifeng Zhang, Shenghao Zhu, and Jun Wang. 2023
A Game-Theoretic Framework for Managing Risk in Multi-Agent Systems by Slumbers, Oliver, David Henry Mguni, Stefano B. Blumberg, Stephen Marcus Mcaleer, Yaodong Yang, and Jun Wang. 2023
Chatdev: Communicative agents for software development by Qian, Chen, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang et al. 2023
MarlRank: Multi-agent Reinforced Learning to Rank by Zou, Shihao, Zhonghua Li, Mohammad Akbari, Jun Wang, and Peng Zhang. 2019
Magent: A many-agent reinforcement learning platform for artificial collective intelligence by Zheng, Lianmin, Jiacheng Yang, Han Cai, Ming Zhou, Weinan Zhang, Jun Wang, and Yong Yu. 2018
Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising by Jin, Junqi, Chengru Song, Han Li, Kun Gai, Jun Wang, and Weinan Zhang. 2018

Safety and Security

Securing agentic ai: A comprehensive threat model and mitigation framework for generative ai agents by Narajala, Vineeth Sai, and Om Narayan. 2025
Open challenges in multi-agent security: Towards secure systems of interacting ai agents by de Witt, Christian Schroeder. 2025
Model context protocol (mcp): Landscape, security threats, and future research directions by Hou, Xinyi, Yanjie Zhao, Shenao Wang, and Haoyu Wang. 2025
Ai agents under threat: A survey of key security challenges and future pathways by Deng, Zehang, Yongjian Guo, Changzhou Han, Wanlun Ma, Junwu Xiong, Sheng Wen, and Yang Xiang. 2025
Enterprise-grade security for the model context protocol (mcp): Frameworks and mitigation strategies by Narajala, Vineeth Sai, and Idan Habler. 2025
Position: AI Safety Must Embrace an Antifragile Perspective by Jin, Ming, and Hyunin Lee. 2025
Red-teaming llm multi-agent systems via communication attacks by He, Pengfei, Yupin Lin, Shen Dong, Han Xu, Yue Xing, and Hui Liu. 2025
Skin-in-the-game: Decision making via multi-stakeholder alignment in llms by Sel, Bilgehan, Priya Shanmugasundaram, Mohammad Kachuee, Kun Zhou, Ruoxi Jia, and Ming Jin. 2024
Ai safety in generative ai large language models: A survey by Chua, Jaymari, Yun Li, Shiyi Yang, Chen Wang, and Lina Yao. 2024
Agent-safetybench: Evaluating the safety of llm agents by Zhang, Zhexin, Shiyao Cui, Yida Lu, Jingzhuo Zhou, Junxiao Yang, Hongning Wang, and Minlie Huang.
Mart: Improving llm safety with multi-round automatic red-teaming by Ge, Suyu, Chunting Zhou, Rui Hou, Madian Khabsa, Yi-Chia Wang, Qifan Wang, Jiawei Han, and Yuning Mao. 2023
Aart: Ai-assisted red-teaming with diverse data generation for new llm-powered applications by Radharapu, Bhaktipriya, Kevin Robinson, Lora Aroyo, and Preethi Lahoti. 2023
Red teaming language models with language models by Perez, Ethan, Saffron Huang, Francis Song, Trevor Cai, Roman Ring, John Aslanides, Amelia Glaese, Nat McAleese, and Geoffrey Irving. 2022
Improving alignment of dialogue agents via targeted human judgements by Glaese, Amelia, Nat McAleese, Maja Trębacz, John Aslanides, Vlad Firoiu, Timo Ewalds, Maribeth Rauh et al. 2022
Adversarial training for high-stakes reliability by Ziegler, Daniel, Seraphina Nix, Lawrence Chan, Tim Bauman, Peter Schmidt-Nielsen, Tao Lin, Adam Scherlis et al. 2022
Analyzing dynamic adversarial training data in the limit by Wallace, Eric, Adina Williams, Robin Jia, and Douwe Kiela. 2021
Dynabench: Rethinking benchmarking in NLP by Kiela, Douwe, Max Bartolo, Yixin Nie, Divyansh Kaushik, Atticus Geiger, Zhengxuan Wu, Bertie Vidgen et al. 2021
Beyond accuracy: Behavioral testing of NLP models with CheckList by Ribeiro, Marco Tulio, Tongshuang Wu, Carlos Guestrin, and Sameer Singh. 2020
HateCheck: Functional tests for hate speech detection models by Röttger, Paul, Bertram Vidgen, Dong Nguyen, Zeerak Waseem, Helen Margetts, and Janet B. 2020
Recipes for safety in open-domain chatbots by Xu, Jing, Da Ju, Margaret Li, Y-Lan Boureau, Jason Weston, and Emily Dinan. 2020
Counterfactual fairness in text classification through robustness by Garg, Sahaj, Vincent Perot, Nicole Limtiaco, Ankur Taly, Ed H. Chi, and Alex Beutel. 2019
Avoiding reasoning shortcuts: Adversarial evaluation, training, and model development for multi-hop QA by Jiang, Yichen, and Mohit Bansal. 2019
Build it break it fix it for dialogue safety: Robustness from adversarial human attack by Dinan, Emily, Samuel Humeau, Bharath Chintagunta, and Jason Weston. 2019
Adversarial NLI: A new benchmark for natural language understanding by Nie, Yixin, Adina Williams, Emily Dinan, Mohit Bansal, Jason Weston, and Douwe Kiela. 2019
The malicious use of artificial intelligence: Forecasting, prevention, and mitigation by Brundage, Miles, Shahar Avin, Jack Clark, Helen Toner, Peter Eckersley, Ben Garfinkel, Allan Dafoe et al. 2018
Measuring and mitigating unintended bias in text classification by Dixon, Lucas, John Li, Jeffrey Sorensen, Nithum Thain, and Lucy Vasserman. 2018
Adversarial examples for evaluating reading comprehension systems by Jia, Robin, and Percy Liang. 2017
Concrete problems in AI safety by Amodei, Dario, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. 2016

Benchmark

Safearena: Evaluating the safety of autonomous web agents by Tur, Ada Defne, Nicholas Meade, Xing Han Lù, Alejandra Zambrano, Arkil Patel, Esin Durmus, Spandana Gella, Karolina Stańczak, and Siva Reddy. 2025
An Illusion of Progress? Assessing the Current State of Web Agents by Xue, Tianci, Weijian Qi, Tianneng Shi, Chan Hee Song, Boyu Gou, Dawn Song, Huan Sun, and Yu Su. 2025
Workarena: How capable are web agents at solving common knowledge work tasks? by Drouin, Alexandre, Maxime Gasse, Massimo Caccia, Issam H. Laradji, Manuel Del Verme, Tom Marty, Léo Boisvert et al. 2024
The browsergym ecosystem for web agent research by Chezelles, De, Thibault Le Sellier, Sahar Omidi Shayegan, Lawrence Keunho Jang, Xing Han Lù, Ori Yoran, Dehan Kong et al. 2024
AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks? by Yoran, Ori, Samuel Joseph Amouyal, Chaitanya Malaviya, Ben Bogin, Ofir Press, and Jonathan Berant. 2024
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks by Koh, Jing Yu, Robert Lo, Lawrence Jang, Vikram Duvvur, Ming Chong Lim, Po-Yu Huang, Graham Neubig, Shuyan Zhou, Ruslan Salakhutdinov, and Daniel Fried. 2024
St-webagentbench: A benchmark for evaluating safety and trustworthiness in web agents by Levy, Ido, Ben Wiesel, Sami Marreed, Alon Oved, Avi Yaeli, and Segev Shlomov. 2024
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents by Yuan, Tongxin, Zhiwei He, Lingzhong Dong, Yiming Wang, Ruijie Zhao, Tian Xia, Lizhen Xu et al. 2024
Webcanvas: Benchmarking web agents in online environments by Pan, Yichen, Dehan Kong, Sida Zhou, Cheng Cui, Yifei Leng, Bing Jiang, Hangyu Liu et al. 2024
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models by He, Hongliang, Wenlin Yao, Kaixin Ma, Wenhao Yu, Yong Dai, Hongming Zhang, Zhenzhong Lan, and Dong Yu. 2024
Mind2web: Towards a generalist agent for the web by Deng, Xiang, Yu Gu, Boyuan Zheng, Shijie Chen, Sam Stevens, Boshi Wang, Huan Sun, and Yu Su. 2023
Webshop: Towards scalable real-world web interaction with grounded language agents by Yao, Shunyu, Howard Chen, John Yang, and Karthik Narasimhan. 2022

Citation

If you find the repository useful, please cite the study

@article{yang2025agentic,
  title={Agentic Web: Weaving the Next Web with AI Agents},
  author={Yang, Yingxuan and Ma, Mulei and Huang, Yuxuan and Chai, Huacan and Gong, Chenyu and Geng, Haoran and Zhou, Yuanjian and Wen, Ying and Fang, Meng and Chen, Muhao and others},
  journal={arXiv preprint arXiv:2507.21206},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
docs		docs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Agentic Web: Weaving the Next Web with AI Agents

Agentic Web Development

Information Retrieval

Recommendation

Agent Planning

Multi-Agent Learning

Safety and Security

Benchmark

Citation

About

Uh oh!

Releases

Packages

SafeRL-Lab/agentic-web

Folders and files

Latest commit

History

Repository files navigation

Agentic Web: Weaving the Next Web with AI Agents

Agentic Web Development

Information Retrieval

Recommendation

Agent Planning

Multi-Agent Learning

Safety and Security

Benchmark

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages