-
Notifications
You must be signed in to change notification settings - Fork 492
Description
Thanks for a great survey and for setting up this repository!
I know the papers listed here from Zhiheng Xi et al. But if you are open to adding newer papers to the list, I would like to add AppWorld.
π Website: https://appworld.dev/
π Paper: https://arxiv.org/abs/2407.18901
π¦ Tweet: https://x.com/harsh3vedi/status/1818311843976233198
π¬ Blog: https://appworld.dev/blog
π¬ Video(s): https://appworld.dev/video
π Code: https://github.com/stonybrooknlp/appworld
π§ Data (task, trajectories) explorer, playground: https://appworld.dev/task-explorer
π API explorer: https://appworld.dev/api-explorer
π Leaderboard: https://appworld.dev/leaderboard
TLDR: Introduces AppWorld Engine, a high-fidelity execution environment of 9 day-to-day apps, operable via 457 APIs, populated with digital activities of 106 people living in a simulated world, and an associated benchmark of natural, diverse, and challenging autonomous agent tasks requiring rich and interactive coding.
In my opinion, AppWorld fits in the following (sub)sections.
- 4.1 Benchmarks for LLM-based Agents
- 1.3.1 Tool Using
- 3.2.1 Text-based Environment
- 2.1.1 Task-oriented Deployment (Web scenarios)
- 3.2.2 Virtual Sandbox Environment
- 1.1.5 Transferability and Generalization