Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
234 changes: 180 additions & 54 deletions docs/mkdocs/docs/all_about_agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,104 +7,230 @@ Welcome to our comprehensive resource collection for AI agents. This page curate
## Table of Contents

!!! abstract "Resource Categories"
1. [Agent Frameworks](#agent-frameworks)
2. [Agent Memory](#agent-memory)
3. [Papers](#papers)
4. [Evaluation](#evaluation)
1. [Agent Papers](#papers)
2. [Agent Frameworks](#agent-frameworks)
3. [Evaluation](#evaluation)
4. [Agent Memory](#agent-memory)
5. [Blogs](#blogs)

---

## Agent Frameworks

!!! info "Popular Agent Development Frameworks"
Comprehensive frameworks for building and deploying AI agents across different domains.

- **MiroFlow**: Build, manage, and scale your AI agents with ease
- [:material-github: GitHub](https://github.com/MiroMindAI/MiroFlow)

- **Youtu-Agent**: A simple yet powerful agent framework that delivers with open-source models
- [:material-github: GitHub](https://github.com/TencentCloudADP/youtu-agent)

- **OpenManus**: No fortress, purely open ground. OpenManus is Coming
- [:material-github: GitHub](https://github.com/FoundationAgents/OpenManus)

- **OpenBB Platform**: Financial data platform for analysts, quants and AI agents
- [:material-link: Project](https://github.com/OpenBB-finance/OpenBB)

---

## Agent Memory

!!! tip "Memory Systems for Persistent Agent Intelligence"
Advanced memory solutions for building agents with long-term context and learning capabilities.

- **Mem0**: Building Production- Ready AI Agents with Scalable Long-Term Memory
- [:material-github: GitHub](https://github.com/mem0ai/mem0)

- **memobase**: Profile-Based Long-Term Memory for AI Applications
- [:material-github: GitHub](https://github.com/memodb-io/memobase)

- **Memento**: Fine-tuning LLM Agents without Fine-tuning LLMs
- [:material-file-document: Paper](https://arxiv.org/abs/2508.16153) · [:material-github: GitHub](https://github.com/Agent-on-the-Fly/Memento)

---


## Papers
### Agent Papers

!!! note "Research Papers & Publications"
Latest research in agent systems, methodologies, and theoretical foundations.

- **WebThinker**: Empowering Large Reasoning Models with Deep Research Capability
- [:material-file-document: Paper](https://arxiv.org/abs/2504.21776) · [:material-github: GitHub](https://github.com/sunnynexus/WebThinker)
- **Profile-Aware Maneuvering**: A Dynamic Multi-Agent System for Robust GAIA Problem Solving by AWorld
- [:material-file-document: Paper](https://arxiv.org/abs/2508.09889)

- **AFlow**: Automating Agentic Workflow Generation
- [:material-file-document: Paper](https://arxiv.org/abs/2410.10762)

- **AgentFly**: Fine-tuning LLM Agents without Fine-tuning LLMs
- [:material-file-document: Paper](https://arxiv.org/abs/2508.16153v2)

- **Throttling Web Agents Using Reasoning Gates**
- [:material-file-document: Paper](https://arxiv.org/abs/2509.01619)

- **The Landscape of Agentic Reinforcement Learning for LLMs**: A Survey
- [:material-file-document: Paper](https://arxiv.org/abs/2509.02547)
- **BrowseMaster**: Towards Scalable Web Browsing via Tool-Augmented Programmatic Agent Pair
- [:material-file-document: Paper](https://arxiv.org/abs/2508.09129) · [:material-github: GitHub](https://github.com/sjtu-sai-agents/Browse-Master)
- **Long Term Memory**: The Foundation of AI Self-Evolution
- [:material-file-document: Paper](https://arxiv.org/abs/2410.15665)
- **DeepResearcher**: Scaling Deep Research via Reinforcement Learning in Real-world Environments
- [:material-file-document: Paper](https://arxiv.org/abs/2504.03160) · [:material-github: GitHub](https://github.com/GAIR-NLP/DeepResearcher)
- **Web-Shepherd**: Advancing PRMs for Reinforcing Web Agents
- [:material-file-document: Paper](https://arxiv.org/abs/2505.15277) · [:material-github: GitHub](https://github.com/kyle8581/Web-Shepherd)
- **SimpleDeepSearcher**: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis
- [:material-file-document: Paper](https://arxiv.org/abs/2505.16834) · [:material-github: GitHub](https://github.com/RUCAIBox/SimpleDeepSearcher)
- **Alita**: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution
- [:material-file-document: Paper](https://arxiv.org/abs/2505.20286) · [:material-github: GitHub](https://github.com/CharlesQ9/Alita)
- **MCP-Zero**: Active Tool Discovery for Autonomous LLM Agents
- [:material-file-document: Paper](https://arxiv.org/abs/2506.01056) · [:material-github: GitHub](https://github.com/xfey/MCP-Zero)
- **AgentOrchestra**: Orchestrating Hierarchical Multi-Agent Intelligence with the Tool-Environment-Agent(TEA) Protocol
- [:material-file-document: Paper](https://arxiv.org/abs/2506.12508) · [:material-github: GitHub](https://github.com/SkyworkAI/DeepResearchAgent)
- **Deep Research Agents**: A Systematic Examination And Roadmap
- [:material-file-document: Paper](https://arxiv.org/abs/2506.18096) · [:material-github: GitHub](https://github.com/ai-agents-2030/awesome-deep-research-agent)
- **SciMaster**: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity's Last Exam?
- [:material-file-document: Paper](https://arxiv.org/abs/2507.05241) · [:material-github: GitHub](https://github.com/sjtu-sai-agents/X-Master)
- **Deep Researcher with Test-Time Diffusion**: Enhancing research capabilities through diffusion-based test-time adaptation
- [:material-file-document: Paper](https://arxiv.org/abs/2507.16075)
- **Multi-Agent Tool-Integrated Policy Optimization**: Enhancing multi-agent systems through integrated tool usage and policy optimization
- [:material-file-document: Paper](https://arxiv.org/abs/2510.04678)
- **WALT**: Web Agents that Learn Tools
- [:material-file-document: Paper](https://arxiv.org/abs/2510.01524)
- **Learning to Route**: A Rule-Driven Agent Framework for Hybrid-Source Retrieval-Augmented Generation
- [:material-file-document: Paper](https://arxiv.org/abs/2510.02388)
- **SurveyBench**: Can LLM(-Agents) Write Academic Surveys that Align with Reader Needs?
- [:material-file-document: Paper](https://arxiv.org/abs/2510.03120)
- **FocusAgent**: Simple Yet Effective Ways of Trimming the Large Context of Web Agents
- [:material-file-document: Paper](https://arxiv.org/pdf/2510.03204)
- **LLM-Based Data Science Agents**: A Survey of Capabilities, Challenges, and Future Directions
- [:material-file-document: Paper](https://arxiv.org/pdf/2510.04023)
- **Agentic Context Engineering**: Evolving Contexts for Self-Improving Language Models
- [:material-file-document: Paper](https://arxiv.org/abs/2510.04618)
- **MARS**: Optimizing Dual-System Deep Research via Multi-Agent Reinforcement Learning
- [:material-file-document: Paper](https://arxiv.org/abs/2510.04935)
- **QuantAgents**: Towards Multi-agent Financial System via Simulated Trading
- [:material-file-document: Paper](https://arxiv.org/abs/2510.04643) · [:material-link: Project](https://quantagents.github.io/)
- **Small Language Models for Agentic Systems**: A Survey of Architectures, Capabilities, and Deployment Trade-offs
- [:material-file-document: Paper](https://arxiv.org/abs/2510.03847)
- **Open Agent Specification (Agent Spec)**: Technical Report
- [:material-file-document: Paper](https://arxiv.org/abs/2510.04173v1)
- **AudioToolAgent**: An Agentic Framework for Audio-Language Models
- [:material-file-document: Paper](https://arxiv.org/abs/2510.02995)
- **ThinkBrake**: Mitigating Overthinking in Tool Reasoning
- [:material-file-document: Paper](https://arxiv.org/abs/2510.00546)
- **TOUCAN**: Synthesizing 1.5M Tool-Agentic Trajectories from Real Environments
- [:material-file-document: Paper](https://arxiv.org/abs/2510.01179)
- **ToolTweak**: An Attack on Tool Selection in LLM-Based Agents
- [:material-file-document: Paper](https://arxiv.org/abs/2510.02554)
- **ToolBrain**: A Flexible RL Framework for Agentic Tools
- [:material-file-document: Paper](https://arxiv.org/abs/2510.00023)
- **TUMIX**: Multi-Agent Test-Time Scaling with Tool-Use Mixture
- [:material-file-document: Paper](https://arxiv.org/abs/2510.01279)


---

### Agent Frameworks

!!! info "Popular Agent Development Frameworks"
Comprehensive frameworks for building and deploying AI agents across different domains.

- **MiroFlow**: Build, manage, and scale your AI agents with ease
- [:material-github: GitHub](https://github.com/MiroMindAI/MiroFlow)
- **Youtu-Agent**: A simple yet powerful agent framework that delivers with open-source models
- [:material-github: GitHub](https://github.com/TencentCloudADP/youtu-agent)
- **OpenManus**: No fortress, purely open ground. OpenManus is Coming
- [:material-github: GitHub](https://github.com/FoundationAgents/OpenManus)
- **OpenBB Platform**: Financial data platform for analysts, quants and AI agents
- [:material-link: Project](https://github.com/OpenBB-finance/OpenBB)
- **TradingAgents**: Multi-Agents LLM Financial Trading Framework
- [:material-file-document: Paper](https://arxiv.org/abs/2412.20138) · [:material-github: GitHub](https://github.com/TauricResearch/TradingAgents)
- **JoyAgent-JDGenie**: Technical Report on the GAIA
- [:material-file-document: Paper](https://arxiv.org/abs/2510.00510) · [:material-github: GitHub](https://github.com/jd-opensource/joyagent-jdgenie)



## Evaluation
---

### Evaluation

!!! success "Benchmarks & Evaluation Frameworks"
Comprehensive evaluation tools and benchmarks for measuring agent performance across various tasks.

- **LiveMCP-101**: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries
- [:material-file-document: Paper](https://arxiv.org/abs/2508.15760)

- **BrowseComp-Plus**: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent
- [:material-file-document: Paper](https://arxiv.org/abs/2508.06600)

- **HotpotQA**: A Dataset for Diverse, Explainable Multi-hop Question Answering
- [:material-file-document: Paper](https://arxiv.org/abs/1809.09600)

- **GAIA**: a benchmark for General AI Assistants
- [:material-file-document: Paper](https://arxiv.org/abs/2311.12983) · [:material-trophy: Leaderboard](https://huggingface.co/spaces/gaia-benchmark/leaderboard)

- **xbench**: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations
- [:material-file-document: Paper](https://arxiv.org/abs/2506.13651)

- **MCP-Universe**: Benchmarking Large Language Models with Real-World Model Context Protocol Servers
- [:material-file-document: Paper](https://arxiv.org/abs/2508.14704)

- **FutureX**: An Advanced Live Benchmark for LLM Agents in Future Prediction
- [:material-file-document: Paper](https://arxiv.org/abs/2508.11987)

- **Terminal-Bench**: the benchmark for testing AI agents in real terminal environments
- [:material-github: GitHub](https://github.com/laude-institute/terminal-bench)

- **Gaia2 and ARE**: Empowering the Community to Evaluate Agents
- [:material-file-document: Blog Post](https://huggingface.co/blog/gaia2)
- **GPQA**: A Graduate-Level Google-Proof Q&A Benchmark
- [:material-file-document: Paper](https://arxiv.org/abs/2311.12022) · [:material-github: GitHub](https://github.com/idavidrein/gpqa/)
- **WebWalkerQA**: WebWalker: Benchmarking LLMs in Web Traversal
- [:material-file-document: Paper](https://arxiv.org/abs/2501.07572) · [:material-github: GitHub](https://github.com/Alibaba-NLP/DeepResearch) · [:material-trophy: Leaderboard](https://huggingface.co/spaces/callanwu/WebWalkerQALeaderboard)
- **HLE**: Humanity's Last Exam
- [:material-file-document: Paper](https://arxiv.org/abs/2501.14249) · [:material-link: Website](https://lastexam.ai/)
- **BFCL**: Berkeley Function Calling Leaderboard
- [:material-github: GitHub](https://github.com/ShishirPatil/gorilla) · [:material-trophy: Leaderboard](https://gorilla.cs.berkeley.edu/leaderboard.html)
- **When2Call**: When (not) to Call Tools
- [:material-file-document: Paper](https://arxiv.org/abs/2504.18851) · [:material-github: GitHub](https://github.com/NVIDIA/When2Call)
- **ToolSandbox**: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities
- [:material-file-document: Paper](https://arxiv.org/abs/2408.04682) · [:material-github: GitHub](https://github.com/apple/ToolSandbox)
- **ToolBeHonest**: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models
- [:material-file-document: Paper](https://arxiv.org/abs/2406.20015) · [:material-github: GitHub](https://github.com/ToolBeHonest/ToolBeHonest)
- **SuperGPQA**: Scaling LLM Evaluation across 285 Graduate Disciplines
- [:material-file-document: Paper](https://arxiv.org/abs/2502.14739) · [:material-link: Website](https://supergpqa.github.io/)
- **Terminal-Bench**: A benchmark for testing AI agents in terminal environments
- [:material-trophy: Leaderboard](https://www.tbench.ai/leaderboard) · [:material-link: Website](https://www.tbench.ai/)
- **τ-bench**: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
- [:material-file-document: Paper](https://arxiv.org/abs/2406.12045) · [:material-github: GitHub](https://github.com/sierra-research/tau-bench)
- **τ2-Bench**: Evaluating Conversational Agents in a Dual-Control Environment
- [:material-file-document: Paper](https://arxiv.org/abs/2506.07982) · [:material-github: GitHub](https://github.com/sierra-research/tau2-bench)
- **Deep Research Bench**: Evaluating AI Web Research Agents
- [:material-file-document: Paper](https://arxiv.org/abs/2506.06287) · [:material-link: Website](https://drb.futuresearch.ai/)
- **Beyond the Final Answer**: Evaluating the Reasoning Trajectories of Tool-Augmented Agents
- [:material-file-document: Paper](https://arxiv.org/abs/2510.02837)
- **TRAJECT-Bench**: A Trajectory-Aware Benchmark for Evaluating Agentic Tool Use
- [:material-file-document: Paper](https://arxiv.org/abs/2510.04550)
- **ARC-AGI**: The General Intelligence Benchmark
- [:material-link: Website](https://arcprize.org/arc-agi)


---

### Agent Memory

!!! tip "Memory Systems for Persistent Agent Intelligence"
Advanced memory solutions for building agents with long-term context and learning capabilities.

- **Mem0**: Building Production- Ready AI Agents with Scalable Long-Term Memory
- [:material-github: GitHub](https://github.com/mem0ai/mem0)
- **memobase**: Profile-Based Long-Term Memory for AI Applications
- [:material-github: GitHub](https://github.com/memodb-io/memobase)
- **Memento**: Fine-tuning LLM Agents without Fine-tuning LLMs
- [:material-file-document: Paper](https://arxiv.org/abs/2508.16153) · [:material-github: GitHub](https://github.com/Agent-on-the-Fly/Memento)
- **MEMTRACK**: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments
- [:material-file-document: Paper](https://arxiv.org/abs/2510.01353)
- **A-MEM**: Agentic Memory for LLM Agents
- [:material-file-document: Paper](https://arxiv.org/abs/2502.12110) · [:material-github: GitHub](https://github.com/WujiangXu/A-mem)
- **MemoryOS**: Memory OS of AI Agent
- [:material-file-document: Paper](https://arxiv.org/abs/2506.06326) · [:material-github: GitHub](https://github.com/BAI-LAB/MemoryOS)
- **Memory-R1**: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning
- [:material-file-document: Paper](https://arxiv.org/abs/2508.19828)
- **HippoRAG**: Neurobiologically Inspired Long-Term Memory for Large Language Models
- [:material-file-document: Paper](https://arxiv.org/abs/2405.14831) · [:material-github: GitHub](https://github.com/OSU-NLP-Group/HippoRAG)
- **MaxKB**: Open-source platform for building enterprise-grade agents
- [:material-github: GitHub](https://github.com/1Panel-dev/MaxKB)
- **MemAgent**: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent
- [:material-file-document: Paper](https://arxiv.org/abs/2507.02259) · [:material-link: Website](https://memagent-sialab.github.io/)
- **LEGOMem**: Modular Procedural Memory for Multi-agent LLM Systems for Workflow Automation
- [:material-file-document: Paper](https://arxiv.org/abs/2510.04851)
- **Memp**: Exploring Agent Procedural Memory
- [:material-file-document: Paper](https://www.arxiv.org/abs/2508.06433)
- **MIRIX**: Multi-Agent Memory System for LLM-Based Agents
- [:material-file-document: Paper](https://arxiv.org/abs/2507.07957) · [:material-link: Website](https://mirix.io/)
- **A-MemGuard**: A Proactive Defense Framework for LLM-Based Agent Memory
- [:material-file-document: Paper](https://www.arxiv.org/abs/2510.02373)


---

## Blogs

!!! info "Blog Posts & Tutorials"
Curated collection of blog posts, tutorials, and articles about AI agents from various sources and languages.

#### General Blogs

- **ChatGPT Agent**: Introducing ChatGPT Agent
- [:material-link: Blog Post](https://openai.com/index/introducing-chatgpt-agent/) · OpenAI's latest agent capabilities and features

- **Tongyi DeepResearch**: Deep Research Agent for Complex Tasks
- [:material-link: Blog Post](https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research/) · Alibaba's advanced research agent system

#### Chinese Blogs

!!! quote "中文博客与资源"
精选的中文AI智能体相关博客文章、教程和资源,帮助中文用户更好地理解和应用智能体技术。

- **17个主流 Agent 框架快速对比**
- [:material-link: 博客链接](https://zhuanlan.zhihu.com/p/1957319210951746186) · 知乎专栏文章,对比分析主流智能体框架

- **通义 DeepResearch**
- [:material-link: Blog Post](https://tongyi-agent.github.io/zh/blog/introducing-tongyi-deep-research/) · 阿里巴巴通义智能体深度研究系统介绍

---

Expand Down
12 changes: 6 additions & 6 deletions docs/mkdocs/docs/openai-gpt.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,14 @@ OpenAI's latest models including GPT-5, GPT-4o and advanced reasoning models wit

`GPT5OpenAIClient`

## Environment Setup
### Environment Setup

```bash title="Environment Variables"
export OPENAI_API_KEY="your-openai-key"
export OPENAI_BASE_URL="https://api.openai.com/v1" # optional
```

## Configuration
### Configuration

```yaml title="Agent Configuration"
main_agent:
Expand All @@ -31,7 +31,7 @@ main_agent:
openai_base_url: "${oc.env:OPENAI_BASE_URL,https://api.openai.com/v1}"
```

## Usage
### Usage

```bash title="Example Command"
# Create custom OpenAI config
Expand All @@ -43,14 +43,14 @@ uv run main.py trace --config_file_name=your_config_file \

`GPTOpenAIClient`

## Environment Setup
### Environment Setup

```bash title="Environment Variables"
export OPENAI_API_KEY="your-openai-key"
export OPENAI_BASE_URL="https://api.openai.com/v1" # optional
```

## Configuration
### Configuration

```yaml title="Agent Configuration"
main_agent:
Expand All @@ -61,7 +61,7 @@ main_agent:
openai_base_url: "${oc.env:OPENAI_BASE_URL,https://api.openai.com/v1}"
```

## Usage
### Usage

```bash title="Example Command"
# Create custom OpenAI config
Expand Down
2 changes: 1 addition & 1 deletion src/llm/providers/claude_openrouter_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -411,4 +411,4 @@ def _apply_cache_control(self, messages):
else:
# Other messages add directly
cached_messages.append(turn)
return list(reversed(cached_messages))
return list(reversed(cached_messages))
2 changes: 1 addition & 1 deletion src/llm/providers/gpt5_openai_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ async def _create_message(
extra_body["min_p"] = self.min_p
if self.repetition_penalty != 1.0:
extra_body["repetition_penalty"] = self.repetition_penalty

assert self.model_name in ["gpt-5-2025-08-07", "gpt-5"]
params = {
"model": self.model_name,
Expand Down
Loading