forked from google/adk-docs
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathllms.txt
More file actions
380 lines (284 loc) · 42.7 KB
/
llms.txt
File metadata and controls
380 lines (284 loc) · 42.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
# Agent Development Kit (ADK)
## High-Level Summary
The Agent Development Kit (ADK) is an open-source, code-first Python toolkit designed for developers building, evaluating, and deploying sophisticated AI agents, with a strong focus on integration with Google Cloud services and Gemini models. It emphasizes flexibility and fine-grained control over agent behavior, orchestration, and tool usage directly within Python code.
**Key Features:**
* **Rich Tool Ecosystem:** Supports built-in tools (Google Search, Code Execution, Vertex AI Search), custom Python functions, OpenAPI spec integration, third-party libraries (LangChain, CrewAI), Google Cloud integrations (API Hub, Application Integration, MCP Toolbox for DBs), MCP standard tools, and using other agents as tools. Includes robust authentication handling.
* **Code-First Development:** Define agent logic, workflows, and state management directly in Python, enabling testability, versioning, and debugging.
* **Flexible Orchestration:** Build multi-agent systems using predefined workflow agents (`SequentialAgent`, `ParallelAgent`, `LoopAgent`) for structured processes or leverage `LlmAgent` for dynamic, LLM-driven routing and decision-making. Custom agents (`BaseAgent`) allow for arbitrary logic.
* **Context & State Management:** Provides mechanisms for managing conversational context (`Session`), short-term state (`State` with session/user/app/temp scopes), long-term memory (`MemoryService`), and binary data (`ArtifactService`).
* **Callbacks for Control:** Offers hooks (`before/after_agent`, `before/after_model`, `before/after_tool`) to observe, customize, or intercept agent execution flow for logging, validation, guardrails, caching, and more.
* **Deployment Ready:** Facilitates deployment to various environments, including local testing, Google Cloud Run, and the scalable Vertex AI Agent Engine.
* **Evaluation Framework:** Includes tools and patterns for evaluating agent performance based on trajectory (tool usage) and final response quality against predefined test cases.
* **Responsible AI:** Provides guidance and mechanisms (guardrails, callbacks, identity management) for building safer and more secure agents.
The documentation covers getting started guides (installation, quickstarts, tutorial), core concepts (agents, tools, sessions, context, runtime, events), advanced topics (multi-agent systems, callbacks, custom agents, memory, artifacts, authentication), deployment strategies, evaluation methods, and responsible AI practices. Code examples and snippets illustrate key functionalities.
### **Table of Contents**
1. **Introduction to Agent Development Kit (ADK)**
* What is ADK?
* Core Concepts
* Key Capabilities
2. **Getting Started**
* Installation
* Quickstart (Basic Agent)
* Quickstart (Streaming)
3. **Agent Fundamentals**
* Types of Agents
* LLM Agents (LlmAgent, Agent)
* Workflow Agents (SequentialAgent, ParallelAgent, LoopAgent)
* Custom Agents
* Multi-Agent Systems
* Agent Hierarchy
* Workflow Agents as Orchestrators
* Interaction & Communication Mechanisms
* Common Multi-Agent Patterns
4. **Tools and Capabilities**
* What is a Tool?
* How Agents Use Tools
* Tool Types in ADK
* Function Tools
* Function Tool
* Long Running Function Tool
* Agent-as-a-Tool
* Built-in Tools
* Google Search
* Code Execution
* Vertex AI Search
* Third-Party Tools
* LangChain Tools
* CrewAI Tools
* Google Cloud Tools
* Apigee API Hub Tools
* Application Integration Tools
* Toolbox Tools for Databases
* OpenAPI Integration
* Model Context Protocol (MCP) Tools
* Tool Context
* Defining Effective Tool Functions
* Toolsets
* Authentication with Tools
5. **Conversational Context and Runtime**
* Session, State, and Memory
* Session
* State
* Memory
* Events
* Context
* Runtime
* Runtime Configuration (RunConfig)
6. **Deployment**
* Deployment Options
* Agent Engine in Vertex AI
* Cloud Run
* GKE
7. **Evaluation and Safety**
* Why Evaluate Agents?
* Evaluation Approaches
* Evaluation Criteria
* Running Evaluation
* Safety & Security for AI Agents
* Best Practices
8. **Community Resources and Contribution**
* Community Resources
* Contributing Guide
### ---
**1\. Introduction to Agent Development Kit (ADK)**
The Agent Development Kit (ADK) is a flexible and modular framework developed by Google for building, managing, evaluating, and deploying AI-powered agents. It is designed to simplify the development of AI agents by providing a robust and flexible environment that makes the process feel more like traditional software development. ADK supports both conversational and non-conversational agents, capable of handling tasks ranging from simple to complex workflows. It is model-agnostic, deployment-agnostic, and built for compatibility with other frameworks.
#### **Core Concepts**
ADK is built around several key primitives that make it powerful and flexible. These include:
* **Agent**: The fundamental worker unit designed for specific tasks. Agents can use language models (LlmAgent) for complex reasoning or act as deterministic controllers for execution, known as "workflow agents" (SequentialAgent, ParallelAgent, LoopAgent).
* **Tool**: Equips agents with capabilities beyond conversation, allowing them to interact with external APIs, search for information, run code, or call other services.
* **Callbacks**: Custom code snippets that run at specific points in an agent's process, enabling checks, logging, or behavior modifications.
* **Session Management (Session & State)**: Handles the context of a single conversation (Session), including its history (Events) and the agent's working memory for that conversation (State).
* **Memory**: Enables agents to recall information about a user across multiple sessions, providing long-term context distinct from short-term session State.
* **Artifact Management (Artifact)**: Allows agents to save, load, and manage files or binary data (such as images and PDFs) associated with a session or user.
* **Code Execution**: The ability for agents, usually via Tools, to generate and execute code for complex calculations or actions.
* **Planning**: An advanced capability where agents can break down complex goals into smaller steps and plan how to achieve them, similar to a ReAct planner.
* **Models**: The underlying Large Language Model (LLM) that powers LlmAgents, enabling their reasoning and language understanding abilities.
* **Event**: The basic unit of communication representing occurrences during a session (user message, agent reply, tool use), forming the conversation history.
* **Runner**: The engine that manages the execution flow, orchestrates agent interactions based on Events, and coordinates with backend services.
#### **Key Capabilities**
ADK offers several advantages for developing agentic applications. These capabilities include:
* **Multi-Agent System Design**: Facilitates building applications composed of multiple, specialized agents arranged hierarchically, allowing them to coordinate complex tasks and delegate sub-tasks using LLM-driven transfer or explicit AgentTool invocation.
* **Rich Tool Ecosystem**: Agents can be equipped with diverse capabilities, integrating custom functions (FunctionTool), using other agents as tools (AgentTool), leveraging built-in functionalities like code execution, and interacting with external data sources and APIs (e.g., Search, Databases). Support for long-running tools is also available.
* **Flexible Orchestration**: Complex agent workflows can be defined using built-in workflow agents (SequentialAgent, ParallelAgent, LoopAgent) alongside LLM-driven dynamic routing, enabling both predictable pipelines and adaptive agent behavior.
* **Integrated Developer Tooling**: Provides tools like a command-line interface (CLI) and a Developer UI for local development, allowing users to run agents, inspect execution steps, debug interactions, and visualize agent definitions.
* **Native Streaming Support**: Supports real-time, interactive experiences with bidirectional streaming (text and audio), integrating seamlessly with the Gemini Live API for both Google AI Studio and Vertex AI.
* **Built-in Agent Evaluation**: Tools are included for systematically assessing agent performance, allowing the creation of multi-turn evaluation datasets and running evaluations locally to measure quality and guide improvements.
* **Broad LLM Support**: Optimized for Google's Gemini models, the framework is also designed for flexibility, enabling integration with various LLMs, including open-source or fine-tuned models, through its BaseLlm interface.
* **Artifact Management**: Provides mechanisms (ArtifactService, context methods) for agents to save, load, and manage versioned artifacts such as images, documents, or generated reports during execution.
* **Extensibility and Interoperability**: Promotes an open ecosystem, allowing developers to integrate and reuse tools from other popular agent frameworks, including LangChain and CrewAI.
* **State and Memory Management**: Automatically handles short-term conversational memory (State within a Session) managed by the SessionService, and provides integration points for longer-term Memory services for cross-session recall.
### **2\. Getting Started**
#### **Installation**
ADK is available for both Python and Java. For Python, it's recommended to create and activate a virtual environment using venv before installing the google-adk package via pip. For Java, google-adk and google-adk-dev packages can be added via Maven or Gradle dependencies in the pom.xml or build.gradle file, respectively.
#### **Quickstart (Basic Agent)**
The quickstart guides users through creating a basic agent with multiple tools and running it locally. For Python, this involves creating a parent\_folder/multi\_tool\_agent directory structure with \_\_init\_\_.py, agent.py, and a .env file. For Java, the structure typically includes project\_folder/src/main/java/agents/multitool/MultiToolAgent.java and a pom.xml or build.gradle.
The agent.py or MultiToolAgent.java defines the agent, including its name, model (e.g., gemini-2.0-flash), description, instruction, and a list of tools (e.g., get\_weather, get\_current\_time). The .env file or environment variables are used to set up the LLM, either with a Google AI Studio API key (GOOGLE\_API\_KEY, GOOGLE\_GENAI\_USE\_VERTEXAI=FALSE) or Google Cloud Vertex AI project details (GOOGLE\_CLOUD\_PROJECT, GOOGLE\_CLOUD\_LOCATION, GOOGLE\_GENAI\_USE\_VERTEXAI=TRUE).
Agents can be run locally using the adk web command for a browser-based Dev UI, adk run for terminal interaction, or adk api\_server for a local FastAPI server. The Dev UI allows selecting the agent, chatting, and inspecting function calls, responses, and model responses.
#### **Quickstart (Streaming)**
ADK supports real-time, interactive experiences through streaming, including low-latency bidirectional voice and video communication. Supported models for voice/video streaming must support the Gemini Live API, with specific model IDs available in the Google AI Studio and Vertex AI documentation.
For Python, the streaming quickstart involves a similar project structure to the basic quickstart, with an app folder containing .env, main.py, static files (for the web client), and the Google Search\_agent folder. The agent.py defines the agent using a streaming-compatible model (e.g., gemini-2.0-flash-exp) and tools like Google Search. The setup for the LLM platform (Google AI Studio or Google Cloud Vertex AI) is similar to the basic quickstart, requiring API keys or project configurations.
Running the streaming agent locally involves navigating to the app directory, setting SSL\_CERT\_FILE (for wss:// connections), and running adk web. Users can then interact with the agent via text, voice, or video in the Dev UI.
For Java, the streaming quickstart involves setting up a Maven project, adding google-adk and google-adk-dev dependencies in pom.xml, and creating a ScienceTeacherAgent.java. The agent must be initialized as a public static final BaseAgent ROOT\_AGENT. Environment variables for the Gemini key and GOOGLE\_GENAI\_USE\_VERTEXAI are also required. The Dev UI can be launched using mvn exec:java. For custom live audio applications, a specific pom.xml and LiveAudioRun.java are provided, demonstrating microphone input and speaker output for real-time voice conversations.
### **3\. Agent Fundamentals**
#### **Types of Agents**
In the Agent Development Kit (ADK), an **Agent** is a self-contained execution unit designed to act autonomously to achieve specific goals. Agents can perform tasks, interact with users, utilize external tools, and coordinate with other agents. The foundation for all agents is the BaseAgent class, which is extended in three main ways:
* **LLM Agents (LlmAgent, Agent)**: These agents use Large Language Models (LLMs) as their core engine to understand natural language, reason, plan, generate responses, and dynamically decide how to proceed or which tools to use. They are ideal for flexible, language-centric tasks and their behavior is non-deterministic. Key parameters include name, description, model, instruction, and tools. Advanced configurations include generate\_content\_config for fine-tuning LLM generation, input\_schema, output\_schema, and output\_key for structuring data, and include\_contents for managing context.
* **Workflow Agents (SequentialAgent, ParallelAgent, LoopAgent)**: These specialized agents control the execution flow of other agents in predefined, deterministic patterns (sequence, parallel, or loop) without using an LLM for flow control. They are perfect for structured processes needing predictable execution.
* SequentialAgent executes sub-agents one after another in the order they are listed, passing the same InvocationContext sequentially.
* ParallelAgent executes its sub-agents concurrently, facilitating tasks like multi-source data retrieval or heavy computations. They operate in independent branches, but still access the same shared session.state.
* LoopAgent repeatedly runs a sequence of agents for a specified number of iterations or until a termination condition is met. Termination can be set by max\_iterations or by a sub-agent signaling escalation.
* **Custom Agents**: Created by extending BaseAgent directly, these agents allow for implementing unique operational logic, specific control flows, or specialized integrations not covered by standard types. They provide ultimate flexibility by defining arbitrary orchestration logic.
#### **Multi-Agent Systems**
Complex applications often benefit from structuring multiple, distinct BaseAgent instances into a **Multi-Agent System (MAS)**. This approach enhances modularity, specialization, reusability, maintainability, and allows for structured control flows.
* **Agent Hierarchy**: The foundation for MAS is the parent-child relationship defined in BaseAgent. Agents are passed as a list to the sub\_agents argument when initializing a parent agent, and ADK automatically sets the parent\_agent attribute on each child. An agent instance can only have one parent, and the hierarchy defines the scope for workflow agents and influences LLM-driven delegation targets.
* **Workflow Agents as Orchestrators**: These specialized agents (e.g., SequentialAgent, ParallelAgent, LoopAgent) are derived from BaseAgent and orchestrate the execution flow of their sub-agents. They define the control flow deterministically without using an LLM for orchestration.
* **Interaction & Communication Mechanisms**: Agents within a system can exchange data or trigger actions:
* **Shared Session State (session.state)**: The most fundamental way for agents to communicate passively within the same invocation. One agent writes a value to context.state\['data\_key'\], and a subsequent agent reads it. The output\_key property on LlmAgent automatically saves the agent's final response to a specified state key.
* **LLM-Driven Delegation (Agent Transfer)**: Leverages an LlmAgent's understanding to dynamically route tasks to other suitable agents within the hierarchy. The LLM generates a function call (transfer\_to\_agent(agent\_name='target\_agent\_name')), which the AutoFlow intercepts to switch execution focus. Target agents need distinct descriptions for informed decisions.
* **Explicit Invocation (AgentTool)**: Allows an LlmAgent to treat another BaseAgent instance as a callable function or Tool. The target agent is wrapped in AgentTool and included in the parent LlmAgent's tools list. When called, it runs the target agent, captures its final response, and forwards state/artifact changes back to the parent's context.
* **Common Multi-Agent Patterns**: ADK primitives can be combined to implement various collaboration patterns:
* **Coordinator/Dispatcher Pattern**: A central LlmAgent routes incoming requests to specialized sub\_agents using LLM-Driven Delegation or Explicit Invocation (AgentTool).
* **Sequential Pipeline Pattern**: A SequentialAgent executes sub\_agents in a fixed order, typically using Shared Session State to pass outputs between steps.
* **Parallel Fan-Out/Gather Pattern**: A ParallelAgent runs multiple sub\_agents concurrently, often followed by an agent that aggregates results, using Shared Session State for communication.
* **Hierarchical Task Decomposition**: A multi-level tree of agents where higher-level agents break down complex goals and delegate sub-tasks to lower-level agents, using LLM-Driven Delegation or Explicit Invocation (AgentTool).
* **Review/Critique Pattern (Generator-Critic)**: Two agents (Generator and Critic/Reviewer) typically work within a SequentialAgent, using Shared Session State to improve output quality.
* **Iterative Refinement Pattern**: A LoopAgent executes one or more agents over multiple iterations to progressively improve a result in session state, with termination based on max\_iterations or an escalating signal.
* **Human-in-the-Loop Pattern**: Integrates human intervention points using a custom Tool that pauses execution and interacts with an external system for human input.
### **4\. Tools and Capabilities**
#### **What is a Tool?**
A Tool in ADK represents a specific capability provided to an AI agent, enabling it to perform actions and interact with the world beyond its core text generation and reasoning abilities. Tools are action-oriented, extend agent capabilities, and execute predefined logic, allowing agents to access real-time information, affect external systems, and overcome knowledge limitations.
#### **How Agents Use Tools**
Agents leverage tools dynamically through function calling mechanisms. The process involves the agent's LLM reasoning about the user request, selecting the appropriate tool based on available tools and their docstrings, generating required arguments, invoking the tool, receiving its output, and finally incorporating the output into its reasoning process.
#### **Tool Types in ADK**
ADK supports various types of tools:
* **Function Tools**: Custom tools tailored to specific application needs.
* **Function Tool**: Regular Python functions or Java methods that perform specific actions. Parameters should use JSON-serializable types and avoid default values. The preferred return type is a dictionary in Python or a Map in Java, which provides context and clarity to the LLM. Docstrings or source code comments serve as the tool's description for the LLM.
* **Long Running Function Tool**: Designed for tasks that require significant processing time without blocking the agent's execution, and is a subclass of FunctionTool. The function initiates a long-running operation, optionally returns an initial result, and the agent runner can pause the agent run. The agent client can then query the progress and send back intermediate or final responses.
* **Agent-as-a-Tool**: Allows leveraging the capabilities of other agents by calling them as tools, effectively delegating responsibility. Unlike sub-agents, where control is fully transferred, an Agent-as-a-Tool returns its answer back to the calling agent, which then summarizes and responds to the user, retaining control for future input. The AgentTool class is used to wrap the agent.
* **Built-in Tools**: Ready-to-use tools provided by the framework for common tasks. These include:
* **Google Search**: Allows the agent to perform web searches using Google Search, compatible with Gemini 2 models.
* **Code Execution**: Enables the agent to execute code using the built\_in\_code\_execution tool, typically with Gemini 2 models, for calculations or data manipulation.
* **Vertex AI Search**: Uses Google Cloud Vertex AI Search for agents to search across private, configured data stores.
* **GKE Code Executor (`GkeCodeExecutor`)**: Provides a secure and scalable method for running LLM-generated code by leveraging a gVisor-sandboxed GKE environment. It creates ephemeral, isolated Kubernetes Jobs for each execution request.
* **Limitations**: Currently, each root agent or single agent only supports one built-in tool, and no other tools of any type can be used in the same agent. Built-in tools are also not supported within sub-agents.
* **Third-Party Tools**: Integrates tools from other AI Agent frameworks like CrewAI and LangChain, enabling faster development and reuse of existing tools.
* **LangChain Tools**: Uses the LangchainTool wrapper to integrate tools from the LangChain ecosystem (e.g., Tavily search tool).
* **CrewAI Tools**: Uses the CrewaiTool wrapper to integrate tools from the CrewAI library (e.g., Serper API for web search).
* **Google Cloud Tools**: Facilitates connecting agents to Google Cloud products and services.
* **Apigee API Hub Tools**: ApiHubToolset allows turning any documented API from Apigee API hub into a tool, supporting various authentication methods.
* **Application Integration Tools**: ApplicationIntegrationToolset enables secure and governed access to enterprise applications via Integration Connector's pre-built connectors (e.g., Salesforce, ServiceNow) and existing Application Integration process automations.
* **Toolbox Tools for Databases**: Integrates with the open-source MCP Toolbox for Databases server for accessing data in databases like Spanner, AlloyDB, and Postgres.
* **OpenAPI Integration**: Simplifies interacting with external REST APIs by automatically generating callable tools (RestApiTool) directly from an OpenAPI Specification (v3.x). The OpenAPIToolset parses the spec, discovers operations, generates tools, and handles authentication.
* **Model Context Protocol (MCP) Tools**: ADK integrates with the Model Context Protocol (MCP), an open standard for LLMs to communicate with external applications, data sources, and tools. This includes using existing MCP servers within ADK (ADK as an MCP client) and exposing ADK tools via an MCP server (MCP server exposing ADK). The MCPToolset class is used to integrate tools from an MCP server, handling connection management, tool discovery, and proxying tool calls.
#### **Tool Context**
For advanced scenarios, tool functions can include a tool\_context: ToolContext parameter in their signature. This object provides access to the current session's state (tool\_context.state), allows influencing agent actions via tool\_context.actions (e.g., skip\_summarization, transfer\_to\_agent, escalate), and offers methods to interact with configured services like Artifacts and Memory. It also includes function\_call\_id for tracking tool invocations and auth\_response for authentication.
#### **Defining Effective Tool Functions**
The effectiveness of an agent's tool usage heavily depends on how the tool functions are defined. Key guidelines include:
* **Function Name**: Use descriptive, verb-noun based names that clearly indicate the action (e.g., get\_weather, searchDocuments).
* **Parameters (Arguments)**: Use clear and descriptive names, provide type hints in Python, and ensure all parameter types are JSON serializable. Avoid setting default values for parameters.
* **Return Type**: Must be a dictionary in Python or a Map in Java. Design the dictionary/Map keys and values to be descriptive and easily understood by the LLM, often including a status key.
* **Docstring / Source Code Comments**: Critical for describing what the tool does, when it should be used, explaining each parameter, and describing the expected return value structure. The tool\_context parameter should not be described in the docstring.
* **Simplicity and Focus**: Keep tools focused on one well-defined task, minimize parameters, use simple data types, and decompose complex tasks into smaller, more focused tools.
#### **Toolsets**
Toolsets, via the BaseToolset interface, allow for managing and providing a collection of BaseTool instances, often dynamically, to an agent. This is beneficial for organizing related tools, enabling dynamic tool availability based on context, and integrating external tool providers. The BaseToolset defines get\_tools() to return a list of BaseTool instances and close() for cleanup.
#### **Authentication with Tools**
Many tools require authentication to access protected resources. ADK provides a system to handle various authentication methods securely. Key components include AuthScheme (defines how API expects credentials, e.g., API Key, OAuth 2.0) and AuthCredential (holds initial information to start authentication, e.g., OAuth Client ID/Secret). Supported initial credential types include API\_KEY, HTTP, OAUTH2, OPEN\_ID\_CONNECT, and SERVICE\_ACCOUNT. Authentication is configured on tools during initialization, with different methods for OpenAPI-based Toolsets and Google API Toolsets. The system handles interactive OAuth/OIDC flows, where the Agent Client application redirects the user for authorization and then sends the authentication result back to ADK.
### **5\. Conversational Context and Runtime**
#### **Session, State, and Memory**
Meaningful, multi-turn conversations require agents to understand context. ADK manages this through Session, State, and Memory.
* **Session**: Represents a single, ongoing interaction between a user and the agent system. It contains a chronological sequence of Events (messages and actions) and can hold temporary data (State) relevant only to that conversation. Key properties include id, appName, userId for identification, events for history, state for session-specific data, and lastUpdateTime for activity tracking. SessionService manages the lifecycle of Session objects. Implementations include InMemorySessionService (non-persistent) and VertexAiSessionService or DatabaseSessionService (persistent).
* **State (session.state)**: A dictionary or Map within each Session for storing and updating dynamic details needed during the conversation. It holds serializable key-value pairs and its persistence depends on the SessionService. State can be organized using prefixes: no prefix for session-specific, user: for user-specific across sessions, app: for app-wide, and temp: for temporary state not persisted. State should be updated by adding an Event to the session history via session\_service.append\_event(), either through output\_key for agent text responses or EventActions.state\_delta for complex updates.
* **Memory**: A searchable store of information that can span multiple past sessions or include external data sources. MemoryService defines the interface for managing this long-term knowledge, handling ingestion of session information (add\_session\_to\_memory) and searching (search\_memory). Implementations include InMemoryMemoryService (in-memory, non-persistent) and VertexAiRagMemoryService (persistent, leverages Vertex AI RAG Corpus).
#### **Accessing State in Instructions**
`LlmAgent` instructions can directly inject session state values using `{key}` templating. The framework replaces the placeholder with the value from `session.state` before sending the instruction to the LLM.
* **Syntax**: `{key}` for required keys, `{key?}` for optional keys.
* **Bypassing Injection**: To use literal `{{` and `}}`, provide the instruction as a function (an `InstructionProvider`) instead of a string. The `InstructionProvider` receives a `ReadonlyContext` object.
#### **Updating State**
State should be updated as part of an `Event` to ensure tracking and persistence.
1. **`output_key`**: The simplest method for an `LlmAgent`. The agent's final text response is automatically saved to `session.state[output_key]`.
2. **`EventActions.state_delta`**: For complex updates, manually construct a dictionary of changes and assign it to the `state_delta` of an `EventActions` object when creating an `Event`.
3. **`CallbackContext` or `ToolContext`**: The recommended method within callbacks and tools. Directly modify the `state` attribute on the provided context object (e.g., `tool_context.state['my_key'] = 'new_value'`). The framework automatically captures these changes and includes them in the event's `state_delta`.
#### **Events**
Events are the fundamental units of information flow within ADK, representing every significant occurrence during an agent's interaction lifecycle. An Event is an immutable record capturing user messages, agent replies, tool requests, tool results, state changes, control signals, and errors. Events are central for communication, signaling state/artifact changes, controlling flow, and providing history. Events can be identified by event.author (e.g., 'user', 'AgentName'), event.content (text, tool call, tool result), and event.partial for streaming output. Key information can be extracted from event.content.parts\[0\].text for text, event.get\_function\_calls() for tool calls, and event.get\_function\_responses() for tool results. The event.actions object signals changes and side effects, including state\_delta, artifact\_delta, transfer\_to\_agent, escalate, and skip\_summarization. event.is\_final\_response() is a helper to identify complete, user-facing responses.
#### **Context**
"Context" in ADK refers to the crucial bundle of information available to agents and tools during specific operations. It enables maintaining state, passing data, accessing services (Artifact Storage, Memory, Authentication), and provides identity and tracking. The central piece is InvocationContext, which the ADK framework creates and passes implicitly.
* InvocationContext: Received directly within an agent's core implementation methods (\_run\_async\_impl, \_run\_live\_impl), providing access to the entire state of the current invocation, including session, agent, invocation\_id, user\_content, and configured services.
* ReadonlyContext: Used where only read access is needed (e.g., InstructionProvider functions), offering a safe, read-only view of fundamental details like invocation\_id, agent\_name, and state.
* CallbackContext: Passed to agent lifecycle callbacks and model interaction callbacks, allowing inspection and modification of state, interaction with artifacts, and access to invocation details. It adds mutable state, load\_artifact, save\_artifact, and direct user\_content access.
* ToolContext: Passed to tool functions and tool execution callbacks, providing everything CallbackContext does, plus specialized methods for tool execution like request\_credential, get\_auth\_response for authentication, list\_artifacts, and search\_memory. It also has function\_call\_id and actions.
Common tasks using context include:
* **Accessing Information**: Reading session state using dictionary-like access on the state property, getting current identifiers like agent\_name and invocation\_id, and accessing the initial user input via user\_content.
* **Managing Session State**: Modifying state using CallbackContext.state or ToolContext.state automatically tracks changes as deltas in EventActions.state\_delta, which are then persisted by the SessionService.
* **Working with Artifacts**: Using save\_artifact and load\_artifact for managing file references or large data blobs associated with the session. list\_artifacts() can discover available files.
* **Handling Tool Authentication**: ToolContext provides auth\_response, request\_credential(auth\_config), and get\_auth\_response() for securely managing API keys or other credentials needed by tools.
* **Leveraging Memory**: ToolContext.search\_memory(query) allows tools to access relevant information from past conversations or external sources via the configured memory\_service.
#### **Runtime**
The ADK Runtime is the underlying engine that powers agent applications during user interactions, orchestrating execution, managing information flow, state changes, and interactions with external services. It operates on an **Event Loop**, which facilitates back-and-forth communication between the Runner component and the "Execution Logic" (Agents, LLM calls, Callbacks, Tools).
* **Runner's Role**: Acts as the central orchestrator for a single user invocation. It initiates the process, receives and processes events (committing changes via Services), and forwards events upstream.
* **Execution Logic's Role**: Custom code within agents, tools, and callbacks performs computation and decision-making. It constructs and yields Event objects to the Runner, pausing until the Runner processes and commits the changes, and then resumes execution.
* **State Updates & Commitment Timing**: Changes to session state are guaranteed to be persisted *after* the Event carrying the corresponding state\_delta has been yielded and processed by the Runner.
* **"Dirty Reads"**: Code running later within the same invocation, but before a state-changing event is yielded and processed, can often see local, uncommitted changes.
* **Streaming vs. Non-Streaming Output**: For streaming, multiple Event objects with partial=True are yielded, but the Runner fully processes actions only from the final non-partial event.
* **Async is Primary**: The Runtime is fundamentally built on asynchronous libraries (Python's asyncio, Java's RxJava).
#### **Runtime Configuration (RunConfig)**
RunConfig defines runtime behavior and options for agents, controlling speech and streaming settings, function calling, artifact saving, and limits on LLM calls. Parameters include:
* speech\_config: Configures speech synthesis (voice, language).
* response\_modalities: List of desired output modalities (e.g., "TEXT", "AUDIO").
* save\_input\_blobs\_as\_artifacts: If true, saves input blobs as run artifacts.
* streaming\_mode: Sets the streaming behavior (NONE, SSE, BIDI).
* output\_audio\_transcription: Configures transcription of generated audio output.
* max\_llm\_calls: Limits total LLM calls per run.
* support\_cfc: Enables Compositional Function Calling (Python only, experimental).
### **6\. Deployment**
ADK agents can be deployed to various environments based on production needs or custom flexibility.
* **Agent Engine in Vertex AI**: A fully managed auto-scaling service on Google Cloud designed for deploying, managing, and scaling AI agents built with frameworks like ADK. It requires installation of the Vertex AI SDK and initialization with a project ID, location, and staging bucket. Agents are prepared for Agent Engine using reasoning\_engines.AdkApp().
* **Cloud Run**: A managed auto-scaling compute platform on Google Cloud for running container-based agent applications. Deployment can be done using the adk deploy cloud\_run command (recommended for Python) or gcloud run deploy with a Dockerfile.
* **GKE**: Google Cloud's managed Kubernetes service, allowing deployment and management of containerized applications using Kubernetes. Deployment involves enabling necessary APIs, creating a GKE cluster, configuring a Kubernetes Service Account for Vertex AI (if applicable), building a container image from a Dockerfile, creating Kubernetes manifest files (deployment.yaml), and deploying the application using kubectl.
### **7\. Evaluation and Safety**
#### **Callbacks**
Callbacks are functions that hook into an agent's execution lifecycle, allowing for observation, customization, and control. They are associated with an agent at creation.
* **Lifecycle Points**: `before_agent`, `after_agent`, `before_model`, `after_model`, `before_tool`, `after_tool`.
* **Context Objects**: Callbacks receive `CallbackContext` or `ToolContext`, providing access to session state and runtime information.
* **Control Flow**:
* **`return None` (or `Optional.empty()` in Java)**: Allows the default ADK behavior to proceed.
* **`return <Specific Object>`**: Overrides the default behavior. For example, returning an `LlmResponse` from `before_model_callback` skips the LLM call and uses the returned object as the response. Returning a `dict` from `before_tool_callback` skips the tool execution and uses the dictionary as the tool's result. This is the core mechanism for implementing guardrails and custom logic.
---
#### **Why Evaluate Agents?**
Evaluating agents is crucial for ensuring they operate safely, securely, and align with brand values. Traditional software testing is insufficient due to the probabilistic nature of LLM agents. Evaluation involves assessing the quality of both the final output and the agent's trajectory (sequence of steps).
Evaluation can be broken down into:
* **Evaluating Trajectory and Tool Use**: Analyzing the steps an agent takes to reach a solution, including choice of tools, strategies, and efficiency. Ground-truth-based trajectory evaluations include exact match, in-order match, any-order match, precision, recall, and single-tool use.
* **Evaluating the Final Response**: Assessing the quality, relevance, and correctness of the agent's final output.
ADK offers two approaches for evaluation:
* **Using a test file**: Creating individual .test.json files, each representing a single, simple agent-model interaction (session), ideal for unit testing and rapid execution. Test files include user content, expected intermediate tool use trajectory, expected intermediate agent responses, and the final response.
* **Using an Evalset File**: Utilizes a dedicated dataset ("evalset") containing multiple, potentially lengthy sessions, making it ideal for integration tests and simulating complex, multi-turn conversations. Evalsets contain multiple "evals," each representing a distinct session with turns, user queries, expected tool use, intermediate responses, and a reference response.
* **Evaluation Criteria**: Define how agent performance is measured against the evalset. Metrics include tool\_trajectory\_avg\_score (compares actual tool usage to expected) and response\_match\_score (compares final natural language response to reference using ROUGE). Default criteria are a tool\_trajectory\_avg\_score of 1.0 and a response\_match\_score of 0.8.
Evaluation can be run via:
* **Web-based UI (adk web)**: Provides an interactive way to evaluate agents and generate evaluation datasets.
* **Programmatically (pytest)**: Integrates evaluation into testing pipelines using pytest and test files.
* **Command Line Interface (adk eval)**: Runs evaluations on an existing evaluation set file directly from the command line, useful for automation.
#### **Safety & Security for AI Agents**
Ensuring AI agents operate safely and securely is paramount due to risks like misaligned actions, data exfiltration, and inappropriate content generation. Sources of risk include vague instructions, model hallucination, jailbreaks, and prompt injections. Google Cloud Vertex AI provides a multi-layered approach to mitigate these risks.
* **Safety and Security Risks**: Risks categorized as misalignment/goal corruption, harmful content generation (including brand safety), and unsafe actions (e.g., executing damaging commands, leaking sensitive data).
* **Best Practices**:
* **Identity and Authorization**: Control who the agent acts as by defining agent and user authentication. Agent-Auth means the tool uses the agent's own identity, suitable for scenarios where all users share the same access level. User Auth means the tool uses the identity of the "controlling user", typically implemented using OAuth.
* **Guardrails to screen inputs and outputs**:
* **In-tool guardrails**: Designing tools with security in mind, limiting actions exposed to the model, and using deterministically set Tool Context information to validate model behavior.
* **Built-in Gemini Safety Features**: Leveraging Gemini models' in-built content safety filters (non-configurable for prohibited content, configurable for harm categories) and system instructions for safety (guiding model behavior and content).
* **Model and Tool Callbacks**: Using Before Tool Callback or Before Model Callback to pre-validate calls, inspect arguments, or block execution based on policies or policy violations.
* **Using Gemini as a safety guardrail**: Employing a fast and cheap LLM like Gemini Flash Lite as a safety filter in callbacks to mitigate content safety, agent misalignment, and brand safety risks from user and tool inputs.
* **Sandboxed Code Execution**: Using sandboxing to prevent model-generated code from compromising the local environment. Options include Vertex Gemini Enterprise API code execution and Vertex Code Interpreter Extension.
* **Evaluations**: Utilizing evaluation tools to assess the quality, relevance, and correctness of the agent's final output.
* **VPC-SC Perimeters and Network Controls**: Confining agent activity within secure perimeters to prevent data exfiltration.
* **Other Security Risks**: Always escaping model-generated content in UIs to prevent execution of HTML or JS content, which could lead to data exfiltration or malicious actions.
### **8\. Community Resources and Contribution**
The ADK community provides various resources, including:
* **Translations**: Community-provided translations of the ADK documentation (e.g., adk.wiki for Chinese documentation).
* **Tutorials, Guides & Blog Posts**: Community-written guides covering ADK features, use cases, and integrations (e.g., building an e-commerce recommendation AI agent).
* **Videos & Screencasts**: Video walkthroughs, talks, and demos showcasing ADK.
Contributions to the Agent Development Kit are welcome for both the core framework (Python and Java) and its documentation. Contributions must be accompanied by a Contributor License Agreement (CLA). The project follows Google's Open Source Community Guidelines. Discussions can be joined on the Python or Java GitHub Discussions pages.
Ways to contribute include:
* **Reporting Issues**: For framework bugs, open an issue in google/adk-python or google/adk-java; for documentation errors, open an issue in google/adk-docs.
* **Suggesting Enhancements**: For framework enhancements, open an issue in google/adk-python or google/adk-java; for documentation enhancements, open an issue in google/adk-docs.
* **Improving Documentation**: Submit Pull Requests (PRs) to google/adk-docs.
* **Writing Code**: Submit PRs to google/adk-python, google/adk-java, or google/adk-docs. All contributions undergo a review process via GitHub Pull Requests.
By contributing, you agree that your contributions will be licensed under the project's Apache 2.0 License.