You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/blog/posts/ai_agent_tutorial.md
+1-2Lines changed: 1 addition & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,8 +21,7 @@ In this post, let's chat a bit about AI Agents and how we can tailor LLMs to ans
21
21
22
22
Before diving into more complex applications, let’s take a moment to introduce the topic. This way, we can get comfortable with the terms and concepts we'll be working with!
23
23
24
-
This tutorial is inspired by the fantastic course put together by [HuggingFace](https://huggingface.co/learn/agents-course/unit0/introduction)!
25
-
24
+
This tutorial is a summmary of [HuggingFace](https://huggingface.co/learn/agents-course/unit0/introduction) Course on Agents - Unit 1.
After performing an action, the framework follows these steps in order:
31
+
32
+
***Parse the action** to identify the function(s) to call and the argument(s) to use.
33
+
***Execute the action**.
34
+
***Append** the result as an Observation.
35
+
36
+
In many agent frameworks, rules and guidelines are directly embedded into the system prompt. In the following System Message example we defined we can see:
37
+
38
+
* The Agent’s behavior.
39
+
40
+
* The Tools our Agent has access to.
41
+
42
+
* The Thought-Action-Observation Cycle, that we insert into the LLM instructions.
43
+
44
+

45
+
46
+
Let’s consider a practical example. Imagine we ask an agent about the temperature in Toronto. When the agent receives this question, it begins the initial step of the "Think" process. This "Think" step represents the agent’s internal reasoning and planning activities to solve the task at hand. The agent utilizes its LLM capabilities to analyze the information presented in its prompt.
47
+
48
+
During this process, the agent can break down complex problems into smaller, more manageable steps, reflect on past experiences, and continuously adjust its plans based on new information. Key components of this thought process include planning, analysis, decision-making, problem-solving, memory integration, self-reflection, goal-setting, and prioritization.
49
+
50
+
**For LLMs that are fine-tuned for function-calling, the thought process is optional.**
51
+
52
+
<blockquote>
53
+
The user needs current weather information for Toronto. I have access to a tool that fetches weather data. First, I need to call the weather API to get up-to-date details.
54
+
</blockquote>
55
+
56
+
This step shows the agent breaking the problem into steps: first, gathering the necessary data.
57
+
58
+
Based on its reasoning and the fact that the Agent is aware of a <code>get_weather</code> tool, the Agent prepares a JSON-formatted command to call the weather API tool. For example, its first action could be:
59
+
60
+
```python
61
+
{
62
+
"action": "get_weather",
63
+
"action_input": {
64
+
"location": "Toronto"
65
+
}
66
+
}
67
+
```
68
+
The "Observation" step refers to the environment's response to an API call or the raw data received. This observation is then added to the prompt as additional context. Before the Agent formats and presents the final answer to the user, it returns to the "Think" step to update its internal reasoning. If the observation indicates an error or incomplete data, the Agent may re-enter the cycle to correct its approach.
69
+
70
+
The ability to call external tools, such as a weather API, empowers the Agent to access real-time data, which is a critical capability for any effective AI agent. Each cycle prompts the Agent to integrate new information (observations) into its reasoning (thought process), ensuring that the final outcome is accurate and well-informed. This illustrates the core principle of the ReAct cycle: the dynamic interplay of Thought, Action, and Observation that enables AI agents to tackle complex tasks with precision and efficiency. By mastering these principles, you can design agents that not only reason through their tasks but also leverage external tools to achieve their objectives, continuously refining their outputs based on environmental feedback.
71
+
72
+
### The ReAct Approach
73
+
74
+
Another technique is the ReAct approach, which combines “Reasoning” (Think) with “Acting” (Act). ReAct is a straightforward prompting method that adds step-by-step reasoning before allowing the LLM to interpret the next tokens. In fact, encouraging the model to think like this promotes the decoding process towards the next tokens that create a plan, rather than jumping to a final solution, as the model is prompted to break the problem down into smaller tasks. This enables the model to examine sub-steps more thoroughly, which generally results in fewer errors compared to attempting to produce the final solution all at once. For instance, DeepSeek, which have been fine-tuned to "think before answering". These models have been trained to always include specific thinking sections (enclosed between <think> and </think> special tokens). This is not just a prompting technique like ReAct, but a training method where the model learns to generate these sections after analyzing thousands of examples that show what we expect it to do.
75
+
76
+
### Actions
77
+
78
+
Actions refer to the specific steps that an AI agent undertakes to engage with its surroundings. Whether it involves searching the internet for information or managing a physical device, every action is a purposeful task performed by the agent. For instance, an agent that aids in customer service could obtain customer information, provide support articles, or escalate problems to a human representative.
79
+
80
+
There are multiple types of Agents that take actions differently:
An essential aspect of an agent is its ability to stop generating new tokens once an action is complete, applicable across all formats (JSON, code, function-calling). This prevents unintended output and ensures clarity. The LLM handles text to describe the desired action and its parameters.
96
+
97
+
One approach to implementing actions is known as the **stop and parse approach**. This method ensures that output generation is structured, using formats like JSON or code. It aims to avoid producing unnecessary tokens and to call the appropriate tool to extract the required parameters.
98
+
99
+
```python
100
+
Thought: I need to check the current weather.
101
+
Action :
102
+
{
103
+
"action": "get_weather",
104
+
"action_input": {"location": "Toronto"}
105
+
}
106
+
```
107
+
108
+
Function-calling agents operate similarly by structuring each action so that a designated function is invoked with the correct arguments.
109
+
110
+
An alternative Action approach is using Code Agents. Instead of outputting a simple JSON object, a Code Agent generates an executable code block—typically in a high-level language like Python.
111
+
112
+
{align="center"}
113
+
114
+
This approach offers several advantages:
115
+
116
+
***Expressiveness**: Code can naturally represent complex logic, including loops, conditionals, and nested functions, providing greater flexibility than JSON.
117
+
118
+
***Modularity and Reusability**: Generated code can include functions and modules that are reusable across different actions or tasks.
119
+
120
+
***Enhanced Debuggability**: With a well-defined programming syntax, code errors are often easier to detect and correct.
121
+
122
+
***Direct Integration**: Code Agents can integrate directly with external libraries and APIs, enabling more complex operations such as data processing or real-time decision making.
123
+
124
+
For example, a Code Agent tasked with fetching the weather might generate the following Python snippet:
125
+
126
+
```python
127
+
# Code Agent Example: Retrieve Weather Information
return data.get("weather", "No weather information available")
135
+
else:
136
+
return"Error: Unable to fetch weather data."
137
+
138
+
# Execute the function and prepare the final answer
139
+
result = get_weather("Toronto")
140
+
final_answer =f"The current weather in Toronto is: {result}"
141
+
print(final_answer)
142
+
```
143
+
This method also follows the stop and parse approach by clearly delimiting the code block and signaling when execution is complete by printing the <code>final_answer</code>.
144
+
145
+
### Observation
146
+
147
+
Observations are how an Agent perceives the consequences of its actions. We ca understand as signals from the environment that guide the next cycle of thought.
148
+
149
+
In the observation phase, the agent:
150
+
151
+
-**Collects Feedback**: Receives confirmation of action success.
152
+
-**Appends Results**: Updates its memory with new information.
153
+
-**Adapts Strategy**: Refines future actions based on updated context.
154
+
155
+
This process of using feedback helps the agent stay on track with its goals. It allows the agent to learn and adjust continuously based on real-world results. Observation can also be seen as Tool “logs” that provide textual feedback of the Action execution.
156
+
157
+
**Type of Observation and Examples**
158
+
159
+
1.**System Feedback:** Error messages, success notifications, or status codes.
160
+
2.**Data Changes:** Updates in the database, modifications to the file system, or changes in state.
161
+
3.**Environmental Data:** Readings from sensors, system metrics, or resource usage information.
162
+
4.**Response Analysis:** Responses from APIs, query results, or outputs from computations.
163
+
5.**Time-based Events:** Completion of scheduled tasks or milestones reached, such as deadlines.
164
+
165
+
Comments:
166
+
167
+
Both tutorials may seem technical, but they offer an overview of understanding the potential of AI agents. In the next blog post, we will discuss an implementation in public transit.
0 commit comments