You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: units/en/unit2/tiny-agents.mdx
+381Lines changed: 381 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -174,3 +174,384 @@ The complete flow we've built allows an agent to:
174
174
- Combine it with other capabilities like file system access and web browsing
175
175
176
176
This modular approach is what makes MCP so powerful for building flexible AI applications.
177
+
178
+
## Bonus: Build a Browser Automation MCP Server with Playwright
179
+
180
+
As a bonus, let's explore how to use the Playwright MCP server for browser automation with Tiny Agents. This demonstrates the extensibility of the MCP ecosystem beyond our sentiment analysis example.
181
+
182
+
<Tip>
183
+
184
+
This section is based on the [Tiny Agents blog post](https://huggingface.co/blog/tiny-agents) and adapted for the MCP course.
185
+
186
+
</Tip>
187
+
188
+
In this section, we'll show you how to build an agent that can perform web automation tasks like searching, clicking, and extracting information from websites.
The Playwright MCP server exposes tools that allow your agent to:
210
+
211
+
1. Open browser tabs
212
+
2. Navigate to URLs
213
+
3. Click on elements
214
+
4. Type into forms
215
+
5. Extract content from webpages
216
+
6. Take screenshots
217
+
218
+
Here's an example interaction with our browser automation agent:
219
+
220
+
```
221
+
User: Search for "tiny agents" on GitHub and collect the names of the top 3 repositories
222
+
223
+
Agent: I'll search GitHub for "tiny agents" repositories.
224
+
[Agent opens browser, navigates to GitHub, performs the search, and extracts repository names]
225
+
226
+
Here are the top 3 (not real) repositories for "tiny agents":
227
+
1. huggingface/tiny-agents
228
+
2. modelcontextprotocol/tiny-agents-examples
229
+
3. langchain/tiny-agents-js
230
+
```
231
+
232
+
This browser automation capability can be combined with other MCP servers to create powerful workflows—for example, extracting text from a webpage and then analyzing it with custom tools.
233
+
234
+
## How to run the complete demo
235
+
236
+
If you have NodeJS (with `pnpm` or `npm`), just run this in a terminal:
237
+
238
+
```bash
239
+
npx @huggingface/mcp-client
240
+
```
241
+
242
+
or if using `pnpm`:
243
+
244
+
```bash
245
+
pnpx @huggingface/mcp-client
246
+
```
247
+
248
+
This installs the package into a temporary folder then executes its command.
249
+
250
+
You'll see your simple Agent connect to multiple MCP servers (running locally), loading their tools (similar to how it would load your Gradio sentiment analysis tool), then prompting you for a conversation.
By default our example Agent connects to the following two MCP servers:
257
+
258
+
- the "canonical" [file system server](https://github.com/modelcontextprotocol/servers/tree/main/src/filesystem), which gets access to your Desktop,
259
+
- and the [Playwright MCP](https://github.com/microsoft/playwright-mcp) server, which knows how to use a sandboxed Chromium browser for you.
260
+
261
+
> [!NOTE]
262
+
> Note: this is a bit counter-intuitive but currently, all MCP servers in tiny agents are actually local processes (though remote servers are coming soon).
263
+
264
+
Our input for this first video was:
265
+
266
+
> write a haiku about the Hugging Face community and write it to a file named "hf.txt" on my Desktop
267
+
268
+
Now let us try this prompt that involves some Web browsing:
269
+
270
+
> do a Web Search for HF inference providers on Brave Search and open the first 3 results
## Implementing an MCP client on top of InferenceClient
304
+
305
+
Now that we know what a tool is in recent LLMs, let's implement the actual MCP client that will communicate with MCP servers and other MCP servers.
306
+
307
+
The official doc at https://modelcontextprotocol.io/quickstart/client is fairly well-written. You only have to replace any mention of the Anthropic client SDK by any other OpenAI-compatible client SDK. (There is also a [llms.txt](https://modelcontextprotocol.io/llms-full.txt) you can feed into your LLM of choice to help you code along).
308
+
309
+
As a reminder, we use HF's `InferenceClient` for our inference client.
310
+
311
+
> [!TIP]
312
+
> The complete `McpClient.ts` code file is [here](https://github.com/huggingface/huggingface.js/blob/main/packages/mcp-client/src/McpClient.ts) if you want to follow along using the actual code 🤓
313
+
314
+
Our `McpClient` class has:
315
+
- an Inference Client (works with any Inference Provider, and `huggingface/inference` supports both remote and local endpoints)
316
+
- a set of MCP client sessions, one for each connected MCP server (this allows us to connect to multiple servers)
317
+
- and a list of available tools that is going to be filled from the connected servers and just slightly re-formatted.
`StdioServerParameters` is an interface from the MCP SDK that will let you easily spawn a local process: as we mentioned earlier, currently, all MCP servers are actually local processes.
374
+
375
+
### How to use the tools
376
+
377
+
Using our sentiment analysis tool (or any other MCP tool) is straightforward. You just pass `this.availableTools` to your LLM chat-completion, in addition to your usual array of messages:
378
+
379
+
```ts
380
+
const stream =this.client.chatCompletionStream({
381
+
provider: this.provider,
382
+
model: this.model,
383
+
messages,
384
+
tools: this.availableTools,
385
+
tool_choice: "auto",
386
+
});
387
+
```
388
+
389
+
`tool_choice: "auto"` is the parameter you pass for the LLM to generate zero, one, or multiple tool calls.
390
+
391
+
When parsing or streaming the output, the LLM will generate some tool calls (i.e. a function name, and some JSON-encoded arguments), which you (as a developer) need to compute. The MCP client SDK once again makes that very easy; it has a `client.callTool()` method:
const result =awaitclient.callTool({ name: toolName, arguments: toolArgs });
408
+
toolMessage.content=result.content[0].text;
409
+
} else {
410
+
toolMessage.content=`Error: No session found for tool: ${toolName}`;
411
+
}
412
+
```
413
+
414
+
If the LLM chooses to use a tool, this code will automatically route the call to the MCP server, execute the analysis, and return the result back to the LLM.
415
+
416
+
Finally you will add the resulting tool message to your `messages` array and back into the LLM.
417
+
418
+
## Our 50-lines-of-code Agent 🤯
419
+
420
+
Now that we have an MCP client capable of connecting to arbitrary MCP servers to get lists of tools and capable of injecting them and parsing them from the LLM inference, well... what is an Agent?
421
+
422
+
> Once you have an inference client with a set of tools, then an Agent is just a while loop on top of it.
423
+
424
+
In more detail, an Agent is simply a combination of:
425
+
- a system prompt
426
+
- an LLM Inference client
427
+
- an MCP client to hook a set of Tools into it from a bunch of MCP servers
428
+
- some basic control flow (see below for the while loop)
429
+
430
+
> [!TIP]
431
+
> The complete `Agent.ts` code file is [here](https://github.com/huggingface/huggingface.js/blob/main/packages/mcp-client/src/Agent.ts).
432
+
433
+
Our Agent class simply extends McpClient:
434
+
435
+
```ts
436
+
exportclassAgentextendsMcpClient {
437
+
privatereadonly servers:StdioServerParameters[];
438
+
protected messages:ChatCompletionInputMessage[];
439
+
440
+
constructor({
441
+
provider,
442
+
model,
443
+
apiKey,
444
+
servers,
445
+
prompt,
446
+
}: {
447
+
provider:InferenceProvider;
448
+
model:string;
449
+
apiKey:string;
450
+
servers:StdioServerParameters[];
451
+
prompt?:string;
452
+
}) {
453
+
super({ provider, model, apiKey });
454
+
this.servers=servers;
455
+
this.messages= [
456
+
{
457
+
role: "system",
458
+
content: prompt??DEFAULT_SYSTEM_PROMPT,
459
+
},
460
+
];
461
+
}
462
+
}
463
+
```
464
+
465
+
By default, we use a very simple system prompt inspired by the one shared in the [GPT-4.1 prompting guide](https://cookbook.openai.com/examples/gpt4-1_prompting_guide).
466
+
467
+
Even though this comes from OpenAI 😈, this sentence in particular applies to more and more models, both closed and open:
468
+
469
+
> We encourage developers to exclusively use the tools field to pass tools, rather than manually injecting tool descriptions into your prompt and writing a separate parser for tool calls, as some have reported doing in the past.
470
+
471
+
Which is to say, we don't need to provide painstakingly formatted lists of tool use examples in the prompt. The `tools: this.availableTools` param is enough, and the LLM will know how to use both the filesystem tools.
472
+
473
+
Loading the tools on the Agent is literally just connecting to the MCP servers we want (in parallel because it's so easy to do in JS):
When calling any of these tools, the Agent will break its loop and give control back to the user for new input.
510
+
511
+
### The complete while loop
512
+
513
+
Behold our complete while loop.🎉
514
+
515
+
The gist of our Agent's main while loop is that we simply iterate with the LLM alternating between tool calling and feeding it the tool results, and we do so **until the LLM starts to respond with two non-tool messages in a row**.
0 commit comments