Skip to content

Commit afab954

Browse files
committed
improve relationship between tiny agents and gradio use case
1 parent 8e25a3a commit afab954

File tree

1 file changed

+109
-47
lines changed

1 file changed

+109
-47
lines changed

units/en/unit2/tiny-agents.mdx

Lines changed: 109 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,13 @@
11
# Tiny Agents: an MCP-powered agent in 50 lines of code
22

3-
Now that we've built MCP server's in Gradio let's explore MCP clients even further. This section on based on the experimental project [Tiny Agents](https://huggingface.co/blog/tiny-agents). Which is a super simple way of deploying MCP clients and using Hugging Face Inference Providers.
3+
Now that we've built MCP servers in Gradio let's explore MCP clients even further. This section builds on the experimental project [Tiny Agents](https://huggingface.co/blog/tiny-agents), which demonstrates a super simple way of deploying MCP clients that can connect to services like our Gradio sentiment analysis server.
44

5-
6-
It is fairly simple to extend an Inference Client – at HF, we have two official client SDKs: [`@huggingface/inference`](https://github.com/huggingface/huggingface.js) in JS, and [`huggingface_hub`](https://github.com/huggingface/huggingface_hub/) in Python – to also act as a MCP client and hook the available tools from MCP servers into the LLM inference.
7-
8-
<Tip>
9-
10-
Once you have an MCP Client, an Agent is literally just a while loop on top of it.
11-
12-
</Tip>
13-
14-
In short exercise, we will walk you through how to implement a Typescript (JS) MCP client, how you can adopt MCP too and how it's going to make Agentic AI way simpler going forward.
5+
In this short exercise, we will walk you through how to implement a TypeScript (JS) MCP client that can communicate with any MCP server, including the Gradio-based sentiment analysis server we built in the previous section. You'll see how MCP standardizes the way agents interact with tools, making Agentic AI development significantly simpler.
156

167
![meme](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/tiny-agents/thumbnail.jpg)
178
<figcaption>Image credit https://x.com/adamdotdev</figcaption>
189

19-
We will also show you how to connect your tiny agent to Gradio based MCP server from the previous section.
10+
We will show you how to connect your tiny agent to Gradio-based MCP servers, allowing it to leverage both your custom sentiment analysis tool and other pre-built tools.
2011

2112
## How to run the complete demo
2213

@@ -32,9 +23,9 @@ or if using `pnpm`:
3223
pnpx @huggingface/mcp-client
3324
```
3425

35-
This installs my package into a temporary folder then executes its command.
26+
This installs the package into a temporary folder then executes its command.
3627

37-
You'll see your simple Agent connect to two distinct MCP servers (running locally), loading their tools, then prompting you for a conversation.
28+
You'll see your simple Agent connect to multiple MCP servers (running locally), loading their tools (similar to how it would load your Gradio sentiment analysis tool), then prompting you for a conversation.
3829

3930
<video controls autoplay loop>
4031
<source src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/tiny-agents/use-filesystem.mp4" type="video/mp4">
@@ -45,8 +36,10 @@ By default our example Agent connects to the following two MCP servers:
4536
- the "canonical" [file system server](https://github.com/modelcontextprotocol/servers/tree/main/src/filesystem), which gets access to your Desktop,
4637
- and the [Playwright MCP](https://github.com/microsoft/playwright-mcp) server, which knows how to use a sandboxed Chromium browser for you.
4738

39+
You can easily add your Gradio sentiment analysis server to this list, as we'll demonstrate later in this section.
40+
4841
> [!NOTE]
49-
> Note: this is a bit counter-intuitive but currently, all MCP servers are actually local processes (though remote servers are coming soon).
42+
> Note: this is a bit counter-intuitive but currently, all MCP servers in tiny agents are actually local processes (though remote servers are coming soon). This doesn't includes our Gradio server running on localhost:7860.
5043
5144
Our input for this first video was:
5245

@@ -60,40 +53,56 @@ Now let us try this prompt that involves some Web browsing:
6053
<source src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/tiny-agents/brave-search.mp4" type="video/mp4">
6154
</video>
6255

56+
With our Gradio sentiment analysis tool connected, we could similarly ask:
57+
> analyze the sentiment of this review: "I absolutely loved the product, it exceeded all my expectations!"
58+
6359
### Default model and provider
6460

6561
In terms of model/provider pair, our example Agent uses by default:
6662
- ["Qwen/Qwen2.5-72B-Instruct"](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct)
6763
- running on [Nebius](https://huggingface.co/docs/inference-providers/providers/nebius)
6864

69-
This is all configurable through env variables! See:
65+
This is all configurable through env variables! Here, we'll also show how to add our Gradio MCP server:
7066

7167
```ts
7268
const agent = new Agent({
7369
provider: process.env.PROVIDER ?? "nebius",
7470
model: process.env.MODEL_ID ?? "Qwen/Qwen2.5-72B-Instruct",
7571
apiKey: process.env.HF_TOKEN,
76-
servers: SERVERS,
72+
servers: [
73+
// Default servers
74+
{
75+
command: "npx",
76+
args: ["@modelcontextprotocol/servers", "filesystem"]
77+
},
78+
{
79+
command: "npx",
80+
args: ["playwright-mcp"]
81+
},
82+
// Our Gradio sentiment analysis server
83+
{
84+
command: "npx",
85+
args: [
86+
"mcp-remote",
87+
"http://localhost:7860/gradio_api/mcp/sse"
88+
]
89+
}
90+
],
7791
});
7892
```
7993

80-
## Where does the code live
81-
82-
The Tiny Agent code lives in the `mcp-client` sub-package of the `huggingface.js` mono-repo, which is the GitHub mono-repo in which all our JS libraries reside.
94+
<Tip>
8395

84-
https://github.com/huggingface/huggingface.js/tree/main/packages/mcp-client
96+
We connect to our Gradio based MCP server via the [`mcp-remote`](https://www.npmjs.com/package/mcp-remote) package.
8597

86-
> [!TIP]
87-
> The codebase uses modern JS features (notably, async generators) which make things way easier to implement, especially asynchronous events like the LLM responses.
88-
> You might need to ask a LLM about those JS features if you're not yet familiar with them.
98+
</Tip>
8999

90100

91101
## The foundation for this: tool calling native support in LLMs.
92102

93-
What is going to make this whole blogpost very easy is that the recent crop of LLMs (both closed and open) have been trained for function calling, aka. tool use.
103+
What makes connecting Gradio MCP servers to our Tiny Agent possible is that recent LLMs (both closed and open) have been trained for function calling, aka. tool use. This same capability powers our integration with the sentiment analysis tool we built with Gradio.
94104

95-
A tool is defined by its name, a description, and a JSONSchema representation of its parameters.
96-
In some sense, it is an opaque representation of any function's interface, as seen from the outside (meaning, the LLM does not care how the function is actually implemented).
105+
A tool is defined by its name, a description, and a JSONSchema representation of its parameters - exactly how we defined our sentiment analysis function in the Gradio server. Let's look at a simple example:
97106

98107
```ts
99108
const weatherTool = {
@@ -114,6 +123,8 @@ const weatherTool = {
114123
};
115124
```
116125

126+
Our Gradio sentiment analysis tool would have a similar structure, with `text` as the input parameter instead of `location`.
127+
117128
The canonical documentation I will link to here is [OpenAI's function calling doc](https://platform.openai.com/docs/guides/function-calling?api-mode=chat). (Yes... OpenAI pretty much defines the LLM standards for the whole community 😅).
118129

119130
Inference engines let you pass a list of tools when calling the LLM, and the LLM is free to call zero, one or more of those tools.
@@ -124,7 +135,7 @@ As a developer, you run the tools and feed their result back into the LLM to con
124135
125136
## Implementing an MCP client on top of InferenceClient
126137

127-
Now that we know what a tool is in recent LLMs, let us implement the actual MCP client.
138+
Now that we know what a tool is in recent LLMs, let's implement the actual MCP client that will communicate with our Gradio server and other MCP servers.
128139

129140
The official doc at https://modelcontextprotocol.io/quickstart/client is fairly well-written. You only have to replace any mention of the Anthropic client SDK by any other OpenAI-compatible client SDK. (There is also a [llms.txt](https://modelcontextprotocol.io/llms-full.txt) you can feed into your LLM of choice to help you code along).
130141

@@ -135,7 +146,7 @@ As a reminder, we use HF's `InferenceClient` for our inference client.
135146
136147
Our `McpClient` class has:
137148
- an Inference Client (works with any Inference Provider, and `huggingface/inference` supports both remote and local endpoints)
138-
- a set of MCP client sessions, one for each connected MCP server (yes, we want to support multiple servers)
149+
- a set of MCP client sessions, one for each connected MCP server (this allows us to connect to multiple servers, including our Gradio server)
139150
- and a list of available tools that is going to be filled from the connected servers and just slightly re-formatted.
140151

141152
```ts
@@ -156,7 +167,7 @@ export class McpClient {
156167
}
157168
```
158169

159-
To connect to a MCP server, the official `@modelcontextprotocol/sdk/client` TypeScript SDK provides a `Client` class with a `listTools()` method:
170+
To connect to a MCP server (like our Gradio sentiment analysis server), the official `@modelcontextprotocol/sdk/client` TypeScript SDK provides a `Client` class with a `listTools()` method:
160171

161172
```ts
162173
async addMcpServer(server: StdioServerParameters): Promise<void> {
@@ -192,13 +203,13 @@ async addMcpServer(server: StdioServerParameters): Promise<void> {
192203
}
193204
```
194205

195-
`StdioServerParameters` is an interface from the MCP SDK that will let you easily spawn a local process: as we mentioned earlier, currently, all MCP servers are actually local processes.
206+
`StdioServerParameters` is an interface from the MCP SDK that will let you easily spawn a local process: as we mentioned earlier, currently, all MCP servers are actually local processes, including our Gradio server (though we access it via HTTP).
196207

197-
For each MCP server we connect to, we slightly re-format its list of tools and add them to `this.availableTools`.
208+
For each MCP server we connect to (including our Gradio sentiment analysis server), we slightly re-format its list of tools and add them to `this.availableTools`.
198209

199210
### How to use the tools
200211

201-
Easy, you just pass `this.availableTools` to your LLM chat-completion, in addition to your usual array of messages:
212+
Using our sentiment analysis tool (or any other MCP tool) is straightforward. You just pass `this.availableTools` to your LLM chat-completion, in addition to your usual array of messages:
202213

203214
```ts
204215
const stream = this.client.chatCompletionStream({
@@ -235,18 +246,20 @@ if (client) {
235246
}
236247
```
237248

249+
If the LLM chooses to use our sentiment analysis tool, this code will automatically route the call to our Gradio server, execute the analysis, and return the result back to the LLM.
250+
238251
Finally you will add the resulting tool message to your `messages` array and back into the LLM.
239252

240253
## Our 50-lines-of-code Agent 🤯
241254

242-
Now that we have an MCP client capable of connecting to arbitrary MCP servers to get lists of tools and capable of injecting them and parsing them from the LLM inference, well... what is an Agent?
255+
Now that we have an MCP client capable of connecting to arbitrary MCP servers (including our Gradio sentiment analysis server) to get lists of tools and capable of injecting them and parsing them from the LLM inference, well... what is an Agent?
243256

244257
> Once you have an inference client with a set of tools, then an Agent is just a while loop on top of it.
245258
246259
In more detail, an Agent is simply a combination of:
247260
- a system prompt
248261
- an LLM Inference client
249-
- an MCP client to hook a set of Tools into it from a bunch of MCP servers
262+
- an MCP client to hook a set of Tools into it from a bunch of MCP servers (including our Gradio server)
250263
- some basic control flow (see below for the while loop)
251264

252265
> [!TIP]
@@ -290,7 +303,7 @@ Even though this comes from OpenAI 😈, this sentence in particular applies to
290303

291304
> We encourage developers to exclusively use the tools field to pass tools, rather than manually injecting tool descriptions into your prompt and writing a separate parser for tool calls, as some have reported doing in the past.
292305
293-
Which is to say, we don't need to provide painstakingly formatted lists of tool use examples in the prompt. The `tools: this.availableTools` param is enough.
306+
Which is to say, we don't need to provide painstakingly formatted lists of tool use examples in the prompt. The `tools: this.availableTools` param is enough, and the LLM will know how to use both the filesystem tools and our Gradio sentiment analysis tool.
294307

295308
Loading the tools on the Agent is literally just connecting to the MCP servers we want (in parallel because it's so easy to do in JS):
296309

@@ -377,19 +390,68 @@ while (true) {
377390
}
378391
```
379392

380-
## Next steps
393+
## Connecting Tiny Agents with Gradio MCP Servers
394+
395+
Now that we understand both Tiny Agents and Gradio MCP servers, let's see how they work together! The beauty of MCP is that it provides a standardized way for agents to interact with any MCP-compatible server, including our Gradio-based sentiment analysis server.
396+
397+
### Using the Gradio Server with Tiny Agents
398+
399+
To connect our Tiny Agent to the Gradio sentiment analysis server we built earlier, we just need to add it to our list of servers. Here's how we can modify our agent configuration:
400+
401+
```ts
402+
const agent = new Agent({
403+
provider: process.env.PROVIDER ?? "nebius",
404+
model: process.env.MODEL_ID ?? "Qwen/Qwen2.5-72B-Instruct",
405+
apiKey: process.env.HF_TOKEN,
406+
servers: [
407+
// ... existing servers ...
408+
{
409+
command: "npx",
410+
args: [
411+
"mcp-remote",
412+
"http://localhost:7860/gradio_api/mcp/sse" // Your Gradio MCP server
413+
]
414+
}
415+
],
416+
});
417+
```
418+
419+
Now our agent can use the sentiment analysis tool alongside other tools! For example, it could:
420+
1. Read text from a file using the filesystem server
421+
2. Analyze its sentiment using our Gradio server
422+
3. Write the results back to a file
423+
424+
### Example Interaction
425+
426+
Here's what a conversation with our agent might look like:
427+
428+
```
429+
User: Read the file "feedback.txt" from my Desktop and analyze its sentiment
430+
431+
Agent: I'll help you analyze the sentiment of the feedback file. Let me break this down into steps:
432+
433+
1. First, I'll read the file using the filesystem tool
434+
2. Then, I'll analyze its sentiment using the sentiment analysis tool
435+
3. Finally, I'll write the results to a new file
436+
437+
[Agent proceeds to use the tools and provide the analysis]
438+
```
439+
440+
### Deployment Considerations
441+
442+
When deploying your Gradio MCP server to Hugging Face Spaces, you'll need to update the server URL in your agent configuration to point to your deployed space:
381443

382-
There are many cool potential next steps once you have a running MCP Client and a simple way to build Agents 🔥
444+
```ts
445+
{
446+
command: "npx",
447+
args: [
448+
"mcp-remote",
449+
"https://YOUR_USERNAME-mcp-sentiment.hf.space/gradio_api/mcp/sse"
450+
]
451+
}
452+
```
383453

384-
- Experiment with **other models**
385-
- [mistralai/Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503) is optimized for function calling
386-
- Gemma 3 27B, the [Gemma 3 QAT](https://huggingface.co/collections/google/gemma-3-qat-67ee61ccacbf2be4195c265b) models are a popular choice for function calling though it would require us to implement tool parsing as it's not using native `tools` (a PR would be welcome!)
387-
- Experiment with all the **[Inference Providers](https://huggingface.co/docs/inference-providers/index)**:
388-
- Cerebras, Cohere, Fal, Fireworks, Hyperbolic, Nebius, Novita, Replicate, SambaNova, Together, etc.
389-
- each of them has different optimizations for function calling (also depending on the model) so performance may vary!
390-
- Hook **local LLMs** using llama.cpp or LM Studio
454+
This allows your agent to use the sentiment analysis tool from anywhere, not just locally!
391455

392-
Pull requests and contributions are welcome!
393-
Again, everything here is [open source](https://github.com/huggingface/huggingface.js)! 💎❤️
394456

395457

0 commit comments

Comments
 (0)