You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/en/guides/inference.md
+63Lines changed: 63 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -443,6 +443,69 @@ strictly the same as the sync-only version.
443
443
444
444
For more information about the `asyncio` module, please refer to the [official documentation](https://docs.python.org/3/library/asyncio.html).
445
445
446
+
## MCP Client
447
+
448
+
The `huggingface_hub` library now includes an experimental [`MCPClient`], designed to empower Large Language Models (LLMs) with the ability to interact with external Tools via the [Model Context Protocol](https://modelcontextprotocol.io) (MCP). This client extends an [`AsyncInferenceClient`] to seamlessly integrate Tool usage.
449
+
450
+
The [`MCPClient`] connects to MCP servers (either local `stdio` scripts or remote `http`/`sse` services) that expose tools. It feeds these tools to an LLM (via [`AsyncInferenceClient`]). If the LLM decides to use a tool, [`MCPClient`] manages the execution request to the MCP server and relays the Tool's output back to the LLM, often streaming results in real-time.
451
+
452
+
In the following example, we use [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) model via [Nebius](https://nebius.com/) inference provider. We then add a remote MCP server, in this case, an SSE server which made the Flux image generation tool available to the LLM.
453
+
454
+
```python
455
+
import os
456
+
457
+
from huggingface_hub import ChatCompletionInputMessage, ChatCompletionStreamOutput, MCPClient
For even simpler development, we offer a higher-level [`Agent`] class. This 'Tiny Agent' simplifies creating conversational Agents by managing the chat loop and state, essentially acting as a wrapper around [`MCPClient`]. It's designed to be a simple while loop built right on top of an [`MCPClient`]. You can run these Agents directly from the command line:
497
+
498
+
499
+
```bash
500
+
# install latest version of huggingface_hub with the mcp extra
501
+
pip install -U huggingface_hub[mcp]
502
+
# Run an agent that uses the Flux image generation tool
503
+
tiny-agents run julien-c/flux-schnell-generator
504
+
505
+
```
506
+
507
+
When launched, the Agent will load, list the Tools it has discovered from its connected MCP servers, and then it's ready for your prompts!
508
+
446
509
## Advanced tips
447
510
448
511
In the above section, we saw the main aspects of [`InferenceClient`]. Let's dive into some more advanced tips.
The `huggingface_hub` library now includes an [`MCPClient`], designed to empower Large Language Models (LLMs) with the ability to interact with external Tools via the [Model Context Protocol](https://modelcontextprotocol.io) (MCP). This client extends an [`AsyncInferenceClient`] to seamlessly integrate Tool usage.
4
+
5
+
The [`MCPClient`] connects to MCP servers (local `stdio` scripts or remote `http`/`sse` services) that expose tools. It feeds these tools to an LLM (via [`AsyncInferenceClient`]). If the LLM decides to use a tool, [`MCPClient`] manages the execution request to the MCP server and relays the Tool's output back to the LLM, often streaming results in real-time.
6
+
7
+
We also provide a higher-level [`Agent`] class. This 'Tiny Agent' simplifies creating conversational Agents by managing the chat loop and state, acting as a wrapper around [`MCPClient`].
Copy file name to clipboardExpand all lines: src/huggingface_hub/inference/_mcp/agent.py
+30-2Lines changed: 30 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -11,8 +11,27 @@
11
11
12
12
classAgent(MCPClient):
13
13
"""
14
-
Python implementation of a Simple Agent
15
-
i.e. just a basic while loop on top of an Inference Client with MCP-powered tools
14
+
Implementation of a Simple Agent, which is a simple while loop built right on top of an [`MCPClient`].
15
+
16
+
<Tip warning={true}>
17
+
18
+
This class is experimental and might be subject to breaking changes in the future without prior notice.
19
+
20
+
</Tip>
21
+
22
+
Args:
23
+
model (`str`):
24
+
The model to run inference with. Can be a model id hosted on the Hugging Face Hub, e.g. `meta-llama/Meta-Llama-3-8B-Instruct`
25
+
or a URL to a deployed Inference Endpoint or other local or remote endpoint.
26
+
servers (`Iterable[Dict]`):
27
+
MCP servers to connect to. Each server is a dictionary containing a `type` key and a `config` key. The `type` key can be `"stdio"` or `"sse"`, and the `config` key is a dictionary of arguments for the server.
28
+
provider (`str`, *optional*):
29
+
Name of the provider to use for inference. Defaults to "auto" i.e. the first of the providers available for the model, sorted by the user's order in https://hf.co/settings/inference-providers.
30
+
If model is a URL or `base_url` is passed, then `provider` is not used.
31
+
api_key (`str`, *optional*):
32
+
Token to use for authentication. Will default to the locally Hugging Face saved token if not provided. You can also use your own provider API key to interact directly with the provider's service.
33
+
prompt (`str`, *optional*):
34
+
The system prompt to use for the agent. Defaults to the default system prompt in `constants.py`.
Copy file name to clipboardExpand all lines: src/huggingface_hub/inference/_mcp/mcp_client.py
+28-17Lines changed: 28 additions & 17 deletions
Original file line number
Diff line number
Diff line change
@@ -61,6 +61,16 @@ class MCPClient:
61
61
This class is experimental and might be subject to breaking changes in the future without prior notice.
62
62
63
63
</Tip>
64
+
65
+
Args:
66
+
model (`str`, `optional`):
67
+
The model to run inference with. Can be a model id hosted on the Hugging Face Hub, e.g. `meta-llama/Meta-Llama-3-8B-Instruct`
68
+
or a URL to a deployed Inference Endpoint or other local or remote endpoint.
69
+
provider (`str`, *optional*):
70
+
Name of the provider to use for inference. Defaults to "auto" i.e. the first of the providers available for the model, sorted by the user's order in https://hf.co/settings/inference-providers.
71
+
If model is a URL or `base_url` is passed, then `provider` is not used.
72
+
api_key (`str`, `optional`):
73
+
Token to use for authentication. Will default to the locally Hugging Face saved token if not provided. You can also use your own provider API key to interact directly with the provider's service.
0 commit comments