Skip to content

Commit 00d4775

Browse files
LysandreJikpcuenca
andauthored
Reorder serving docs (#39634)
* Slight reorg * LLMs + draft VLMs * Actual VLM examples * Initial responses * Reorder * Update docs/source/en/serving.md Co-authored-by: Pedro Cuenca <[email protected]> * Update docs/source/en/tiny_agents.md Co-authored-by: Pedro Cuenca <[email protected]> * Update docs/source/en/open_webui.md Co-authored-by: Pedro Cuenca <[email protected]> * Update docs/source/en/cursor.md Co-authored-by: Pedro Cuenca <[email protected]> * Update docs/source/en/serving.md Co-authored-by: Pedro Cuenca <[email protected]> * Responses API * Address Pedro's comments --------- Co-authored-by: Pedro Cuenca <[email protected]>
1 parent 8c4ea67 commit 00d4775

File tree

7 files changed

+414
-98
lines changed

7 files changed

+414
-98
lines changed

docs/source/en/_toctree.yml

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,18 @@
8989
- local: chat_extras
9090
title: Tools and RAG
9191
title: Chat with models
92+
- sections:
93+
- local: serving
94+
title: Serving LLMs, VLMs, and other chat-based models
95+
- local: jan
96+
title: Jan
97+
- local: cursor
98+
title: Cursor
99+
- local: tiny_agents
100+
title: Tiny-Agents CLI and MCP tools
101+
- local: open_webui
102+
title: Open WebUI
103+
title: Serving
92104
- sections:
93105
- local: perf_torch_compile
94106
title: torch.compile
@@ -103,8 +115,6 @@
103115
title: Agents
104116
- local: tools
105117
title: Tools
106-
- local: serving
107-
title: Serving
108118
- local: transformers_as_backend
109119
title: Inference server backends
110120
title: Inference

docs/source/en/cursor.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Using Cursor as a client of transformers serve
2+
3+
This example shows how to use `transformers serve` as a local LLM provider for [Cursor](https://cursor.com/), the popular IDE. In this particular case, requests to `transformers serve` will come from an external IP (Cursor's server IPs), which requires some additional setup. Furthermore, some of Cursor's requests require [CORS](https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/CORS), which is disabled by default for security reasons.
4+
5+
To launch a server with CORS enabled, run
6+
7+
```shell
8+
transformers serve --enable-cors
9+
```
10+
11+
You'll also need to expose your server to external IPs. A potential solution is to use [`ngrok`](https://ngrok.com/), which has a permissive free tier. After setting up your `ngrok` account and authenticating on your server machine, you run
12+
13+
```shell
14+
ngrok http [port]
15+
```
16+
17+
where `port` is the port used by `transformers serve` (`8000` by default). On the terminal where you launched `ngrok`, you'll see a https address in the "Forwarding" row, as in the image below. This is the address to send requests to.
18+
19+
<h3 align="center">
20+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/transformers_serve_ngrok.png"/>
21+
</h3>
22+
23+
You're now ready to set things up on the app side! In Cursor, while you can't set a new provider, you can change the endpoint for OpenAI requests in the model selection settings. First, navigate to "Settings" > "Cursor Settings", "Models" tab, and expand the "API Keys" collapsible. To set your `transformers serve` endpoint, follow this order:
24+
1. Unselect ALL models in the list above (e.g. `gpt4`, ...);
25+
2. Add and select the model you want to use (e.g. `Qwen/Qwen3-4B`)
26+
3. Add some random text to OpenAI API Key. This field won't be used, but it can’t be empty;
27+
4. Add the https address from `ngrok` to the "Override OpenAI Base URL" field, appending `/v1` to the address (i.e. `https://(...).ngrok-free.app/v1`);
28+
5. Hit "Verify".
29+
30+
After you follow these steps, your "Models" tab should look like the image below. Your server should also have received a few requests from the verification step.
31+
32+
<h3 align="center">
33+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/transformers_serve_cursor.png"/>
34+
</h3>
35+
36+
You are now ready to use your local model in Cursor! For instance, if you toggle the AI Pane, you can select the model you added and ask it questions about your local files.
37+
38+
<h3 align="center">
39+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/transformers_serve_cursor_chat.png"/>
40+
</h3>
41+
42+

docs/source/en/jan.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Jan: using the serving API as a local LLM provider
2+
3+
This example shows how to use `transformers serve` as a local LLM provider for the [Jan](https://jan.ai/) app. Jan is a ChatGPT-alternative graphical interface, fully running on your machine. The requests to `transformers serve` come directly from the local app -- while this section focuses on Jan, you can extrapolate some instructions to other apps that make local requests.
4+
5+
## Running models locally
6+
7+
To connect `transformers serve` with Jan, you'll need to set up a new model provider ("Settings" > "Model Providers"). Click on "Add Provider", and set a new name. In your new model provider page, all you need to set is the "Base URL" to the following pattern:
8+
9+
```shell
10+
http://[host]:[port]/v1
11+
```
12+
13+
where `host` and `port` are the `transformers serve` CLI parameters (`localhost:8000` by default). After setting this up, you should be able to see some models in the "Models" section, hitting "Refresh". Make sure you add some text in the "API key" text field too -- this data is not actually used, but the field can't be empty. Your custom model provider page should look like this:
14+
15+
<h3 align="center">
16+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/transformers_serve_jan_model_providers.png"/>
17+
</h3>
18+
19+
You are now ready to chat!
20+
21+
> [!TIP]
22+
> You can add any `transformers`-compatible model to Jan through `transformers serve`. In the custom model provider you created, click on the "+" button in the "Models" section and add its Hub repository name, e.g. `Qwen/Qwen3-4B`.
23+
24+
## Running models on a separate machine
25+
26+
To conclude this example, let's look into a more advanced use-case. If you have a beefy machine to serve models with, but prefer using Jan on a different device, you need to add port forwarding. If you have `ssh` access from your Jan machine into your server, this can be accomplished by typing the following to your Jan machine's terminal
27+
28+
```
29+
ssh -N -f -L 8000:localhost:8000 your_server_account@your_server_IP -p port_to_ssh_into_your_server
30+
```
31+
32+
Port forwarding is not Jan-specific: you can use it to connect `transformers serve` running in a different machine with an app of your choice.

docs/source/en/open_webui.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Audio transcriptions with WebUI and `transformers serve`
2+
3+
This guide shows how to do audio transcription for chat purposes, using `transformers serve` and [Open WebUI](https://openwebui.com/). This guide assumes you have Open WebUI installed on your machine and ready to run. Please refer to the examples above to use the text functionalities of `transformer serve` with Open WebUI -- the instructions are the same.
4+
5+
To start, let's launch the server. Some of Open WebUI's requests require [CORS](https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/CORS), which is disabled by default for security reasons, so you need to enable it:
6+
7+
```shell
8+
transformers serve --enable-cors
9+
```
10+
11+
Before you can speak into Open WebUI, you need to update its settings to use your server for speech to text (STT) tasks. Launch Open WebUI, and navigate to the audio tab inside the admin settings. If you're using Open WebUI with the default ports, [this link (default)](http://localhost:3000/admin/settings/audio) or [this link (python deployment)](http://localhost:8080/admin/settings/audio) will take you there. Do the following changes there:
12+
1. Change the type of "Speech-to-Text Engine" to "OpenAI";
13+
2. Update the address to your server's address -- `http://localhost:8000/v1` by default;
14+
3. Type your model of choice into the "STT Model" field, e.g. `openai/whisper-large-v3` ([available models](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=trending)).
15+
16+
If you've done everything correctly, the audio tab should look like this
17+
18+
<h3 align="center">
19+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/transformers_openwebui_stt_settings.png"/>
20+
</h3>
21+
22+
You're now ready to speak! Open a new chat, utter a few words after hitting the microphone button, and you should see the corresponding text on the chat input after the model transcribes it.

0 commit comments

Comments
 (0)