Skip to content

Commit 88e09ea

Browse files
authored
Merge branch 'main' into switch-itl-to-tpot
2 parents c413fde + a4bdbb5 commit 88e09ea

File tree

6 files changed

+533
-302
lines changed

6 files changed

+533
-302
lines changed

docs/backends.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,24 @@ docker run --gpus 1 -ti --shm-size 1g --ipc=host --rm -p 8080:80 \
4040

4141
For more information on starting a TGI server, see the [TGI Documentation](https://huggingface.co/docs/text-generation-inference/index).
4242

43+
### 3. llama.cpp
44+
45+
[llama.cpp](https://github.com/ggml-org/llama.cpp) provides lightweight, OpenAI-compatible server through its [llama-server](https://github.com/ggml-org/llama.cpp/blob/master/tools/server) tool.
46+
47+
To start a llama.cpp server with the gpt-oss-20b model, you can use the following command:
48+
49+
```bash
50+
llama-server -hf ggml-org/gpt-oss-20b-GGUF --alias gpt-oss-20b --ctx-size 0 --jinja -ub 2048 -b 2048
51+
```
52+
53+
Note that we are providing an alias `gpt-oss-20b` for the model name because `guidellm` is using it to retrieve model metadata in JSON format and such metadata is not included in GGUF model repositories. A simple workaround is to download the metadata files from safetensors repository and place them in a local directory named after the alias:
54+
55+
```bash
56+
huggingface-cli download openai/gpt-oss-20b --include "*.json" --local-dir gpt-oss-20b/
57+
```
58+
59+
Now you can run `guidellm` as usual and it will be able to fetch the model metadata from the local directory.
60+
4361
## Expanding Backend Support
4462

4563
GuideLLM is an open platform, and we encourage contributions to extend its backend support. Whether it's adding new server implementations, integrating with Python-based backends, or enhancing existing capabilities, your contributions are welcome. For more details on how to contribute, see the [CONTRIBUTING.md](https://github.com/vllm-project/guidellm/blob/main/CONTRIBUTING.md) file.

pdm.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,4 @@
1+
[strategy]
2+
update = "reuse"
13
[lock]
24
format = "pylock"

0 commit comments

Comments
 (0)