forked from ggml-org/llama.cpp
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationroadmap
Description
Note: This issue was copied from ggml-org#9291
Original Author: @ggerganov
Original Issue Number: ggml-org#9291
Created: 2024-09-03T06:56:11Z
Overview
This is a list of changes to the public HTTP interface of the llama-server example. Collaborators are encouraged to edit this post in order to reflect important changes to the API that end up merged into the master branch.
If you are building a 3rd party project that relies on llama-server, it is recommended to follow this issue and check it carefully before upgrading to new versions.
See also:
Recent API changes (most recent at the top)
| version | PR | desc |
|---|---|---|
| b6523 | ggml-org#16109 | In stream mode, error events are now OAI-compatible |
| b6508 | ggml-org#16052 | include usage statistics only when stream_options.include_usage is specified |
| b6399 | ggml-org#15827 | added return_progress and timings.cache_n |
| b6243 | ggml-org#15108 | Add multimodal support to completions and embeddings endpoints |
| b6205 | ggml-org#15416 | Disable context shift by default |
| b5441 | ggml-org#13660 | Remove /metrics fields related to KV cache tokens and cells` |
| b5441 | ggml-org#13660 | Remove /metrics fields related to KV cache tokens and cells` |
| b5223 | ggml-org#13174 | For chat competion, if last message is assistant, it will be a prefilled message |
| b4599 | ggml-org#9639 | /v1/chat/completions now supports tools & tool_choice |
| TBD. | ggml-org#10974 | /v1/completions is now OAI-compat |
| TBD. | ggml-org#10783 | logprobs is now OAI-compat, default to pre-sampling probs |
| TBD. | ggml-org#10861 | /embeddings supports pooling type none |
| TBD. | ggml-org#10853 | Add optional "tokens" output to /completions endpoint |
| b4337 | ggml-org#10803 | Remove penalize_nl |
| b4265 | ggml-org#10626 | CPU docker images working directory changed to /app |
| b4285 | ggml-org#10691 | (Again) Change /slots and /props responses |
| b4283 | ggml-org#10704 | Change /slots and /props responses |
| b4027 | ggml-org#10162 | /slots endpoint: remove slot[i].state, add slot[i].is_processing |
| b3912 | ggml-org#9865 | Add option to time limit the generation phase |
| b3911 | ggml-org#9860 | Remove self-extend support |
| b3910 | ggml-org#9857 | Remove legacy system prompt support |
| b3897 | ggml-org#9776 | Change default security settings, /slots is now disabled by defaultEndpoints now check for API key if it's set |
| b3887 | ggml-org#9510 | Add /rerank endpoint |
| b3754 | ggml-org#9459 | Add [DONE]\n\n in OAI stream response to match spec |
| b3721 | ggml-org#9398 | Add seed_cur to completion response |
| b3683 | ggml-org#9308 | Environment variable updated |
| b3599 | ggml-org#9056 | Change /health and /slots |
For older changes, use:
git log --oneline -p b3599 -- examples/server/README.mdUpcoming API changes
- TBD
Metadata
Metadata
Assignees
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationroadmap