Skip to content

changelog : llama-server REST API #245

@jakexcosme

Description

@jakexcosme

Note: This issue was copied from ggml-org#9291

Original Author: @ggerganov
Original Issue Number: ggml-org#9291
Created: 2024-09-03T06:56:11Z


Overview

This is a list of changes to the public HTTP interface of the llama-server example. Collaborators are encouraged to edit this post in order to reflect important changes to the API that end up merged into the master branch.

If you are building a 3rd party project that relies on llama-server, it is recommended to follow this issue and check it carefully before upgrading to new versions.

See also:

Recent API changes (most recent at the top)

version PR desc
b6523 ggml-org#16109 In stream mode, error events are now OAI-compatible
b6508 ggml-org#16052 include usage statistics only when stream_options.include_usage is specified
b6399 ggml-org#15827 added return_progress and timings.cache_n
b6243 ggml-org#15108 Add multimodal support to completions and embeddings endpoints
b6205 ggml-org#15416 Disable context shift by default
b5441 ggml-org#13660 Remove /metrics fields related to KV cache tokens and cells`
b5441 ggml-org#13660 Remove /metrics fields related to KV cache tokens and cells`
b5223 ggml-org#13174 For chat competion, if last message is assistant, it will be a prefilled message
b4599 ggml-org#9639 /v1/chat/completions now supports tools & tool_choice
TBD. ggml-org#10974 /v1/completions is now OAI-compat
TBD. ggml-org#10783 logprobs is now OAI-compat, default to pre-sampling probs
TBD. ggml-org#10861 /embeddings supports pooling type none
TBD. ggml-org#10853 Add optional "tokens" output to /completions endpoint
b4337 ggml-org#10803 Remove penalize_nl
b4265 ggml-org#10626 CPU docker images working directory changed to /app
b4285 ggml-org#10691 (Again) Change /slots and /props responses
b4283 ggml-org#10704 Change /slots and /props responses
b4027 ggml-org#10162 /slots endpoint: remove slot[i].state, add slot[i].is_processing
b3912 ggml-org#9865 Add option to time limit the generation phase
b3911 ggml-org#9860 Remove self-extend support
b3910 ggml-org#9857 Remove legacy system prompt support
b3897 ggml-org#9776 Change default security settings, /slots is now disabled by default
Endpoints now check for API key if it's set
b3887 ggml-org#9510 Add /rerank endpoint
b3754 ggml-org#9459 Add [DONE]\n\n in OAI stream response to match spec
b3721 ggml-org#9398 Add seed_cur to completion response
b3683 ggml-org#9308 Environment variable updated
b3599 ggml-org#9056 Change /health and /slots

For older changes, use:

git log --oneline -p b3599 -- examples/server/README.md

Upcoming API changes

  • TBD

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationroadmap

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions