Skip to content

Commit 01cc848

Browse files
committed
Add missing documentation for OpenAI compatible endpoints
1 parent 82b9831 commit 01cc848

File tree

2 files changed

+129
-3
lines changed

2 files changed

+129
-3
lines changed

llamafile/server/doc/endpoints.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
# LLaMAfiler Endpoints Reference
22

3-
- [`/tokenize`](tokenize.md)
4-
- [`/embedding`](embedding.md)
5-
- [`/v1/chat/completions`](v1_chat_completions.md)
3+
- [`/v1/tokenize`](tokenize.md) endpoint provides a robust interface for
4+
converting text prompts into tokens.
5+
- [`/v1/embedding`](embedding.md) endpoint provides a way to
6+
transform textual prompts into numerical representations.
7+
- [`/v1/chat/completions`](v1_chat_completions.md) endpoint lets you build a chatbot.
8+
- [`/v1/completions`](v1_completions.md) returns a predicted completion for a given prompt.
9+
- `/v1/models` returns a basic model info which is usually used by OpenAI clients for discovery and health check.
Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
# LLaMAfiler Completions Endpoint
2+
3+
The `/v1/completions` endpoint generates text completions based on a
4+
given prompt. It provides a flexible interface for text generation,
5+
allowing customization of parameters such as temperature, top-p
6+
sampling, and maximum tokens.
7+
8+
This endpoint supports the following features:
9+
10+
1. Deterministic outputs using a fixed seed
11+
2. Streaming responses for real-time token generation
12+
3. Configurable stopping criteria for token generation
13+
14+
## Request URIs
15+
16+
- `/v1/completions` (OpenAI API compatible)
17+
18+
## Request Methods
19+
20+
- `POST`
21+
22+
## Request Content Types
23+
24+
- `application/json` must be used.
25+
26+
## Request Parameters
27+
28+
- `model`: `string`
29+
30+
Specifies name of model to run.
31+
32+
Only a single model is currently supported, so this field is simply
33+
copied along to the response. In the future, this will matter.
34+
35+
This field is required in the request.
36+
37+
- `prompt`: `string`
38+
39+
The input text that the model will generate a completion for.
40+
41+
This field is required.
42+
43+
- `stream`: `boolean|null`
44+
45+
If this field is optionally set to true, then this endpoint will
46+
return a text/event-stream using HTTP chunked transfer encoding. This
47+
allows your chatbot to rapidly show text as it's being genearted. The
48+
standard JSON response is slightly modified so that its message field
49+
will be named delta instead. It's assumed the client will reconstruct
50+
the full conversation.
51+
52+
- `stream_options`: `object|null`
53+
54+
Options for streaming the API response. This parameter is only
55+
applicable when `stream: true` is also specified. Default is `null`.
56+
57+
- `include_usage`: `boolean|null`
58+
59+
Whether to include usage statistics in the streaming response. Default is `false`.
60+
61+
If set to `true`, a `usage` field with the usage information will be
62+
included in an additional empty chunk. Note that all other chunks will
63+
also contain this field, but with `null` value.
64+
65+
- `max_tokens`: `integer|null`
66+
67+
Specifies an upper bound for the number of tokens that can be
68+
generated for this completion. This can be used to control compute
69+
and/or latency costs.
70+
71+
- `top_p`: `number|null`
72+
73+
May optionally be used to set the `top_p` sampling parameter. This
74+
should be a floating point number. Setting this to 1.0 (the default)
75+
will disable this feature. Setting this to, for example, 0.1, would
76+
mean that only the top 10% probability tokens are considered.
77+
78+
We generally recommend altering this or temperature but not both.
79+
80+
- `temperature`: `number|null`
81+
82+
Configures the randomness level of generated text.
83+
84+
This field may be set to a value between 0.0 and 2.0 inclusive. It
85+
defaults to 1.0. Lower numbers are more deterministic. Higher numbers
86+
mean more randomness.
87+
88+
We generally recommend altering this or top_p but not both.
89+
90+
- `seed`: `integer|null`
91+
92+
If specified, llamafiler will make its best effort to sample
93+
deterministically, even when temperature is non-zero. This means that
94+
repeated requests with the same seed and parameters should return the
95+
same result.
96+
97+
- `presence_penalty`: `number|null`
98+
99+
Number between -2.0 and 2.0. Positive values penalize new tokens based
100+
on whether they appear in the text so far, increasing the model's
101+
likelihood to talk about new topics.
102+
103+
- `frequency_penalty`: `number|null`
104+
105+
Number between -2.0 and 2.0. Positive values penalize new tokens based
106+
on their existing frequency in the text so far, decreasing the model's
107+
likelihood to repeat the same line verbatim.
108+
109+
- `user`: `string|null`
110+
111+
A unique identifier representing your end-user, which can help
112+
llamafiler to monitor and detect abuse.
113+
114+
- `stop`: `string|array<string>|null`
115+
116+
Specifies up to 4 stop sequences where the API will cease text generation.
117+
118+
## See Also
119+
120+
- [LLaMAfiler Documentation Index](index.md)
121+
- [LLaMAfiler Endpoints Reference](endpoints.md)
122+
- [LLaMAfiler Technical Details](technical_details.md)

0 commit comments

Comments
 (0)