|
| 1 | +# LLaMAfiler Completions Endpoint |
| 2 | + |
| 3 | +The `/v1/completions` endpoint generates text completions based on a |
| 4 | +given prompt. It provides a flexible interface for text generation, |
| 5 | +allowing customization of parameters such as temperature, top-p |
| 6 | +sampling, and maximum tokens. |
| 7 | + |
| 8 | +This endpoint supports the following features: |
| 9 | + |
| 10 | +1. Deterministic outputs using a fixed seed |
| 11 | +2. Streaming responses for real-time token generation |
| 12 | +3. Configurable stopping criteria for token generation |
| 13 | + |
| 14 | +## Request URIs |
| 15 | + |
| 16 | +- `/v1/completions` (OpenAI API compatible) |
| 17 | + |
| 18 | +## Request Methods |
| 19 | + |
| 20 | +- `POST` |
| 21 | + |
| 22 | +## Request Content Types |
| 23 | + |
| 24 | +- `application/json` must be used. |
| 25 | + |
| 26 | +## Request Parameters |
| 27 | + |
| 28 | +- `model`: `string` |
| 29 | + |
| 30 | + Specifies name of model to run. |
| 31 | + |
| 32 | + Only a single model is currently supported, so this field is simply |
| 33 | + copied along to the response. In the future, this will matter. |
| 34 | + |
| 35 | + This field is required in the request. |
| 36 | + |
| 37 | +- `prompt`: `string` |
| 38 | + |
| 39 | + The input text that the model will generate a completion for. |
| 40 | + |
| 41 | + This field is required. |
| 42 | + |
| 43 | +- `stream`: `boolean|null` |
| 44 | + |
| 45 | + If this field is optionally set to true, then this endpoint will |
| 46 | + return a text/event-stream using HTTP chunked transfer encoding. This |
| 47 | + allows your chatbot to rapidly show text as it's being genearted. The |
| 48 | + standard JSON response is slightly modified so that its message field |
| 49 | + will be named delta instead. It's assumed the client will reconstruct |
| 50 | + the full conversation. |
| 51 | + |
| 52 | +- `stream_options`: `object|null` |
| 53 | + |
| 54 | + Options for streaming the API response. This parameter is only |
| 55 | + applicable when `stream: true` is also specified. Default is `null`. |
| 56 | + |
| 57 | + - `include_usage`: `boolean|null` |
| 58 | + |
| 59 | + Whether to include usage statistics in the streaming response. Default is `false`. |
| 60 | + |
| 61 | + If set to `true`, a `usage` field with the usage information will be |
| 62 | + included in an additional empty chunk. Note that all other chunks will |
| 63 | + also contain this field, but with `null` value. |
| 64 | + |
| 65 | +- `max_tokens`: `integer|null` |
| 66 | + |
| 67 | + Specifies an upper bound for the number of tokens that can be |
| 68 | + generated for this completion. This can be used to control compute |
| 69 | + and/or latency costs. |
| 70 | + |
| 71 | +- `top_p`: `number|null` |
| 72 | + |
| 73 | + May optionally be used to set the `top_p` sampling parameter. This |
| 74 | + should be a floating point number. Setting this to 1.0 (the default) |
| 75 | + will disable this feature. Setting this to, for example, 0.1, would |
| 76 | + mean that only the top 10% probability tokens are considered. |
| 77 | + |
| 78 | + We generally recommend altering this or temperature but not both. |
| 79 | + |
| 80 | +- `temperature`: `number|null` |
| 81 | + |
| 82 | + Configures the randomness level of generated text. |
| 83 | + |
| 84 | + This field may be set to a value between 0.0 and 2.0 inclusive. It |
| 85 | + defaults to 1.0. Lower numbers are more deterministic. Higher numbers |
| 86 | + mean more randomness. |
| 87 | + |
| 88 | + We generally recommend altering this or top_p but not both. |
| 89 | + |
| 90 | +- `seed`: `integer|null` |
| 91 | + |
| 92 | + If specified, llamafiler will make its best effort to sample |
| 93 | + deterministically, even when temperature is non-zero. This means that |
| 94 | + repeated requests with the same seed and parameters should return the |
| 95 | + same result. |
| 96 | + |
| 97 | +- `presence_penalty`: `number|null` |
| 98 | + |
| 99 | + Number between -2.0 and 2.0. Positive values penalize new tokens based |
| 100 | + on whether they appear in the text so far, increasing the model's |
| 101 | + likelihood to talk about new topics. |
| 102 | + |
| 103 | +- `frequency_penalty`: `number|null` |
| 104 | + |
| 105 | + Number between -2.0 and 2.0. Positive values penalize new tokens based |
| 106 | + on their existing frequency in the text so far, decreasing the model's |
| 107 | + likelihood to repeat the same line verbatim. |
| 108 | + |
| 109 | +- `user`: `string|null` |
| 110 | + |
| 111 | + A unique identifier representing your end-user, which can help |
| 112 | + llamafiler to monitor and detect abuse. |
| 113 | + |
| 114 | +- `stop`: `string|array<string>|null` |
| 115 | + |
| 116 | + Specifies up to 4 stop sequences where the API will cease text generation. |
| 117 | + |
| 118 | +## See Also |
| 119 | + |
| 120 | +- [LLaMAfiler Documentation Index](index.md) |
| 121 | +- [LLaMAfiler Endpoints Reference](endpoints.md) |
| 122 | +- [LLaMAfiler Technical Details](technical_details.md) |
0 commit comments