You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
string_format("restrict to only support embedding use case; use only with dedicated embedding models (default: %s)", params.embedding ? "enabled" : "disabled"),
Copy file name to clipboardExpand all lines: examples/server/README.md
+10-9Lines changed: 10 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -146,6 +146,7 @@ The project is under active development, and we are [looking for feedback and co
146
146
|`--host HOST`| ip address to listen (default: 127.0.0.1)<br/>(env: LLAMA_ARG_HOST) |
147
147
|`--port PORT`| port to listen (default: 8080)<br/>(env: LLAMA_ARG_PORT) |
148
148
|`--path PATH`| path to serve static files from (default: )<br/>(env: LLAMA_ARG_STATIC_PATH) |
149
+
|`--no-webui`| disable the Web UI<br/>(env: LLAMA_ARG_NO_WEBUI) |
149
150
|`--embedding, --embeddings`| restrict to only support embedding use case; use only with dedicated embedding models (default: disabled)<br/>(env: LLAMA_ARG_EMBEDDINGS) |
150
151
|`--reranking, --rerank`| enable reranking endpoint on server (default: disabled)<br/>(env: LLAMA_ARG_RERANKING) |
151
152
|`--api-key KEY`| API key to use for authentication (default: none)<br/>(env: LLAMA_API_KEY) |
@@ -302,23 +303,23 @@ mkdir llama-client
302
303
cd llama-client
303
304
```
304
305
305
-
Create a index.js file and put this inside:
306
+
Create an index.js file and put this inside:
306
307
307
308
```javascript
308
-
const prompt = `Building a website can be done in 10 simple steps:`;
309
+
const prompt = "Building a website can be done in 10 simple steps:"
309
310
310
-
async function Test() {
311
+
async function test() {
311
312
let response = await fetch("http://127.0.0.1:8080/completion", {
312
-
method: 'POST',
313
+
method: "POST",
313
314
body: JSON.stringify({
314
315
prompt,
315
-
n_predict: 512,
316
+
n_predict: 64,
316
317
})
317
318
})
318
319
console.log((await response.json()).content)
319
320
}
320
321
321
-
Test()
322
+
test()
322
323
```
323
324
324
325
And run it:
@@ -380,7 +381,7 @@ Multiple prompts are also supported. In this case, the completion result will be
380
381
`n_keep`: Specify the number of tokens from the prompt to retain when the context size is exceeded and tokens need to be discarded. The number excludes the BOS token.
381
382
By default, this value is set to `0`, meaning no tokens are kept. Use `-1` to retain all tokens from the prompt.
382
383
383
-
`stream`: It allows receiving each predicted token in real-time instead of waiting for the completion to finish. To enable this, set to `true`.
384
+
`stream`: Allows receiving each predicted token in real-time instead of waiting for the completion to finish (uses a different response format). To enable this, set to `true`.
384
385
385
386
`stop`: Specify a JSON array of stopping strings.
386
387
These words will not be included in the completion, so make sure to add them to the prompt for the next iteration. Default: `[]`
@@ -441,11 +442,11 @@ These words will not be included in the completion, so make sure to add them to
441
442
442
443
`samplers`: The order the samplers should be applied in. An array of strings representing sampler type names. If a sampler is not set, it will not be used. If a sampler is specified more than once, it will be applied multiple times. Default: `["dry", "top_k", "typ_p", "top_p", "min_p", "xtc", "temperature"]` - these are all the available values.
443
444
444
-
`timings_per_token`: Include prompt processing and text generation speed information in each response. Default: `false`
445
+
`timings_per_token`: Include prompt processing and text generation speed information in each response. Default: `false`
445
446
446
447
**Response format**
447
448
448
-
- Note: When using streaming mode (`stream`), only `content` and `stop` will be returned until end of completion.
449
+
- Note: In streaming mode (`stream`), only `content` and `stop` will be returned until end of completion. Responses are sent using the [Server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html) standard. Note: the browser's `EventSource` interface cannot be used due to its lack of `POST` request support.
449
450
450
451
- `completion_probabilities`: An array of token probabilities for each completion. The array's length is `n_predict`. Each item in the array has the following structure:
0 commit comments