You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/server/README.md
+55-55Lines changed: 55 additions & 55 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -318,100 +318,100 @@ node index.js
318
318
319
319
### POST `/completion`: Given a `prompt`, it returns the predicted completion.
320
320
321
-
*Options:*
321
+
*Options:*
322
322
323
-
`prompt`: Provide the prompt for this completion as a string or as an array of strings or numbers representing tokens. Internally, if `cache_prompt` is `true`, the prompt is compared to the previous completion and only the "unseen" suffix is evaluated. A `BOS` token is inserted at the start, if all of the following conditions are true:
323
+
`prompt`: Provide the prompt for this completion as a string or as an array of strings or numbers representing tokens. Internally, if `cache_prompt` is `true`, the prompt is compared to the previous completion and only the "unseen" suffix is evaluated. A `BOS` token is inserted at the start, if all of the following conditions are true:
324
324
325
-
- The prompt is a string or an array with the first element given as a string
326
-
- The model's `tokenizer.ggml.add_bos_token` metadata is `true`
325
+
- The prompt is a string or an array with the first element given as a string
326
+
- The model's `tokenizer.ggml.add_bos_token` metadata is `true`
327
327
328
-
These input shapes and data type are allowed for `prompt`:
328
+
These input shapes and data type are allowed for `prompt`:
329
329
330
-
- Single string: `"string"`
331
-
- Single sequence of tokens: `[12, 34, 56]`
332
-
- Mixed tokens and strings: `[12, 34, "string", 56, 78]`
330
+
- Single string: `"string"`
331
+
- Single sequence of tokens: `[12, 34, 56]`
332
+
- Mixed tokens and strings: `[12, 34, "string", 56, 78]`
333
333
334
-
Multiple prompts are also supported. In this case, the completion result will be an array.
334
+
Multiple prompts are also supported. In this case, the completion result will be an array.
335
335
336
-
- Only strings: `["string1", "string2"]`
337
-
- Strings and sequences of tokens: `["string1", [12, 34, 56]]`
`temperature`: Adjust the randomness of the generated text. Default: `0.8`
340
+
`temperature`: Adjust the randomness of the generated text. Default: `0.8`
341
341
342
-
`dynatemp_range`: Dynamic temperature range. The final temperature will be in the range of `[temperature - dynatemp_range; temperature + dynatemp_range]` Default: `0.0`, which is disabled.
342
+
`dynatemp_range`: Dynamic temperature range. The final temperature will be in the range of `[temperature - dynatemp_range; temperature + dynatemp_range]` Default: `0.0`, which is disabled.
343
343
344
-
`dynatemp_exponent`: Dynamic temperature exponent. Default: `1.0`
344
+
`dynatemp_exponent`: Dynamic temperature exponent. Default: `1.0`
345
345
346
-
`top_k`: Limit the next token selection to the K most probable tokens. Default: `40`
346
+
`top_k`: Limit the next token selection to the K most probable tokens. Default: `40`
347
347
348
-
`top_p`: Limit the next token selection to a subset of tokens with a cumulative probability above a threshold P. Default: `0.95`
348
+
`top_p`: Limit the next token selection to a subset of tokens with a cumulative probability above a threshold P. Default: `0.95`
349
349
350
-
`min_p`: The minimum probability for a token to be considered, relative to the probability of the most likely token. Default: `0.05`
350
+
`min_p`: The minimum probability for a token to be considered, relative to the probability of the most likely token. Default: `0.05`
351
351
352
-
`n_predict`: Set the maximum number of tokens to predict when generating text. **Note:** May exceed the set limit slightly if the last token is a partial multibyte character. When 0, no tokens will be generated but the prompt is evaluated into the cache. Default: `-1`, where `-1` is infinity.
352
+
`n_predict`: Set the maximum number of tokens to predict when generating text. **Note:** May exceed the set limit slightly if the last token is a partial multibyte character. When 0, no tokens will be generated but the prompt is evaluated into the cache. Default: `-1`, where `-1` is infinity.
353
353
354
-
`n_indent`: Specify the minimum line indentation for the generated text in number of whitespace characters. Useful for code completion tasks. Default: `0`
354
+
`n_indent`: Specify the minimum line indentation for the generated text in number of whitespace characters. Useful for code completion tasks. Default: `0`
355
355
356
-
`n_keep`: Specify the number of tokens from the prompt to retain when the context size is exceeded and tokens need to be discarded. The number excludes the BOS token.
357
-
By default, this value is set to `0`, meaning no tokens are kept. Use `-1` to retain all tokens from the prompt.
356
+
`n_keep`: Specify the number of tokens from the prompt to retain when the context size is exceeded and tokens need to be discarded. The number excludes the BOS token.
357
+
By default, this value is set to `0`, meaning no tokens are kept. Use `-1` to retain all tokens from the prompt.
358
358
359
-
`stream`: It allows receiving each predicted token in real-time instead of waiting for the completion to finish. To enable this, set to `true`.
359
+
`stream`: It allows receiving each predicted token in real-time instead of waiting for the completion to finish. To enable this, set to `true`.
360
360
361
-
`stop`: Specify a JSON array of stopping strings.
362
-
These words will not be included in the completion, so make sure to add them to the prompt for the next iteration. Default: `[]`
361
+
`stop`: Specify a JSON array of stopping strings.
362
+
These words will not be included in the completion, so make sure to add them to the prompt for the next iteration. Default: `[]`
363
363
364
-
`typical_p`: Enable locally typical sampling with parameter p. Default: `1.0`, which is disabled.
364
+
`typical_p`: Enable locally typical sampling with parameter p. Default: `1.0`, which is disabled.
365
365
366
-
`repeat_penalty`: Control the repetition of token sequences in the generated text. Default: `1.1`
366
+
`repeat_penalty`: Control the repetition of token sequences in the generated text. Default: `1.1`
367
367
368
-
`repeat_last_n`: Last n tokens to consider for penalizing repetition. Default: `64`, where `0` is disabled and `-1` is ctx-size.
368
+
`repeat_last_n`: Last n tokens to consider for penalizing repetition. Default: `64`, where `0` is disabled and `-1` is ctx-size.
369
369
370
-
`penalize_nl`: Penalize newline tokens when applying the repeat penalty. Default: `true`
370
+
`penalize_nl`: Penalize newline tokens when applying the repeat penalty. Default: `true`
371
371
372
-
`presence_penalty`: Repeat alpha presence penalty. Default: `0.0`, which is disabled.
372
+
`presence_penalty`: Repeat alpha presence penalty. Default: `0.0`, which is disabled.
373
373
374
-
`frequency_penalty`: Repeat alpha frequency penalty. Default: `0.0`, which is disabled.
374
+
`frequency_penalty`: Repeat alpha frequency penalty. Default: `0.0`, which is disabled.
375
375
376
-
`dry_multiplier`: Set the DRY (Don't Repeat Yourself) repetition penalty multiplier. Default: `0.0`, which is disabled.
376
+
`dry_multiplier`: Set the DRY (Don't Repeat Yourself) repetition penalty multiplier. Default: `0.0`, which is disabled.
377
377
378
-
`dry_base`: Set the DRY repetition penalty base value. Default: `1.75`
378
+
`dry_base`: Set the DRY repetition penalty base value. Default: `1.75`
379
379
380
-
`dry_allowed_length`: Tokens that extend repetition beyond this receive exponentially increasing penalty: multiplier * base ^ (length of repeating sequence before token - allowed length). Default: `2`
380
+
`dry_allowed_length`: Tokens that extend repetition beyond this receive exponentially increasing penalty: multiplier * base ^ (length of repeating sequence before token - allowed length). Default: `2`
381
381
382
-
`dry_penalty_last_n`: How many tokens to scan for repetitions. Default: `-1`, where `0` is disabled and `-1` is context size.
382
+
`dry_penalty_last_n`: How many tokens to scan for repetitions. Default: `-1`, where `0` is disabled and `-1` is context size.
383
383
384
-
`dry_sequence_breakers`: Specify an array of sequence breakers for DRY sampling. Only a JSON array of strings is accepted. Default: `['\n', ':', '"', '*']`
384
+
`dry_sequence_breakers`: Specify an array of sequence breakers for DRY sampling. Only a JSON array of strings is accepted. Default: `['\n', ':', '"', '*']`
385
385
386
-
`mirostat`: Enable Mirostat sampling, controlling perplexity during text generation. Default: `0`, where `0` is disabled, `1` is Mirostat, and `2` is Mirostat 2.0.
386
+
`mirostat`: Enable Mirostat sampling, controlling perplexity during text generation. Default: `0`, where `0` is disabled, `1` is Mirostat, and `2` is Mirostat 2.0.
387
387
388
-
`mirostat_tau`: Set the Mirostat target entropy, parameter tau. Default: `5.0`
388
+
`mirostat_tau`: Set the Mirostat target entropy, parameter tau. Default: `5.0`
389
389
390
-
`mirostat_eta`: Set the Mirostat learning rate, parameter eta. Default: `0.1`
390
+
`mirostat_eta`: Set the Mirostat learning rate, parameter eta. Default: `0.1`
391
391
392
-
`grammar`: Set grammar for grammar-based sampling. Default: no grammar
392
+
`grammar`: Set grammar for grammar-based sampling. Default: no grammar
393
393
394
-
`json_schema`: Set a JSON schema for grammar-based sampling (e.g. `{"items": {"type": "string"}, "minItems": 10, "maxItems": 100}` of a list of strings, or `{}` for any JSON). See [tests](../../tests/test-json-schema-to-grammar.cpp) for supported features. Default: no JSON schema.
394
+
`json_schema`: Set a JSON schema for grammar-based sampling (e.g. `{"items": {"type": "string"}, "minItems": 10, "maxItems": 100}` of a list of strings, or `{}` for any JSON). See [tests](../../tests/test-json-schema-to-grammar.cpp) for supported features. Default: no JSON schema.
395
395
396
-
`seed`: Set the random number generator (RNG) seed. Default: `-1`, which is a random seed.
396
+
`seed`: Set the random number generator (RNG) seed. Default: `-1`, which is a random seed.
397
397
398
-
`ignore_eos`: Ignore end of stream token and continue generating. Default: `false`
398
+
`ignore_eos`: Ignore end of stream token and continue generating. Default: `false`
399
399
400
-
`logit_bias`: Modify the likelihood of a token appearing in the generated text completion. For example, use `"logit_bias": [[15043,1.0]]` to increase the likelihood of the token 'Hello', or `"logit_bias": [[15043,-1.0]]` to decrease its likelihood. Setting the value to false, `"logit_bias": [[15043,false]]` ensures that the token `Hello` is never produced. The tokens can also be represented as strings, e.g. `[["Hello, World!",-0.5]]` will reduce the likelihood of all the individual tokens that represent the string `Hello, World!`, just like the `presence_penalty` does. Default: `[]`
400
+
`logit_bias`: Modify the likelihood of a token appearing in the generated text completion. For example, use `"logit_bias": [[15043,1.0]]` to increase the likelihood of the token 'Hello', or `"logit_bias": [[15043,-1.0]]` to decrease its likelihood. Setting the value to false, `"logit_bias": [[15043,false]]` ensures that the token `Hello` is never produced. The tokens can also be represented as strings, e.g. `[["Hello, World!",-0.5]]` will reduce the likelihood of all the individual tokens that represent the string `Hello, World!`, just like the `presence_penalty` does. Default: `[]`
401
401
402
-
`n_probs`: If greater than 0, the response also contains the probabilities of top N tokens for each generated token given the sampling settings. Note that for temperature < 0 the tokens are sampled greedily but token probabilities are still being calculated via a simple softmax of the logits without considering any other sampler settings. Default: `0`
402
+
`n_probs`: If greater than 0, the response also contains the probabilities of top N tokens for each generated token given the sampling settings. Note that for temperature < 0 the tokens are sampled greedily but token probabilities are still being calculated via a simple softmax of the logits without considering any other sampler settings. Default: `0`
403
403
404
-
`min_keep`: If greater than 0, force samplers to return N possible tokens at minimum. Default: `0`
404
+
`min_keep`: If greater than 0, force samplers to return N possible tokens at minimum. Default: `0`
405
405
406
-
`t_max_predict_ms`: Set a time limit in milliseconds for the prediction (a.k.a. text-generation) phase. The timeout will trigger if the generation takes more than the specified time (measured since the first token was generated) and if a new-line character has already been generated. Useful for FIM applications. Default: `0`, which is disabled.
406
+
`t_max_predict_ms`: Set a time limit in milliseconds for the prediction (a.k.a. text-generation) phase. The timeout will trigger if the generation takes more than the specified time (measured since the first token was generated) and if a new-line character has already been generated. Useful for FIM applications. Default: `0`, which is disabled.
407
407
408
-
`image_data`: An array of objects to hold base64-encoded image `data` and its `id`s to be reference in `prompt`. You can determine the place of the image in the prompt as in the following: `USER:[img-12]Describe the image in detail.\nASSISTANT:`. In this case, `[img-12]` will be replaced by the embeddings of the image with id `12` in the following `image_data` array: `{..., "image_data": [{"data": "<BASE64_STRING>", "id": 12}]}`. Use `image_data` only with multimodal models, e.g., LLaVA.
408
+
`image_data`: An array of objects to hold base64-encoded image `data` and its `id`s to be reference in `prompt`. You can determine the place of the image in the prompt as in the following: `USER:[img-12]Describe the image in detail.\nASSISTANT:`. In this case, `[img-12]` will be replaced by the embeddings of the image with id `12` in the following `image_data` array: `{..., "image_data": [{"data": "<BASE64_STRING>", "id": 12}]}`. Use `image_data` only with multimodal models, e.g., LLaVA.
409
409
410
-
`id_slot`: Assign the completion task to an specific slot. If is -1 the task will be assigned to a Idle slot. Default: `-1`
410
+
`id_slot`: Assign the completion task to an specific slot. If is -1 the task will be assigned to a Idle slot. Default: `-1`
411
411
412
-
`cache_prompt`: Re-use KV cache from a previous request if possible. This way the common prefix does not have to be re-processed, only the suffix that differs between the requests. Because (depending on the backend) the logits are **not** guaranteed to be bit-for-bit identical for different batch sizes (prompt processing vs. token generation) enabling this option can cause nondeterministic results. Default: `false`
412
+
`cache_prompt`: Re-use KV cache from a previous request if possible. This way the common prefix does not have to be re-processed, only the suffix that differs between the requests. Because (depending on the backend) the logits are **not** guaranteed to be bit-for-bit identical for different batch sizes (prompt processing vs. token generation) enabling this option can cause nondeterministic results. Default: `false`
413
413
414
-
`samplers`: The order the samplers should be applied in. An array of strings representing sampler type names. If a sampler is not set, it will not be used. If a sampler is specified more than once, it will be applied multiple times. Default: `["top_k", "typical_p", "top_p", "min_p", "temperature"]` - these are all the available values.
414
+
`samplers`: The order the samplers should be applied in. An array of strings representing sampler type names. If a sampler is not set, it will not be used. If a sampler is specified more than once, it will be applied multiple times. Default: `["top_k", "typical_p", "top_p", "min_p", "temperature"]` - these are all the available values.
415
415
416
416
**Response format**
417
417
@@ -454,13 +454,13 @@ Notice that each `probs` is an array of length `n_probs`.
454
454
455
455
### POST `/tokenize`: Tokenize a given text
456
456
457
-
*Options:*
457
+
*Options:*
458
458
459
-
`content`: (Required) The text to tokenize.
459
+
`content`: (Required) The text to tokenize.
460
460
461
-
`add_special`: (Optional) Boolean indicating if special tokens, i.e. `BOS`, should be inserted. Default: `false`
461
+
`add_special`: (Optional) Boolean indicating if special tokens, i.e. `BOS`, should be inserted. Default: `false`
462
462
463
-
`with_pieces`: (Optional) Boolean indicating whether to return token pieces along with IDs. Default: `false`
463
+
`with_pieces`: (Optional) Boolean indicating whether to return token pieces along with IDs. Default: `false`
0 commit comments