Skip to content

Commit fb669fb

Browse files
authored
Undocument x-wait-for-model and x-use-cache headers (#1673)
* Undocument x-wait-for-mode and x-use-cache headers * feedback
1 parent 924a726 commit fb669fb

37 files changed

+127
-223
lines changed

docs/inference-providers/tasks/audio-classification.md

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,11 @@ No snippet available for this task.
4646

4747
#### Request
4848

49+
| Headers | | |
50+
| :--- | :--- | :--- |
51+
| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with "Inference Providers" permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens/new?ownUserPermissions=inference.serverless.write&tokenType=fineGrained). |
52+
53+
4954
| Payload | | |
5055
| :--- | :--- | :--- |
5156
| **inputs*** | _string_ | The input audio data as a base64-encoded string. If no `parameters` are provided, you can also provide the audio data as a raw bytes payload. |
@@ -54,16 +59,6 @@ No snippet available for this task.
5459
| **        top_k** | _integer_ | When specified, limits the output to the top K most probable classes. |
5560

5661

57-
Some options can be configured by passing headers to the Inference API. Here are the available headers:
58-
59-
| Headers | | |
60-
| :--- | :--- | :--- |
61-
| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
62-
| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
63-
| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
64-
65-
For more information about Inference API headers, check out the parameters [guide](../parameters).
66-
6762
#### Response
6863

6964
| Body | |

docs/inference-providers/tasks/automatic-speech-recognition.md

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,11 @@ Explore all available models and find the one that suits you best [here](https:/
4848

4949
#### Request
5050

51+
| Headers | | |
52+
| :--- | :--- | :--- |
53+
| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with "Inference Providers" permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens/new?ownUserPermissions=inference.serverless.write&tokenType=fineGrained). |
54+
55+
5156
| Payload | | |
5257
| :--- | :--- | :--- |
5358
| **inputs*** | _string_ | The input audio data as a base64-encoded string. If no `parameters` are provided, you can also provide the audio data as a raw bytes payload. |
@@ -72,16 +77,6 @@ Explore all available models and find the one that suits you best [here](https:/
7277
| **                use_cache** | _boolean_ | Whether the model should use the past last key/values attentions to speed up decoding |
7378

7479

75-
Some options can be configured by passing headers to the Inference API. Here are the available headers:
76-
77-
| Headers | | |
78-
| :--- | :--- | :--- |
79-
| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
80-
| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
81-
| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
82-
83-
For more information about Inference API headers, check out the parameters [guide](../parameters).
84-
8580
#### Response
8681

8782
| Body | |

docs/inference-providers/tasks/chat-completion.md

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,11 @@ conversational />
7979

8080
#### Request
8181

82+
| Headers | | |
83+
| :--- | :--- | :--- |
84+
| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with "Inference Providers" permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens/new?ownUserPermissions=inference.serverless.write&tokenType=fineGrained). |
85+
86+
8287
| Payload | | |
8388
| :--- | :--- | :--- |
8489
| **frequency_penalty** | _number_ | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. |
@@ -140,16 +145,6 @@ conversational />
140145
| **top_p** | _number_ | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. |
141146

142147

143-
Some options can be configured by passing headers to the Inference API. Here are the available headers:
144-
145-
| Headers | | |
146-
| :--- | :--- | :--- |
147-
| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
148-
| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
149-
| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
150-
151-
For more information about Inference API headers, check out the parameters [guide](../parameters).
152-
153148
#### Response
154149

155150
Output type depends on the `stream` input parameter.

docs/inference-providers/tasks/feature-extraction.md

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,11 @@ Explore all available models and find the one that suits you best [here](https:/
4747

4848
#### Request
4949

50+
| Headers | | |
51+
| :--- | :--- | :--- |
52+
| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with "Inference Providers" permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens/new?ownUserPermissions=inference.serverless.write&tokenType=fineGrained). |
53+
54+
5055
| Payload | | |
5156
| :--- | :--- | :--- |
5257
| **inputs*** | _unknown_ | One of the following: |
@@ -58,16 +63,6 @@ Explore all available models and find the one that suits you best [here](https:/
5863
| **truncation_direction** | _enum_ | Possible values: Left, Right. |
5964

6065

61-
Some options can be configured by passing headers to the Inference API. Here are the available headers:
62-
63-
| Headers | | |
64-
| :--- | :--- | :--- |
65-
| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
66-
| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
67-
| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
68-
69-
For more information about Inference API headers, check out the parameters [guide](../parameters).
70-
7166
#### Response
7267

7368
| Body | |

docs/inference-providers/tasks/fill-mask.md

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,11 @@ No snippet available for this task.
3939

4040
#### Request
4141

42+
| Headers | | |
43+
| :--- | :--- | :--- |
44+
| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with "Inference Providers" permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens/new?ownUserPermissions=inference.serverless.write&tokenType=fineGrained). |
45+
46+
4247
| Payload | | |
4348
| :--- | :--- | :--- |
4449
| **inputs*** | _string_ | The text with masked tokens |
@@ -47,16 +52,6 @@ No snippet available for this task.
4752
| **        targets** | _string[]_ | When passed, the model will limit the scores to the passed targets instead of looking up in the whole vocabulary. If the provided targets are not in the model vocab, they will be tokenized and the first resulting token will be used (with a warning, and that might be slower). |
4853

4954

50-
Some options can be configured by passing headers to the Inference API. Here are the available headers:
51-
52-
| Headers | | |
53-
| :--- | :--- | :--- |
54-
| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
55-
| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
56-
| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
57-
58-
For more information about Inference API headers, check out the parameters [guide](../parameters).
59-
6055
#### Response
6156

6257
| Body | |

docs/inference-providers/tasks/image-classification.md

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,11 @@ Explore all available models and find the one that suits you best [here](https:/
4444

4545
#### Request
4646

47+
| Headers | | |
48+
| :--- | :--- | :--- |
49+
| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with "Inference Providers" permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens/new?ownUserPermissions=inference.serverless.write&tokenType=fineGrained). |
50+
51+
4752
| Payload | | |
4853
| :--- | :--- | :--- |
4954
| **inputs*** | _string_ | The input image data as a base64-encoded string. If no `parameters` are provided, you can also provide the image data as a raw bytes payload. |
@@ -52,16 +57,6 @@ Explore all available models and find the one that suits you best [here](https:/
5257
| **        top_k** | _integer_ | When specified, limits the output to the top K most probable classes. |
5358

5459

55-
Some options can be configured by passing headers to the Inference API. Here are the available headers:
56-
57-
| Headers | | |
58-
| :--- | :--- | :--- |
59-
| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
60-
| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
61-
| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
62-
63-
For more information about Inference API headers, check out the parameters [guide](../parameters).
64-
6560
#### Response
6661

6762
| Body | |

docs/inference-providers/tasks/image-segmentation.md

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,11 @@ Explore all available models and find the one that suits you best [here](https:/
4343

4444
#### Request
4545

46+
| Headers | | |
47+
| :--- | :--- | :--- |
48+
| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with "Inference Providers" permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens/new?ownUserPermissions=inference.serverless.write&tokenType=fineGrained). |
49+
50+
4651
| Payload | | |
4752
| :--- | :--- | :--- |
4853
| **inputs*** | _string_ | The input image data as a base64-encoded string. If no `parameters` are provided, you can also provide the image data as a raw bytes payload. |
@@ -53,16 +58,6 @@ Explore all available models and find the one that suits you best [here](https:/
5358
| **        threshold** | _number_ | Probability threshold to filter out predicted masks. |
5459

5560

56-
Some options can be configured by passing headers to the Inference API. Here are the available headers:
57-
58-
| Headers | | |
59-
| :--- | :--- | :--- |
60-
| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
61-
| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
62-
| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
63-
64-
For more information about Inference API headers, check out the parameters [guide](../parameters).
65-
6661
#### Response
6762

6863
| Body | |

docs/inference-providers/tasks/image-to-image.md

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,11 @@ Explore all available models and find the one that suits you best [here](https:/
4646

4747
#### Request
4848

49+
| Headers | | |
50+
| :--- | :--- | :--- |
51+
| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with "Inference Providers" permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens/new?ownUserPermissions=inference.serverless.write&tokenType=fineGrained). |
52+
53+
4954
| Payload | | |
5055
| :--- | :--- | :--- |
5156
| **inputs*** | _string_ | The input image data as a base64-encoded string. If no `parameters` are provided, you can also provide the image data as a raw bytes payload. |
@@ -59,16 +64,6 @@ Explore all available models and find the one that suits you best [here](https:/
5964
| **                height*** | _integer_ | |
6065

6166

62-
Some options can be configured by passing headers to the Inference API. Here are the available headers:
63-
64-
| Headers | | |
65-
| :--- | :--- | :--- |
66-
| **authorization** | _string_ | Authentication header in the form `'Bearer: hf_****'` when `hf_****` is a personal user access token with Inference API permission. You can generate one from [your settings page](https://huggingface.co/settings/tokens). |
67-
| **x-use-cache** | _boolean, default to `true`_ | There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching [here](../parameters#caching]). |
68-
| **x-wait-for-model** | _boolean, default to `false`_ | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability [here](../overview#eligibility]). |
69-
70-
For more information about Inference API headers, check out the parameters [guide](../parameters).
71-
7267
#### Response
7368

7469
| Body | |

0 commit comments

Comments
 (0)