Skip to content

Commit 4fe884b

Browse files
authored
Partial commit (#1822)
1 parent eff7ac4 commit 4fe884b

File tree

1 file changed

+30
-0
lines changed

1 file changed

+30
-0
lines changed

docs/inference-providers/register-as-a-provider.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -296,6 +296,36 @@ Here is an example of response:
296296
}
297297
```
298298

299+
### Automatic validation
300+
301+
Once a mapping is created through the API, Hugging Face performs periodic automated tests to ensure the mapped endpoint functions correctly.
302+
303+
Each model is tested every 6 hours by making API calls to your service. If the test is successful, the model remains active and continues to be tested periodically. However, if the test fails (e.g., your service returns an HTTP error status during an inference request), the provider will be temporarily removed from the list of active providers.
304+
305+
<div class="flex justify-center">
306+
<picture>
307+
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers/automatic-validation-light.png">
308+
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers/automatic-validation-dark.png">
309+
</picture>
310+
</div>
311+
312+
A failed mapping undergoes retesting every hour. Additionally, updating the status of a model mapping triggers an immediate validation test.
313+
314+
The validation process checks the following:
315+
316+
- The Inference API is reachable, and the HTTP call succeeds.
317+
- The output format is compatible with the Hugging Face JavaScript Inference Client.
318+
- Latency requirements are met:
319+
- For conversational and text models: under 5 seconds (time to first token in streaming mode).
320+
- For other tasks: under 30 seconds.
321+
322+
For large language models (LLMs), additional behavioral tests are conducted:
323+
324+
- Tool calling support.
325+
- Structured output support.
326+
327+
These tests involve sending specific inference requests to the model and verifying that the responses meet the expected format.
328+
299329
## 4. Billing
300330

301331
For routed requests (see figure below), i.e. when users authenticate via HF, our intent is that

0 commit comments

Comments
 (0)