diff --git a/docs/inference-providers/register-as-a-provider.md b/docs/inference-providers/register-as-a-provider.md index fb2e68f2f..a8eef2349 100644 --- a/docs/inference-providers/register-as-a-provider.md +++ b/docs/inference-providers/register-as-a-provider.md @@ -154,14 +154,46 @@ Create a new mapping item, with the following body (JSON-encoded): - `hfModel` is the model id on the Hub's side. - `providerModel` is the model id on your side (can be the same or different). -In the future, we will add support for a new parameter (ping us if it's important to you now): +The output of this route is a mapping ID that you can later use to update the mapping's status or delete it. + +### Using a tag-filter to map several HF models to a single inference endpoint + +We also support mapping HF models based on their `tags`. Using tag filters, you can automatically map multiple HF models to a single inference endpoint on your side. +For example, any model tagged with both `lora` and `base_model:adapter:black-forest-labs/FLUX.1-dev` can be mapped to your Flux-dev LoRA inference endpoint. + + + + +Important: Make sure that the JS client library can handle LoRA weights for your provider. Check out [fal's implementation](https://github.com/huggingface/huggingface.js/blob/904964c9f8cd10ed67114ccb88b9028e89fd6cad/packages/inference/src/providers/fal-ai.ts#L78-L124) for more details. + + + +The API is as follows: + +```http +POST /api/partners/{provider}/models +``` +Create a new mapping item, with the following body (JSON-encoded): + ```json { - "hfFilter": ["string"] - // ^Power user move: register a "tag" slice of HF in one go. - // Example: tag == "base_model:adapter:black-forest-labs/FLUX.1-dev" for all Flux-dev LoRAs + "type": "tag-filter", // required + "task": "WidgetType", // required + "tags": ["string"], // required: any HF model with all of those tags will be mapped to providerModel + "providerModel": "string", // required: the partner's "model id" i.e. id on your side + "adapterType": "lora", // required: only "lora" is supported at the moment + "status": "live" | "staging" // Optional: defaults to "staging". "staging" models are only available to members of the partner's org, then you switch them to "live" when they're ready to go live } ``` + +- `task`, also known as `pipeline_tag` in the HF ecosystem, is the type of model / type of API +(examples: "text-to-image", "text-generation", but you should use "conversational" for chat models) +- `tags` is the set of model tags to match. For example, to match all LoRAs of Flux, you can use: `["lora", "base_model:adapter:black-forest-labs/FLUX.1-dev"]` +- `providerModel` is the model ID on your side (can be the same or different from the HF model ID). +- `adapterType` is a literal value that helps client libraries interpret how to call your API. The only supported value at the moment is `"lora"`. + +The output of this route is a mapping ID that you can later use to update the mapping's status or delete it. + #### Authentication You need to be in the _provider_ Hub organization (e.g. https://huggingface.co/togethercomputer @@ -178,26 +210,31 @@ huggingface.js/inference call of the corresponding task i.e. the API specs are v ### Delete a mapping item ```http -DELETE /api/partners/{provider}/models?hfModel=namespace/model-name +DELETE /api/partners/{provider}/models/{mapping ID} ``` +Where `mapping ID` is the mapping's id obtained upon creation. +You can also retrieve it from the [list API endpoint](#list-the-whole-mapping). + ### Update a mapping item's status Call this HTTP PUT endpoint: ```http -PUT /api/partners/{provider}/models/status +PUT /api/partners/{provider}/models/{mapping ID}/status ``` With the following body (JSON-encoded): ```json { - "hfModel": "namespace/model-name", // The name of the model on HF "status": "live" | "staging" // The new status, one of "staging" or "live" } ``` +Where `mapping ID` is the mapping's id obtained upon creation. +You can also retrieve it from the [list API endpoint](#list-the-whole-mapping). + ### List the whole mapping ```http @@ -217,26 +254,41 @@ Here is an example of response: { "text-to-image": { "black-forest-labs/FLUX.1-Canny-dev": { + "_id": "xxxxxxxxxxxxxxxxxxxxxxxx", "providerId": "black-forest-labs/FLUX.1-canny", "status": "live" }, "black-forest-labs/FLUX.1-Depth-dev": { + "_id": "xxxxxxxxxxxxxxxxxxxxxxxx", "providerId": "black-forest-labs/FLUX.1-depth", "status": "live" + }, + "tag-filter=base_model:adapter:stabilityai/stable-diffusion-xl-base-1.0,lora": { + "_id": "xxxxxxxxxxxxxxxxxxxxxxxx", + "status": "live", + "providerId": "sdxl-lora-mutualized", + "adapterType": "lora", + "tags": [ + "base_model:adapter:stabilityai/stable-diffusion-xl-base-1.0", + "lora" + ] } }, "conversational": { "deepseek-ai/DeepSeek-R1": { + "_id": "xxxxxxxxxxxxxxxxxxxxxxxx", "providerId": "deepseek-ai/DeepSeek-R1", "status": "live" } }, "text-generation": { "meta-llama/Llama-2-70b-hf": { + "_id": "xxxxxxxxxxxxxxxxxxxxxxxx", "providerId": "meta-llama/Llama-2-70b-hf", "status": "live" }, "mistralai/Mixtral-8x7B-v0.1": { + "_id": "xxxxxxxxxxxxxxxxxxxxxxxx", "providerId": "mistralai/Mixtral-8x7B-v0.1", "status": "live" } @@ -264,9 +316,11 @@ provide the cost for each request via an HTTP API you host on your end. We ask that you expose an API that supports a HTTP POST request. The body of the request is a JSON-encoded object containing a list of request IDs for which we request the cost. +The authentication system should be the same as your Inference service; for example, a bearer token. ```http POST {your URL here} +Authorization: {authentication info - eg "Bearer token"} Content-Type: application/json { @@ -297,7 +351,7 @@ Content-Type: application/json ### Price Unit -We require the price to be an **integer** number of **nano-USDs** (10^-9 USD). +We require the price to be a **non-negative integer** number of **nano-USDs** (10^-9 USD). ### How to define the request ID