Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 62 additions & 8 deletions docs/inference-providers/register-as-a-provider.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,14 +154,46 @@ Create a new mapping item, with the following body (JSON-encoded):
- `hfModel` is the model id on the Hub's side.
- `providerModel` is the model id on your side (can be the same or different).

In the future, we will add support for a new parameter (ping us if it's important to you now):
The output of this route is a mapping ID that you can later use to update the mapping's status or delete it.

### Using a tag-filter to map several HF models to a single inference endpoint

We also support mapping HF models based on their `tags`. Using tag filters, you can automatically map multiple HF models to a single inference endpoint on your side.
For example, any model tagged with both `lora` and `base_model:adapter:black-forest-labs/FLUX.1-dev` can be mapped to your Flux-dev LoRA inference endpoint.


<Tip>

Important: Make sure that the JS client library can handle LoRA weights for your provider. Check out [fal's implementation](https://github.com/huggingface/huggingface.js/blob/904964c9f8cd10ed67114ccb88b9028e89fd6cad/packages/inference/src/providers/fal-ai.ts#L78-L124) for more details.

</Tip>

The API is as follows:

```http
POST /api/partners/{provider}/models
```
Create a new mapping item, with the following body (JSON-encoded):

```json
{
"hfFilter": ["string"]
// ^Power user move: register a "tag" slice of HF in one go.
// Example: tag == "base_model:adapter:black-forest-labs/FLUX.1-dev" for all Flux-dev LoRAs
"type": "tag-filter", // required
"task": "WidgetType", // required
"tags": ["string"], // required: any HF model with all of those tags will be mapped to providerModel
"providerModel": "string", // required: the partner's "model id" i.e. id on your side
"adapterType": "lora", // required: only "lora" is supported at the moment
"status": "live" | "staging" // Optional: defaults to "staging". "staging" models are only available to members of the partner's org, then you switch them to "live" when they're ready to go live
}
```

- `task`, also known as `pipeline_tag` in the HF ecosystem, is the type of model / type of API
(examples: "text-to-image", "text-generation", but you should use "conversational" for chat models)
- `tags` is the set of model tags to match. For example, to match all LoRAs of Flux, you can use: `["lora", "base_model:adapter:black-forest-labs/FLUX.1-dev"]`
- `providerModel` is the model ID on your side (can be the same or different from the HF model ID).
- `adapterType` is a literal value that helps client libraries interpret how to call your API. The only supported value at the moment is `"lora"`.

The output of this route is a mapping ID that you can later use to update the mapping's status or delete it.

#### Authentication

You need to be in the _provider_ Hub organization (e.g. https://huggingface.co/togethercomputer
Expand All @@ -178,26 +210,31 @@ huggingface.js/inference call of the corresponding task i.e. the API specs are v
### Delete a mapping item

```http
DELETE /api/partners/{provider}/models?hfModel=namespace/model-name
DELETE /api/partners/{provider}/models/{mapping ID}
Copy link
Member

@julien-c julien-c Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{:mappingId} would be easier to "parse" mentally

```

Where `mapping ID` is the mapping's id obtained upon creation.
You can also retrieve it from the [list API endpoint](#list-the-whole-mapping).

### Update a mapping item's status

Call this HTTP PUT endpoint:

```http
PUT /api/partners/{provider}/models/status
PUT /api/partners/{provider}/models/{mapping ID}/status
```

With the following body (JSON-encoded):

```json
{
"hfModel": "namespace/model-name", // The name of the model on HF
"status": "live" | "staging" // The new status, one of "staging" or "live"
}
```

Where `mapping ID` is the mapping's id obtained upon creation.
You can also retrieve it from the [list API endpoint](#list-the-whole-mapping).

### List the whole mapping

```http
Expand All @@ -217,26 +254,41 @@ Here is an example of response:
{
"text-to-image": {
"black-forest-labs/FLUX.1-Canny-dev": {
"_id": "xxxxxxxxxxxxxxxxxxxxxxxx",
"providerId": "black-forest-labs/FLUX.1-canny",
"status": "live"
},
"black-forest-labs/FLUX.1-Depth-dev": {
"_id": "xxxxxxxxxxxxxxxxxxxxxxxx",
"providerId": "black-forest-labs/FLUX.1-depth",
"status": "live"
},
"tag-filter=base_model:adapter:stabilityai/stable-diffusion-xl-base-1.0,lora": {
"_id": "xxxxxxxxxxxxxxxxxxxxxxxx",
"status": "live",
"providerId": "sdxl-lora-mutualized",
"adapterType": "lora",
"tags": [
"base_model:adapter:stabilityai/stable-diffusion-xl-base-1.0",
"lora"
]
}
},
"conversational": {
"deepseek-ai/DeepSeek-R1": {
"_id": "xxxxxxxxxxxxxxxxxxxxxxxx",
"providerId": "deepseek-ai/DeepSeek-R1",
"status": "live"
}
},
"text-generation": {
"meta-llama/Llama-2-70b-hf": {
"_id": "xxxxxxxxxxxxxxxxxxxxxxxx",
"providerId": "meta-llama/Llama-2-70b-hf",
"status": "live"
},
"mistralai/Mixtral-8x7B-v0.1": {
"_id": "xxxxxxxxxxxxxxxxxxxxxxxx",
"providerId": "mistralai/Mixtral-8x7B-v0.1",
"status": "live"
}
Expand Down Expand Up @@ -264,9 +316,11 @@ provide the cost for each request via an HTTP API you host on your end.
We ask that you expose an API that supports a HTTP POST request.
The body of the request is a JSON-encoded object containing a list of request IDs for which we
request the cost.
The authentication system should be the same as your Inference service; for example, a bearer token.

```http
POST {your URL here}
Authorization: {authentication info - eg "Bearer token"}
Content-Type: application/json

{
Expand Down Expand Up @@ -297,7 +351,7 @@ Content-Type: application/json

### Price Unit

We require the price to be an **integer** number of **nano-USDs** (10^-9 USD).
We require the price to be a **non-negative integer** number of **nano-USDs** (10^-9 USD).

### How to define the request ID

Expand Down
Loading