-
Notifications
You must be signed in to change notification settings - Fork 374
[Inference Providers] Update partners API documentation #1717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 2 commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -154,14 +154,46 @@ Create a new mapping item, with the following body (JSON-encoded): | |
| - `hfModel` is the model id on the Hub's side. | ||
| - `providerModel` is the model id on your side (can be the same or different). | ||
|
|
||
| In the future, we will add support for a new parameter (ping us if it's important to you now): | ||
| The output of such route is a mapping ID that you can use later to update the mapping's status; or to delete it. | ||
|
|
||
| ### Using a tag-filter to map several HF models to a single inference endpoint | ||
|
|
||
| We also support mapping HF models based on their `tags`. | ||
|
|
||
| This is useful to, for example, automatically map LoRA adapters to a single Inference Endpoint on your side. | ||
SBrandeis marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| <Tip> | ||
|
|
||
| Important note: the client library (Javascript) must be able to handle LoRA weights for your provider. Check out [fal's implementation](https://github.com/huggingface/huggingface.js/blob/904964c9f8cd10ed67114ccb88b9028e89fd6cad/packages/inference/src/providers/fal-ai.ts#L78-L124) for more details. | ||
SBrandeis marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| </Tip> | ||
|
|
||
| The API is as follows: | ||
|
|
||
| ```http | ||
| POST /api/partners/{provider}/models | ||
| ``` | ||
| Create a new mapping item, with the following body (JSON-encoded): | ||
|
|
||
| ```json | ||
| { | ||
| "hfFilter": ["string"] | ||
| // ^Power user move: register a "tag" slice of HF in one go. | ||
| // Example: tag == "base_model:adapter:black-forest-labs/FLUX.1-dev" for all Flux-dev LoRAs | ||
| "type": "tag-filter", // required | ||
| "task": "WidgetType", // required | ||
| "tags": ["string"], // required: any HF model with all of those tags will be mapped to providerModel | ||
| "providerModel": "string", // required: the partner's "model id" i.e. id on your side | ||
| "adapterType": "lora", // required: only "lora" is supported at the moment | ||
| "status": "live" | "staging" // Optional: defaults to "staging". "staging" models are only available to members of the partner's org, then you switch them to "live" when they're ready to go live | ||
| } | ||
| ``` | ||
|
|
||
| - `task`, also known as `pipeline_tag` in the HF ecosystem, is the type of model / type of API | ||
| (examples: "text-to-image", "text-generation", but you should use "conversational" for chat models) | ||
| - `tags` is the set of model tags to match. For example, to match all LoRAs of Flux, you can use: `["lora", "base_model:adapter:black-forest-labs/FLUX.1-dev"]` | ||
| - `providerModel` is the model id on your side (can be the same or different). | ||
SBrandeis marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| - `adapterType` is a literal value designed to help client libraries interpret how to request your API. The only supported value at the moment is `"lora"`. | ||
SBrandeis marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| The output of such route is a mapping ID that you can use later to update the mapping's status; or to delete it. | ||
SBrandeis marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| #### Authentication | ||
|
|
||
| You need to be in the _provider_ Hub organization (e.g. https://huggingface.co/togethercomputer | ||
|
|
@@ -178,26 +210,31 @@ huggingface.js/inference call of the corresponding task i.e. the API specs are v | |
| ### Delete a mapping item | ||
|
|
||
| ```http | ||
| DELETE /api/partners/{provider}/models?hfModel=namespace/model-name | ||
| DELETE /api/partners/{provider}/models/{mapping ID} | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| ``` | ||
|
|
||
| Where `mapping ID` is the mapping's id obtained upon creation. | ||
| You can also retrieve it from the [list API endpoint](#list-the-whole-mapping). | ||
|
|
||
| ### Update a mapping item's status | ||
|
|
||
| Call this HTTP PUT endpoint: | ||
|
|
||
| ```http | ||
| PUT /api/partners/{provider}/models/status | ||
| PUT /api/partners/{provider}/models/{mapping ID}/status | ||
| ``` | ||
|
|
||
| With the following body (JSON-encoded): | ||
|
|
||
| ```json | ||
| { | ||
| "hfModel": "namespace/model-name", // The name of the model on HF | ||
| "status": "live" | "staging" // The new status, one of "staging" or "live" | ||
| } | ||
| ``` | ||
|
|
||
| Where `mapping ID` is the mapping's id obtained upon creation. | ||
| You can also retrieve it from the [list API endpoint](#list-the-whole-mapping). | ||
|
|
||
| ### List the whole mapping | ||
|
|
||
| ```http | ||
|
|
@@ -217,26 +254,41 @@ Here is an example of response: | |
| { | ||
| "text-to-image": { | ||
| "black-forest-labs/FLUX.1-Canny-dev": { | ||
| "_id": "xxxxxxxxxxxxxxxxxxxxxxxx", | ||
| "providerId": "black-forest-labs/FLUX.1-canny", | ||
| "status": "live" | ||
| }, | ||
| "black-forest-labs/FLUX.1-Depth-dev": { | ||
| "_id": "xxxxxxxxxxxxxxxxxxxxxxxx", | ||
| "providerId": "black-forest-labs/FLUX.1-depth", | ||
| "status": "live" | ||
| }, | ||
| "tag-filter=base_model:adapter:stabilityai/stable-diffusion-xl-base-1.0,lora": { | ||
| "_id": "xxxxxxxxxxxxxxxxxxxxxxxx", | ||
| "status": "live", | ||
| "providerId": "sdxl-lora-mutualized", | ||
| "adapterType": "lora", | ||
| "tags": [ | ||
| "base_model:adapter:stabilityai/stable-diffusion-xl-base-1.0", | ||
| "lora" | ||
| ] | ||
| } | ||
| }, | ||
| "conversational": { | ||
| "deepseek-ai/DeepSeek-R1": { | ||
| "_id": "xxxxxxxxxxxxxxxxxxxxxxxx", | ||
| "providerId": "deepseek-ai/DeepSeek-R1", | ||
| "status": "live" | ||
| } | ||
| }, | ||
| "text-generation": { | ||
| "meta-llama/Llama-2-70b-hf": { | ||
| "_id": "xxxxxxxxxxxxxxxxxxxxxxxx", | ||
| "providerId": "meta-llama/Llama-2-70b-hf", | ||
| "status": "live" | ||
| }, | ||
| "mistralai/Mixtral-8x7B-v0.1": { | ||
| "_id": "xxxxxxxxxxxxxxxxxxxxxxxx", | ||
| "providerId": "mistralai/Mixtral-8x7B-v0.1", | ||
| "status": "live" | ||
| } | ||
|
|
@@ -264,9 +316,11 @@ provide the cost for each request via an HTTP API you host on your end. | |
| We ask that you expose an API that supports a HTTP POST request. | ||
| The body of the request is a JSON-encoded object containing a list of request IDs for which we | ||
| request the cost. | ||
| The authentication system should be the same as your Inference service; for example, a bearer token. | ||
|
|
||
| ```http | ||
| POST {your URL here} | ||
| Authorization: {authentication info - eg "Bearer token"} | ||
| Content-Type: application/json | ||
|
|
||
| { | ||
|
|
@@ -297,7 +351,7 @@ Content-Type: application/json | |
|
|
||
| ### Price Unit | ||
|
|
||
| We require the price to be an **integer** number of **nano-USDs** (10^-9 USD). | ||
| We require the price to be a **non-negative integer** number of **nano-USDs** (10^-9 USD). | ||
|
|
||
| ### How to define the request ID | ||
|
|
||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.