Merge branch 'main' into v0.30-release

Wauplin · Wauplin · commit 276c42e7519f · 2025-03-28T15:07:58.000+01:00
diff --git a/.github/workflows/python-prerelease.yml b/.github/workflows/python-prerelease.yml
@@ -3,8 +3,12 @@ name: Python prerelease
 on:
   push:
     tags:
-      - "v*.rc*"
-
+      - v*rc*
+  workflow_dispatch:
+    inputs:
+      tag:
+        description: "Tag to test (e.g., 0.30.0rc2)"
+        required: true
 jobs:
   trigger_rc_testing:
     runs-on: ubuntu-latest
@@ -15,9 +19,14 @@ jobs:
         target-repo: ["transformers", "datasets", "diffusers"]
 
     steps:
-      - name: Extract version from tag
+      - name: Determine version from tag
         id: get-version
-        run: echo "VERSION=${GITHUB_REF#refs/tags/}" >> $GITHUB_OUTPUT
+        run: |
+          if [[ "${{ github.event_name }}" == "workflow_dispatch" ]]; then
+            echo "VERSION=${{ inputs.tag }}" >> $GITHUB_OUTPUT
+          else
+            echo "VERSION=${GITHUB_REF#refs/tags/}" >> $GITHUB_OUTPUT
+          fi
 
       - name: Checkout target repo
         uses: actions/checkout@v4
@@ -80,4 +89,4 @@ jobs:
         run: |
           VERSION=${{ steps.get-version.outputs.VERSION }}
           echo "https://github.com/huggingface/${{ matrix.target-repo }}/actions"
-          echo "https://github.com/huggingface/${{ matrix.target-repo }}/compare/main...ci-test-huggingface-hub-${VERSION}"
+          echo "https://github.com/huggingface/${{ matrix.target-repo }}/compare/main...ci-test-huggingface-hub-${VERSION}"
diff --git a/docs/source/en/guides/download.md b/docs/source/en/guides/download.md
@@ -176,7 +176,7 @@ There are two options to speed up downloads. Both involve installing a Python pa
 Take advantage of faster downloads through `hf_xet`, the Python binding to the [`xet-core`](https://github.com/huggingface/xet-core) library that enables 
 chunk-based deduplication for faster downloads and uploads. `hf_xet` integrates seamlessly with `huggingface_hub`, but uses the Rust `xet-core` library and Xet storage instead of LFS.
 
-`hf_xet` uses the Xet storage system, which breaks files down into immutable chunks, storing collections of these chunks (called blocks or xorbs) remotely and retrieving them to reassemble the file when requested. When downloading, after confirming the user is authorized to access the files, `hf_xet` will query the Xet content-addressable service (CAS) with the LFS SHA256 hash for this file to receive the reconstruction metadata (ranges within xorbs) to assemble these files, along with presigned URLs to download the xorbs directly. Then `hf_xet` will efficiently download the xorb ranges necessary and will write out the files on disk. `hf_xet` uses a local disk cache to only download chunks once, learn more in the [Chunk-based caching(Xet)](./manage-cache.md#chunk-based-caching-xet) section.
+`hf_xet` uses the Xet storage system, which breaks files down into immutable chunks, storing collections of these chunks (called blocks or xorbs) remotely and retrieving them to reassemble the file when requested. When downloading, after confirming the user is authorized to access the files, `hf_xet` will query the Xet content-addressable service (CAS) with the LFS SHA256 hash for this file to receive the reconstruction metadata (ranges within xorbs) to assemble these files, along with presigned URLs to download the xorbs directly. Then `hf_xet` will efficiently download the xorb ranges necessary and will write out the files on disk. `hf_xet` uses a local disk cache to only download chunks once, learn more in the [Chunk-based caching(Xet)](./manage-cache#chunk-based-caching-xet) section.
 
 To enable it, specify the `hf_xet` package when installing `huggingface_hub`:
 
diff --git a/docs/source/en/guides/inference.md b/docs/source/en/guides/inference.md
@@ -320,6 +320,23 @@ For more information about the `asyncio` module, please refer to the [official d
 
 In the above section, we saw the main aspects of [`InferenceClient`]. Let's dive into some more advanced tips.
 
+### Billing
+
+As an HF user, you get monthly credits to run inference through various providers on the Hub. The amount of credits you get depends on your type of account (Free or PRO or Enterprise Hub). You get charged for every inference request, depending on the provider's pricing table. By default, the requests are billed to your personal account. However, it is possible to set the billing so that requests are charged to an organization you are part of by simply passing `bill_to="<your_org_name>"` to `InferenceClient`. For this to work, your organization must be subscribed to Enterprise Hub. For more details about billing, check out [this guide](https://huggingface.co/docs/api-inference/pricing#features-using-inference-providers).
+
+```py
+>>> from huggingface_hub import InferenceClient
+>>> client = InferenceClient(provider="fal-ai", bill_to="openai")
+>>> image = client.text_to_image(
+...     "A majestic lion in a fantasy forest",
+...     model="black-forest-labs/FLUX.1-schnell",
+... )
+>>> image.save("lion.png")
+```
+
+Note that it is NOT possible to charge another user or an organization you are not part of. If you want to grant someone else some credits, you must create a joint organization with them.
+
+
 ### Timeout
 
 Inference calls can take a significant amount of time. By default, [`InferenceClient`] will wait "indefinitely" until the inference complete. If you want more control in your workflow, you can set the `timeout` parameter to a specific value in seconds. If the timeout delay expires, an [`InferenceTimeoutError`] is raised, which you can catch in your code:
diff --git a/src/huggingface_hub/constants.py b/src/huggingface_hub/constants.py
@@ -73,6 +73,7 @@ def _as_int(value: Optional[str]) -> Optional[int]:
 HUGGINGFACE_HEADER_X_REPO_COMMIT = "X-Repo-Commit"
 HUGGINGFACE_HEADER_X_LINKED_ETAG = "X-Linked-Etag"
 HUGGINGFACE_HEADER_X_LINKED_SIZE = "X-Linked-Size"
+HUGGINGFACE_HEADER_X_BILL_TO = "X-HF-Bill-To"
 
 INFERENCE_ENDPOINT = os.environ.get("HF_INFERENCE_ENDPOINT", "https://api-inference.huggingface.co")
 
diff --git a/src/huggingface_hub/inference/_client.py b/src/huggingface_hub/inference/_client.py
@@ -146,6 +146,9 @@ class InferenceClient:
         headers (`Dict[str, str]`, `optional`):
             Additional headers to send to the server. By default only the authorization and user-agent headers are sent.
             Values in this dictionary will override the default values.
+        bill_to (`str`, `optional`):
+            The billing account to use for the requests. By default the requests are billed on the user's account.
+            Requests can only be billed to an organization the user is a member of, and which has subscribed to Enterprise Hub.
         cookies (`Dict[str, str]`, `optional`):
             Additional cookies to send to the server.
         proxies (`Any`, `optional`):
@@ -168,6 +171,7 @@ def __init__(
         headers: Optional[Dict[str, str]] = None,
         cookies: Optional[Dict[str, str]] = None,
         proxies: Optional[Any] = None,
+        bill_to: Optional[str] = None,
         # OpenAI compatibility
         base_url: Optional[str] = None,
         api_key: Optional[str] = None,
@@ -203,7 +207,25 @@ def __init__(
 
         self.model: Optional[str] = base_url or model
         self.token: Optional[str] = token
-        self.headers = headers if headers is not None else {}
+
+        self.headers = {**headers} if headers is not None else {}
+        if bill_to is not None:
+            if (
+                constants.HUGGINGFACE_HEADER_X_BILL_TO in self.headers
+                and self.headers[constants.HUGGINGFACE_HEADER_X_BILL_TO] != bill_to
+            ):
+                warnings.warn(
+                    f"Overriding existing '{self.headers[constants.HUGGINGFACE_HEADER_X_BILL_TO]}' value in headers with '{bill_to}'.",
+                    UserWarning,
+                )
+            self.headers[constants.HUGGINGFACE_HEADER_X_BILL_TO] = bill_to
+
+            if token is not None and not token.startswith("hf_"):
+                warnings.warn(
+                    "You've provided an external provider's API key, so requests will be billed directly by the provider. "
+                    "The `bill_to` parameter is only applicable for Hugging Face billing and will be ignored.",
+                    UserWarning,
+                )
 
         # Configure provider
         self.provider = provider if provider is not None else "hf-inference"
diff --git a/src/huggingface_hub/inference/_generated/_async_client.py b/src/huggingface_hub/inference/_generated/_async_client.py
@@ -134,6 +134,9 @@ class AsyncInferenceClient:
         headers (`Dict[str, str]`, `optional`):
             Additional headers to send to the server. By default only the authorization and user-agent headers are sent.
             Values in this dictionary will override the default values.
+        bill_to (`str`, `optional`):
+            The billing account to use for the requests. By default the requests are billed on the user's account.
+            Requests can only be billed to an organization the user is a member of, and which has subscribed to Enterprise Hub.
         cookies (`Dict[str, str]`, `optional`):
             Additional cookies to send to the server.
         trust_env ('bool', 'optional'):
@@ -159,6 +162,7 @@ def __init__(
         cookies: Optional[Dict[str, str]] = None,
         trust_env: bool = False,
         proxies: Optional[Any] = None,
+        bill_to: Optional[str] = None,
         # OpenAI compatibility
         base_url: Optional[str] = None,
         api_key: Optional[str] = None,
@@ -194,7 +198,25 @@ def __init__(
 
         self.model: Optional[str] = base_url or model
         self.token: Optional[str] = token
-        self.headers = headers if headers is not None else {}
+
+        self.headers = {**headers} if headers is not None else {}
+        if bill_to is not None:
+            if (
+                constants.HUGGINGFACE_HEADER_X_BILL_TO in self.headers
+                and self.headers[constants.HUGGINGFACE_HEADER_X_BILL_TO] != bill_to
+            ):
+                warnings.warn(
+                    f"Overriding existing '{self.headers[constants.HUGGINGFACE_HEADER_X_BILL_TO]}' value in headers with '{bill_to}'.",
+                    UserWarning,
+                )
+            self.headers[constants.HUGGINGFACE_HEADER_X_BILL_TO] = bill_to
+
+            if token is not None and not token.startswith("hf_"):
+                warnings.warn(
+                    "You've provided an external provider's API key, so requests will be billed directly by the provider. "
+                    "The `bill_to` parameter is only applicable for Hugging Face billing and will be ignored.",
+                    UserWarning,
+                )
 
         # Configure provider
         self.provider = provider if provider is not None else "hf-inference"
diff --git a/tests/test_inference_client.py b/tests/test_inference_client.py
@@ -1079,3 +1079,30 @@ def test_cannot_pass_token_false():
     """
     with pytest.raises(ValueError):
         InferenceClient(token=False)
+
+
+class TestBillToOrganization:
+    def test_bill_to_added_to_new_headers(self):
+        client = InferenceClient(bill_to="huggingface_hub")
+        assert client.headers["X-HF-Bill-To"] == "huggingface_hub"
+
+    def test_bill_to_added_to_existing_headers(self):
+        headers = {"foo": "bar"}
+        client = InferenceClient(bill_to="huggingface_hub", headers=headers)
+        assert client.headers["X-HF-Bill-To"] == "huggingface_hub"
+        assert client.headers["foo"] == "bar"
+        assert headers == {"foo": "bar"}  # do not mutate the original headers
+
+    def test_warning_if_bill_to_already_set(self):
+        headers = {"X-HF-Bill-To": "huggingface"}
+        with pytest.warns(UserWarning, match="Overriding existing 'huggingface' value in headers with 'openai'."):
+            client = InferenceClient(bill_to="openai", headers=headers)
+        assert client.headers["X-HF-Bill-To"] == "openai"
+        assert headers == {"X-HF-Bill-To": "huggingface"}  # do not mutate the original headers
+
+    def test_warning_if_bill_to_with_direct_calls(self):
+        with pytest.warns(
+            UserWarning,
+            match="You've provided an external provider's API key, so requests will be billed directly by the provider.",
+        ):
+            InferenceClient(bill_to="openai", token="replicate_key", provider="replicate")