Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions docs/hub/academia-hub.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
# Academia Hub

<Tip>
Ask your university's IT or Procurement Team to <a href="https://huggingface.co/contact/sales?from=academia" target="_blank">get in touch</a> from a university-affiliated email address to initiate the subscription process.
</Tip>
> [!TIP]
> Ask your university's IT or Procurement Team to <a href="https://huggingface.co/contact/sales?from=academia" target="_blank">get in touch</a> from a university-affiliated email address to initiate the subscription process.

Academia Hub is a paid offering that provides the Hugging Face Hub’s PRO features to every student, researcher, or faculty member of an academic institution. Explore advanced tools, enhanced collaboration, and exclusive resources to accelerate your learning, research, and teaching. The Hugging Face team is able to work with your IT or procurement department to set the product up.

Expand Down
5 changes: 2 additions & 3 deletions docs/hub/advanced-compute-options.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
# Advanced Compute Options

<Tip warning={true}>
This feature is part of the <a href="https://huggingface.co/enterprise">Team & Enterprise</a> plans.
</Tip>
> [!WARNING]
> This feature is part of the <a href="https://huggingface.co/enterprise">Team & Enterprise</a> plans.

Enterprise Hub organizations gain access to advanced compute options to accelerate their machine learning journey.

Expand Down
14 changes: 4 additions & 10 deletions docs/hub/agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,8 @@ With the HF MCP Server, you can enhance your AI assistant's capabilities by conn

Visit [huggingface.co/settings/mcp](https://huggingface.co/settings/mcp) to configure your MCP client and get started. Read the dedicated one‑page guide: [HF MCP Server](./hf-mcp-server).

<Tip warning={true}>

This feature is experimental ⚗️ and will continue to evolve.

</Tip>
> [!WARNING]
> This feature is experimental ⚗️ and will continue to evolve.

## tiny-agents (JS and Python)

Expand Down Expand Up @@ -143,11 +140,8 @@ To use a local LLM (such as [llama.cpp](https://github.com/ggerganov/llama.cpp),

Optionally, add a `PROMPT.md` to customize the system prompt.

<Tip>

Don't hesitate to contribute your agent to the community by opening a Pull Request in the [tiny-agents](https://huggingface.co/datasets/tiny-agents/tiny-agents) Hugging Face dataset.

</Tip>
> [!TIP]
> Don't hesitate to contribute your agent to the community by opening a Pull Request in the [tiny-agents](https://huggingface.co/datasets/tiny-agents/tiny-agents) Hugging Face dataset.

## Gradio MCP Server / Tools

Expand Down
12 changes: 4 additions & 8 deletions docs/hub/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,8 @@ All API calls are subject to the HF-wide [Rate limits](./rate-limits). Upgrade y

The following endpoints help get information about models, datasets, and Spaces stored on the Hub.

<Tip>
When making API calls to retrieve information about repositories, the <code>createdAt</code> attribute indicates the time when the respective repository was created. It's important to note that there is a unique value, <code>2022-03-02T23:29:04.000Z</code> assigned to all repositories that were created before we began storing creation dates.
</Tip>
> [!TIP]
> When making API calls to retrieve information about repositories, the <code>createdAt</code> attribute indicates the time when the respective repository was created. It's important to note that there is a unique value, <code>2022-03-02T23:29:04.000Z</code> assigned to all repositories that were created before we began storing creation dates.

### GET /api/models

Expand Down Expand Up @@ -486,11 +485,8 @@ If no parameter is set, all collections are returned.

The response is paginated. To get all collections, you must follow the [`Link` header](https://docs.github.com/en/rest/guides/using-pagination-in-the-rest-api?apiVersion=2022-11-28#link-header).

<Tip warning={true}>

When listing collections, the item list per collection is truncated to 4 items maximum. To retrieve all items from a collection, you need to make an additional call using its collection slug.

</Tip>
> [!WARNING]
> When listing collections, the item list per collection is truncated to 4 items maximum. To retrieve all items from a collection, you need to make an additional call using its collection slug.

Payload:

Expand Down
5 changes: 2 additions & 3 deletions docs/hub/audit-logs.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
# Audit Logs

<Tip warning={true}>
This feature is part of the <a href="https://huggingface.co/enterprise">Team & Enterprise</a> plans.
</Tip>
> [!WARNING]
> This feature is part of the <a href="https://huggingface.co/enterprise">Team & Enterprise</a> plans.

Audit Logs enable organization admins to easily review actions taken by members, including organization membership, repository settings and billing changes.

Expand Down
11 changes: 4 additions & 7 deletions docs/hub/datasets-duckdb-combine-and-export.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,13 +84,10 @@ SELECT COUNT(*) FROM 'output.parquet';

```

<Tip>

You can also export to [CSV](https://duckdb.org/docs/guides/file_formats/csv_export), [Excel](https://duckdb.org/docs/guides/file_formats/excel_export
) and [JSON](https://duckdb.org/docs/guides/file_formats/json_export
) formats.

</Tip>
> [!TIP]
> You can also export to [CSV](https://duckdb.org/docs/guides/file_formats/csv_export), [Excel](https://duckdb.org/docs/guides/file_formats/excel_export
> ) and [JSON](https://duckdb.org/docs/guides/file_formats/json_export
> ) formats.

Finally, let's push the resulting dataset to the Hub. You can use the Hub UI, the `huggingface_hub` client library and more to upload your Parquet file, see more information [here](./datasets-adding).

Expand Down
7 changes: 2 additions & 5 deletions docs/hub/datasets-duckdb-sql.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,8 +152,5 @@ Finally, lets highlight some of the DuckDB functions used in this section:
- `regexp_replace`, if the string contains the regexp pattern, replaces the matching part with replacement.
- `LENGTH`, gets the number of characters in the string.

<Tip>

There are plenty of useful functions available in DuckDB's [SQL functions overview](https://duckdb.org/docs/sql/functions/overview). The best part is that you can use them directly on Hugging Face datasets.

</Tip>
> [!TIP]
> There are plenty of useful functions available in DuckDB's [SQL functions overview](https://duckdb.org/docs/sql/functions/overview). The best part is that you can use them directly on Hugging Face datasets.
38 changes: 16 additions & 22 deletions docs/hub/datasets-duckdb.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,8 @@ You can use the Hugging Face paths (`hf://`) to access data on the Hub:
The [DuckDB CLI](https://duckdb.org/docs/api/cli/overview.html) (Command Line Interface) is a single, dependency-free executable.
There are also other APIs available for running DuckDB, including Python, C++, Go, Java, Rust, and more. For additional details, visit their [clients](https://duckdb.org/docs/api/overview.html) page.

<Tip>

For installation details, visit the [installation page](https://duckdb.org/docs/installation).

</Tip>
> [!TIP]
> For installation details, visit the [installation page](https://duckdb.org/docs/installation).

Starting from version `v0.10.3`, the DuckDB CLI includes native support for accessing datasets on the Hugging Face Hub via URLs with the `hf://` scheme. Here are some features you can leverage with this powerful tool:

Expand Down Expand Up @@ -45,23 +42,20 @@ hf://datasets/{my-username}/{my-dataset}/{path_to_file}
- **path_to_parquet_file**, the parquet file path which supports glob patterns, e.g `**/*.parquet`, to query all parquet files


<Tip>

You can query auto-converted Parquet files using the @~parquet branch, which corresponds to the `refs/convert/parquet` revision. For more details, refer to the documentation at https://huggingface.co/docs/datasets-server/en/parquet#conversion-to-parquet.

To reference the `refs/convert/parquet` revision of a dataset, use the following syntax:

```plaintext
hf://datasets/{my-username}/{my-dataset}@~parquet/{path_to_file}
```

Here is a sample URL following the above syntax:

```plaintext
hf://datasets/ibm/duorc@~parquet/ParaphraseRC/test/0000.parquet
```

</Tip>
> [!TIP]
> You can query auto-converted Parquet files using the @~parquet branch, which corresponds to the `refs/convert/parquet` revision. For more details, refer to the documentation at https://huggingface.co/docs/datasets-server/en/parquet#conversion-to-parquet.
>
> To reference the `refs/convert/parquet` revision of a dataset, use the following syntax:
>
> ```plaintext
> hf://datasets/{my-username}/{my-dataset}@~parquet/{path_to_file}
> ```
>
> Here is a sample URL following the above syntax:
>
> ```plaintext
> hf://datasets/ibm/duorc@~parquet/ParaphraseRC/test/0000.parquet
> ```

Let's start with a quick demo to query all the rows of a dataset:

Expand Down
7 changes: 2 additions & 5 deletions docs/hub/datasets-gated.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,11 +147,8 @@ By clicking on **Agree**, you agree to share your username and email address wit

Once the access request is sent, there are two possibilities. If the approval mechanism is automatic, you immediately get access to the dataset files. Otherwise, the requests have to be approved manually by the authors, which can take more time.

<Tip warning={true}>

The dataset authors have complete control over dataset access. In particular, they can decide at any time to block your access to the dataset without prior notice, regardless of approval mechanism or if your request has already been approved.

</Tip>
> [!WARNING]
> The dataset authors have complete control over dataset access. In particular, they can decide at any time to block your access to the dataset without prior notice, regardless of approval mechanism or if your request has already been approved.

### Download files

Expand Down
30 changes: 12 additions & 18 deletions docs/hub/datasets-manual-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,11 +73,8 @@ configs:
---
```

<Tip warning={true}>

Note that `config_name` field is required even if you have a single subset.

</Tip>
> [!WARNING]
> Note that `config_name` field is required even if you have a single subset.

## Multiple Subsets

Expand Down Expand Up @@ -105,19 +102,16 @@ configs:

Note that the order of subsets shown in the viewer is the default one first, then alphabetical.

<Tip>

You can set a default subset using `default: true`

```yaml
- config_name: main_data
data_files: "main_data.csv"
default: true
```

This is useful to set which subset the Dataset Viewer shows first, and which subset data libraries load by default.

</Tip>
> [!TIP]
> You can set a default subset using `default: true`
>
> ```yaml
> - config_name: main_data
> data_files: "main_data.csv"
> default: true
> ```
>
> This is useful to set which subset the Dataset Viewer shows first, and which subset data libraries load by default.


## Builder parameters
Expand Down
7 changes: 2 additions & 5 deletions docs/hub/datasets-pandas.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,11 +187,8 @@ df["audio"] = df["audio"].sf.write()
You can use `transformers` pipelines on pandas DataFrames to classify, generate text, images, etc.
This section shows a few examples with `tqdm` for progress bars.

<Tip>

Pipelines don't accept a `tqdm` object as input but you can use a python generator instead, in the form `x for x in tqdm(...)`

</Tip>
> [!TIP]
> Pipelines don't accept a `tqdm` object as input but you can use a python generator instead, in the form `x for x in tqdm(...)`

### Text Classification

Expand Down
7 changes: 2 additions & 5 deletions docs/hub/datasets-polars.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,8 @@ import polars as pl
pl.read_parquet("hf://datasets/roneneldan/TinyStories/data/train-00000-of-00004-2d5a1467fff1081b.parquet")
```

<Tip>

Polars provides two APIs: a lazy API (`scan_parquet`) and an eager API (`read_parquet`). We recommend using the eager API for interactive workloads and the lazy API for performance as it allows for better query optimization. For more information on the topic, check out the [Polars user guide](https://docs.pola.rs/user-guide/concepts/lazy-api/#when-to-use-which).

</Tip>
> [!TIP]
> Polars provides two APIs: a lazy API (`scan_parquet`) and an eager API (`read_parquet`). We recommend using the eager API for interactive workloads and the lazy API for performance as it allows for better query optimization. For more information on the topic, check out the [Polars user guide](https://docs.pola.rs/user-guide/concepts/lazy-api/#when-to-use-which).

Polars supports globbing to download multiple files at once into a single DataFrame.

Expand Down
11 changes: 4 additions & 7 deletions docs/hub/datasets-pyarrow.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,13 +81,10 @@ pq.write_table(table_test , "hf://datasets/username/my_dataset/test.parquet", us

We use `use_content_defined_chunking=True` to enable faster uploads and downloads from Hugging Face thanks to Xet deduplication (it requires `pyarrow>=21.0`).

<Tip>

Content defined chunking (CDC) makes the Parquet writer chunk the data pages in a way that makes duplicate data chunked and compressed identically.
Without CDC, the pages are arbitrarily chunked and therefore duplicate data are impossible to detect because of compression.
Thanks to CDC, Parquet uploads and downloads from Hugging Face are faster, since duplicate data are uploaded or downloaded only once.

</Tip>
> [!TIP]
> Content defined chunking (CDC) makes the Parquet writer chunk the data pages in a way that makes duplicate data chunked and compressed identically.
> Without CDC, the pages are arbitrarily chunked and therefore duplicate data are impossible to detect because of compression.
> Thanks to CDC, Parquet uploads and downloads from Hugging Face are faster, since duplicate data are uploaded or downloaded only once.

Find more information about Xet [here](https://huggingface.co/join/xet).

Expand Down
5 changes: 2 additions & 3 deletions docs/hub/datasets-upload-guide-llm.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
# Hugging Face Dataset Upload Decision Guide

<Tip>
This guide is primarily designed for LLMs to help users upload datasets to the Hugging Face Hub in the most compatible format. Users can also reference this guide to understand the upload process and best practices.
</Tip>
> [!TIP]
> This guide is primarily designed for LLMs to help users upload datasets to the Hugging Face Hub in the most compatible format. Users can also reference this guide to understand the upload process and best practices.


> Decision guide for uploading datasets to Hugging Face Hub. Optimized for Dataset Viewer compatibility and integration with the Hugging Face ecosystem.
Expand Down
5 changes: 2 additions & 3 deletions docs/hub/datasets-viewer-sql-console.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,8 @@ Through the SQL Console, you can:
- Embed the results of the query in your own webpage using an iframe
- Query datasets with natural language

<Tip>
You can also use the DuckDB locally through the CLI to query the dataset via the `hf://` protocol. See the <a href="https://huggingface.co/docs/hub/en/datasets-duckdb" target="_blank" rel="noopener noreferrer">DuckDB Datasets documentation</a> for more information. The SQL Console provides a convenient `Copy to DuckDB CLI` button that generates the SQL query for creating views and executing your query in the DuckDB CLI.
</Tip>
> [!TIP]
> You can also use the DuckDB locally through the CLI to query the dataset via the `hf://` protocol. See the <a href="https://huggingface.co/docs/hub/en/datasets-duckdb" target="_blank" rel="noopener noreferrer">DuckDB Datasets documentation</a> for more information. The SQL Console provides a convenient `Copy to DuckDB CLI` button that generates the SQL query for creating views and executing your query in the DuckDB CLI.


## Examples
Expand Down
7 changes: 2 additions & 5 deletions docs/hub/datasets-viewer.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,11 +58,8 @@ In this case, an informational message lets you know that the Viewer is partial.

To power the dataset viewer, the first 5GB of every dataset are auto-converted to the Parquet format (unless it was already a Parquet dataset). In the dataset viewer (for example, see [GLUE](https://huggingface.co/datasets/nyu-mll/glue)), you can click on [_"Auto-converted to Parquet"_](https://huggingface.co/datasets/nyu-mll/glue/tree/refs%2Fconvert%2Fparquet/cola) to access the Parquet files. Please, refer to the [dataset viewer docs](/docs/datasets-server/parquet_process) to learn how to query the dataset parquet files with libraries such as Polars, Pandas or DuckDB.

<Tip>

Parquet is a columnar storage format optimized for querying and processing large datasets. Parquet is a popular choice for big data processing and analytics and is widely used for data processing and machine learning. You can learn more about the advantages associated with this format in the <a href="https://huggingface.co/docs/datasets-server/parquet">documentation</a>.

</Tip>
> [!TIP]
> Parquet is a columnar storage format optimized for querying and processing large datasets. Parquet is a popular choice for big data processing and analytics and is widely used for data processing and machine learning. You can learn more about the advantages associated with this format in the <a href="https://huggingface.co/docs/datasets-server/parquet">documentation</a>.

### Conversion bot

Expand Down
11 changes: 4 additions & 7 deletions docs/hub/dduf.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,10 @@ This work draws inspiration from the [GGUF](https://github.com/ggerganov/ggml/bl

Check out the [DDUF](https://huggingface.co/DDUF) org to start using some of the most popular diffusion models in DDUF.

<Tip>

We welcome contributions with open arms!

To create a widely adopted file format, we need early feedback from the community. Nothing is set in stone, and we value everyone's input. Is your use case not covered? Please let us know in the DDUF organization [discussions](https://huggingface.co/spaces/DDUF/README/discussions/2).

</Tip>
> [!TIP]
> We welcome contributions with open arms!
>
> To create a widely adopted file format, we need early feedback from the community. Nothing is set in stone, and we value everyone's input. Is your use case not covered? Please let us know in the DDUF organization [discussions](https://huggingface.co/spaces/DDUF/README/discussions/2).

Its key features include the following.

Expand Down
5 changes: 2 additions & 3 deletions docs/hub/enterprise-hub-advanced-security.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
# Advanced Security

<Tip warning={true}>
This feature is part of the <a href="https://huggingface.co/enterprise">Team & Enterprise</a> plans.
</Tip>
> [!WARNING]
> This feature is part of the <a href="https://huggingface.co/enterprise">Team & Enterprise</a> plans.

Enterprise Hub organizations can improve their security with advanced security controls for both members and repositories.

Expand Down
10 changes: 4 additions & 6 deletions docs/hub/enterprise-hub-advanced-sso.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
# Advanced Single Sign-On (SSO)

<Tip warning={true}>
This feature is part of the <a href="https://huggingface.co/contact/sales?from=enterprise" target="_blank">Enterprise Plus</a> plan.
</Tip>
> [!WARNING]
> This feature is part of the <a href="https://huggingface.co/contact/sales?from=enterprise" target="_blank">Enterprise Plus</a> plan.

Advanced Single Sign-On (SSO) capabilities extend the standard [SSO features](./security-sso) available in the Enterprise Hub, offering enhanced control and automation for user management and access across the entire Hugging Face platform for your organization members.

Expand Down Expand Up @@ -34,9 +33,8 @@ This feature is particularly beneficial for organizations requiring a higher deg

## Limitations on Managed User Accounts

<Tip warning={true}>
Important Considerations for Managed Accounts.
</Tip>
> [!WARNING]
> Important Considerations for Managed Accounts.

To ensure organizational control and data governance, user accounts provisioned and managed via Advanced SSO ("managed user accounts") have specific limitations:

Expand Down
5 changes: 2 additions & 3 deletions docs/hub/enterprise-hub-analytics.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
# Analytics

<Tip warning={true}>
This feature is part of the <a href="https://huggingface.co/enterprise">Team & Enterprise</a> plans.
</Tip>
> [!WARNING]
> This feature is part of the <a href="https://huggingface.co/enterprise">Team & Enterprise</a> plans.

## Publisher Analytics Dashboard

Expand Down
5 changes: 2 additions & 3 deletions docs/hub/enterprise-hub-datasets.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
# Datasets

<Tip warning={true}>
This feature is part of the <a href="https://huggingface.co/enterprise">Team & Enterprise</a> plans.
</Tip>
> [!WARNING]
> This feature is part of the <a href="https://huggingface.co/enterprise">Team & Enterprise</a> plans.

Data Studio is enabled on private datasets under your Enterprise Hub organization.

Expand Down
Loading