Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/hub/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,8 @@
sections:
- local: datasets-argilla
title: Argilla
- local: datasets-daft
title: Daft
- local: datasets-dask
title: Dask
- local: datasets-usage
Expand Down
2 changes: 1 addition & 1 deletion docs/hub/datasets-adding.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ The rich features set in the `huggingface_hub` library allows you to manage repo

## Using other libraries

Some libraries like [🤗 Datasets](/docs/datasets/index), [Pandas](https://pandas.pydata.org/), [Polars](https://pola.rs), [Dask](https://www.dask.org/) or [DuckDB](https://duckdb.org/) can upload files to the Hub.
Some libraries like [🤗 Datasets](/docs/datasets/index), [Pandas](https://pandas.pydata.org/), [Polars](https://pola.rs), [Dask](https://www.dask.org/), [DuckDB](https://duckdb.org/), or [Daft](https://daft.ai/) can upload files to the Hub.
See the list of [Libraries supported by the Datasets Hub](./datasets-libraries) for more information.

## Using Git
Expand Down
79 changes: 79 additions & 0 deletions docs/hub/datasets-daft.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Daft

[Daft](https://daft.ai/) is a high-performance data engine providing simple and reliable data processing for any modality and scale. Daft has native support for reading from and writing to Hugging Face datasets.

<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/daft_hf.png"/>
</div>


## Getting Started

To get started, pip install `daft` with the `huggingface` feature:

```bash
pip install 'daft[hugggingface]'
```

## Read

Daft is able to read datasets directly from the Hugging Face Hub using the [`daft.read_huggingface()`](https://docs.daft.ai/en/stable/api/io/#daft.read_huggingface) function or via the `hf://datasets/` protocol.

### Reading an Entire Dataset

Using [`daft.read_huggingface()`](https://docs.daft.ai/en/stable/api/io/#daft.read_huggingface), you can easily load a dataset.


```python
import daft

df = daft.read_huggingface("username/dataset_name")
```

This will read the entire dataset into a DataFrame.

### Reading Specific Files

Not only can you read entire datasets, but you can also read individual files from a dataset repository. Using a read function that takes in a path (such as [`daft.read_parquet()`](https://docs.daft.ai/en/stable/api/io/#daft.read_parquet), [`daft.read_csv()`](https://docs.daft.ai/en/stable/api/io/#daft.read_csv), or [`daft.read_json()`](https://docs.daft.ai/en/stable/api/io/#daft.read_json)), specify a Hugging Face dataset path via the `hf://datasets/` prefix:

```python
import daft

# read a specific Parquet file
df = daft.read_parquet("hf://datasets/username/dataset_name/file_name.parquet")

# or a csv file
df = daft.read_csv("hf://datasets/username/dataset_name/file_name.csv")

# or a set of Parquet files using a glob pattern
df = daft.read_parquet("hf://datasets/username/dataset_name/**/*.parquet")
```

## Write

Daft is able to write Parquet files to a Hugging Face dataset repository using [`daft.DataFrame.write_huggingface`](https://docs.daft.ai/en/stable/api/dataframe/#daft.DataFrame.write_deltalake). Daft supports [Content-Defined Chunking](https://huggingface.co/blog/parquet-cdc) and [Xet](https://huggingface.co/blog/xet-on-the-hub) for faster, deduplicated writes.

Basic usage:

```python
import daft

df: daft.DataFrame = ...

df.write_huggingface("username/dataset_name")
```

See the [`DataFrame.write_huggingface`](https://docs.daft.ai/en/stable/api/dataframe/#daft.DataFrame.write_deltalake) API page for more info.

## Authentication

The `token` parameter in [`daft.io.HuggingFaceConfig`](https://docs.daft.ai/en/stable/api/config/#daft.io.HuggingFaceConfig) can be used to specify a Hugging Face access token for requests that require authentication (e.g. reading private dataset repositories or writing to a dataset repository).

Example of loading a dataset with a specified token:

```python
from daft.io import IOConfig, HuggingFaceConfig

io_config = IOConfig(hf=HuggingFaceConfig(token="your_token"))
df = daft.read_parquet("hf://datasets/username/dataset_name", io_config=io_config)
```
3 changes: 2 additions & 1 deletion docs/hub/datasets-libraries.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ The table below summarizes the supported libraries and their level of integratio
| Library | Description | Download from Hub | Push to Hub |
| ----------------------------------- | ------------------------------------------------------------------------------------------------------------------------------ | ----------------- | ----------- |
| [Argilla](./datasets-argilla) | Collaboration tool for AI engineers and domain experts that value high quality data. | ✅ | ✅ |
| [Daft](./datasets-daft) | Data engine for large scale, multimodal data processing with a Python-native interface. | ✅ | ✅ |
| [Dask](./datasets-dask) | Parallel and distributed computing library that scales the existing Python and PyData ecosystem. | ✅ | ✅ |
| [Datasets](./datasets-usage) | 🤗 Datasets is a library for accessing and sharing datasets for Audio, Computer Vision, and Natural Language Processing (NLP). | ✅ | ✅ |
| [Distilabel](./datasets-distilabel) | The framework for synthetic data generation and AI feedback. | ✅ | ✅ |
Expand Down Expand Up @@ -87,7 +88,7 @@ Examples of this kind of integration:

#### Rely on an existing libraries integration with the Hub

Polars, Pandas, Dask, Spark and DuckDB all can write to a Hugging Face Hub repository. See [datasets libraries](https://huggingface.co/docs/hub/datasets-libraries) for more details.
Polars, Pandas, Dask, Spark, DuckDB, and Daft can all write to a Hugging Face Hub repository. See [datasets libraries](https://huggingface.co/docs/hub/datasets-libraries) for more details.

If you are already using one of these libraries in your code, adding the ability to push to the Hub is straightforward. For example, if you have a synthetic data generation library that can return a Pandas DataFrame, here is the code you would need to write to the Hub:

Expand Down
100 changes: 49 additions & 51 deletions docs/inference-providers/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,42 +6,9 @@
title: Pricing and Billing
- local: hub-integration
title: Hub integration
- local: register-as-a-provider
title: Register as an Inference Provider
- local: security
title: Security

- title: Providers
sections:
- local: providers/cerebras
title: Cerebras
- local: providers/cohere
title: Cohere
- local: providers/fal-ai
title: Fal AI
- local: providers/featherless-ai
title: Featherless AI
- local: providers/fireworks-ai
title: Fireworks
- local: providers/groq
title: Groq
- local: providers/hyperbolic
title: Hyperbolic
- local: providers/hf-inference
title: HF Inference
- local: providers/nebius
title: Nebius
- local: providers/novita
title: Novita
- local: providers/nscale
title: Nscale
- local: providers/replicate
title: Replicate
- local: providers/sambanova
title: SambaNova
- local: providers/together
title: Together

- title: Guides
sections:
- local: guides/first-api-call
Expand All @@ -57,25 +24,19 @@
- local: guides/image-editor
title: Build an Image Editor


- title: API Reference
- local: tasks/index
title: Inference Tasks
sections:
- local: tasks/index
title: Index
- local: hub-api
title: Hub API
- title: Popular Tasks
sections:
- local: tasks/chat-completion
title: Chat Completion
- local: tasks/feature-extraction
title: Feature Extraction
- local: tasks/text-to-image
title: Text to Image
- local: tasks/text-to-video
title: Text to Video
- local: tasks/chat-completion
title: Chat Completion
- local: tasks/feature-extraction
title: Feature Extraction
- local: tasks/text-to-image
title: Text to Image
- local: tasks/text-to-video
title: Text to Video
- title: Other Tasks
isExpanded: false
isExpanded: False
sections:
- local: tasks/audio-classification
title: Audio Classification
Expand Down Expand Up @@ -108,4 +69,41 @@
- local: tasks/translation
title: Translation
- local: tasks/zero-shot-classification
title: Zero Shot Classification
title: Zero Shot Classification

- title: Providers
sections:
- local: providers/cerebras
title: Cerebras
- local: providers/cohere
title: Cohere
- local: providers/fal-ai
title: Fal AI
- local: providers/featherless-ai
title: Featherless AI
- local: providers/fireworks-ai
title: Fireworks
- local: providers/groq
title: Groq
- local: providers/hyperbolic
title: Hyperbolic
- local: providers/hf-inference
title: HF Inference
- local: providers/nebius
title: Nebius
- local: providers/novita
title: Novita
- local: providers/nscale
title: Nscale
- local: providers/replicate
title: Replicate
- local: providers/sambanova
title: SambaNova
- local: providers/together
title: Together

- local: hub-api
title: Hub API

- local: register-as-a-provider
title: Register as an Inference Provider
4 changes: 2 additions & 2 deletions docs/inference-providers/providers/featherless-ai.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ Find out more about Chat Completion (LLM) [here](../tasks/chat-completion).

<InferenceSnippet
pipeline=text-generation
providersMapping={ {"featherless-ai":{"modelId":"moonshotai/Kimi-K2-Instruct","providerModelId":"moonshotai/Kimi-K2-Instruct"} } }
providersMapping={ {"featherless-ai":{"modelId":"meta-llama/Llama-3.1-8B-Instruct","providerModelId":"meta-llama/Meta-Llama-3.1-8B-Instruct"} } }
conversational />


Expand All @@ -72,6 +72,6 @@ Find out more about Text Generation [here](../tasks/text_generation).

<InferenceSnippet
pipeline=text-generation
providersMapping={ {"featherless-ai":{"modelId":"moonshotai/Kimi-K2-Instruct","providerModelId":"moonshotai/Kimi-K2-Instruct"} } }
providersMapping={ {"featherless-ai":{"modelId":"meta-llama/Llama-3.1-8B-Instruct","providerModelId":"meta-llama/Meta-Llama-3.1-8B-Instruct"} } }
/>

30 changes: 5 additions & 25 deletions docs/inference-providers/providers/hf-inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,16 +57,6 @@ Find out more about Automatic Speech Recognition [here](../tasks/automatic_speec
/>


### Chat Completion (LLM)

Find out more about Chat Completion (LLM) [here](../tasks/chat-completion).

<InferenceSnippet
pipeline=text-generation
providersMapping={ {"hf-inference":{"modelId":"HuggingFaceTB/SmolLM3-3B","providerModelId":"HuggingFaceTB/SmolLM3-3B"} } }
conversational />


### Feature Extraction

Find out more about Feature Extraction [here](../tasks/feature_extraction).
Expand All @@ -93,7 +83,7 @@ Find out more about Image Classification [here](../tasks/image_classification).

<InferenceSnippet
pipeline=image-classification
providersMapping={ {"hf-inference":{"modelId":"Falconsai/nsfw_image_detection","providerModelId":"Falconsai/nsfw_image_detection"} } }
providersMapping={ {"hf-inference":{"modelId":"dima806/fairface_age_image_detection","providerModelId":"dima806/fairface_age_image_detection"} } }
/>


Expand All @@ -103,7 +93,7 @@ Find out more about Image Segmentation [here](../tasks/image_segmentation).

<InferenceSnippet
pipeline=image-segmentation
providersMapping={ {"hf-inference":{"modelId":"mattmdjaga/segformer_b2_clothes","providerModelId":"mattmdjaga/segformer_b2_clothes"} } }
providersMapping={ {"hf-inference":{"modelId":"facebook/mask2former-swin-small-ade-semantic","providerModelId":"facebook/mask2former-swin-small-ade-semantic"} } }
/>


Expand Down Expand Up @@ -153,17 +143,7 @@ Find out more about Text Classification [here](../tasks/text_classification).

<InferenceSnippet
pipeline=text-classification
providersMapping={ {"hf-inference":{"modelId":"ProsusAI/finbert","providerModelId":"ProsusAI/finbert"} } }
/>


### Text Generation

Find out more about Text Generation [here](../tasks/text_generation).

<InferenceSnippet
pipeline=text-generation
providersMapping={ {"hf-inference":{"modelId":"HuggingFaceTB/SmolLM3-3B","providerModelId":"HuggingFaceTB/SmolLM3-3B"} } }
providersMapping={ {"hf-inference":{"modelId":"distilbert/distilbert-base-uncased-finetuned-sst-2-english","providerModelId":"distilbert/distilbert-base-uncased-finetuned-sst-2-english"} } }
/>


Expand All @@ -183,7 +163,7 @@ Find out more about Token Classification [here](../tasks/token_classification).

<InferenceSnippet
pipeline=token-classification
providersMapping={ {"hf-inference":{"modelId":"iiiorg/piiranha-v1-detect-personal-information","providerModelId":"iiiorg/piiranha-v1-detect-personal-information"} } }
providersMapping={ {"hf-inference":{"modelId":"microsoft/llmlingua-2-bert-base-multilingual-cased-meetingbank","providerModelId":"microsoft/llmlingua-2-bert-base-multilingual-cased-meetingbank"} } }
/>


Expand All @@ -203,6 +183,6 @@ Find out more about Zero Shot Classification [here](../tasks/zero_shot_classific

<InferenceSnippet
pipeline=zero-shot-classification
providersMapping={ {"hf-inference":{"modelId":"facebook/bart-large-mnli","providerModelId":"facebook/bart-large-mnli"} } }
providersMapping={ {"hf-inference":{"modelId":"joeddav/xlm-roberta-large-xnli","providerModelId":"joeddav/xlm-roberta-large-xnli"} } }
/>

4 changes: 2 additions & 2 deletions docs/inference-providers/providers/nebius.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ Find out more about Chat Completion (LLM) [here](../tasks/chat-completion).

<InferenceSnippet
pipeline=text-generation
providersMapping={ {"nebius":{"modelId":"openai/gpt-oss-120b","providerModelId":"openai/gpt-oss-120b"} } }
providersMapping={ {"nebius":{"modelId":"openai/gpt-oss-20b","providerModelId":"openai/gpt-oss-20b"} } }
conversational />


Expand Down Expand Up @@ -80,7 +80,7 @@ Find out more about Text Generation [here](../tasks/text_generation).

<InferenceSnippet
pipeline=text-generation
providersMapping={ {"nebius":{"modelId":"openai/gpt-oss-120b","providerModelId":"openai/gpt-oss-120b"} } }
providersMapping={ {"nebius":{"modelId":"openai/gpt-oss-20b","providerModelId":"openai/gpt-oss-20b"} } }
/>


Expand Down
2 changes: 1 addition & 1 deletion docs/inference-providers/providers/replicate.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,6 @@ Find out more about Text To Video [here](../tasks/text_to_video).

<InferenceSnippet
pipeline=text-to-video
providersMapping={ {"replicate":{"modelId":"Wan-AI/Wan2.2-TI2V-5B","providerModelId":"wan-video/wan-2.2-5b-fast"} } }
providersMapping={ {"replicate":{"modelId":"Wan-AI/Wan2.2-T2V-A14B","providerModelId":"wan-video/wan-2.2-t2v-fast"} } }
/>

4 changes: 2 additions & 2 deletions docs/inference-providers/providers/together.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ Find out more about Chat Completion (LLM) [here](../tasks/chat-completion).

<InferenceSnippet
pipeline=text-generation
providersMapping={ {"together":{"modelId":"openai/gpt-oss-120b","providerModelId":"openai/gpt-oss-120b"} } }
providersMapping={ {"together":{"modelId":"openai/gpt-oss-20b","providerModelId":"OpenAI/gpt-oss-20B"} } }
conversational />


Expand All @@ -70,7 +70,7 @@ Find out more about Text Generation [here](../tasks/text_generation).

<InferenceSnippet
pipeline=text-generation
providersMapping={ {"together":{"modelId":"openai/gpt-oss-120b","providerModelId":"openai/gpt-oss-120b"} } }
providersMapping={ {"together":{"modelId":"openai/gpt-oss-20b","providerModelId":"OpenAI/gpt-oss-20B"} } }
/>


Expand Down
Loading