You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/hub/datasets-adding.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -63,11 +63,11 @@ Adding a Dataset card is super valuable for helping users find your dataset and
63
63
64
64
## Using the `huggingface_hub` client library
65
65
66
-
The rich features set in the `huggingface_hub` library allows you to manage repositories, including creating repos and uploading datasets to the Hub. Visit [the client library's documentation](https://huggingface.co/docs/huggingface_hub/index) to learn more.
66
+
The rich features set in the `huggingface_hub` library allows you to manage repositories, including creating repos and uploading datasets to the Hub. Visit [the client library's documentation](/docs/huggingface_hub/index) to learn more.
67
67
68
68
## Using other libraries
69
69
70
-
Some libraries like [🤗 Datasets](../datasets/index), [Pandas](https://pandas.pydata.org/), [Polars](https://pola.rs), [Dask](https://www.dask.org/) or [DuckDB](https://duckdb.org/) can upload files to the Hub.
70
+
Some libraries like [🤗 Datasets](/docs/datasets/index), [Pandas](https://pandas.pydata.org/), [Polars](https://pola.rs), [Dask](https://www.dask.org/) or [DuckDB](https://duckdb.org/) can upload files to the Hub.
71
71
See the list of [Libraries supported by the Datasets Hub](./datasets-libraries) for more information.
72
72
73
73
## Using Git
@@ -107,8 +107,8 @@ After uploading your dataset, make sure the Dataset Viewer correctly shows your
107
107
108
108
## Large scale datasets
109
109
110
-
The Hugging Face Hub supports large scale datasets, usually uploaded in Parquet (e.g. via `push_to_hub()` using [🤗 Datasets](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Dataset.push_to_hub)) or [WebDataset](https://github.com/webdataset/webdataset) format.
110
+
The Hugging Face Hub supports large scale datasets, usually uploaded in Parquet (e.g. via `push_to_hub()` using [🤗 Datasets](/docs/datasets/main/en/package_reference/main_classes#datasets.Dataset.push_to_hub)) or [WebDataset](https://github.com/webdataset/webdataset) format.
111
111
112
112
You can upload large scale datasets at high speed using the `huggingface-hub` library.
113
113
114
-
See [how to upload a folder by chunks](https://huggingface.co/docs/huggingface_hub/guides/upload#upload-a-folder-by-chunks), the [tips and tricks for large uploads](https://huggingface.co/docs/huggingface_hub/guides/upload#tips-and-tricks-for-large-uploads) and the [repository limitations and recommendations](./repositories-recommendations).
114
+
See [how to upload a folder by chunks](/docs/huggingface_hub/guides/upload#upload-a-folder-by-chunks), the [tips and tricks for large uploads](/docs/huggingface_hub/guides/upload#tips-and-tricks-for-large-uploads) and the [repository limitations and recommendations](./repositories-recommendations).
Copy file name to clipboardExpand all lines: docs/hub/datasets-dask.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,23 +1,23 @@
1
1
# Dask
2
2
3
3
[Dask](https://github.com/dask/dask) is a parallel and distributed computing library that scales the existing Python and PyData ecosystem.
4
-
Since it uses [fsspec](https://filesystem-spec.readthedocs.io) to read and write remote data, you can use the Hugging Face paths ([`hf://`](https://huggingface.co/docs/huggingface_hub/guides/hf_file_system#integrations)) to read and write data on the Hub:
4
+
Since it uses [fsspec](https://filesystem-spec.readthedocs.io) to read and write remote data, you can use the Hugging Face paths ([`hf://`](/docs/huggingface_hub/guides/hf_file_system#integrations)) to read and write data on the Hub:
5
5
6
-
First you need to [Login with your Hugging Face account](../huggingface_hub/quick-start#login), for example using:
6
+
First you need to [Login with your Hugging Face account](/docs/huggingface_hub/quick-start#login), for example using:
7
7
8
8
```
9
9
huggingface-cli login
10
10
```
11
11
12
-
Then you can [Create a dataset repository](../huggingface_hub/quick-start#create-a-repository), for example using:
12
+
Then you can [Create a dataset repository](/docs/huggingface_hub/quick-start#create-a-repository), for example using:
For more information on the Hugging Face paths and how they are implemented, please refer to the [the client library's documentation on the HfFileSystem](https://huggingface.co/docs/huggingface_hub/guides/hf_file_system).
47
+
For more information on the Hugging Face paths and how they are implemented, please refer to the [the client library's documentation on the HfFileSystem](/docs/huggingface_hub/guides/hf_file_system).
Copy file name to clipboardExpand all lines: docs/hub/datasets-data-files-configuration.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ Often it is as simple as naming your data files according to their split names,
7
7
8
8
## What are splits and subsets?
9
9
10
-
Machine learning datasets typically have splits and may also have subsets. A dataset is generally made of _splits_ (e.g. `train` and `test`) that are used during different stages of training and evaluating a model. A _subset_ (also called _configuration_) is a sub-dataset contained within a larger dataset. Subsets are especially common in multilingual speech datasets where there may be a different subset for each language. If you're interested in learning more about splits and subsets, check out the [Splits and subsets](https://huggingface.co/docs/datasets-server/configs_and_splits) guide!
10
+
Machine learning datasets typically have splits and may also have subsets. A dataset is generally made of _splits_ (e.g. `train` and `test`) that are used during different stages of training and evaluating a model. A _subset_ (also called _configuration_) is a sub-dataset contained within a larger dataset. Subsets are especially common in multilingual speech datasets where there may be a different subset for each language. If you're interested in learning more about splits and subsets, check out the [Splits and subsets](/docs/datasets-server/configs_and_splits) guide!
Copy file name to clipboardExpand all lines: docs/hub/datasets-download-stats.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,5 +4,5 @@
4
4
5
5
The Hub provides download stats for all datasets loadable via the `datasets` library. To determine the number of downloads, the Hub counts every time `load_dataset` is called in Python, excluding Hugging Face's CI tooling on GitHub. No information is sent from the user, and no additional calls are made for this. The count is done server-side as we serve files for downloads. This means that:
6
6
7
-
* The download count is the same regardless of whether the data is directly stored on the Hub repo or if the repository has a [script](https://huggingface.co/docs/datasets/dataset_script) to load the data from an external source.
7
+
* The download count is the same regardless of whether the data is directly stored on the Hub repo or if the repository has a [script](/docs/datasets/dataset_script) to load the data from an external source.
8
8
* If a user manually downloads the data using tools like `wget` or the Hub's user interface (UI), those downloads will not be included in the download count.
Copy file name to clipboardExpand all lines: docs/hub/datasets-downloading.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ If a dataset on the Hub is tied to a [supported library](./datasets-libraries),
16
16
17
17
## Using the Hugging Face Client Library
18
18
19
-
You can use the [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub) library to create, delete, update and retrieve information from repos. You can also download files from repos or integrate them into your library! For example, you can quickly load a CSV dataset with a few lines using Pandas.
19
+
You can use the [`huggingface_hub`](/docs/huggingface_hub) library to create, delete, update and retrieve information from repos. You can also download files from repos or integrate them into your library! For example, you can quickly load a CSV dataset with a few lines using Pandas.
This command automatically retrieves the stored token from `~/.cache/huggingface/token`.
33
33
34
-
First you need to [Login with your Hugging Face account](../huggingface_hub/quick-start#login), for example using:
34
+
First you need to [Login with your Hugging Face account](/docs/huggingface_hub/quick-start#login), for example using:
35
35
36
36
```bash
37
37
huggingface-cli login
@@ -43,4 +43,4 @@ Alternatively, you can set your Hugging Face token as an environment variable:
43
43
export HF_TOKEN="hf_xxxxxxxxxxxxx"
44
44
```
45
45
46
-
For more information on authentication, see the [Hugging Face authentication](https://huggingface.co/docs/huggingface_hub/main/en/quick-start#authentication) documentation.
46
+
For more information on authentication, see the [Hugging Face authentication](/docs/huggingface_hub/main/en/quick-start#authentication) documentation.
Copy file name to clipboardExpand all lines: docs/hub/datasets-duckdb-select.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -49,7 +49,7 @@ SELECT COUNT(*) FROM 'hf://datasets/jamescalam/world-cities-geo/*.jsonl';
49
49
50
50
```
51
51
52
-
You can also query Parquet files using the `read_parquet` function (or its alias `parquet_scan`). This function, along with other [parameters]((https://duckdb.org/docs/data/parquet/overview.html#parameters)), provides flexibility in handling Parquet files specially if they dont have a `.parquet` extension. Let's explore these functions using the auto-converted Parquet files from the same dataset.
52
+
You can also query Parquet files using the `read_parquet` function (or its alias `parquet_scan`). This function, along with other [parameters](https://duckdb.org/docs/data/parquet/overview.html#parameters), provides flexibility in handling Parquet files specially if they dont have a `.parquet` extension. Let's explore these functions using the auto-converted Parquet files from the same dataset.
53
53
54
54
Select using [read_parquet](https://duckdb.org/docs/guides/file_formats/query_parquet.html) function:
0 commit comments