Skip to content

Commit 9ab4248

Browse files
Merge pull request #2707 from AI-Hypercomputer:aireen/arrayrecord_doc
PiperOrigin-RevId: 833568709
2 parents 8104f65 + 7da33be commit 9ab4248

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

docs/guides/data_input_pipeline/data_input_grain.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ Grain ensures determinism in data input pipelines by saving the pipeline's state
2929

3030
## Using Grain
3131
1. Grain currently supports two data formats: [ArrayRecord](https://github.com/google/array_record) (random access) and [Parquet](https://arrow.apache.org/docs/python/parquet.html) (partial random-access through row groups). Only the ArrayRecord format supports the global shuffle mentioned above. For converting a dataset into ArrayRecord, see [Apache Beam Integration for ArrayRecord](https://github.com/google/array_record/tree/main/beam). Additionally, other random access data sources can be supported via a custom [data source](https://google-grain.readthedocs.io/en/latest/data_sources.html) class.
32+
* **Community Resource**: The MaxText community has created a [ArrayRecord Documentation](https://array-record.readthedocs.io/). Note: we appreciate the contribution from the community, but as of now it has not been verified by the MaxText or ArrayRecord developers yet.
3233
2. When the dataset is hosted on a Cloud Storage bucket, Grain can read it through [Cloud Storage FUSE](https://cloud.google.com/storage/docs/gcs-fuse). The installation of Cloud Storage FUSE is included in [setup.sh](https://github.com/google/maxtext/blob/main/tools/setup/setup.sh). The user then needs to mount the Cloud Storage bucket to a local path for each worker, using the script [setup_gcsfuse.sh](https://github.com/google/maxtext/blob/main/tools/setup/setup_gcsfuse.sh). The script configures some parameters for the mount.
3334
```sh
3435
bash tools/setup/setup_gcsfuse.sh \

0 commit comments

Comments
 (0)