You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/hns_buckets.rst
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -53,7 +53,7 @@ Important Differences to Keep in Mind
53
53
While ``gcsfs`` aims to abstract the differences via the ``fsspec`` API, you should be aware of standard HNS limitations imposed by the Google Cloud Storage API:
54
54
55
55
1. **Implicit directories:** In standard GCS, you can create an object ``a/b/c.txt`` without the directories ``a/`` or ``a/b/`` physically existing. In HNS, the parent folder resources must exist (or be created) before the object can be written. ``gcsfs`` handles parent folder creation natively under the hood.
56
-
2. **``mkdir`` behavior:** Previously, in a flat namespace, calling ``mkdir`` on a path could only ensure the underlying bucket exists. With HNS enabled, calling ``mkdir`` will create an actual folder resource in GCS. Furthermore, if you want to create nested folders (eg: bucket/a/b/c/d) pass ``create_parents=True``, it will physically create all intermediate folder resources along the specified path.
56
+
2. **``mkdir`` behavior:** Previously, in a flat namespace, calling ``mkdir`` on a path could only ensure the underlying bucket exists. With HNS enabled, calling ``mkdir`` will create an actual folder resource in GCS. Furthermore, if you want to create nested folders (eg: bucket/a/b/c/d), pass ``create_parents=True``, it will physically create all intermediate folder resources along the specified path.
57
57
3. **No mixing or toggling:** You cannot toggle HNS on an existing flat-namespace bucket. You must create a new HNS bucket and migrate your data.
58
58
4. **Object naming:** Object names in HNS cannot end with a slash (``/``) unless without the creation of physical folder resources.
Copy file name to clipboardExpand all lines: docs/source/rapid_storage_support.rst
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -99,14 +99,14 @@ The table below highlights how core filesystem and file-level operations change
99
99
- Closes streams but leaves the object unfinalized (appendable) by default. Use ``finalize_on_close=True`` when opening file or calling ``close()`` or use ``.commit()`` to finalize. Note that ``autocommit`` does not work for Rapid buckets.
100
100
* - **mv**
101
101
- Object-level copy-and-delete logic.
102
-
- Uses native, atomic ``rename_folder`` API for folders. All directory semantics described in the :doc:`HNS documentation <hns_buckets>` also apply For Rapid.
102
+
- Uses native, atomic ``rename_folder`` API for folders. All directory semantics described in the :doc:`HNS documentation <hns_buckets>` also apply for Rapid.
103
103
104
104
Performance Benchmarks
105
105
----------------------
106
106
107
107
Rapid Storage via gRPC significantly improves read and write performance compared to standard HTTP regional buckets.
108
-
Here are the microbenchmarks
109
-
Rapid drastically outperform standard buckets across different read patterns, including both sequential and random reads, as well as for writes.
108
+
Here are the microbenchmarks.
109
+
Rapid drastically outperforms standard buckets across different read patterns, including both sequential and random reads, as well as for writes.
110
110
To reproduce using more combinations, please see the `gcsfs/perf/microbenchmarks <https://github.com/fsspec/gcsfs/tree/main/gcsfs/tests/perf/microbenchmarks>`_ directory.
111
111
112
112
.. list-table:: **Sequential Reads**
@@ -182,11 +182,11 @@ Because `gcsfs` relies on gRPC to interact with Rapid storage, developers must b
182
182
However, gRPC Python wraps gRPC core, which uses internal multithreading for performance, and hence doesn't support `fork()`.
183
183
Using `fork()` for multi-processing can lead to hangs or segmentation faults when child processes attempt to use the network layer
184
184
where the application creates gRPC Python objects (e.g., client channel)before invoking `fork()`. However, if the application only
185
-
instantiate gRPC Python objects after calling `fork()`, then `fork()` will work normally, since there is no C extension binding at this point.
185
+
instantiates gRPC Python objects after calling `fork()`, then `fork()` will work normally, since there is no C extension binding at this point.
186
186
187
187
**Alternative: Use `forkserver` or `spawn` instead of `fork`**
188
188
189
-
To resolve `fork` issue, you can use `forkserver` or `spawn` instead of `fork` where the child process will create their own grpc connection.
189
+
To resolve the `fork` issue, you can use `forkserver` or `spawn` instead of `fork` where the child processes will create their own gRPC connections.
190
190
You can configure Python's `multiprocessing` module to override the start method as shown in the snippet below.
191
191
For example while using data loaders in frameworks like PyTorch
192
192
(e.g., `torch.utils.data.DataLoader` with `num_workers > 0`) alongside `gcsfs` with Rapid storage:
@@ -198,7 +198,7 @@ For example while using data loaders in frameworks like PyTorch
198
198
# This must be done before other imports or initialization
0 commit comments