Skip to content

Commit 3a7229d

Browse files
author
Martin Durant
committed
Add fsspec explicitly to tutorial
1 parent 5c52819 commit 3a7229d

File tree

2 files changed

+35
-4
lines changed

2 files changed

+35
-4
lines changed

docs/tutorial.rst

Lines changed: 33 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -854,13 +854,44 @@ please raise an issue on the GitHub issue tracker with any profiling data you
854854
can provide, as there may be opportunities to optimise further either within
855855
Zarr or within the mapping interface to the storage.
856856

857+
IO with ``fsspec``
858+
~~~~~~~~~~~~~~~~~~
859+
860+
As of version 2.5, zarr supports passing URLs directly to `fsspec`_,
861+
and having it create the "mapping" instance automatically. This means, that
862+
for all of the backend storage implementations `supported by fsspec`_,
863+
you can skip importing and configuring the storage explicitly.
864+
For example::
865+
866+
>>> g = zarr.open_group("s3://zarr-demo/store", storage_options={'anon': True})
867+
>>> g['foo/bar/baz'][:].tobytes()
868+
b'Hello from the cloud!'
869+
870+
The provision of the protocol specifier "s3://" will select the correct backend.
871+
Notice the kwargs ``storage_options``, used to pass parameters to that backend.
872+
873+
As of version 2.6, write mode and complex URLs are also supported, such that in::
874+
875+
>>> g = zarr.open_group("simplecache::s3://zarr-demo/store",
876+
... storage_options={"s3": {'anon': True}})
877+
>>> g['foo/bar/baz'][:].tobytes() # downloads target file
878+
b'Hello from the cloud!'
879+
>>> g['foo/bar/baz'][:].tobytes() # uses cached file
880+
b'Hello from the cloud!'
881+
882+
The second invocation here will be much faster. Note that the ``storage_options``
883+
have become more complex here, to account for the two parts of the supplied
884+
URL.
885+
886+
.. _fsspec: https://filesystem-spec.readthedocs.io/en/latest/
887+
888+
.. _supported by fsspec: https://filesystem-spec.readthedocs.io/en/latest/api.html#built-in-implementations
889+
857890
.. _tutorial_copy:
858891

859892
Consolidating metadata
860893
~~~~~~~~~~~~~~~~~~~~~~
861894

862-
(This is an experimental feature.)
863-
864895
Since there is a significant overhead for every connection to a cloud object
865896
store such as S3, the pattern described in the previous section may incur
866897
significant latency while scanning the metadata of the array hierarchy, even

zarr/storage.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1021,14 +1021,14 @@ def __init__(self, url, normalize_keys=True, key_separator='.',
10211021
exceptions=(KeyError, PermissionError, IOError),
10221022
**storage_options):
10231023
import fsspec
1024-
self.path = url
10251024
self.normalize_keys = normalize_keys
10261025
self.key_separator = key_separator
10271026
self.map = fsspec.get_mapper(url, **storage_options)
10281027
self.fs = self.map.fs # for direct operations
1028+
self.path = self.fs._strip_protocol(url)
10291029
self.mode = mode
10301030
self.exceptions = exceptions
1031-
if self.fs.exists(url) and not self.fs.isdir(url):
1031+
if self.fs.exists(self.path) and not self.fs.isdir(self.path):
10321032
raise FSPathExistNotDir(url)
10331033

10341034
def _normalize_key(self, key):

0 commit comments

Comments
 (0)