-
Notifications
You must be signed in to change notification settings - Fork 3
Description
@katamartin and I have been making progress in integrating the proxy into the data viewer. Our intention is to use the proxy for on-the-fly rechunking of datasets for visualization purposes. The results are looking promising and the performance is satisfactory (for small datasets and datasets hosted in AWS S3) even without caching on the backend
https://storage.googleapis.com/carbonplan-maps/ncview/demo/single_timestep/air_temperature.zarr
-
s3://carbonplan-data-viewer/demo/MURSST.zarr( the original chunk size is roughly ~ 1.21 GB)

-
retrieving data from stores hosted outside outside of S3 takes a long time (as expected). the following are timings for
gs://ldeo-glaciology/bedmachine/bm.zarr(the original chunk size is roughly ~ 35MB)
there's still more work to do to ensure seamless interoperability with existing zarr clients. To illustrate this, below is a code snippet that demonstrates how the proxy can be used via the zarr Python library.
- instantiate a zarr store via fsspec
In [21]: url = 'http://127.0.0.1:8000/storage.googleapis.com/ldeo-glaciology/bedmachine/bm.zarr'
In [22]: store = zarr.storage.FSStore(url, client_kwargs={'headers': {"chunks": "10,10"}})
In [23]: store['.zattrs']
Out[23]: b'{"Author":"Mathieu Morlighem","Conventions":"CF-1.7","Data_citation":"Morlighem M. et al., (2019), Deep glacial troughs and stabilizing ridges unveiled beneath the margins of the Antarctic ice sheet, Nature Geoscience (accepted)","Notes":"Data processed at the Department of Earth System Science, University of California, Irvine","Projection":"Polar Stereographic South (71S,0E)","Title":"BedMachine Antarctica","false_easting":[0.0],"false_northing":[0.0],"grid_mapping_name":"polar_stereographic","ice_density (kg m-3)":[917.0],"inverse_flattening":[298.2794050428205],"latitude_of_projection_origin":[-90.0],"license":"No restrictions on access or use","no_data":[-9999.0],"nx":[13333.0],"ny":[13333.0],"proj4":"+init=epsg:3031","sea_water_density (kg m-3)":[1027.0],"semi_major_axis":[6378273.0],"spacing":[500],"standard_parallel":[-71.0],"straight_vertical_longitude_from_pole":[0.0],"version":"05-Nov-2019 (v1.38)","xmin":[-3333000],"ymax":[3333000]}'- open an array within the zarr store
In [25]: arr = zarr.open(store, path='/bed')
In [27]: arr
Out[27]: <zarr.core.Array '/bed' (13333, 13333) float32>- retrieve some data
In [28]: arr[:10, :10]
Out[28]:
array([[-5914.538 , -5919.3955, -5924.865 , -5930.3765, -5935.8853,
-5941.0205, -5945.997 , -5950.359 , -5954.3784, -5958.045 ],
[-5910.384 , -5915.8296, -5921.3076, -5927.158 , -5932.7554,
-5938.29 , -5943.1704, -5947.785 , -5951.881 , -5955.54 ],
[-5906.422 , -5911.8516, -5917.63 , -5923.6133, -5929.573 ,
-5935.029 , -5940.271 , -5944.9736, -5949.237 , -5952.898 ],
[-5902.613 , -5908.093 , -5914.061 , -5920.044 , -5925.9707,
-5931.7017, -5937.0083, -5941.9688, -5946.243 , -5950.265 ],
[-5899.054 , -5904.7085, -5910.5 , -5916.532 , -5922.4585,
-5928.2095, -5933.64 , -5938.608 , -5943.3335, -5947.362 ],
[-5895.9683, -5901.283 , -5907.2 , -5913.2 , -5919.1235,
-5924.6836, -5930.077 , -5935.3584, -5940.0796, -5944.544 ],
[-5892.8423, -5898.332 , -5904.08 , -5910.0503, -5915.838 ,
-5921.344 , -5926.583 , -5931.785 , -5936.9224, -5941.452 ],
[-5890.067 , -5895.4604, -5901.1587, -5906.9365, -5912.6836,
-5918.2617, -5923.3687, -5928.1724, -5933.3447, -5937.538 ],
[-5887.37 , -5892.716 , -5898.2046, -5903.9224, -5909.691 ,
-5915.144 , -5920.3755, -5925.193 , -5928.876 , -5933.021 ],
[-5884.786 , -5890.015 , -5895.455 , -5900.958 , -5906.5366,
-5912.1353, -5917.4043, -5921.5264, -5925.1343, -5928.5483]],
dtype=float32)if we attempt to access a variable whose dimensionality does not match the specified chunks in the HTTP headers, it causes issues or failure
. for instance, in our store, x is 1D, and the chunks we specified earlier are 10,10 as defined in zarr.storage.FSStore(url, client_kwargs={'headers': {"chunks": "10,10"}})
In [29]: store['x/.zarray']
Out[29]: b'{"chunks":[10,10],"compressor":null,"dtype":"<i4","fill_value":null,"filters":[],"order":"C","shape":[13333],"zarr_format":2}'
In [30]: store['x/0']
---------------------------------------------------------------------------
ClientResponseError Traceback (most recent call last)
Cell In[30], line 1
----> 1 store['x/0']
ClientResponseError: 500, message='Internal Server Error', url=URL('http://127.0.0.1:8000/storage.googleapis.com/ldeo-glaciology/bedmachine/bm.zarr/x/0')It would be nice if there's a way to override the headers via fsspec.
I am also CC-ing some folks (@freeman-lab, @norlandrhagen, @jhamman, @rabernat) who might be interested in this, to keep them in the loop of our progress

