-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Thanks very much for including the JSON kerchunk output. If it's not too much trouble, please can you specify smaller chunks in the NetCDF file.
Looking at the output at https://mnemosyne.somisana.ac.za/somisana/opendrift/20230904/test_east_coast_blowout, I see in the JSON file that the smallest retrievable chunk is 3.7MB. If I remember specifying chunk sizes for tier3 output there is a tradeoff between write-speed and chunk size (more chunks = longer to create the NetCDF file).
It's easy to do: https://github.com/SAEON/somisana/blob/stable/toolkit/cli/applications/croco/regrid_tier3/__init__.py#L258-L289
# Explicitly set chunk sizes of some dimensions
chunksizes = {
"time": 24,
"depth": 1,
}
# For data_vars, set chunk sizes for each dimension
# This is either the override specified in "chunksizes"
# or the length of the dimension
default_chunksizes = {dim: len(data_out[dim]) for dim in data_out.dims}
encoding = {
var: {
"dtype": "float32",
"chunksizes": [chunksizes.get(dim, default_chunksizes[dim]) for dim in data_out[var].dims]
}
for var in data_out.data_vars
}
# Adjust for non-chunked variables - I can't remember why this doesn't override the 'chunksizes' array above
encoding["time"] = {"dtype": "i4"}
encoding['latitude'] = {"dtype": "float32"}
encoding['longitude'] = {"dtype": "float32"}
encoding['depth'] = {"dtype": "float32"}
log("Generating NetCDF data")
write_op = data_out.to_netcdf(
output,
encoding=encoding,
mode="w",
compute=False,
)
Here is an example of the output: https://mnemosyne.somisana.ac.za/somisana/algoa-bay/5-day-forecast/202309/20230906_hourly_avg_t3.kerchunk.json
I can make a get request to get all salt values at a particular depth/time for all lat/longs.
(The salt is defined as [time, depth, lat, long] - "salt/.zattrs": "{\"_ARRAY_DIMENSIONS\":[\"time\",\"depth\",\"latitude\",\"longitude\"],\"long_name\":\"averaged salinity\",\"standard_name\":\"sea_water_salinity\",\"units\":\"PSU\"}" )
