Skip to content

[OpenDrift] Request for smaller chunks #46

@zachsa

Description

@zachsa

Thanks very much for including the JSON kerchunk output. If it's not too much trouble, please can you specify smaller chunks in the NetCDF file.

Looking at the output at https://mnemosyne.somisana.ac.za/somisana/opendrift/20230904/test_east_coast_blowout, I see in the JSON file that the smallest retrievable chunk is 3.7MB. If I remember specifying chunk sizes for tier3 output there is a tradeoff between write-speed and chunk size (more chunks = longer to create the NetCDF file).

It's easy to do: https://github.com/SAEON/somisana/blob/stable/toolkit/cli/applications/croco/regrid_tier3/__init__.py#L258-L289

# Explicitly set chunk sizes of some dimensions
chunksizes = {
    "time": 24,
    "depth": 1,
}

# For data_vars, set chunk sizes for each dimension
# This is either the override specified in "chunksizes"
# or the length of the dimension
default_chunksizes = {dim: len(data_out[dim]) for dim in data_out.dims}

encoding = {
    var: {
        "dtype": "float32", 
        "chunksizes": [chunksizes.get(dim, default_chunksizes[dim]) for dim in data_out[var].dims]
    }
    for var in data_out.data_vars
}

# Adjust for non-chunked variables - I can't remember why this doesn't override the 'chunksizes' array above
encoding["time"] = {"dtype": "i4"}
encoding['latitude'] = {"dtype": "float32"}
encoding['longitude'] = {"dtype": "float32"}
encoding['depth'] = {"dtype": "float32"}

log("Generating NetCDF data")
write_op = data_out.to_netcdf(
    output,
    encoding=encoding,
    mode="w",
    compute=False,
)

Here is an example of the output: https://mnemosyne.somisana.ac.za/somisana/algoa-bay/5-day-forecast/202309/20230906_hourly_avg_t3.kerchunk.json

I can make a get request to get all salt values at a particular depth/time for all lat/longs.

image

(The salt is defined as [time, depth, lat, long] - "salt/.zattrs": "{\"_ARRAY_DIMENSIONS\":[\"time\",\"depth\",\"latitude\",\"longitude\"],\"long_name\":\"averaged salinity\",\"standard_name\":\"sea_water_salinity\",\"units\":\"PSU\"}" )

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions