Skip to content

Persisting arrays on disk using zarr backend does not encode data using scale_factor/offset #495

@vlevasseur073

Description

@vlevasseur073

Hello, I am processing data stored in zarr format. Most of it are stored on disk encoded following CF conventions. For instance I have a YAXArray

┌ 1500×1200 YAXArray{Union{Missing, Float64}, 2} ┐
├────────────────────────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────── dims ┐
  ↓ columns Sampled{Int64} 1:1500 ForwardOrdered Regular Points,
  → rows    Sampled{Int64} 1:1200 ForwardOrdered Regular Points
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any} with 10 entries:
  "units"         => "K"
  "name"          => "s7_bt_in"
  "coordinates"   => "latitude longitude x y"
  "short_name"    => "s7_bt_in"
  "add_offset"    => 283.73
  "long_name"     => "gridded pixel brightness temperature for channel s7 (1km TIR grid, nadir view)"
  "missing_value" => -32768
  "scale_factor"  => 0.01
  "standard_name" => "toa_brightness_temperature"
  "_FillValue"    => -32768
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── loaded lazily ┤
  data size: 13.73 MB
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

The underlying data is a CFDiskArray highlighting that the original zarray is Int16

1500×1200 DiskArrayTools.CFDiskArray{Union{Missing, Float64}, 2, Int16, ZArray{Int16, 2, Zarr.BloscCompressor, Zarr.ConsolidatedStore{DirectoryStore}}, Float64}

Chunked: (
    [1500]
    [1200]
)

When I save the data on disk using savecube or savedataset the array is stored as float64, and not encoded/packed in Int16.

For information, the original YAXArray is created with open_dataset from an existing zarr file and correctly decode/unpack the data. At last, surprisingly (or not), when I read again the persisted file (stored in float64), the scale_factor is not applied twice.

Am I missing any options there ? since DiskArrayTools seems to implement such mechanisms at reading/writing.

Many thanks in advance for your feedbacks!
Vincent

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions