Skip to content

Zarr chunk corruption in Level1 magnetics data for 11xxx/12xxx series shots #212

@RHH-CQER

Description

@RHH-CQER

We are conducting disruption prediction research using MAST Mirnov coil data (see also #211 re: Level2 quantization).

Summary

Approximately 18 shots from the 11xxx and 12xxx series have zarr chunk size mismatches in their Level1 magnetics data, making them unreadable. The corruption is in the archive itself — re-downloading does not resolve it. Shot 46943 is also missing magnetics data entirely.

Affected Shots

From a batch of 23 shots we attempted to download and process, these failed:

11xxx series (chunk size mismatch):
11779, 11848, 11849, 11852, 11855, 11860, 11897, 11957

12xxx series (similar corruption):
12144, 12167, 12185, 12230, 12291, 12441, 12442, 12444, 12457

Missing magnetics entirely:
46943

Successfully loaded:
12091, 12438, 12455, 29975, 29980, 29981

Failure Mode

When opening the zarr stores, the declared chunk shape in .zarray metadata does not match the actual stored chunk sizes. This causes zarr to raise errors on read. The issue persists across fresh downloads, confirming it is upstream.

Reproduction

# Download a known-affected shot
s5cmd --no-sign-request --endpoint-url https://s3.echo.stfc.ac.uk \
  cp 's3://mast/level1/shots/11779.zarr/amb/ccbv01/*' 11779_level1/amb/ccbv01/

# Attempt to read
python -c "
import zarr
store = zarr.open('11779_level1/amb/ccbv01', 'r')
print(store[:])  # Expected: chunk mismatch error
"

Impact

We were attempting to validate a disruption prediction method across a broad shot range. The 18 failed shots reduced our usable dataset from 24 to 6, limiting statistical power. The 11xxx series shots would have been particularly valuable as they span a different operational period.

Questions

  1. Is the chunk corruption a known issue for these shot ranges?
  2. Are there plans to rebuild/repair these zarr stores?
  3. Is there an alternative path to access the underlying data for these shots (e.g. different archive format or direct request)?
  4. Is shot 46943 expected to have no magnetics data, or is that also corruption?

Environment

  • Access method: s5cmd v2.x via s3.echo.stfc.ac.uk
  • Reader: Python zarr library
  • Date of access: December 2025 – January 2026

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions