-
Notifications
You must be signed in to change notification settings - Fork 2
Description
We are conducting disruption prediction research using MAST Mirnov coil data (see also #211 re: Level2 quantization).
Summary
Approximately 18 shots from the 11xxx and 12xxx series have zarr chunk size mismatches in their Level1 magnetics data, making them unreadable. The corruption is in the archive itself — re-downloading does not resolve it. Shot 46943 is also missing magnetics data entirely.
Affected Shots
From a batch of 23 shots we attempted to download and process, these failed:
11xxx series (chunk size mismatch):
11779, 11848, 11849, 11852, 11855, 11860, 11897, 11957
12xxx series (similar corruption):
12144, 12167, 12185, 12230, 12291, 12441, 12442, 12444, 12457
Missing magnetics entirely:
46943
Successfully loaded:
12091, 12438, 12455, 29975, 29980, 29981
Failure Mode
When opening the zarr stores, the declared chunk shape in .zarray metadata does not match the actual stored chunk sizes. This causes zarr to raise errors on read. The issue persists across fresh downloads, confirming it is upstream.
Reproduction
# Download a known-affected shot
s5cmd --no-sign-request --endpoint-url https://s3.echo.stfc.ac.uk \
cp 's3://mast/level1/shots/11779.zarr/amb/ccbv01/*' 11779_level1/amb/ccbv01/
# Attempt to read
python -c "
import zarr
store = zarr.open('11779_level1/amb/ccbv01', 'r')
print(store[:]) # Expected: chunk mismatch error
"Impact
We were attempting to validate a disruption prediction method across a broad shot range. The 18 failed shots reduced our usable dataset from 24 to 6, limiting statistical power. The 11xxx series shots would have been particularly valuable as they span a different operational period.
Questions
- Is the chunk corruption a known issue for these shot ranges?
- Are there plans to rebuild/repair these zarr stores?
- Is there an alternative path to access the underlying data for these shots (e.g. different archive format or direct request)?
- Is shot 46943 expected to have no magnetics data, or is that also corruption?
Environment
- Access method: s5cmd v2.x via
s3.echo.stfc.ac.uk - Reader: Python zarr library
- Date of access: December 2025 – January 2026
Related
- Level2 magnetics data severely quantized (~6 effective bits despite float64 storage) #211 — Level2 quantization issue (separate problem, same diagnostic)