Skip to content

Level2 magnetics data severely quantized (~6 effective bits despite float64 storage) #211

@RHH-CQER

Description

@RHH-CQER

We are conducting disruption prediction research using our framework analysis on MAST Mirnov coil data, which relies on derivative-based metrics (instantaneous frequency, growth rates). This makes our work particularly sensitive to data resolution.

Summary

Level2 zarr data for MAST magnetics diagnostics (Mirnov coils, Saddle loops) appears to be quantized to approximately 6 effective bits of resolution, despite being stored as float64. This makes derivative-based analyses unreliable, as quantization steps generate artificial spikes. Level1 raw ADC data does not have this problem (~14.5 effective bits), so the issue appears to be in the Level2 processing pipeline.

Affected Data

  • Archive path: s3://mast/level2/ via s3.echo.stfc.ac.uk
  • Confirmed on: Shot 29980 (magnetics/b_field_pol_probe_cc_field, shape 39x269200, dtype float64)
  • Likely scope: Unknown how many shots are affected; we tested one in detail

Reproduction

# Download shot 29980
s5cmd --no-sign-request --endpoint-url https://s3.echo.stfc.ac.uk \
  cp 's3://mast/level2/shots/29980.zarr/*' 29980.zarr/
import zarr, numpy as np

store = zarr.open('29980.zarr/magnetics', 'r')
signal = store['b_field_pol_probe_cc_field'][0, :]  # Mirnov channel 0

# Check unique values in a 2ms window (1000 samples at 500kHz)
window = signal[182000:183000]  # ~253-255ms
unique = np.unique(window)
print(f"Unique values: {len(unique)}")       # Expected: hundreds. Observed: 7-9
print(f"Min step: {np.min(np.diff(np.sort(unique)))}")  # Observed: 4.375e-08 T (uniform grid)

# Full-signal analysis
all_unique = np.unique(signal)
diffs = np.diff(np.sort(all_unique))
min_step = np.min(diffs[diffs > 0])
dynamic_range = np.max(signal) - np.min(signal)
effective_bits = np.log2(dynamic_range / min_step)
print(f"Effective bits: {effective_bits:.1f}")  # Observed: 6.1

Quantitative Evidence

Metric Expected (float64) Observed
Unique values per 2ms window Hundreds+ Median 9, range 7-35
Effective bit depth 52+ (mantissa) 6.1
Shannon entropy per window >20 bits Median 2.39 bits
Quantization step Variable Uniform 43.7 nT grid
Windows failing 16-unique threshold ~0% 98.3% (170/173)

The uniform step size (4.375e-08 T ≈ 2⁻²⁴·⁵) across the entire signal confirms this is a digitization/processing artifact, not noise.

Impact

Any analysis computing derivatives on this data (dB/dt, growth rates, instantaneous frequency via Hilbert transform) will produce artificial spikes at quantization boundaries. We initially identified what appeared to be physically meaningful "bouncing ball" patterns in the pre-disruption window — these turned out to be pure quantization artifacts.

Workaround

Level1 raw ADC data for the same shots shows ~14.5 effective bits and does not exhibit this problem. We migrated our analysis pipeline to Level1 and the artifacts disappeared.

Additional Observation: Saddle Loop NaN Rates

On the same shot, Saddle loop data (b_field_tor_probe_saddle_field) has 56% NaN values (180,872 / 323,040), rising to 60% in the analysis window around the disruption. This may be a separate issue but worth flagging.

Environment

  • Access method: s5cmd v2.x with --no-sign-request --endpoint-url https://s3.echo.stfc.ac.uk
  • Analysis: Python 3.9, zarr, numpy, scipy
  • Date of access: December 2025 – January 2026

Questions for the Team

  1. Is the Level2 quantization a known limitation of the processing pipeline?
  2. Is this specific to certain shot ranges or diagnostic channels?
  3. Should downstream users default to Level1 data for quantitative analysis?
  4. Is the Saddle loop NaN rate expected for shot 29980?

Thank you for maintaining this archive — it's a valuable resource and we want to make sure the community is aware of this limitation for derivative-sensitive analyses.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions