-
Notifications
You must be signed in to change notification settings - Fork 2
Description
We are conducting disruption prediction research using our framework analysis on MAST Mirnov coil data, which relies on derivative-based metrics (instantaneous frequency, growth rates). This makes our work particularly sensitive to data resolution.
Summary
Level2 zarr data for MAST magnetics diagnostics (Mirnov coils, Saddle loops) appears to be quantized to approximately 6 effective bits of resolution, despite being stored as float64. This makes derivative-based analyses unreliable, as quantization steps generate artificial spikes. Level1 raw ADC data does not have this problem (~14.5 effective bits), so the issue appears to be in the Level2 processing pipeline.
Affected Data
- Archive path:
s3://mast/level2/vias3.echo.stfc.ac.uk - Confirmed on: Shot 29980 (
magnetics/b_field_pol_probe_cc_field, shape 39x269200, dtype float64) - Likely scope: Unknown how many shots are affected; we tested one in detail
Reproduction
# Download shot 29980
s5cmd --no-sign-request --endpoint-url https://s3.echo.stfc.ac.uk \
cp 's3://mast/level2/shots/29980.zarr/*' 29980.zarr/import zarr, numpy as np
store = zarr.open('29980.zarr/magnetics', 'r')
signal = store['b_field_pol_probe_cc_field'][0, :] # Mirnov channel 0
# Check unique values in a 2ms window (1000 samples at 500kHz)
window = signal[182000:183000] # ~253-255ms
unique = np.unique(window)
print(f"Unique values: {len(unique)}") # Expected: hundreds. Observed: 7-9
print(f"Min step: {np.min(np.diff(np.sort(unique)))}") # Observed: 4.375e-08 T (uniform grid)
# Full-signal analysis
all_unique = np.unique(signal)
diffs = np.diff(np.sort(all_unique))
min_step = np.min(diffs[diffs > 0])
dynamic_range = np.max(signal) - np.min(signal)
effective_bits = np.log2(dynamic_range / min_step)
print(f"Effective bits: {effective_bits:.1f}") # Observed: 6.1Quantitative Evidence
| Metric | Expected (float64) | Observed |
|---|---|---|
| Unique values per 2ms window | Hundreds+ | Median 9, range 7-35 |
| Effective bit depth | 52+ (mantissa) | 6.1 |
| Shannon entropy per window | >20 bits | Median 2.39 bits |
| Quantization step | Variable | Uniform 43.7 nT grid |
| Windows failing 16-unique threshold | ~0% | 98.3% (170/173) |
The uniform step size (4.375e-08 T ≈ 2⁻²⁴·⁵) across the entire signal confirms this is a digitization/processing artifact, not noise.
Impact
Any analysis computing derivatives on this data (dB/dt, growth rates, instantaneous frequency via Hilbert transform) will produce artificial spikes at quantization boundaries. We initially identified what appeared to be physically meaningful "bouncing ball" patterns in the pre-disruption window — these turned out to be pure quantization artifacts.
Workaround
Level1 raw ADC data for the same shots shows ~14.5 effective bits and does not exhibit this problem. We migrated our analysis pipeline to Level1 and the artifacts disappeared.
Additional Observation: Saddle Loop NaN Rates
On the same shot, Saddle loop data (b_field_tor_probe_saddle_field) has 56% NaN values (180,872 / 323,040), rising to 60% in the analysis window around the disruption. This may be a separate issue but worth flagging.
Environment
- Access method: s5cmd v2.x with
--no-sign-request --endpoint-url https://s3.echo.stfc.ac.uk - Analysis: Python 3.9, zarr, numpy, scipy
- Date of access: December 2025 – January 2026
Questions for the Team
- Is the Level2 quantization a known limitation of the processing pipeline?
- Is this specific to certain shot ranges or diagnostic channels?
- Should downstream users default to Level1 data for quantitative analysis?
- Is the Saddle loop NaN rate expected for shot 29980?
Thank you for maintaining this archive — it's a valuable resource and we want to make sure the community is aware of this limitation for derivative-sensitive analyses.