Level2 magnetics data severely quantized (~6 effective bits despite float64 storage)

We are conducting disruption prediction research using our framework analysis on MAST Mirnov coil data, which relies on derivative-based metrics (instantaneous frequency, growth rates). This makes our work particularly sensitive to data resolution.

### Summary

Level2 zarr data for MAST magnetics diagnostics (Mirnov coils, Saddle loops) appears to be quantized to approximately 6 effective bits of resolution, despite being stored as float64. This makes derivative-based analyses unreliable, as quantization steps generate artificial spikes. Level1 raw ADC data does not have this problem (~14.5 effective bits), so the issue appears to be in the Level2 processing pipeline.

### Affected Data

- **Archive path:** `s3://mast/level2/` via `s3.echo.stfc.ac.uk`
- **Confirmed on:** Shot 29980 (`magnetics/b_field_pol_probe_cc_field`, shape 39x269200, dtype float64)
- **Likely scope:** Unknown how many shots are affected; we tested one in detail

### Reproduction

```bash
# Download shot 29980
s5cmd --no-sign-request --endpoint-url https://s3.echo.stfc.ac.uk \
  cp 's3://mast/level2/shots/29980.zarr/*' 29980.zarr/
```

```python
import zarr, numpy as np

store = zarr.open('29980.zarr/magnetics', 'r')
signal = store['b_field_pol_probe_cc_field'][0, :]  # Mirnov channel 0

# Check unique values in a 2ms window (1000 samples at 500kHz)
window = signal[182000:183000]  # ~253-255ms
unique = np.unique(window)
print(f"Unique values: {len(unique)}")       # Expected: hundreds. Observed: 7-9
print(f"Min step: {np.min(np.diff(np.sort(unique)))}")  # Observed: 4.375e-08 T (uniform grid)

# Full-signal analysis
all_unique = np.unique(signal)
diffs = np.diff(np.sort(all_unique))
min_step = np.min(diffs[diffs > 0])
dynamic_range = np.max(signal) - np.min(signal)
effective_bits = np.log2(dynamic_range / min_step)
print(f"Effective bits: {effective_bits:.1f}")  # Observed: 6.1
```

### Quantitative Evidence

| Metric | Expected (float64) | Observed |
|--------|-------------------|----------|
| Unique values per 2ms window | Hundreds+ | Median 9, range 7-35 |
| Effective bit depth | 52+ (mantissa) | 6.1 |
| Shannon entropy per window | >20 bits | Median 2.39 bits |
| Quantization step | Variable | Uniform 43.7 nT grid |
| Windows failing 16-unique threshold | ~0% | 98.3% (170/173) |

The uniform step size (4.375e-08 T ≈ 2⁻²⁴·⁵) across the entire signal confirms this is a digitization/processing artifact, not noise.

### Impact

Any analysis computing derivatives on this data (dB/dt, growth rates, instantaneous frequency via Hilbert transform) will produce artificial spikes at quantization boundaries. We initially identified what appeared to be physically meaningful "bouncing ball" patterns in the pre-disruption window — these turned out to be pure quantization artifacts.

### Workaround

Level1 raw ADC data for the same shots shows ~14.5 effective bits and does not exhibit this problem. We migrated our analysis pipeline to Level1 and the artifacts disappeared.

### Additional Observation: Saddle Loop NaN Rates

On the same shot, Saddle loop data (`b_field_tor_probe_saddle_field`) has 56% NaN values (180,872 / 323,040), rising to 60% in the analysis window around the disruption. This may be a separate issue but worth flagging.

### Environment

- Access method: s5cmd v2.x with `--no-sign-request --endpoint-url https://s3.echo.stfc.ac.uk`
- Analysis: Python 3.9, zarr, numpy, scipy
- Date of access: December 2025 – January 2026

### Questions for the Team

1. Is the Level2 quantization a known limitation of the processing pipeline?
2. Is this specific to certain shot ranges or diagnostic channels?
3. Should downstream users default to Level1 data for quantitative analysis?
4. Is the Saddle loop NaN rate expected for shot 29980?

Thank you for maintaining this archive — it's a valuable resource and we want to make sure the community is aware of this limitation for derivative-sensitive analyses.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Level2 magnetics data severely quantized (~6 effective bits despite float64 storage) #211

Summary

Affected Data

Reproduction

Quantitative Evidence

Impact

Workaround

Additional Observation: Saddle Loop NaN Rates

Environment

Questions for the Team

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metric	Expected (float64)	Observed
Unique values per 2ms window	Hundreds+	Median 9, range 7-35
Effective bit depth	52+ (mantissa)	6.1
Shannon entropy per window	>20 bits	Median 2.39 bits
Quantization step	Variable	Uniform 43.7 nT grid
Windows failing 16-unique threshold	~0%	98.3% (170/173)

Level2 magnetics data severely quantized (~6 effective bits despite float64 storage) #211

Description

Summary

Affected Data

Reproduction

Quantitative Evidence

Impact

Workaround

Additional Observation: Saddle Loop NaN Rates

Environment

Questions for the Team

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions