Skip to content

Gate dict gdal_metadata behind the experimental rich-tag opt-in#3327

Merged
brendancol merged 2 commits into
mainfrom
issue-3320
Jun 14, 2026
Merged

Gate dict gdal_metadata behind the experimental rich-tag opt-in#3327
brendancol merged 2 commits into
mainfrom
issue-3320

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #3320

Summary

to_geotiff gated attrs['gdal_metadata_xml'] and attrs['extra_tags'] behind allow_experimental_codecs=True, but not attrs['gdal_metadata']. Since _extract_rich_tags builds GDAL XML from a dict gdal_metadata and writes it to disk, a fresh DataArray could write arbitrary GDAL metadata without opting in. This closes that hole.

  • Add a gdal_metadata check to _validate_write_rich_tag_optin, firing only when the value is a dict (a non-dict value is ignored by the writer, so it stays ungated).
  • The round-trip exemption for read-back arrays (carrying _xrspatial_geotiff_contract) is unchanged, so canonical round-trips stay flag-free.
  • _extract_rich_tags is untouched.

Backend coverage

The gate lives in _validate_write_rich_tag_optin, shared by the eager/dask CPU writer and the GPU writer, so all four backends enforce it.

Test plan

  • Unit tests on _validate_write_rich_tag_optin: dict gdal_metadata rejected without the flag, non-dict ignored, accepted with the flag, round-trip marker still exempt.
  • End-to-end to_geotiff: fresh DataArray with dict gdal_metadata raises ValueError without the opt-in and writes with it; a read-back array writes flag-free.
  • Updated pack/scale-offset test setups that wrote a fresh gdal_metadata dict to pass the opt-in.
  • Full geotiff suite green (6451 passed).

A fresh DataArray carrying attrs['gdal_metadata'] as a dict could write
arbitrary GDAL metadata to the on-disk TIFF without allow_experimental_codecs,
because _validate_write_rich_tag_optin only checked gdal_metadata_xml and
extra_tags. _extract_rich_tags builds the GDAL XML from a dict gdal_metadata,
so add it to the gate (dict only; non-dict values are ignored by the writer).
The round-trip exemption for attrs carrying the contract marker is unchanged.

Update pack/scale-offset test setups that wrote a fresh gdal_metadata dict to
pass the opt-in, and note the dict case on the writer.gdal_metadata_xml row in
the release contract.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 14, 2026

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Gate dict gdal_metadata behind the experimental rich-tag opt-in

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

None.

Nits (optional improvements)

  • The gate fires on a dict attrs['gdal_metadata'] even when attrs['gdal_metadata_xml'] is also present. In that case _extract_rich_tags (_attrs.py:1474-1478) prefers the XML and never builds from the dict, so the dict alone wouldn't reach disk. This is harmless: with gdal_metadata_xml present the gate already fires for it, so the write is blocked either way, and a message listing both attrs is accurate. No change needed; noting it so the interaction is on record.
  • A test for the "both gdal_metadata_xml and dict gdal_metadata present, no flag" case would pin the message wording, but the existing gdal_metadata_xml gate test already covers the reject path. Low value.

What looks good

  • The trigger isinstance(attrs.get('gdal_metadata'), dict) mirrors _extract_rich_tags exactly, so the gate fires for precisely the inputs that would otherwise write GDAL XML to disk. Non-dict values stay ungated, matching the writer.
  • The round-trip exemption is untouched: the _xrspatial_geotiff_contract early return runs before the triggered checks, so read-back arrays still write flag-free, and there's a direct test for it.
  • The gate is shared by the eager/dask CPU writer and the GPU writer, so all four backends enforce it.
  • Test fallout was handled at the source: pack and scale-offset test helpers that build a fresh gdal_metadata dict now pass the opt-in, rather than weakening the gate. Full geotiff suite is green.

Checklist

  • Matches the write trigger in _extract_rich_tags
  • All four backends enforce the gate (shared validator)
  • Round-trip exemption preserved and tested
  • Edge cases covered (non-dict ignored, accepted with flag)
  • No premature materialization or copies (validation-only change)
  • Benchmark not needed (input validation)
  • README feature matrix not applicable (no new function)
  • Docstrings/contract doc updated (release contract row)

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review

Addressed the review nits:

  • Added test_validate_write_rich_tag_optin_names_both_xml_and_gdal_metadata to pin the rejection message when both gdal_metadata_xml and a dict gdal_metadata are present without the opt-in (the second nit).
  • The first nit (gate also flags the dict when xml is present) is dismissed as a non-defect: _extract_rich_tags prefers the XML, the write is already blocked by the gdal_metadata_xml trigger, and listing both attrs in the message is accurate. No code change.

No remaining blockers or suggestions. The gate logic, round-trip exemption, and backend coverage are unchanged from the first pass.

@brendancol brendancol merged commit 6bd7463 into main Jun 14, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

geotiff: attrs['gdal_metadata'] bypasses the experimental rich-tag write gate

1 participant