Skip to content

Conversation

wiredfool
Copy link
Member

@wiredfool wiredfool commented Jul 19, 2025

Addresses #8329 (comment) . see also #8329 (comment)
Changes proposed in this pull request:

  • Add the mode name as the schema name instead of "pixel"
  • Add image band metadata in the arrow schema
  • Tests for metadata interop
  • Client testing with arro3 and nanoarrow similar to the pyarrow tests

For multiband images, we issue:

ArrowSchema:
  Format: FixedLengthArray[4]
  Children: 
    - ArrowSchema:
       Format: uint8
       name: RGB
       metadata: {image: {bands: [R, G, B, X]}}

This metadata is accessible using pa.array(img).type.field(0).metadata

For single band images we emit:

ArrowSchema:
  Format: ...
  name: F
  metadata: {image: {bands: [F]}}

This metadata is not apparently accessible via pyarrow or arro3, but is accessible via nanoarrow.

The ultimate goal is to be able to have:

  • numpy/pandas/arrow users be able to do np.array(img) and be able to identify RGB channel names
  • height/width would be ideal

@rok

@wiredfool wiredfool marked this pull request as draft July 19, 2025 15:38
@rok
Copy link

rok commented Jul 20, 2025

@paleolimbot am I right to say array metadata was not really designed to pass application metadata? What are your thoughts on what's being done here?
@wiredfool has the issue here that he can't access metadata of children when in certain cases.
I'm suggesting usage of FixedShapeTensorArray, but I imagine working with arrays directly would simplify things here.

@wiredfool wiredfool force-pushed the pyarrow_band_names branch from e768bbf to 28c7645 Compare July 21, 2025 09:21
@wiredfool wiredfool force-pushed the pyarrow_band_names branch from 14ac76c to 7d2abbd Compare July 21, 2025 09:23
This metadata is available in nanoarrow, but not pyarrow or arro3
@paleolimbot
Copy link

Cool!

It's true that field metadata at anything other the top-level struct has a medium chance of getting propagated through various Arrow operations. If you used the two metadata fields ARROW:extension:name (== pillow.image) and ARROW:extension:metadata as the JSON you're proposing above at the Array level, I think you will have more success getting this to propagate correctly. For pyarrow you'd have to implement a minimal ExtensionType class that a (pyarrow) user would have to "register" for that type to show up nicely in pyarrow.

Feel free to ping on anything if you have a question!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants