Skip to content

Conversation

@LalatenduMohanty
Copy link
Member

@LalatenduMohanty LalatenduMohanty commented May 16, 2025

Fixes #561. Replacing the metadata parser with the metadata parser from packaging.metadata.
As we need to use the packaging library to parse metadata instead of using the email library ourselves.

@LalatenduMohanty LalatenduMohanty force-pushed the issue_561 branch 2 times, most recently from 8bd775c to 1247975 Compare May 19, 2025 10:45
@LalatenduMohanty LalatenduMohanty requested a review from a team as a code owner July 4, 2025 18:07
@LalatenduMohanty LalatenduMohanty changed the title [WIP] Replaceing the metadata parser from packaging.metadata Replaceing the metadata parser from packaging.metadata Jul 4, 2025
@LalatenduMohanty LalatenduMohanty force-pushed the issue_561 branch 2 times, most recently from cb3813b to bdada13 Compare July 4, 2025 18:35
Fixes python-wheel-build#561. Replacing the metadata parser with the metadata parser from packaging.metadata.
As we need to use the packaging library to parse metadata instead of using the email library ourselves.

Signed-off-by: Lalatendu Mohanty <[email protected]>
@LalatenduMohanty
Copy link
Member Author

@tiran @dhellmann PTAL

Copy link
Member

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. How many places in the code do we do something similar to parse metadata? How useful would it be to have a function that takes a path and returns the metadata?

@tiran
Copy link
Collaborator

tiran commented Jul 7, 2025

packaging.metadata.parse_email does not round trip and looses any field that it does not understand. Is this okay? Do we need a round trip-safe function?

@LalatenduMohanty
Copy link
Member Author

How many places in the code do we do something similar to parse metadata? How useful would it be to have a function that takes a path and returns the metadata?

My bad. I should have checked all code to see if same pattern exists else where. I can see https://github.com/python-wheel-build/fromager/blob/main/src/fromager/candidate.py#L82 . I do not think we need a common function yet. PTAL and let me know.

@LalatenduMohanty
Copy link
Member Author

packaging.metadata.parse_email does not round trip and looses any field that it does not understand. Is this okay? Do we need a round trip-safe function?

Let me get back to you on this.

@LalatenduMohanty LalatenduMohanty force-pushed the issue_561 branch 3 times, most recently from 6e992d6 to a1a70e7 Compare July 7, 2025 19:42
@LalatenduMohanty
Copy link
Member Author

@tiran Since fromager only reads metadata for dependency resolution and doesn't need to write it back, round trip safety isn't necessary. The benefits of type safety and validation from packaging.metadata outweigh the loss of unknown fields that aren't being used anyway.

- Replace email.parser.BytesParser with packaging.metadata.Metadata
- Remove complex TYPE_CHECKING type alias workaround
- Update metadata access patterns to use typed attributes
- Improve type safety and API consistency

Signed-off-by: Lalatendu Mohanty <[email protected]>
Update parse_metadata() to use parse_email() + Metadata.from_raw()
for round-trip safety and consistency with bootstrapper.py and
candidate.py.

Signed-off-by: Lalatendu Mohanty <[email protected]>
@LalatenduMohanty LalatenduMohanty requested a review from tiran July 8, 2025 19:09
Copy link
Contributor

@rd4398 rd4398 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I will wait for @tiran to approve since he had clarification questions

Comment on lines +78 to +80
# If we didn't find the metadata, return an empty metadata object
raw_metadata, _ = parse_email(b"")
return Metadata.from_raw(raw_metadata)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not work. A metadata object has three mandatory fields:

>>> raw_metadata, _ = parse_email(b"")
>>> Metadata.from_raw(raw_metadata)
  + Exception Group Traceback (most recent call last):
  |   File "<python-input-4>", line 1, in <module>
  |     Metadata.from_raw(raw_metadata)
  |     ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
  |   File "/usr/lib/python3.13/site-packages/packaging/metadata.py", line 752, in from_raw
  |     raise ExceptionGroup("invalid metadata", exceptions)
  | ExceptionGroup: invalid metadata (3 sub-exceptions)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/usr/lib/python3.13/site-packages/packaging/metadata.py", line 712, in from_raw
    |     metadata_version = ins.metadata_version
    |                        ^^^^^^^^^^^^^^^^^^^^
    |   File "/usr/lib/python3.13/site-packages/packaging/metadata.py", line 514, in __get__
    |     value = converter(value)
    |   File "/usr/lib/python3.13/site-packages/packaging/metadata.py", line 536, in _process_metadata_version
    |     raise self._invalid_metadata(f"{value!r} is not a valid metadata version")
    | packaging.metadata.InvalidMetadata: None is not a valid metadata version
    +---------------- 2 ----------------
    | Traceback (most recent call last):
    |   File "/usr/lib/python3.13/site-packages/packaging/metadata.py", line 747, in from_raw
    |     getattr(ins, key)
    |     ~~~~~~~^^^^^^^^^^
    |   File "/usr/lib/python3.13/site-packages/packaging/metadata.py", line 514, in __get__
    |     value = converter(value)
    |   File "/usr/lib/python3.13/site-packages/packaging/metadata.py", line 541, in _process_name
    |     raise self._invalid_metadata("{field} is a required field")
    | packaging.metadata.InvalidMetadata: 'name' is a required field
    +---------------- 3 ----------------
    | Traceback (most recent call last):
    |   File "/usr/lib/python3.13/site-packages/packaging/metadata.py", line 747, in from_raw
    |     getattr(ins, key)
    |     ~~~~~~~^^^^^^^^^^
    |   File "/usr/lib/python3.13/site-packages/packaging/metadata.py", line 514, in __get__
    |     value = converter(value)
    |   File "/usr/lib/python3.13/site-packages/packaging/metadata.py", line 554, in _process_version
    |     raise self._invalid_metadata("{field} is a required field")
    | packaging.metadata.InvalidMetadata: 'version' is a required field
    +------------------------------------

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate=False is not going to work either. It creates a broken Metadata object:

>>> raw_metadata, _ = parse_email(b"")
>>> m = Metadata.from_raw(raw_metadata, validate=False)
>>> m.name
Traceback (most recent call last):
  File "<python-input-4>", line 1, in <module>
    m.name
  File "/usr/lib/python3.13/site-packages/packaging/metadata.py", line 514, in __get__
    value = converter(value)
  File "/usr/lib/python3.13/site-packages/packaging/metadata.py", line 541, in _process_name
    raise self._invalid_metadata("{field} is a required field")
packaging.metadata.InvalidMetadata: 'name' is a required field

Comment on lines +787 to +788
raw_metadata, _ = parse_email(f.read())
metadata = Metadata.from_raw(raw_metadata)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you using parse_email() + Metadata.from_raw() instead of Metadata.parse_email()? The Metadata.parse_email() combines parse_email(), Metadata.from_raw(), and additional validation.

This code should probably use fromager.dependencies.parse_metadata(metadata_filename).

Comment on lines +73 to +76
metadata_content = z.read(n)
raw_metadata, _ = parse_email(metadata_content)
metadata = Metadata.from_raw(raw_metadata)
return metadata
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this?

Suggested change
metadata_content = z.read(n)
raw_metadata, _ = parse_email(metadata_content)
metadata = Metadata.from_raw(raw_metadata)
return metadata
return Metadata.parse_email(z.read(n))

Comment on lines -268 to +269
return Metadata.from_email(metadata_file.read_bytes(), validate=validate)
raw_metadata, _ = parse_email(metadata_file.read_bytes())
return Metadata.from_raw(raw_metadata, validate=validate)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change this? Metadata.from_email() is better, because it performs additional validations with validate=True.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

use packaging library to parse metadata instead of doing it ourselves

4 participants