-
Couldn't load subscription status.
- Fork 31
Replaceing the metadata parser from packaging.metadata #607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Replaceing the metadata parser from packaging.metadata #607
Conversation
8bd775c to
1247975
Compare
1247975 to
2132d61
Compare
2132d61 to
42ef5bf
Compare
cb3813b to
bdada13
Compare
Fixes python-wheel-build#561. Replacing the metadata parser with the metadata parser from packaging.metadata. As we need to use the packaging library to parse metadata instead of using the email library ourselves. Signed-off-by: Lalatendu Mohanty <[email protected]>
|
@tiran @dhellmann PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. How many places in the code do we do something similar to parse metadata? How useful would it be to have a function that takes a path and returns the metadata?
|
|
My bad. I should have checked all code to see if same pattern exists else where. I can see https://github.com/python-wheel-build/fromager/blob/main/src/fromager/candidate.py#L82 . I do not think we need a common function yet. PTAL and let me know. |
Let me get back to you on this. |
6e992d6 to
a1a70e7
Compare
|
@tiran Since fromager only reads metadata for dependency resolution and doesn't need to write it back, round trip safety isn't necessary. The benefits of type safety and validation from packaging.metadata outweigh the loss of unknown fields that aren't being used anyway. |
- Replace email.parser.BytesParser with packaging.metadata.Metadata - Remove complex TYPE_CHECKING type alias workaround - Update metadata access patterns to use typed attributes - Improve type safety and API consistency Signed-off-by: Lalatendu Mohanty <[email protected]>
a1a70e7 to
d948afc
Compare
Update parse_metadata() to use parse_email() + Metadata.from_raw() for round-trip safety and consistency with bootstrapper.py and candidate.py. Signed-off-by: Lalatendu Mohanty <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! I will wait for @tiran to approve since he had clarification questions
| # If we didn't find the metadata, return an empty metadata object | ||
| raw_metadata, _ = parse_email(b"") | ||
| return Metadata.from_raw(raw_metadata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not work. A metadata object has three mandatory fields:
>>> raw_metadata, _ = parse_email(b"")
>>> Metadata.from_raw(raw_metadata)
+ Exception Group Traceback (most recent call last):
| File "<python-input-4>", line 1, in <module>
| Metadata.from_raw(raw_metadata)
| ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
| File "/usr/lib/python3.13/site-packages/packaging/metadata.py", line 752, in from_raw
| raise ExceptionGroup("invalid metadata", exceptions)
| ExceptionGroup: invalid metadata (3 sub-exceptions)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/usr/lib/python3.13/site-packages/packaging/metadata.py", line 712, in from_raw
| metadata_version = ins.metadata_version
| ^^^^^^^^^^^^^^^^^^^^
| File "/usr/lib/python3.13/site-packages/packaging/metadata.py", line 514, in __get__
| value = converter(value)
| File "/usr/lib/python3.13/site-packages/packaging/metadata.py", line 536, in _process_metadata_version
| raise self._invalid_metadata(f"{value!r} is not a valid metadata version")
| packaging.metadata.InvalidMetadata: None is not a valid metadata version
+---------------- 2 ----------------
| Traceback (most recent call last):
| File "/usr/lib/python3.13/site-packages/packaging/metadata.py", line 747, in from_raw
| getattr(ins, key)
| ~~~~~~~^^^^^^^^^^
| File "/usr/lib/python3.13/site-packages/packaging/metadata.py", line 514, in __get__
| value = converter(value)
| File "/usr/lib/python3.13/site-packages/packaging/metadata.py", line 541, in _process_name
| raise self._invalid_metadata("{field} is a required field")
| packaging.metadata.InvalidMetadata: 'name' is a required field
+---------------- 3 ----------------
| Traceback (most recent call last):
| File "/usr/lib/python3.13/site-packages/packaging/metadata.py", line 747, in from_raw
| getattr(ins, key)
| ~~~~~~~^^^^^^^^^^
| File "/usr/lib/python3.13/site-packages/packaging/metadata.py", line 514, in __get__
| value = converter(value)
| File "/usr/lib/python3.13/site-packages/packaging/metadata.py", line 554, in _process_version
| raise self._invalid_metadata("{field} is a required field")
| packaging.metadata.InvalidMetadata: 'version' is a required field
+------------------------------------There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validate=False is not going to work either. It creates a broken Metadata object:
>>> raw_metadata, _ = parse_email(b"")
>>> m = Metadata.from_raw(raw_metadata, validate=False)
>>> m.name
Traceback (most recent call last):
File "<python-input-4>", line 1, in <module>
m.name
File "/usr/lib/python3.13/site-packages/packaging/metadata.py", line 514, in __get__
value = converter(value)
File "/usr/lib/python3.13/site-packages/packaging/metadata.py", line 541, in _process_name
raise self._invalid_metadata("{field} is a required field")
packaging.metadata.InvalidMetadata: 'name' is a required field| raw_metadata, _ = parse_email(f.read()) | ||
| metadata = Metadata.from_raw(raw_metadata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you using parse_email() + Metadata.from_raw() instead of Metadata.parse_email()? The Metadata.parse_email() combines parse_email(), Metadata.from_raw(), and additional validation.
This code should probably use fromager.dependencies.parse_metadata(metadata_filename).
| metadata_content = z.read(n) | ||
| raw_metadata, _ = parse_email(metadata_content) | ||
| metadata = Metadata.from_raw(raw_metadata) | ||
| return metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this?
| metadata_content = z.read(n) | |
| raw_metadata, _ = parse_email(metadata_content) | |
| metadata = Metadata.from_raw(raw_metadata) | |
| return metadata | |
| return Metadata.parse_email(z.read(n)) |
| return Metadata.from_email(metadata_file.read_bytes(), validate=validate) | ||
| raw_metadata, _ = parse_email(metadata_file.read_bytes()) | ||
| return Metadata.from_raw(raw_metadata, validate=validate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you change this? Metadata.from_email() is better, because it performs additional validations with validate=True.
Fixes #561. Replacing the metadata parser with the metadata parser from packaging.metadata.
As we need to use the packaging library to parse metadata instead of using the email library ourselves.