Trying to make sense of purl qualifier value encoding/decoding #642
-
|
I'm using the https://github.com/package-url/packageurl-go library to handle purls. I'm at a point where I need to understand the PURL spec better to know if the offending behavior is to be found in the textual purl input, the library decoding/parsing, or the library encoding. Textual inputLet's start with the textual purl input and let's use pkg:generic as our example. Say that I have a package
Library decodingOnto decoding. For the above purls the library decodes the input as follows:
Library encodingOnto encoding of the parsed
I would really appreciate if someone could answer these questions so I can figure out what's at fault here. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 6 replies
-
If you want the canonical form:
I think all of those PURLs should be understandable by any correct implementation.
This is a bug in the library. I'm pretty sure the reason why plus is not in the new permitted characters set is to avoid this edge case. When I entered #261 for this plus vs space issue, half of the known implementations did one thing and half did another, and some of them did one thing when encoding and another when decoding such that they couldn't understand their own output.
This is another common encoding issue with PURL implementations. The old spec was less clear, but the new spec says that colon should not be escaped. Hopefully now that the tests are being fixed up, the implementations that have been adopted into the package-url organization will either be updated to pass the tests or archived so people know that they're not compliant. |
Beta Was this translation helpful? Give feedback.
-
|
A whole slew of test suite testcases were fixed after this discussion: #644 |
Beta Was this translation helpful? Give feedback.
On "should" vs "must": as I understand it, the PURL spec actually defines two different classes of PURL. If you follow the procedure documented in How to parse a
purlstring in its components you end up with an implementation that accepts a much wider range of PURLs than implementations are supposed to generate. I think it's unfortunate that the prescribed algorithm does things like searching for delimiters from both ends of the string, but it's a fairly simple algorithm and it covers up some inconsistencies you might see in input data from different implementations or written by humans.