Trying to make sense of purl qualifier value encoding/decoding #642
Replies: 1 comment 6 replies
-
If you want the canonical form:
I think all of those PURLs should be understandable by any correct implementation.
This is a bug in the library. I'm pretty sure the reason why plus is not in the new permitted characters set is to avoid this edge case. When I entered #261 for this plus vs space issue, half of the known implementations did one thing and half did another, and some of them did one thing when encoding and another when decoding such that they couldn't understand their own output.
This is another common encoding issue with PURL implementations. The old spec was less clear, but the new spec says that colon should not be escaped. Hopefully now that the tests are being fixed up, the implementations that have been adopted into the package-url organization will either be updated to pass the tests or archived so people know that they're not compliant. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm using the https://github.com/package-url/packageurl-go library to handle purls.
It is important that these purls are canonical since I also store them as identifiers in a database.
This was largely a smooth sail until I started having purls with qualifiers, not least ones like
pkg:generic
withdownload_url
.I'm at a point where I need to understand the PURL spec better to know if the offending behavior is to be found in the textual purl input, the library decoding/parsing, or the library encoding.
Textual input
Let's start with the textual purl input and let's use pkg:generic as our example. Say that I have a package
pkg:generic/[email protected]
for which I want to specify adownload_url
beinghttps://dl.openssl.org/openssl+security.tar.gz
and a checksum beingsha256:abc123
. How am I allowed to express this as a textual purl?pkg:generic/[email protected]?checksum=sha256:abc123&download_url=https:%2F%2Fdl.openssl.org%2Fopenssl%2Bsecurity.tar.gz
pkg:generic/[email protected]?download_url=https:%2F%2Fdl.openssl.org%2Fopenssl%2Bsecurity.tar.gz&checksum=sha256:abc123
checksum
withdownload_url
pkg:generic/[email protected]?checksum=sha256:abc123&download_url=https://dl.openssl.org/openssl+security.tar.gz
download_url
qualifier value according to spec. Still it follows the same pattern as the pkg:generic examples and this comment by @matt-phylum so I assume it's allowed, although I do not understand how it can be since it seems to violate the encoding requirements in the spec.Library decoding
Onto decoding. For the above purls the library decodes the input as follows:
✔️
pkg:generic/[email protected]?checksum=sha256:abc123&download_url=https:%2F%2Fdl.openssl.org%2Fopenssl%2Bsecurity.tar.gz
. The output looks right:✔️
pkg:generic/[email protected]?download_url=https:%2F%2Fdl.openssl.org%2Fopenssl%2Bsecurity.tar.gz&checksum=sha256:abc123
. The output looks right here as well:❌
.
pkg:generic/[email protected]?checksum=sha256:abc123&download_url=https://dl.openssl.org/openssl+security.tar.gz
:Here we can see that the
downoad_url
is incorrect, with the+
being replaced with a spaceIs this an error in input or an error in the library decoding/parsing?
Library encoding
Onto encoding of the parsed
packageurl.PackageURL
s.❌
pkg:generic/[email protected]?checksum=sha256:abc123&download_url=https:%2F%2Fdl.openssl.org%2Fopenssl%2Bsecurity.tar.gz
.Looks wrong (colons
:
should not be escaped):❌
pkg:generic/[email protected]?download_url=https:%2F%2Fdl.openssl.org%2Fopenssl%2Bsecurity.tar.gz&checksum=sha256:abc123
.Looks wrong (colons
:
should not be escaped):❌
pkg:generic/[email protected]?checksum=sha256:abc123&download_url=https://dl.openssl.org/openssl+security.tar.gz
.Looks wrong (colons
:
should not be escaped but the+
character should be escaped):I would really appreciate if someone could answer these questions so I can figure out what's at fault here.
Beta Was this translation helpful? Give feedback.
All reactions