Trying to make sense of purl qualifier value encoding/decoding #642

petergardfjall · 2025-08-25T14:02:09Z

petergardfjall
Aug 25, 2025

I'm using the https://github.com/package-url/packageurl-go library to handle purls.
It is important that these purls are canonical since I also store them as identifiers in a database.
This was largely a smooth sail until I started having purls with qualifiers, not least ones like pkg:generic with download_url.

I'm at a point where I need to understand the PURL spec better to know if the offending behavior is to be found in the textual purl input, the library decoding/parsing, or the library encoding.

Textual input

Let's start with the textual purl input and let's use pkg:generic as our example. Say that I have a package pkg:generic/[email protected] for which I want to specify a download_url being https://dl.openssl.org/openssl+security.tar.gz and a checksum being sha256:abc123. How am I allowed to express this as a textual purl?

✔️ pkg:generic/[email protected]?checksum=sha256:abc123&download_url=https:%2F%2Fdl.openssl.org%2Fopenssl%2Bsecurity.tar.gz
- Should be allowed since this is the canonical purl encoding if I understand the spec.
✔️ pkg:generic/[email protected]?download_url=https:%2F%2Fdl.openssl.org%2Fopenssl%2Bsecurity.tar.gz&checksum=sha256:abc123
- Should also be fine since it just swaps checksum with download_url
❓ pkg:generic/[email protected]?checksum=sha256:abc123&download_url=https://dl.openssl.org/openssl+security.tar.gz
- This one I'm unsure of since it does not escape the download_url qualifier value according to spec. Still it follows the same pattern as the pkg:generic examples and this comment by @matt-phylum so I assume it's allowed, although I do not understand how it can be since it seems to violate the encoding requirements in the spec.

Library decoding

Onto decoding. For the above purls the library decodes the input as follows:

✔️ pkg:generic/[email protected]?checksum=sha256:abc123&download_url=https:%2F%2Fdl.openssl.org%2Fopenssl%2Bsecurity.tar.gz. The output looks right:

packageurl.PackageURL{
 Type:"generic",
 Namespace:"",
 Name:"openssl+security",
 Version:"1.2.3",
 Qualifiers:packageurl.Qualifiers{
   packageurl.Qualifier{Key:"checksum", Value:"sha256:abc123"},
   packageurl.Qualifier{Key:"download_url", Value:"https://dl.openssl.org/openssl+security.tar.gz"}},
 Subpath:""}

✔️ pkg:generic/[email protected]?download_url=https:%2F%2Fdl.openssl.org%2Fopenssl%2Bsecurity.tar.gz&checksum=sha256:abc123. The output looks right here as well:

packageurl.PackageURL{
  Type:"generic",
  Namespace:"",
  Name:"openssl+security",
  Version:"1.2.3",
  Qualifiers:packageurl.Qualifiers{
    packageurl.Qualifier{Key:"checksum", Value:"sha256:abc123"},
    packageurl.Qualifier{Key:"download_url", Value:"https://dl.openssl.org/openssl+security.tar.gz"}},
  Subpath:""}

❌ pkg:generic/[email protected]?checksum=sha256:abc123&download_url=https://dl.openssl.org/openssl+security.tar.gz:
Here we can see that the downoad_url is incorrect, with the + being replaced with a space .
Is this an error in input or an error in the library decoding/parsing?

packageurl.PackageURL{
  Type:"generic",
  Namespace:"",
  Name:"openssl+security",
  Version:"1.2.3",
  Qualifiers:packageurl.Qualifiers{
    packageurl.Qualifier{Key:"checksum", Value:"sha256:abc123"},
    packageurl.Qualifier{Key:"download_url", Value:"https://dl.openssl.org/openssl security.tar.gz"}},
  Subpath:""}

Library encoding

Onto encoding of the parsed packageurl.PackageURLs.

❌ pkg:generic/[email protected]?checksum=sha256:abc123&download_url=https:%2F%2Fdl.openssl.org%2Fopenssl%2Bsecurity.tar.gz.
Looks wrong (colons : should not be escaped):
```
pkg:generic/openssl%[email protected]?checksum=sha256%3Aabc123&download_url=https%3A%2F%2Fdl.openssl.org%2Fopenssl%2Bsecurity.tar.gz
```
❌ pkg:generic/[email protected]?download_url=https:%2F%2Fdl.openssl.org%2Fopenssl%2Bsecurity.tar.gz&checksum=sha256:abc123.
Looks wrong (colons : should not be escaped):
```
pkg:generic/openssl%[email protected]?checksum=sha256%3Aabc123&download_url=https%3A%2F%2Fdl.openssl.org%2Fopenssl%2Bsecurity.tar.gz
```
❌ pkg:generic/[email protected]?checksum=sha256:abc123&download_url=https://dl.openssl.org/openssl+security.tar.gz.
Looks wrong (colons : should not be escaped but the + character should be escaped):
```
pkg:generic/openssl%[email protected]?checksum=sha256%3Aabc123&download_url=https%3A%2F%2Fdl.openssl.org%2Fopenssl+security.tar.gz
```

I would really appreciate if someone could answer these questions so I can figure out what's at fault here.

matt-phylum · 2025-08-25T15:09:00Z

matt-phylum
Aug 25, 2025

How am I allowed to express this as a textual purl?

If you want the canonical form:

The plus signs need to be encoded because it's not part of the "permitted characters" set.
checksum must come before download_url because qualifiers must be sorted by key.
The download URL characters not in the "permitted characters" set must be encoded.

I think all of those PURLs should be understandable by any correct implementation.

Here we can see that the downoad_url is incorrect, with the + being replaced with a space .
Is this an error in input or an error in the library decoding/parsing?

This is a bug in the library. I'm pretty sure the reason why plus is not in the new permitted characters set is to avoid this edge case. When I entered #261 for this plus vs space issue, half of the known implementations did one thing and half did another, and some of them did one thing when encoding and another when decoding such that they couldn't understand their own output.

Looks wrong (colons : should not be escaped):

This is another common encoding issue with PURL implementations. The old spec was less clear, but the new spec says that colon should not be escaped.

Hopefully now that the tests are being fixed up, the implementations that have been adopted into the package-url organization will either be updated to pass the tests or archived so people know that they're not compliant.

6 replies

petergardfjall Aug 26, 2025
Author

Taking the go purl library as an example, its qualifier value encoding is incorrect then since it just does a https://pkg.go.dev/net/[email protected]#PathEscape, which fails to percent-encode: + (to %2B), @ (to %40), & (to %26), = (to %3D)

matt-phylum Aug 26, 2025

On "should" vs "must": as I understand it, the PURL spec actually defines two different classes of PURL. If you follow the procedure documented in How to parse a purl string in its components you end up with an implementation that accepts a much wider range of PURLs than implementations are supposed to generate. I think it's unfortunate that the prescribed algorithm does things like searching for delimiters from both ends of the string, but it's a fairly simple algorithm and it covers up some inconsistencies you might see in input data from different implementations or written by humans.

petergardfjall Aug 27, 2025
Author

I took a stab to see if I could fix the broken purl stringification of the Go library (package-url/packageurl-go#83), but quickly ran into a number of issues with the new test suite. I've opened a bunch of tickets (#643, #644, #645) to better understand expectations.

Notably #644 is interesting.
In short there are lots of, so called, roundtrip test cases where an input like pkg:generic/[email protected]?download_url=https://openssl.org/source/openssl-1.1.0g.tar.gz&checksum=sha256:de4d501267da is supposed to be handed back by the library as an expected_output. The issue is that the expected_output never escapes qualifiers according to the rules of a canonical purl.

This, to me, suggests that libraries somehow need to preserve the original user input and only return a canonical string when expressly asked. Could you comment on this, since that seems to make libraries much more complex?

matt-phylum Aug 27, 2025

I think what happened was the PURL types spec had a lot of bad examples and all of those bad examples have been automatically converted into bad tests. This example is clearly not canonical because the qualifiers are not sorted.

purl-spec/PURL-TYPES.rst

Lines 278 to 282 in 337a4ed

	- Examples (truncated for brevity)::

	pkg:generic/[email protected]
	pkg:generic/[email protected]?download_url=https://openssl.org/source/openssl-1.1.0g.tar.gz&checksum=sha256:de4d501267da
	pkg:generic/bitwarderl?vcs_url=git%2Bhttps://git.fsfe.org/dxtr/bitwarderl%40cc55108da32

petergardfjall Aug 27, 2025
Author

Is it fair to say that the new test suite is not yet ready to be used by library implementations?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Trying to make sense of purl qualifier value encoding/decoding #642

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 6 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Trying to make sense of purl qualifier value encoding/decoding #642

Uh oh!

petergardfjall Aug 25, 2025

Textual input

Library decoding

Library encoding

Replies: 1 comment · 6 replies

Uh oh!

matt-phylum Aug 25, 2025

Uh oh!

petergardfjall Aug 26, 2025 Author

Uh oh!

matt-phylum Aug 26, 2025

Uh oh!

Uh oh!

petergardfjall Aug 27, 2025 Author

Uh oh!

matt-phylum Aug 27, 2025

Uh oh!

petergardfjall Aug 27, 2025 Author

petergardfjall
Aug 25, 2025

Replies: 1 comment 6 replies

matt-phylum
Aug 25, 2025

petergardfjall Aug 26, 2025
Author

petergardfjall Aug 27, 2025
Author

petergardfjall Aug 27, 2025
Author