Skip to content

Conversation

@tianon
Copy link
Member

@tianon tianon commented Jul 18, 2025

If the list is not empty, the tags MUST be in lexical order (i.e. case-insensitive alphanumeric order).

@tianon
Copy link
Member Author

tianon commented Jul 18, 2025

Oh, that's exciting:

OCI Distribution Conformance Tests Content Discovery Test content discovery endpoints (listing tags) [It] GET request to list tags should yield 200 response and be in sorted order
/home/runner/work/distribution-spec/distribution-spec/conformance/03_discovery_test.go:257

  [FAILED] Expected
      <[]string | len:5, cap:8>: ["test0", "test1", "test2", "test3", "tagtest0"]
  to equal
      <[]string | len:5, cap:5>: ["tagtest0", "test0", "test1", "test2", "test3"]
  In [It] at: /home/runner/work/distribution-spec/distribution-spec/conformance/03_discovery_test.go:267 @ 07/18/25 05:59:58.96

@tianon
Copy link
Member Author

tianon commented Jul 18, 2025

If the list is not empty, the tags MUST be in lexical order (i.e. case-insensitive alphanumeric order).

Ah, this is slightly more complicated than what I've implemented. Moving to draft for now (however the CI failure is a real actual bug, I think).

@tianon tianon marked this pull request as draft July 18, 2025 06:04
@tianon tianon force-pushed the sorted-tag-list branch from d9b44fd to 83278b5 Compare July 18, 2025 06:08
@tianon tianon marked this pull request as ready for review July 18, 2025 06:08
@tianon
Copy link
Member Author

tianon commented Jul 18, 2025

Implementation now follows the spec more correctly, although I did not add any uppercase tags to test it with. That's probably worth doing too.

(and again, that CI failure is a legit spec compliance issue)

@tianon
Copy link
Member Author

tianon commented Jul 18, 2025

An edge case the spec is not currently clear on is what to do with test0 vs TEST0 in the same list - which comes first?

@sudo-bmitch
Copy link
Contributor

Are there registries where this passes? If everyone is running a sort.Strings then perhaps we should adjust the spec to align with the implementations.

@tianon
Copy link
Member Author

tianon commented Jul 19, 2025

I did some historical diving, and distribution/distribution@aebe850 is the first time the original spec we're based on mentioned "lexical" ordering, however that was for the _catalog endpoint (which OCI does not include). However, in distribution/distribution@006214d (which is part of the same patch series/PR) the "tag listing" API was updated to "heavily reference" the catalog endpoint and explicitly include the word "lexical".

This series was included in registry (distribution) v2.1.0, but digging into the code I get down to https://github.com/distribution/distribution/blob/9ca7921603852314b18a6ecc19f91806935f34bd/registry/storage/tagstore.go#L29 being the "original" implementation of said "ordering", which I believe is shelling out to the drivers like S3, Azure, etc and just passing their sorting forward verbatim, so I don't think the sorting was actually implemented intentionally, but more documented as an incidental detail of the means of implementation (which is totally fair, since I don't think they were actually trying to make a perfect spec that other people could implement servers for in a 100% compatible and reliable way - I see that original spec as more of documentation for implementing sane client code, but that's not really the point here).

Scratching the surface via Git Blame on the current codebase, distribution/distribution@4da2712 is extremely relevant, but 2021 is way too recent (especially given the "last" parameter being defined in the API back in 2015), however it is pretty surprising to see a simple sort.Strings in there, contrary to the API (and now inherited into OCI!) specifying "lexical" explicitly (and presumably intentionally).

"Tianon, stawp, what's the takeaway"

I'm not a distribution-spec maintainer, but I'm pretty well convinced that sort.Strings is actually the closest to the original intent, and that "ASCIIbetical" sorting is actually the correct and desirable result here, despite the explicit language in the specification that says otherwise (but I also find this particular API to be irritatingly sparse and the sorting fairly close to useless, so perhaps my opinion shouldn't hold much weight here).

No matter what, I think we really ought to update the spec language here in some way to clarify what the intent is (especially since we've got several popular registries that do not implement the spec language as-is, now including the original "reference" implementation).

A conservative change might be to keep the current sorting language but somehow make it clear that a given registry MUST be deterministic, because otherwise the pagination last parameter is worse than useless? Then update the test to only verify sorting, and to be generous in what is considered "sorted"? If we wanted to go the extra mile, test twice and make sure the list is the same order? Or test with last set to the first tag and check that the list is almost the same?

A more proactive change would be to codify Go's sort.Strings algorithm, but given the tag grammar, that's effectively just "ASCIIbetical" anyhow and would still have some fallout 🤷

@tianon
Copy link
Member Author

tianon commented Jul 19, 2025

I guess an even simpler alternative that would increase the number of compliant registries would be to remove the language about ordering entirely, but I don't know what the implications of that would be. I guess it could be downgraded to a SHOULD? No matter what, we ought to fix the spec, the test, or both.

@tianon
Copy link
Member Author

tianon commented Jul 19, 2025

I guess I should either update this implementation to use sort.SliceStable or something like Expect(sort.SliceIsSorted(tagList, func(i, j int) bool { return strings.ToLower(tagList[i]) < strings.ToLower(tagList[j]) })).To(BeTrue(), "tag list should be in lexical order (i.e. case-insensitive alphanumeric order): %v", tagList) so it's more forgiving of the edge cases implied by the spec language, but I'd love some spec maintainer guidance/opinion on which direction this fix ought to go (spec, test, or both).

@sudo-bmitch
Copy link
Contributor

Given how many registries are doing an ascii sort, my preference is to adjust the spec wording to match the implementations. Loosening or removing the wording to account for any sort order the registry wants to implement would mean client logic to list tags starting at some given point (e.g. at the start of semver named tags) would not be reliable on OCI conformant registries.

I suspect the reason this hasn't come up before is because so many repositories default to lower case tags. Until I tried pushing a tag with upper case characters, it wasn't obvious whether the sort order was lexical.

@tianon tianon force-pushed the sorted-tag-list branch from 8de2a0c to 9ac302e Compare July 19, 2025 23:23
@tianon
Copy link
Member Author

tianon commented Jul 19, 2025

Yeah, that makes sense to me! For now, I've updated the test here to verify that the tag order is either lexical or "asciibetical" (sort.Strings), explicitly allowing both. I think it would also be sane to verify that the order is consistent (like if we set last to tagList[0], making sure the returned result exactly equals tagList[1:]), but I didn't go that far here. I'm happy to update if that sounds useful.

The CI failure is still a legitimate bug (because the tested registry does not appear to sort tags at all, which is problematic):

OCI Distribution Conformance Tests Content Discovery Test content discovery endpoints (listing tags) [It] GET request to list tags should yield 200 response and be in sorted order
/home/runner/work/distribution-spec/distribution-spec/conformance/03_discovery_test.go:258

  [FAILED] Expected
      <[]string | len:9, cap:16>: ["test0", "TEST0", "test1", "TEST1", "test2", "TEST2", "test3", "TEST3", "tagtest0"]
  To satisfy at least one of these matchers: [%!s(*matchers.EqualMatcher=&{[tagtest0 test0 TEST0 test1 TEST1 test2 TEST2 test3 TEST3]}) %!s(*matchers.EqualMatcher=&{[TEST0 TEST1 TEST2 TEST3 tagtest0 test0 test1 test2 test3]})]
  In [It] at: /home/runner/work/distribution-spec/distribution-spec/conformance/03_discovery_test.go:272 @ 07/19/25 23:23:48.156

@tianon tianon force-pushed the sorted-tag-list branch from 8dad02a to 2dab2cf Compare July 19, 2025 23:37
@tianon
Copy link
Member Author

tianon commented Jul 19, 2025

I've added a new commit with a bare minimum change to the spec that makes it clear sort.Strings is also acceptable -- of course, I'm happy to update, amend, drop, reword, etc. 👍

tianon added 2 commits July 19, 2025 16:39
> If the list is not empty, the tags MUST be in lexical order (i.e. case-insensitive alphanumeric order).

This update explicitly allows either lexical *or* "asciibetical", as many existing registries have already implemented this requirement via `sort.Strings`.

This also adds uppercase versions of the `testN` tags to help test "case-insensitive alphanumeric order" (the edge cases that matter for sorted order).

Signed-off-by: Tianon Gravi <[email protected]>
@tianon tianon force-pushed the sorted-tag-list branch from 2dab2cf to 0bc87d5 Compare July 19, 2025 23:40
Copy link
Contributor

@jdolitsky jdolitsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this LGTM but appears to fail in CI since we are using zot. @rchincha - does zot sort tags in any way?

@andaaron
Copy link

this LGTM but appears to fail in CI since we are using zot. @rchincha - does zot sort tags in any way?

Tags in zot are sorted in lexical order

@tianon
Copy link
Member Author

tianon commented Jul 22, 2025

https://github.com/project-zot/zot/blob/552242f558af313dbd58a0927bbe54340fad57c1/pkg/api/routes.go#L367 appears to be where sorting is applied (which isn't lexical, but also isn't what I'm seeing in CI here and should still pass this test, so more digging required)

@tianon
Copy link
Member Author

tianon commented Jul 22, 2025

Ah, that wasn't applied unconditionally until project-zot/zot@24e37eb which was in 2.0.0-rc7 and OCI is still testing rc6.

This should fix the tag listing API to be sorted all the time (not lexical, but the updated language/test accounts for that): project-zot/zot@24e37eb

Signed-off-by: Tianon Gravi <[email protected]>
@tianon
Copy link
Member Author

tianon commented Jul 22, 2025

I updated zot to 2.0.4 so that it includes that sort.Strings call unconditionally and the CI is green now (thanks to relaxing the test / spec language to explicitly allow non-lexical sorting - we've now got two example registries that implemented/interpreted it that way 😅).

I considered updating to the latest latest zot (2.1.x), but figured a more conservative update was safer here.

@tianon
Copy link
Member Author

tianon commented Jul 24, 2025

In case it's helpful, https://github.com/quay/quay/blob/2172c6bd4610937ea6c2861756ef2e0f2f05e1c7/data/model/oci/tag.py#L142 seems to be where Quay has implemented sorting for the tag listing API, and it appears to be using a standard "order by" in whatever database backend they've got, which is going to be ASCIIbetical 10 times out of 10 also. 😄 ❤️

Copy link
Contributor

@sudo-bmitch sudo-bmitch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@rchincha rchincha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@sudo-bmitch sudo-bmitch merged commit 9d1b925 into opencontainers:main Jul 31, 2025
4 checks passed
@tianon tianon deleted the sorted-tag-list branch July 31, 2025 20:12
@tianon
Copy link
Member Author

tianon commented Aug 6, 2025

Just to add another data point here, Docker Hub is also currently using "asciibetical" sorting (not lexical):

$ crane ls tianon/test | grep -i ascii
ASCIIBETICAL
AsCiIbEtIcAl
aScIiBeTiCaL
asciibetical

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants