Skip to content

refactor: begin to remove sigstore_protobuf_specs #1470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 38 commits into from
Aug 11, 2025
Merged

refactor: begin to remove sigstore_protobuf_specs #1470

merged 38 commits into from
Aug 11, 2025

Conversation

woodruffw
Copy link
Member

@woodruffw woodruffw commented Jul 18, 2025

Towards #1049.

This is very WIP and won't work in CI yet since I'm using a local editable install of the new sigstore_models package while I iterate on it.

NB: I've also temporarily disabled interrogate because it has some kind of issue with a cairo dep.

@woodruffw woodruffw added this to the 4.0 milestone Jul 18, 2025
@woodruffw woodruffw self-assigned this Jul 18, 2025
@woodruffw woodruffw added the refactoring Refactoring tasks. label Jul 18, 2025
@woodruffw
Copy link
Member Author

I've published https://pypi.org/project/sigstore-models/ to accompany this.

@jku
Copy link
Member

jku commented Jul 23, 2025

This is great, I'll try to review tomorrow (although I don't think we need to rush with this one)

@woodruffw
Copy link
Member Author

Thanks @jku!

I have an internal need for this, but there's no huge rush from me -- I can always pin to the git reference temporarily 🙂

(With that being said, I'd love it if we could land this with v4, if you think that's possible -- I think it's be nice to have a clean major break.)

@jku
Copy link
Member

jku commented Jul 29, 2025

These quirks should have no public API implications, but they require some more explicit internal round-trip handling than the sigstore_protobuf_specs APIs did. In particular, we now need to explicitly base64 encode bytes when passing them into the models, and same string-encoding uint64s.

This is the part I haven't yet wrapped my head around. Does pydantic still get some actual value from the serializers that are defined in sigstore-models?

Copy link
Member

@jku jku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have another look through the PR to make sure understand the serialization changes but I so far it looks good to me

  • signed-off-by is apparently missing so tests don't run
  • The main disadvantage of not using protobuf-specs (in addition to "someone" having to maintain sigstore-models) might be that dependabot will no longer warn us of changes... not much we can do about that
  • Do you have a good feel for whether we end up modifying the public API in ways that need further documentation (like if the serialization issues ever end up visible to users?)

@woodruffw
Copy link
Member Author

This is the part I haven't yet wrapped my head around. Does pydantic still get some actual value from the serializers that are defined in sigstore-models?

Nope, it's more that Pydantic (which we were already using indirectly, just with less direct control) doesn't have a super clean split between "wire type" and "real type" (for lack of a better term).

For example, here's a made-up Protobuf definition:

message Hello {
    int64 abc = 1;
    bytes def = 2;
}

when serialized as ProtoJSON, that turns into a "wire" representation like:

{ "abc": "1234", "def": "aGVsbG8=" }

i.e. int64 gets wrapped as a string for range safety, and bytes gets wrapped as a base64 string because JSON has no raw bytes.

To then pull those out of JSON and into useful Python types, we need Pydantic type adapters, e.g.:

https://github.com/astral-sh/sigstore-models/blob/d0dad407dfe19a35aa24cc1c54054dc9d90ecd0e/src/sigstore_models/_core.py#L41C1-L85C2

...which we'd then use like this:

class Hello(BaseModel):
    abc: ProtoU64
    def: ProtoBytes

This works really well once the object is actually loaded (since the revealed types for abc and def become int and bytes respectively), but it also means that initializing a Hello object needs to go through those wrapper types instead of directly though the revealed types. In other words, this doesn't work:

Hello(abc=123, def=b"hello")

instead, we have to do:

Hello(abc=str(123), def=base64.b64encode(b"hello"))

...so we end up with explicit base64 littered in a few more places than the current models 😞

There might be a way around this, but it's not super clear from the Pydantic docs. I can look into it some more.

@woodruffw
Copy link
Member Author

  • Do you have a good feel for whether we end up modifying the public API in ways that need further documentation (like if the serialization issues ever end up visible to users?)

If I'm right (famous last words) there should be no real public API changes from the models themselves here, although in practice this PR does shift the other public APIs quite a bit (like LogEntry and Bundle). I could put some work into making the changes a bit less disruptive, but I'm also a little bit tempted to make a "big" public API change here and see what shakes out 😅

woodruffw and others added 3 commits July 29, 2025 10:40
Copy link
Member

@jku jku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very hard to make sure all of the b64encode(bytes.fromhex(...) is correct but I've read through it twice and it seems to all make sense, thanks.

I would like to see the lint and test results though, can you rebase or merge main? (sorry about the parallel signing changes and staging changes making this a bit painful)

@woodruffw
Copy link
Member Author

It's very hard to make sure all of the b64encode(bytes.fromhex(...) is correct but I've read through it twice and it seems to all make sense, thanks.

Yeah, agreed unfortunately -- I think this is ultimately a net improvement but it definitely shows a weak point with how we're (ab)using Pydantic.

And no problem -- I'll deconflict this now.

@woodruffw
Copy link
Member Author

Hm, looks like Rekor v2 in staging is 503ing:

        if http_error_msg:
>           raise HTTPError(http_error_msg, response=self)
E           requests.exceptions.HTTPError: 503 Server Error: Service Unavailable for url: https://log2025-alpha1.rekor.sigstage.dev/api/v2/log/entries

jku
jku previously approved these changes Aug 5, 2025
Copy link
Member

@jku jku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, lgtm.

The rekor 503 did not reproduce (I did realize the rekor-tiles http logging does not seem to be enabled while looking at this, I'll follow up on that elsewhere)

@woodruffw woodruffw requested a review from jku August 8, 2025 15:36
@jku
Copy link
Member

jku commented Aug 11, 2025

DCO check was still complaining but I will count this as a good faith effort to sign-off-by :)

@jku jku merged commit 204e0f4 into main Aug 11, 2025
26 checks passed
@jku jku deleted the ww/rm-protobufs branch August 11, 2025 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring Refactoring tasks.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants