Skip to content

Conversation

liggitt
Copy link
Member

@liggitt liggitt commented Sep 30, 2025

  • One-line PR description: Initial draft for gogo protobuf removal

/sig api-machinery architecture
/cc @deads2k @jpbetz @dims

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 30, 2025
@serathius
Copy link
Contributor

serathius commented Sep 30, 2025

/cc @ahrtr
Might be relevant for etcd

@liggitt
Copy link
Member Author

liggitt commented Sep 30, 2025

/cc @ahrtr

I'm not sure the approach taken here is particularly relevant to etcd and etcd-io/etcd#14533 ... etcd does treat the .proto file as the canonical API definition, and does fully generate go types from .proto without the wild Go-rewriting steps Kubernetes does.

@aojea
Copy link
Member

aojea commented Sep 30, 2025

LGTM modulo one comment

We have to get rid of this sword of Damocles hanging over the project

Copy link
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@jpbetz jpbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke with @liggitt earlier about this approach. It's pragmatic. I'm onboard.

/approve

/hold
For approval from @deads2k

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 1, 2025
@ahrtr
Copy link
Member

ahrtr commented Oct 3, 2025

/cc @ahrtr

I'm not sure the approach taken here is particularly relevant to etcd and etcd-io/etcd#14533 ... etcd does treat the .proto file as the canonical API definition, and does fully generate go types from .proto without the wild Go-rewriting steps Kubernetes does.

etcd has some minor rewrite (converting v1 messages to v2), refer to https://github.com/etcd-io/etcd/blob/682d4fb5e281bb4f987e9350d5896d3bebdfadf4/scripts/genproto.sh#L177-L210

Could you point me to the rewrite & related doc Kubernetes does?

Comment on lines +448 to +449
The changes in this KEP are to code generation, and primarily result in removal of unused generated code.
There is no runtime enablement / disablement of these changes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes perfect sense!

But it seems that introducing new interfaces might be unnecessarily complicate this KEP; and it might be better to consider it as a separate KEP.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no new interfaces, we're referencing interfaces defined within gogo packages currently ... we're redefining those interfaces locally instead

@liggitt
Copy link
Member Author

liggitt commented Oct 3, 2025

etcd has some minor rewrite (converting v1 messages to v2), refer to https://github.com/etcd-io/etcd/blob/682d4fb5e281bb4f987e9350d5896d3bebdfadf4/scripts/genproto.sh#L177-L210

Could you point me to the rewrite & related doc Kubernetes does?

Ours is here, after running the gogo generation step:
https://github.com/kubernetes/code-generator/blob/v0.34.0/cmd/go-to-protobuf/protobuf/cmd.go#L289-L293

The rewrite is defined here: https://github.com/kubernetes/code-generator/blob/master/cmd/go-to-protobuf/protobuf/parser.go#L80

@liggitt
Copy link
Member Author

liggitt commented Oct 3, 2025

etcd has some minor rewrite (converting v1 messages to v2), refer to https://github.com/etcd-io/etcd/blob/682d4fb5e281bb4f987e9350d5896d3bebdfadf4/scripts/genproto.sh#L177-L210

that's unfortunate, I'm pretty sure gogo messages aren't fully compliant v2 messages

@ahrtr
Copy link
Member

ahrtr commented Oct 3, 2025

etcd has some minor rewrite (converting v1 messages to v2), refer to https://github.com/etcd-io/etcd/blob/682d4fb5e281bb4f987e9350d5896d3bebdfadf4/scripts/genproto.sh#L177-L210

that's unfortunate, I'm pretty sure gogo messages aren't fully compliant v2 messages

The rewrite is just to support REST API generated by grpc-gateway (so the core etcd's grpc API isn't impacted), and we have enough test cases to ensure nothing is broken.

Sorry to derail the discussion of the KEP.

@deads2k
Copy link
Contributor

deads2k commented Oct 7, 2025

Plan looks reasonable and any new component reflects our actual exposure today.

PRR also looks good

/lgtm
/approve
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 7, 2025
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 7, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, jpbetz, liggitt, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 01516b7 into kubernetes:master Oct 7, 2025
4 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.35 milestone Oct 7, 2025

We will truncate the exported methods from protobuf generation to just those methods.

The `ProtoMessage()` marker method will be relocated to build-tagged files,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I got to this post-merge. Wanted to share what Istio is doing with k8s APIs that may be impacted.

We use an attempted cast to gogoproto.Message to determine if an object is a Gogo message, which uses ProtoMessage(). From there, we use it to...

  • Call gogo.MarshalAny (I think this can be replaced with a call to Marshal() ([]byte, error) directly and then handling the Any construction
  • call gogojsonpb.Unmarshal (this doesn't seem great anyways and can just use json.Unmarshal?)
  • Call gogoproto.Equals() -- for this, I don't see a clear replacement?
  • Call gogoproto.Clone -- I don't think the generated DeepCopy methods are a suitable replacement because the deep copy methods are like DeepCopy(T) T which means if I have an any I cannot deepcopy it(?)

Copy link
Member Author

@liggitt liggitt Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Call gogo.MarshalAny (I think this can be replaced with a call to Marshal() ([]byte, error) directly and then handling the Any construction

calling one of the self-contained Marshal* methods is definitely recommended over calling the gogo or standard proto marshal methods

  • call gogojsonpb.Unmarshal (this doesn't seem great anyways and can just use json.Unmarshal?)

proto-oriented json encoding is not recommended at all for k8s API types ... definitely use json.Marshal or json.Unmarshal

Call gogoproto.Equals() -- for this, I don't see a clear replacement?

Use reflect.DeepEqual or apiequality.Semantic.DeepEqual. proto equality rules are almost certainly not what you want for comparing Kubernetes API objects.

Call gogoproto.Clone -- I don't think the generated DeepCopy methods are a suitable replacement because the deep copy methods are like DeepCopy(T) T which means if I have an any I cannot deepcopy it(?)

DeepCopy or DeepCopyObject is what you should be using for K8s API objects. I honestly don't know what the generated clone methods do for Kubernetes API types or if they are correct at all. Do you have pointers to where you are copying arbitrary Kubernetes API types not known at build time using proto.Clone()?

If you want a single method you can test as an interface, I'd recommend DeepCopyObject() runtime.Object

func DeepCopyObject[T runtime.Object](obj T) T {
	return obj.DeepCopyObject().(T)
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked offline, DeepCopyObject would work for top-level objects, but doesn't work for substructs, which istio sometimes copies. Hoisting recommendations from that thread:

  • if there's a set of specific known types you're copying, I'd probably type-switch a concrete set you actually use along with DeepCopy
  • if you can limit copy use to top-level runtime.Objects, DeepCopyObject could work
  • otherwise you could make a helper method that uses reflect.MethodByName to call DeepCopy and then cast, but that kills dead code elimination, which isn't great

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on slack https://kubernetes.slack.com/archives/C0EG7JC6T/p1759867608433149, Istio is able to immediately move off all of these dependencies so will have no issues with any of the changes presented here. Thanks @liggitt

howardjohn added a commit to howardjohn/istio that referenced this pull request Oct 7, 2025
See
kubernetes/enhancements#5590 (comment);
we use a lot of very questionable gogoproto on k8s type operations.

The good news is, as far as I can every single one of these is on code
paths that never actually touch k8s types (anymore), or are on entirely
dead code paths.

My plan is to split this into two PRs to make things simpler/safer.

This PR removes strictly dead code that is in this area. There is a
surprising amount here! I think this is as we moved more and more stuff
away from these legacy `config.Config` paths towards kclient/krt the
cruft just stayed around.

In the next PRs I will remove the gogo proto handling (TBD how, if we
want to fail or fallback).

I've analyzed the remaining usages:

```
DeepCopy
  UpdateHealth, DeleteHealthCondition (WE only)
  mergeVirtualServicesIfNeeded (on VS spec)
  mergeDestinationRule
  mergeHTTPRoutes (its a VirtualService though, not a HTTPRoute)
  resolveVirtualServiceShortnames (on VS only)
  convertToWasmPluginWrapper (On WasmPlugin)
  In a bunch of tests

ApplyJSON
  FromJSON
    ConvertObject
      webhook.validate (only called for istio types)
      ParseInputs (should be called for everything)
        kube/file.NewController
          Only allows Istio objects but I think its going to read all of them, so we couldn't panic or something?
ApplyJSONStrict
  fromSchemaAndJSONMap
    validateResource (in istioctl, only used on Istio types)

Equals()
  We already explicitly do not use gogoproto.Equals EXCEPT in `needsPush` which only operates on istio.io types

ToProto():
  PilotConfigToResource (I think this will only have Istio types)

ToJSON():
  ToRaw()
    ConvertConfig
      used in configz handler, which should only have Istio types
ToPrettyJSON():
  Not even used on protos, bug in xds/debug.go
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.