Skip to content

WIP: migrate estimator protobuf to use standard protoc-gen-go#7298

Open
zhzhuang-zju wants to merge 1 commit intokarmada-io:masterfrom
zhzhuang-zju:protoc
Open

WIP: migrate estimator protobuf to use standard protoc-gen-go#7298
zhzhuang-zju wants to merge 1 commit intokarmada-io:masterfrom
zhzhuang-zju:protoc

Conversation

@zhzhuang-zju
Copy link
Copy Markdown
Contributor

What type of PR is this?

/kind api-change

What this PR does / why we need it:

This PR migrates the scheduler-estimator protobuf implementation to use the standard protoc-gen-go toolchain, replacing the legacy go-to-protobuf generator. This change is a proactive adaptation for Kubernetes 1.35+ dependencies.

Background:
Kubernetes v1.35 has removed the ProtoMessage() marker method from core REST API types (e.g., Pod, Node). To provide a buffer for downstream projects like Karmada, upstream Kubernetes introduced a temporary build tag, kubernetes_protomessage_one_more_release, which forces these types to implement an empty ProtoMessage() method. However, this compatibility layer is slated for complete removal in Kubernetes v1.36. This PR ensures Karmada is fully decoupled from this dependency ahead of time.

Solution:
Adopted a "Peer Fields" strategy:

  1. Peer Fields: Introduced new bytes fields (e.g., NodeAffinityBytes) alongside existing fields to support direct binary transmission of K8s objects.
  2. Graceful Migration: Kept the existing strongly-typed fields (deprecated) to maintain wire compatibility with older components during upgrades.
  3. Modern Toolchain: Switched to standard protoc-gen-go and protoc-gen-go-grpc.
  4. Helper Layer: Added helper methods (MustSet..., Unmarshal...) to handle the conversion between K8s Go types and the new protobuf structure seamlessly.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

scheduler-estimator: Migrated to standard protoc-gen-go for gRPC API generation to support Kubernetes 1.35+. Introduced peer `bytes` fields for K8s types to ensure compatibility.

Copilot AI review requested due to automatic review settings March 18, 2026 09:29
@karmada-bot karmada-bot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Mar 18, 2026
@karmada-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jabellard for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Mar 18, 2026
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request modernizes the scheduler-estimator's protobuf implementation to ensure its long-term compatibility with evolving Kubernetes APIs, specifically addressing changes in Kubernetes 1.35 and beyond. The core change involves transitioning to standard protobuf generation tools and introducing a dual-field strategy for Kubernetes object serialization. This approach allows for direct binary handling of Kubernetes types while preserving backward compatibility for existing components, facilitating a smooth upgrade path.

Highlights

  • Protobuf Migration: Migrated the scheduler-estimator's protobuf implementation to use the standard protoc-gen-go toolchain, replacing the legacy go-to-protobuf generator.
  • Kubernetes 1.35+ Compatibility: Proactively adapted the protobuf implementation to ensure compatibility with Kubernetes 1.35+ dependencies, which remove the ProtoMessage() marker method from core REST API types.
  • Peer Fields Strategy: Introduced 'Peer Fields' (e.g., NodeAffinityBytes, ResourceRequestBytes, TolerationsBytes) to support direct binary transmission of Kubernetes objects, addressing changes in Kubernetes 1.35+ where writers cannot set certain strongly-typed fields directly.
  • Backward Compatibility: Retained existing strongly-typed fields (marked as deprecated) alongside the new bytes peer fields to maintain wire compatibility with older components during upgrades.
  • Helper Methods: Added helper methods (Set..., Unmarshal..., MustSet...) to seamlessly handle the conversion between Kubernetes Go types and the new protobuf structure, simplifying interaction with the new peer fields.
  • Build Toolchain Update: Updated the build toolchain to use standard protoc-gen-go and protoc-gen-go-grpc for protobuf generation, and removed the go-to-protobuf related build steps.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully migrates the scheduler-estimator protobuf implementation from the legacy go-to-protobuf generator to the standard protoc-gen-go toolchain, addressing compatibility concerns with Kubernetes 1.35+. The adoption of a "Peer Fields" strategy, introducing new bytes fields alongside deprecated strongly-typed fields, is a robust approach to ensure graceful migration and wire compatibility. The addition of helper methods (Set..., Unmarshal..., MustSet...) simplifies the conversion between Kubernetes Go types and the new protobuf structure. The changes are comprehensive, affecting protobuf definitions, generation scripts, and Go code across various components to align with the new protobuf standard. However, several instances of ignored errors from Set and Unmarshal helper methods have been identified, which could lead to silent failures or incorrect data processing. These should be addressed to ensure the reliability of the new protobuf serialization/deserialization logic.

Tolerations: replicaRequirements.NodeClaim.Tolerations,
}
_ = req.ReplicaRequirements.NodeClaim.SetNodeAffinity(replicaRequirements.NodeClaim.HardNodeAffinity)
_ = req.ReplicaRequirements.NodeClaim.SetTolerations(replicaRequirements.NodeClaim.Tolerations)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The SetTolerations method returns an error, but it is currently ignored. If marshaling fails, the TolerationsBytes field will not be populated, potentially leading to incorrect toleration matching or reliance on deprecated fields. It's important to handle this error to ensure the new 'Peer Fields' strategy works as intended. Consider propagating the error or logging it.


for _, component := range components {
if component.ReplicaRequirements.ResourceRequest == nil {
requirements, _ := component.ReplicaRequirements.UnmarshalResourceRequest()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The UnmarshalResourceRequest method returns an error, but it is currently ignored. If unmarshaling fails, requirements might be an empty ResourceList, leading to incorrect resource aggregation. It's important to handle this error to ensure the new 'Peer Fields' strategy works as intended. Consider propagating the error or logging it.

NodeSelector: cr.NodeClaim.NodeSelector,
Tolerations: cr.NodeClaim.Tolerations,
}
_ = out.NodeClaim.SetNodeAffinity(cr.NodeClaim.HardNodeAffinity)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The SetNodeAffinity method returns an error, but it is currently ignored. If marshaling fails, the NodeAffinityBytes field will not be populated, potentially leading to incorrect node affinity matching or reliance on deprecated fields. It's important to handle this error to ensure the new 'Peer Fields' strategy works as intended. Consider propagating the error or logging it.

Tolerations: cr.NodeClaim.Tolerations,
}
_ = out.NodeClaim.SetNodeAffinity(cr.NodeClaim.HardNodeAffinity)
_ = out.NodeClaim.SetTolerations(cr.NodeClaim.Tolerations)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The SetTolerations method returns an error, but it is currently ignored. If marshaling fails, the TolerationsBytes field will not be populated, potentially leading to incorrect toleration matching or reliance on deprecated fields. It's important to handle this error to ensure the new 'Peer Fields' strategy works as intended. Consider propagating the error or logging it.

Namespace: replicaRequirements.Namespace,
PriorityClassName: replicaRequirements.PriorityClassName,
}
_ = req.ReplicaRequirements.SetResourceRequest(replicaRequirements.ResourceRequest)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The SetResourceRequest method returns an error, but it is currently ignored. If marshaling fails, the ResourceRequestBytes field will not be populated, which could lead to incorrect behavior or reliance on deprecated fields. It's important to handle this error to ensure the new 'Peer Fields' strategy works as intended. Consider propagating the error or logging it if it's truly non-critical.


if requirements.NodeClaim != nil {
tolerations = requirements.NodeClaim.Tolerations
tolerations, _ = requirements.NodeClaim.UnmarshalTolerations()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The UnmarshalTolerations method returns an error, but it is currently ignored. If unmarshaling fails, tolerations might be an empty slice, leading to incorrect node matching. It's important to handle this error to ensure the new 'Peer Fields' strategy works as intended. Consider propagating the error or logging it.

tolerations, _ = requirements.NodeClaim.UnmarshalTolerations()
}

requirementsRL, _ := requirements.UnmarshalResourceRequest()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The UnmarshalResourceRequest method returns an error, but it is currently ignored. If unmarshaling fails, requirementsRL might be an empty ResourceList, leading to incorrect replica estimation. It's important to handle this error to ensure the new 'Peer Fields' strategy works as intended. Consider propagating the error or logging it.

return 0, framework.AsResult(err)
}

requirements, _ := replicaRequirements.UnmarshalResourceRequest()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The UnmarshalResourceRequest method returns an error, but it is currently ignored. If unmarshaling fails, requirements might be an empty ResourceList, leading to incorrect quota evaluation. It's important to handle this error to ensure the new 'Peer Fields' strategy works as intended. Consider propagating the error or logging it.

func (s *SchedulingSimulator) scheduleComponent(component pb.Component) bool {
requiredPerReplica := util.NewResource(component.ReplicaRequirements.ResourceRequest)
func (s *SchedulingSimulator) scheduleComponent(component *pb.Component) bool {
res, _ := component.ReplicaRequirements.UnmarshalResourceRequest()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The UnmarshalResourceRequest method returns an error, but it is currently ignored. If unmarshaling fails, res might be an empty ResourceList, leading to incorrect simulation results. It's important to handle this error to ensure the new 'Peer Fields' strategy works as intended. Consider propagating the error or logging it.

// Set old field
m.ResourceRequest = make(map[string]*resource.Quantity)
for k, v := range res {
q := v
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency and clarity, consider adding a // copy comment here, similar to line 130 in the ReplicaRequirements.SetResourceRequest method. This helps to explicitly state the intention of copying the loop variable v.

@zhzhuang-zju zhzhuang-zju force-pushed the protoc branch 3 times, most recently from cbdbb02 to d7bbc4f Compare April 2, 2026 09:47
Signed-off-by: zhzhuang-zju <m17799853869@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants