Skip to content

feat: multi cluster support #240

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 73 commits into from
Jul 10, 2025
Merged

feat: multi cluster support #240

merged 73 commits into from
Jul 10, 2025

Conversation

vertex451
Copy link
Member

@vertex451 vertex451 commented May 30, 2025

Resolves #149

Changes

Gateway

  1. Refactored gateway/manager and took out its parts into separate packages:
  2. At some point I realized that we have map for GraphQL schemas and map for resolvers/runtime-clients and they are coupled due to relation to a single cluster.
    And I migrated to a domain driven design - introduced targetcluster package that represents a single k8s cluster. So now, instead of having several maps, we have one map of clusters, where each single cluster holds its related schema, resolvers and runtime clients.

Listener

  1. Moved to reconciler-driven code design. Now we have 3 reconcilers - KCP, ClusterAccess and Standard.
    Now it easy to follow the flow of the listnener app.
  2. Reworked package structure - packages that are used by one reconciler I moved to that reconciler, shared packages moved to listener/pkg dir.
  3. Used golang-commons lifecycle package for Reconcilers.

Testing

Tested 01.07 by checking all three paths working:

  1. local-setup
  2. multicluster
  3. single cluster

Tested local setup after removed singlenode mode

vertex451 added 5 commits May 30, 2025 17:39
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
vertex451 added 19 commits June 9, 2025 19:38
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
Copy link
Member

@aaronschweig aaronschweig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments in the PR.

In general I think the PR is already in quite a good state. What I can see though is a diversion from the usual pattern in which we write operators.

A few things to note:

  • to adhere to best practices the usage of our lifecycle package in the golang-commons library is encouraged
    Artem: Done
  • Keep the Reconcilers straight. I am not sold completely on the introduction of the strategies pattern. It feels a bit too much, for something that is determined once on startup. I could understand the introduction of such a pattern a bit more, if we would have on reconciler potentially working on multiple different patterns/inputs. But since we have a KCP reconciler, which is dedicated to performing specific KCP tasks, a ClusterAccess resource reconciler which is able to reconcile ClusterAccesses and non of them are managed by a single reconciler (which would not be ideal anyway) the strategies feel a bit like an overengineered solution for a simple configuration switch in the startup period of the operator
    Artem: Replaced strategy patter with conditional approach
  • I do think it is a good idea, to abstract a cluster layer, that handles a bit the logic of getting a client and the necessary information to gateway to a cluster
  • I didn't look into the tests yet, since you mentioned there were still WIP
  • A small wish from mine: Don't code it like Java. Whilst I sometimes see the benefits of introducing design patterns, I want to avoid to have too many indirections or abstractions where not absolutely necessary. Within the framework of controller-runtime and the combination with the golang-commons library there already are a lot of useful abstractions that help us build mantainable operators, so maybe looking into them, also the pattern of subroutines (term used to encapsulate a specific portion of reconcilation logic) could help when trying to find the right way to abstract/implement the logic needed.
    Artem: Appreciate your feedback, will keep in mind the idea of following the idiomatic Golang way

Comment on lines 80 to 95
tmpFile, err := os.CreateTemp("", "kubeconfig-*.yaml")
if err != nil {
return nil, fmt.Errorf("failed to create temporary kubeconfig file: %w", err)
}
defer os.Remove(tmpFile.Name())

if _, err := tmpFile.Write(kubeconfigData); err != nil {
tmpFile.Close()
return nil, fmt.Errorf("failed to write kubeconfig to temporary file: %w", err)
}
tmpFile.Close()

// Build config from kubeconfig
config, err := clientcmd.BuildConfigFromFlags("", tmpFile.Name())
if err != nil {
return nil, fmt.Errorf("failed to build config from kubeconfig: %w", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be another option to parse the necessary information from the auth metadata. If it is structured, there probably is the correct struct already hidden somewhere, without the need to use a temporary file

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed temporary file usage.

if tc.metadata != nil && tc.metadata.Path != "" {
path = tc.metadata.Path
}
return fmt.Sprintf("http://localhost:%s/%s/graphql", appCfg.Gateway.Port, path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only valid for locally running services. If this GetEndpoint is needed for something else than logging, this is not the correct approach

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now it logs new ednpoint with localhost only in case of local_development=true.
Otherwise it logs /workspace/graphql

}

if metadata.Auth.Kubeconfig == "" {
return nil, fmt.Errorf("kubeconfig data is empty")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no interpolation made, so errors.New is sufficient

Copy link
Member Author

@vertex451 vertex451 Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed that file completely during simplification. And I am very happy about it

Comment on lines 3 to 21
// ClusterMetadata represents cluster connection information embedded in schema files
// Simplified for standard Kubernetes clusters with kubeconfig authentication
type ClusterMetadata struct {
Host string `json:"host"`
Path string `json:"path"`
Auth *AuthMetadata `json:"auth,omitempty"`
}

// AuthMetadata contains kubeconfig authentication for standard Kubernetes clusters
type AuthMetadata struct {
Type string `json:"type"` // Only "kubeconfig" is supported
Kubeconfig string `json:"kubeconfig"` // base64 encoded kubeconfig
}

// FileData represents the data extracted from a schema file
type FileData struct {
Definitions map[string]interface{} `json:"definitions"`
Metadata *ClusterMetadata `json:"x-cluster-metadata"`
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not the biggest fan of type only files - it makes most sense to colocate them to their usage if possible

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also removed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, I don't like the separate type files - please colocate to their usage if possible

Copy link
Member Author

@vertex451 vertex451 Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outdated.

vertex451 added 3 commits July 2, 2025 10:24
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
@n3rdc4ptn
Copy link
Member

Created new issue with this configuration after discussion with Artem.
#271

vertex451 added 10 commits July 2, 2025 10:43
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
* fix: auth issues

---------

Signed-off-by: Artem Shcherbatiuk <[email protected]>
@vertex451 vertex451 changed the title feat: multi cluster support for the listener feat: multi cluster support Jul 9, 2025
vertex451 added 2 commits July 9, 2025 13:51
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
@n3rdc4ptn
Copy link
Member

n3rdc4ptn commented Jul 10, 2025

Great work, already works like a charm for me testing it out.
Two things which came up during testing for me:

HostRef

How hard would it be, to also make the host parameter not only specifiable through the normal spec, but also through a reference from a ConfigMap or Secret?

hostRef:
  configMapRef:

  secretRef:

or similar. Open for improvement here regarding the spec.

Not a deal breaker for us, just a nice to have. Also I not thought about this in my initial architecture proposal.

Status

I saw that there is a condition parameter in the ClusterAccessStatus Type but I not saw anything getting reported back after the resource got reconciled.
It would be helpful if there could be some kind of status parameter, which reports if the definitions file was generated successfully by the listener.
Maybe like ready: true to indicate, that the new endpoint is now available by the gateway.

@aaronschweig @nexus49 @vertex451

@aaronschweig
Copy link
Member

@n3rdc4ptn the status could be a quick win, since this is normally supported out of the box by our lifecycle, the other one, I would argue, can be done in a separate PR.

For the status, we just need to enable WithConditionManagement() and then add conditions to the ClusterAccessStatus @vertex451

@n3rdc4ptn
Copy link
Member

n3rdc4ptn commented Jul 10, 2025

@n3rdc4ptn the status could be a quick win, since this is normally supported out of the box by our lifecycle, the other one, I would argue, can be done in a separate PR.

sounds great :) Thanks for the quick response. Should I create an issue for it? @aaronschweig

On-behalf-of: @SAP [email protected]
Signed-off-by: Artem Shcherbatiuk <[email protected]>
@vertex451
Copy link
Member Author

Added condition management.

P.S. The only thing - it just confirms that spec file is generated, but doesn't guarantee that endpoint is available - for this Gateway must parse the spec file and and create the endpoint.

Screenshot 20
25-07-10 at 13 11 26

@n3rdc4ptn
Copy link
Member

@n3rdc4ptn the status could be a quick win, since this is normally supported out of the box by our lifecycle, the other one, I would argue, can be done in a separate PR.

sounds great :) Thanks for the quick response. Should I create an issue for it? @aaronschweig

added issue: #276

@n3rdc4ptn
Copy link
Member

n3rdc4ptn commented Jul 10, 2025

P.S. The only thing - it just confirms that spec file is generated, but doesn't guarantee that endpoint is available - for this Gateway must parse the spec file and and create the endpoint.

From our side, this is enough for now. We are optimistic, that the Gateway will do its job.
@aaronschweig what do you think? Required for this to be complete that the Gateway will also report status otherwise this could be a Follow Up

@vertex451 vertex451 merged commit 089a1a3 into main Jul 10, 2025
11 of 12 checks passed
@vertex451 vertex451 deleted the feat/mutli-cluster branch July 10, 2025 11:26
@vertex451 vertex451 added this to the 2025.Q3 milestone Jul 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Architecture Proposal for Multi Cluster Listener
5 participants