Skip to content

Conversation

@csviri
Copy link
Contributor

@csviri csviri commented Jul 10, 2025

What changes were proposed in this pull request?

PR removes unnecessary informer for the configuration SparkOperatorConfigMapReconciler.
It adds a sanity test using KubeApiTest.

Why are the changes needed?

Having additional informer might effect performance, memory consumption, also maintains an additiona websocket connection to Kubernetes API.

Does this PR introduce any user-facing change?

No, but introduces new way of testing.

How was this patch tested?

With an integration test.

Was this patch authored or co-authored using generative AI tooling?

No

@csviri csviri changed the title Remove informer config reconciler Remove ConfigMap informer from config reconciler Jul 10, 2025
@csviri csviri changed the title Remove ConfigMap informer from config reconciler [SPARK-52755] Remove ConfigMap informer from config reconciler Jul 10, 2025
@github-actions github-actions bot added the BUILD label Jul 10, 2025
@csviri
Copy link
Contributor Author

csviri commented Jul 10, 2025

@jiangzho @dongjoon-hyun This PR introduces KubeApiTest as dependency for testing purposes.
Pls take a look and let me know if this makes sense for you.

Thank you!

@csviri csviri marked this pull request as ready for review July 10, 2025 14:20
@csviri csviri requested a review from dongjoon-hyun July 11, 2025 06:25
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this PR assume that the created K8s Operator ConfigMap is immutable, @csviri ?

I'm wondering if it's okay with DYNAMIC_CONFIG_SELECTOR.

Although Apache Spark generates immutable ConfigMaps, Spark K8s Operator is different, isn't it?

@dongjoon-hyun
Copy link
Member

Could you review this PR, @jiangzho ?

@jiangzho
Copy link
Contributor

Just wanna make sure I understand this - @csviri - with this patch, the config reconciler is still reconciling the config map with fixed interval. It's just not triggered by config map update events, is my understanding correct ?

But without informer, how may we ensure the resource is up-to-date at each reconciliation ? Is the cache refreshed under the hood ?

@csviri
Copy link
Contributor Author

csviri commented Jul 14, 2025

@dongjoon-hyun @jiangzho

To clarify the behavior of this reconciler; this way every ConfigMap triggers the reconciliation that has the label specified in label selector DYNAMIC_CONFIG_SELECTOR. Also, all subsequent changes to the ConfigMap trigger the conciliation.

Does this PR assume that the created K8s Operator ConfigMap is immutable

From the controller's perspective, this is transparent, if a config map won't change it won't trigger the reconciliation. If someone adds an annotation to an immutable ConfigMap, that might still trigger the reconciliation - in that sense, it is not aware of immutable ConfigMaps. Probably should not be, maybe just the reconciler implementation should be idempotent in this regard.

with this patch, the config reconciler is still reconciling the config map with fixed interval. It's just not triggered by config map update events, is my understanding correct ?

It is triggered by configMap update events.

But without informer, how may we ensure the resource is up-to-date at each reconciliation ? Is the cache refreshed under the hood ?

For a primary resource, there is an implicit informer in the background, that caches the changes and triggers the reconciliation. (This is more explicit for @ControllerConfiguration annotation for exmaple from v5. )

the config reconciler is still reconciling the config map with fixed interval

This is a bit tricky, since if everything works correctly, there is no need for a fixed interval reconciliation. Unless the reconciler implementation requires it. In JOSDK in general there is a "failsafe" fixed reconiliation for every primary resource, with a relatively high interval (10 hours), see in maxReconciliationInterval

The interval can be changed, but also turned off.

@csviri csviri requested a review from dongjoon-hyun July 14, 2025 09:23
@dongjoon-hyun
Copy link
Member

Thank you for the details. Can we have some numbers for this claim, @csviri ?

Having additional informer might effect performance, memory consumption, also maintains an additiona websocket connection to Kubernetes API.

@dongjoon-hyun
Copy link
Member

Also, cc @peter-toth

@csviri
Copy link
Contributor Author

csviri commented Jul 14, 2025

Thank you for the details. Can we have some numbers for this claim, @csviri ?

Having additional informer might effect performance, memory consumption, also maintains an additiona websocket connection to Kubernetes API.

Unfortunately, I don't have any at hand now.
Just to shed a light on this:

  • performance wise the reconciler will start reconiling the first resource when all the informers are synced (this includes the informer of the primary, thus listed all the resources of target resources into the cache)
  • additional infomer means it caches the target resources, in this case basically the informer was redundant, so had the resources twice in memory
  • additional informer means an additional websocker connection maintained continuously to K8S API server.

Will work on the number within JOSDK in close future.

cc @shawkins do you happen to have some numbers / metrics regarding informers? thank you

@shawkins
Copy link

cc @shawkins do you happen to have some numbers / metrics regarding informers? thank you

Not that I recall. Memory footprint is usually the number one concern - it of course varies with the number of resources watched and the size of them. Cluster scoped watching of secrets or configmaps would commonly be the most problematic.

The next most frequent concern with the older fabric8s using okhttp was an exhaustion of concurrent requests - that has since been fixed. I'm not sure what the server-side limit is on the number of concurrent watches / websockets.

@dongjoon-hyun
Copy link
Member

Could you check the CI failure, @csviri ?

@csviri
Copy link
Contributor Author

csviri commented Jul 15, 2025

Could you check the CI failure, @csviri ?

sure will do

csviri added 8 commits July 16, 2025 14:38
Signed-off-by: Attila Mészáros <[email protected]>
Signed-off-by: Attila Mészáros <[email protected]>
Signed-off-by: Attila Mészáros <[email protected]>
Signed-off-by: Attila Mészáros <[email protected]>
Signed-off-by: Attila Mészáros <[email protected]>
@csviri csviri force-pushed the remove-informer-config-reconciler branch from 01ff6e6 to 74502e4 Compare July 16, 2025 12:38
@dongjoon-hyun
Copy link
Member

Merged to main. Thank you, @csviri , @jiangzho , @shawkins .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants