Skip to content

Conversation

@Kevinz857
Copy link
Contributor

Summary

Allow updating Service and Ingress resources without restarting the application when only service/ingress related fields change.

When the application is in Running state and only service/ingress fields change, the controller will update the Service/Ingress resources directly without transitioning to Invalidating state.

Changes

  • Add isServiceIngressFieldsOnlyChange() to detect service/ingress-only changes
  • Skip setting Invalidating state for service/ingress-only changes in Running state
  • Add updateServiceIngressResources() to update resources in Running state
  • Add comprehensive unit tests for change detection logic

Supported Fields

The following fields can now be updated without restarting the application:

  • SparkUIOptions (ServiceAnnotations, ServiceLabels, IngressAnnotations, IngressTLS, etc.)
  • DriverIngressOptions (ServiceAnnotations, ServiceLabels, IngressAnnotations, IngressTLS, etc.)
  • Driver.ServiceAnnotations
  • Driver.ServiceLabels

How It Works

  1. When spec changes are detected in event filter, check if only service/ingress fields changed
  2. If application is in Running state and only these fields changed, don't set state to Invalidating
  3. Trigger reconcile which will update Service/Ingress resources using existing create-or-update logic
  4. Application continues running without interruption

Test Plan

  • Unit tests for isServiceIngressFieldsOnlyChange() function
  • Unit tests for hasServiceIngressFieldChanges() function
  • All existing tests pass
  • Manual test: Update ServiceAnnotations while app is running

Partial fix for #2766

@google-oss-prow google-oss-prow bot requested review from ImpSy and nabuskey December 21, 2025 07:28
@google-oss-prow
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mwielgus for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Kevinz857 Kevinz857 force-pushed the feature/hot-update-service-ingress branch 2 times, most recently from 054e8e4 to 14bb5f3 Compare December 21, 2025 07:39
Copy link
Contributor

@nabuskey nabuskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This supports creating and updating only correct? Are there reasons for not supporting deletions?

// These functions use create-or-update pattern, so they will update
// existing resources if configuration has changed.
if err := r.updateServiceIngressResources(ctx, app); err != nil {
logger.Error(err, "Failed to update Service/Ingress resources")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we put this in events only or events and controller log? Makes debugging issues related to services easier.

@Kevinz857
Copy link
Contributor Author

Kevinz857 commented Dec 30, 2025

@nabuskey Thanks for the review!

Regarding deletions:
You're right, the current implementation only handles create/update.I intentionally kept the scope limited for the first iteration because:

  1. Deletion logic is more complex - we need to track which resources were created by previous configurations
  2. The current create-or-update pattern follows existing codebase patterns
  3. Wanted to get feedback before expanding scope

I can add deletion support if you think it's necessary for this PR. Or we could track it as a follow-up enhancement?

Regarding events:
Good suggestion! I'll add Kubernetes events when Service/Ingress resources are updated. Something like:

  • ServiceUpdated / IngressUpdated for successful updates
  • ServiceUpdateFailed / IngressUpdateFailed for errors

Will push an update soon. Thanks!

@nabuskey
Copy link
Contributor

nabuskey commented Jan 5, 2026

Please let us know when it's ready for reviews.

@Kevinz857 Kevinz857 force-pushed the feature/hot-update-service-ingress branch from 561b114 to 2723ab4 Compare January 9, 2026 03:45
@Kevinz857
Copy link
Contributor Author

@nabuskey Ready for review. I've added Kubernetes events for Service/Ingress updates:

  • SparkUIServiceUpdated / SparkUIServiceUpdateFailed
  • SparkUIIngressUpdated / SparkUIIngressUpdateFailed
  • SparkDriverIngressServiceUpdated / SparkDriverIngressServiceUpdateFailed
  • SparkDriverIngressUpdated / SparkDriverIngressUpdateFailed

Copy link
Contributor

@nabuskey nabuskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! Please see my comments.

Allow updating Service and Ingress resources without restarting the
application when only service/ingress related fields change.

When the application is in Running state and only the following fields
change, the controller will update the Service/Ingress resources
directly without transitioning to Invalidating state:
- SparkUIOptions (ServiceAnnotations, ServiceLabels, IngressAnnotations, etc.)
- DriverIngressOptions (ServiceAnnotations, ServiceLabels, IngressAnnotations, etc.)
- Driver.ServiceAnnotations
- Driver.ServiceLabels

Key changes:
- Add isServiceIngressFieldsOnlyChange() to detect service/ingress-only changes
- Skip setting Invalidating state for service/ingress-only changes
- Add updateServiceIngressResources() to update resources in Running state
- Add comprehensive unit tests for change detection logic

Signed-off-by: Kevinz857 <kevinnz@foxmail.com>
@Kevinz857 Kevinz857 force-pushed the feature/hot-update-service-ingress branch from 2723ab4 to 71c9f41 Compare January 16, 2026 06:43
@Kevinz857
Copy link
Contributor Author

@nabuskey Thanks for the review! I've addressed all your comments:

  1. Removed event from predicate - You're right, the predicate is not the right place for events. Removed the f.recorder.Eventf() call from event_filter.go.

  2. Changed log level to Debug - Changed f.logger.Info() to f.logger.V(1).Info() for debug level logging.

  3. Optimized DeepCopy for better performance - Instead of copying the entire SparkApplication, now only copying the Spec portion:

    // Before: copied entire SparkApplication
    oldCopy := oldApp.DeepCopy()
    
    // After: only copy Spec for better performance
    oldSpec := oldApp.Spec.DeepCopy()
    This reduces memory allocation and improves performance in large deployments.

Ready for another round of review. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants