Skip to content

Conversation

@darenwkt
Copy link
Contributor

@darenwkt darenwkt commented May 15, 2025

What is the purpose of the change

This PR is a feature improvement to include cause in event when restarting unhealthy job:

Before:

2025-05-15 14:08:52,863 o.a.f.k.o.l.AuditUtils         [INFO ][default/basic-example] >>> Event[Job]       | Warning | RESTARTUNHEALTHYJOB | Restarting unhealthy job

After:

2025-05-15 14:08:52,863 o.a.f.k.o.l.AuditUtils         [INFO ][default/basic-example] >>> Event[Job]       | Warning | RESTARTUNHEALTHYJOB | Restart count hit threshold: 1

Brief change log

  • A new class ClusterHealthResult is introduced to store information about the job health and error message

Verifying this change

This change added unit tests and can be verified as follows:

  • Manually built image and run on local minikube
  • Verified ClusterHealthResult is serialised and deserialised correctly by observer
2025-05-15 14:09:50,968 o.a.f.k.o.o.ClusterHealthObserver [DEBUG][default/basic-example] Observed cluster health: ClusterHealthInfo(timeStamp=1747318190968, numRestarts=0, numRestartsEvaluationTimeStamp=0, numCompletedCheckpoints=0, numCompletedCheckpointsIncreasedTimeStamp=0, healthResult=ClusterHealthResult(healthy=true, error=null))
  • Verified Event created contains caused of unhealthy job
2025-05-15 14:08:52,863 o.a.f.k.o.l.AuditUtils         [INFO ][default/basic-example] >>> Event[Job]       | Warning | RESTARTUNHEALTHYJOB | Restart count hit threshold: 1

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changes to the CustomResourceDescriptors: (no)
  • Core observer or reconciler logic that is regularly executed: (yes)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable)

@darenwkt
Copy link
Contributor Author

Hi @gyfora, would you be able to review this PR please? Thank you

@darenwkt
Copy link
Contributor Author

Thanks @gyfora for the review, I have addressed the comments, could you take another look please? Thank you

@darenwkt darenwkt requested a review from gyfora May 16, 2025 08:44
Copy link
Contributor

@gyfora gyfora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a minor test comment, after that it should be good to go :)

@darenwkt
Copy link
Contributor Author

Thanks @gyfora, addressed comment :)

@gyfora gyfora merged commit b0bc3a3 into apache:main May 16, 2025
130 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants