-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Test [Backport 7.75.x] Fix for Spark driver init readiness pending #22306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 7.75.x
Are you sure you want to change the base?
Conversation
* Fix for Spark driver init readiness pending * Add changelog * Lint * Add config param for retries * Simplify description * Address comments * Update spark/assets/configuration/spec.yaml Co-authored-by: Steven Yuen <[email protected]> * ddev sync models and config --------- Co-authored-by: Steven Yuen <[email protected]> (cherry picked from commit 7477060)
Codecov Report❌ Patch coverage is Additional details and impacted files🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 42b986748f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| # Reset startup retry counter on successful JSON parse | ||
| self._startup_retry_count = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid global retry reset across different Spark drivers
_startup_retry_count is a single counter for all requests and gets reset on any successful JSON parse. In driver or YARN modes, the check makes multiple _rest_request_to_json calls to different tracking URLs in the same run; if one app responds with valid JSON, the counter resets even if another app keeps returning the “Spark is starting up” message. That means the retry limit can be perpetually skipped for a stuck driver, so its startup error is never surfaced as CRITICAL and metrics stay silently missing. Consider tracking retries per address/driver or only resetting the counter for the same URL that succeeded.
Useful? React with 👍 / 👎.
Ignore this PR. Just used to test a workflow.
Backport 7477060 from #22252.
What does this PR do?
Prevents JSON decode errors when the Spark driver is starting but not quite ready.
Motivation
Customer inquiry via support ticket.
Review checklist (to be filled by reviewers)
qa/skip-qalabel if the PR doesn't need to be tested during QA.backport/<branch-name>label to the PR and it will automatically open a backport PR once this one is merged