fix(python_strategy): clean up stale RUNNING_STRATEGIES and surface scheduled-start failures#1297
Open
RobinAThomas wants to merge 2 commits intomarketcalls:mainfrom
Open
Conversation
…cheduled-start failures Two related bugs caused scheduled strategies to silently fail to start the day after a successful run: 1. start_strategy_process() rejects with 'Strategy already running' as soon as strategy_id is in the in-memory RUNNING_STRATEGIES dict, without checking whether the tracked process is actually alive. The dict is only cleaned by stop_strategy_process(); any path that bypasses that (natural exit, scheduler not firing scheduled_stop, container hiccup) leaves a stale entry that blocks all future starts until the worker is restarted. 2. scheduled_start_strategy() discards the (success, message) return value from start_strategy_process(), so when (1) fires there is no log line, no last_error on the config, no SSE broadcast - the user sees the scheduler tick happen and then... nothing. No log file is ever created (start_strategy_process bails before opening one), making this very hard to diagnose. Fix: - start_strategy_process(): when a RUNNING_STRATEGIES entry already exists, validate liveness via Popen.poll() / psutil.is_running() / psutil.pid_exists() depending on stored type. If the process is dead, log a warning, close the log handle, drop the stale entry and fall through to start a fresh one. - scheduled_start_strategy(): capture the result and on failure log an error, set last_error on the config, and broadcast an 'error' status update via SSE so the UI reflects reality. Reproduction: - Start a strategy via the scheduler at 09:00. - Let it self-terminate (e.g. via internal squareoff at 16:00) so the child process exits without going through stop_strategy_process(). - Next morning at 09:00, scheduler ticks - silent failure, no log file created, last run shows yesterday's stop time. After the fix the next-day start either reuses the slot cleanly, or if it genuinely fails for another reason the failure is now visible in /app/log and on the strategy card.
Contributor
There was a problem hiding this comment.
1 issue found across 1 file
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="blueprints/python_strategy.py">
<violation number="1" location="blueprints/python_strategy.py:1105">
P2: Scheduled auto-start incorrectly reports idempotent "already running" results as errors, causing false error state/SSE updates.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Owner
|
@cubic-dev-ai what is the current status? |
Contributor
@marketcalls The PR is currently in a ready state for final review. Here is where we stand:
The changes are surgical and contained within the For more details on how I handle these reviews, you can check Cubic's documentation. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two related bugs cause scheduled Python strategies to silently fail to start the day after a successful run.
Bug 1 —
start_strategy_processdoes not validate stale in-memory entriesThe
RUNNING_STRATEGIESdict is only cleaned bystop_strategy_process(). Any code path that bypasses that — natural exit, internal squareoff, scheduled stop that did not fire, container hiccup — leaves a stale entry that blocks every future start until the worker is restarted.Bug 2 —
scheduled_start_strategydiscards the failureWhen Bug 1 fires, there is no log line, no
last_erroron the config, no SSE broadcast. No log file is ever created (start_strategy_processbails before opening one), making this very hard to diagnose. From the user side it looks like the scheduler simply did not fire.Reproduction
sys.exit) so the child process exits without going throughstop_strategy_process().Fix
start_strategy_process: when aRUNNING_STRATEGIESentry already exists, validate liveness viaPopen.poll()/psutil.is_running()/psutil.pid_exists()depending on the stored object type. If the process is dead, log a warning, close the log handle, drop the stale entry, and fall through to start a fresh one. If the process is alive, behaviour is unchanged.scheduled_start_strategy: capture(success, message)and on failure log an error, setlast_erroron the config, and broadcast anerrorstatus update via SSE so the UI reflects reality.Both changes are surgical (50 + / 2 −) and live entirely inside the existing
PROCESS_LOCKcritical section, so concurrency semantics are unchanged.Testing
python -m py_compile blueprints/python_strategy.pypasses.start_strategy_process(stop_strategy_process, the manual start route at line 1653, the post-login restoration paths at 2589/2654) are unaffected — they all pass through the same gateway and now also benefit from the stale-entry cleanup.Summary by cubic
Fixes silent no-starts for scheduled Python strategies. Cleans stale
RUNNING_STRATEGIESentries and surfaces scheduled-start failures in logs, the UI, and persisted config.start_strategy_process: validate existing process (poll/is_running/pid_exists); if dead, warn, close the log handle, drop the entry, then start fresh.scheduled_start_strategy: handle(success, message); on failure log an error, set and persistlast_erroron the config, and broadcast an SSEerrorupdate.Written for commit e534957. Summary will update on new commits.