fix(python_strategy): clean up stale RUNNING_STRATEGIES and surface scheduled-start failures by RobinAThomas · Pull Request #1297 · marketcalls/openalgo

RobinAThomas · 2026-04-24T08:47:06Z

Summary

Two related bugs cause scheduled Python strategies to silently fail to start the day after a successful run.

Bug 1 — `start_strategy_process` does not validate stale in-memory entries

if strategy_id in RUNNING_STRATEGIES:
    return False, "Strategy already running"

The RUNNING_STRATEGIES dict is only cleaned by stop_strategy_process(). Any code path that bypasses that — natural exit, internal squareoff, scheduled stop that did not fire, container hiccup — leaves a stale entry that blocks every future start until the worker is restarted.

Bug 2 — `scheduled_start_strategy` discards the failure

logger.info(f"... all checks passed, starting")
start_strategy_process(strategy_id)   # return value thrown away

When Bug 1 fires, there is no log line, no last_error on the config, no SSE broadcast. No log file is ever created (start_strategy_process bails before opening one), making this very hard to diagnose. From the user side it looks like the scheduler simply did not fire.

Reproduction

Schedule a strategy to start at 09:00 and stop at 16:00.
Let the strategy self-terminate before 16:00 (e.g. internal squareoff that calls sys.exit) so the child process exits without going through stop_strategy_process().
Next morning at 09:00 the scheduler ticks — strategy silently does not start, no new log file is created, and the strategy card still shows yesterday's stop time.

Fix

start_strategy_process: when a RUNNING_STRATEGIES entry already exists, validate liveness via Popen.poll() / psutil.is_running() / psutil.pid_exists() depending on the stored object type. If the process is dead, log a warning, close the log handle, drop the stale entry, and fall through to start a fresh one. If the process is alive, behaviour is unchanged.
scheduled_start_strategy: capture (success, message) and on failure log an error, set last_error on the config, and broadcast an error status update via SSE so the UI reflects reality.

Both changes are surgical (50 + / 2 −) and live entirely inside the existing PROCESS_LOCK critical section, so concurrency semantics are unchanged.

Testing

Verified hot-patched on a production deployment (v2.0.0.2) that hit this exact bug today; restart was clean and the patch loads without error.
python -m py_compile blueprints/python_strategy.py passes.
Existing callers of start_strategy_process (stop_strategy_process, the manual start route at line 1653, the post-login restoration paths at 2589/2654) are unaffected — they all pass through the same gateway and now also benefit from the stale-entry cleanup.

Summary by cubic

Fixes silent no-starts for scheduled Python strategies. Cleans stale RUNNING_STRATEGIES entries and surfaces scheduled-start failures in logs, the UI, and persisted config.

Bug Fixes
- start_strategy_process: validate existing process (poll/is_running/pid_exists); if dead, warn, close the log handle, drop the entry, then start fresh.
- scheduled_start_strategy: handle (success, message); on failure log an error, set and persist last_error on the config, and broadcast an SSE error update.

^{Written for commit e534957. Summary will update on new commits.}

…cheduled-start failures Two related bugs caused scheduled strategies to silently fail to start the day after a successful run: 1. start_strategy_process() rejects with 'Strategy already running' as soon as strategy_id is in the in-memory RUNNING_STRATEGIES dict, without checking whether the tracked process is actually alive. The dict is only cleaned by stop_strategy_process(); any path that bypasses that (natural exit, scheduler not firing scheduled_stop, container hiccup) leaves a stale entry that blocks all future starts until the worker is restarted. 2. scheduled_start_strategy() discards the (success, message) return value from start_strategy_process(), so when (1) fires there is no log line, no last_error on the config, no SSE broadcast - the user sees the scheduler tick happen and then... nothing. No log file is ever created (start_strategy_process bails before opening one), making this very hard to diagnose. Fix: - start_strategy_process(): when a RUNNING_STRATEGIES entry already exists, validate liveness via Popen.poll() / psutil.is_running() / psutil.pid_exists() depending on stored type. If the process is dead, log a warning, close the log handle, drop the stale entry and fall through to start a fresh one. - scheduled_start_strategy(): capture the result and on failure log an error, set last_error on the config, and broadcast an 'error' status update via SSE so the UI reflects reality. Reproduction: - Start a strategy via the scheduler at 09:00. - Let it self-terminate (e.g. via internal squareoff at 16:00) so the child process exits without going through stop_strategy_process(). - Next morning at 09:00, scheduler ticks - silent failure, no log file created, last run shows yesterday's stop time. After the fix the next-day start either reuses the slot cleanly, or if it genuinely fails for another reason the failure is now visible in /app/log and on the strategy card.

cubic-dev-ai

1 issue found across 1 file

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="blueprints/python_strategy.py">

<violation number="1" location="blueprints/python_strategy.py:1105">
P2: Scheduled auto-start incorrectly reports idempotent "already running" results as errors, causing false error state/SSE updates.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

marketcalls · 2026-04-24T17:04:42Z

@cubic-dev-ai what is the current status?

cubic-dev-ai · 2026-04-24T17:05:10Z

@cubic-dev-ai what is the current status?

@marketcalls The PR is currently in a ready state for final review.

Here is where we stand:

Fixes Applied: The core logic to prevent silent start failures is implemented. I've confirmed that start_strategy_process now checks for process liveness (using poll(), is_running(), or pid_exists) instead of blindly trusting the RUNNING_STRATEGIES dictionary.
Improved Visibility: Failures in scheduled_start_strategy are now properly caught, logged, and broadcasted to the UI via SSE, resolving the "silent failure" issue described in the PR.
Applied Suggestions: The latest commit includes a refinement I previously suggested for blueprints/python_strategy.py.

The changes are surgical and contained within the PROCESS_LOCK, preserving existing thread-safety. You can now proceed with merging if the manual tests on the production-like environment (v2.0.0.2) remain stable.

For more details on how I handle these reviews, you can check Cubic's documentation.

cubic-dev-ai Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread blueprints/python_strategy.py Outdated

Update blueprints/python_strategy.py

e534957

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(python_strategy): clean up stale RUNNING_STRATEGIES and surface scheduled-start failures#1297

fix(python_strategy): clean up stale RUNNING_STRATEGIES and surface scheduled-start failures#1297
RobinAThomas wants to merge 2 commits intomarketcalls:mainfrom
RobinAThomas:fix/stale-running-strategies-cleanup

RobinAThomas commented Apr 24, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

marketcalls commented Apr 24, 2026

Uh oh!

cubic-dev-ai Bot commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

RobinAThomas commented Apr 24, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Bug 1 — start_strategy_process does not validate stale in-memory entries

Bug 2 — scheduled_start_strategy discards the failure

Reproduction

Fix

Testing

Summary by cubic

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

marketcalls commented Apr 24, 2026

Uh oh!

cubic-dev-ai Bot commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RobinAThomas commented Apr 24, 2026 •

edited by cubic-dev-ai Bot

Loading

Bug 1 — `start_strategy_process` does not validate stale in-memory entries

Bug 2 — `scheduled_start_strategy` discards the failure