Fix runner workload readiness logic #260

dzirtusss · 2026-01-08T13:25:01Z

Summary

Fix runner workload update logic that could hang indefinitely
Add --ready flag to cpln apply to ensure resources are ready before proceeding
Remove redundant wait_for_runner_workload_deploy check (now handled by --ready)

Problem

cpln apply for suspended cron workloads doesn't always update lastProcessedVersion in deployment status, causing the wait loop to hang indefinitely.

Solution

Use cpln apply --ready flag which waits for the resource to be ready before returning, making the explicit wait check unnecessary.

Summary by CodeRabbit

Bug Fixes
- Fixed an issue where the run command could hang indefinitely when updating the runner workload.
New Behavior
- Deploy/apply operations now support readiness verification by default, ensuring readiness is checked before completion.
Chores / Tests
- Added an environment toggle to disable the readiness verification for test setups.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-08T13:25:11Z

Walkthrough

Removed version-based waiting from the Run command, eliminated the expected_deployed_version attribute, added an optional --ready flag to control-plane apply, documented the fix in CHANGELOG, and disabled the apply-ready behavior in tests via an env var.

Changes

Cohort / File(s)	Summary
Changelog `CHANGELOG.md`	Added a "Fixed" entry documenting the run command hang fix (links to PR `#260`, author Sergey Tarasov).
Run command `lib/command/run.rb`	Removed `expected_deployed_version` attr_reader, removed waiting calls and related private wait methods; create/update still call controlplane apply but now no version-based post-update waits.
Control plane apply `lib/core/controlplane.rb`	`apply_template` and `apply_hash` now accept `wait: false`/`wait:` param; when `wait: true` and `DISABLE_APPLY_READY` not set, `--ready` is appended to the `cpln apply` invocation.
Test setup `spec/spec_helper.rb`	Set `ENV['DISABLE_APPLY_READY'] = "true"` to disable the `--ready` behavior during tests.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Fix issue where run command fails when runner workload has ENV but original workload does not #227 — modifies update_runner_workload / runner workload wait handling in lib/command/run.rb.
Update run command based on new cpln changes #182 — earlier changes to create_runner_workload / update_runner_workload and controlplane apply behavior.
Fix race conditions in run command #163 — related adjustments around runner workload deploy waiting and expected_deployed_version tracking.

Suggested reviewers

justin808
rafaelgomesxyz
borela

Poem

🐰
I hopped through code at break of dawn,
Removed a wait that lingered on,
A tiny flag — ready, swift and light,
Tests tuck it in, out of sight,
— poem by a rabbit 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: fixing runner workload readiness logic by implementing the --ready flag to prevent hanging during updates.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch sergey/fix-runner-logic

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 682689c and 5673b97.

📒 Files selected for processing (4)

CHANGELOG.md
lib/command/run.rb
lib/core/controlplane.rb
spec/spec_helper.rb

🚧 Files skipped from review as they are similar to previous changes (1)

spec/spec_helper.rb

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: rafaelgomesxyz
Repo: shakacode/control-plane-flow PR: 183
File: examples/controlplane.yml:131-133
Timestamp: 2024-10-08T20:53:47.076Z
Learning: The `pre_deploy_hook` in the `examples/controlplane.yml` can be a command, not just a script path.

Learnt from: rafaelgomesxyz
Repo: shakacode/control-plane-flow PR: 183
File: examples/controlplane.yml:131-133
Timestamp: 2024-07-27T10:08:35.868Z
Learning: The `pre_deploy_hook` in the `examples/controlplane.yml` can be a command, not just a script path.

📚 Learning: 2024-11-01T13:56:06.103Z

Learnt from: zzaakiirr
Repo: shakacode/control-plane-flow PR: 240
File: spec/core/terraform_config/workload_spec.rb:448-521
Timestamp: 2024-11-01T13:56:06.103Z
Learning: In `spec/core/terraform_config/workload_spec.rb`, the containers are static and overrides are not needed; therefore, additional complexity like factory patterns is unnecessary.

Applied to files:

lib/command/run.rb

🧬 Code graph analysis (2)

lib/command/run.rb (2)

lib/command/base.rb (1)

cp (549-551)

lib/core/controlplane.rb (1)

apply_hash (433-435)

lib/core/controlplane.rb (1)

lib/command/apply_template.rb (1)

apply_template (161-174)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: RSpec (Fast) / rspec
GitHub Check: claude-review

🔇 Additional comments (6)

CHANGELOG.md (1)

17-20: LGTM!

The changelog entry accurately documents the bug fix and follows the established format.

lib/core/controlplane.rb (2)

433-435: LGTM!

The apply_hash method correctly delegates the wait parameter to apply_template, maintaining a clean abstraction.

407-431: The implementation correctly adds the --ready flag for the minimum required cpln CLI version (3.1.0) with proper conditional logic. The flag is only applied when wait: true and the DISABLE_APPLY_READY environment variable is not set, allowing tests to override the behavior.

The code itself is correct and ready. However, verifying that the --ready flag actually:

Behaves correctly for suspended cron workloads

Waits for resource readiness before returning

requires testing against the actual cpln CLI, which is outside the scope of code review.

lib/command/run.rb (3)

98-100: LGTM!

Removing expected_deployed_version from the public interface is correct, as the version-based wait logic has been replaced by the --ready flag approach.

151-193: Verify that wait: true ensures workload is ready before proceeding.

The change from implicit waiting to using wait: true with apply_hash is a cleaner approach. However, ensure that the --ready flag behavior is sufficient for the create_runner_workload use case, particularly for suspended cron workloads.

Consider whether any additional checks are needed after the workload is created (e.g., verifying the workload is in the expected suspended state).

195-244: Verify that wait: true handles suspended workload updates correctly.

The elimination of explicit wait logic in favor of wait: true is consistent with the PR objectives. However, since the original bug was related to suspended cron workloads not updating lastProcessedVersion, please verify that:

The --ready flag correctly handles suspended workloads (which may have different readiness semantics)

The update completes successfully before start_job is called on line 130

Based on the PR description, this should be the core fix for the hanging issue.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2026-01-08T13:27:01Z

Code Review - PR #260: Fix runner workload readiness logic

Summary

This PR addresses a critical issue where cpln apply for suspended cron workloads could hang indefinitely when lastProcessedVersion wasn't updated. The solution uses the --ready flag and removes redundant wait logic.

✅ Strengths

Good Problem Identification: The PR correctly identifies that suspended cron workloads may not update lastProcessedVersion, causing indefinite hangs.
Cleaner Code: Removing 21 lines of code (two wait methods and expected_deployed_version tracking) simplifies the implementation significantly.
Better Abstraction: Using --ready flag delegates readiness checking to the Control Plane CLI tool itself, which is more reliable than custom polling logic.
Preserved Test Coverage: The existing test suite in spec/command/run_spec.rb should catch regressions.

🔍 Potential Concerns & Questions

1. Error Handling & Timeout Behavior

The --ready flag will wait for resources to be ready, but:

Question: Does cpln apply --ready have its own timeout mechanism? If it hangs, will users have a way to interrupt or know what's happening?
Suggestion: Consider documenting the timeout behavior of --ready flag or ensuring there's appropriate timeout handling.

2. Behavior Change for Non-Suspended Workloads

The old code had wait_for_runner_workload_deploy that checked if lastProcessedVersion was not nil.
The new code applies --ready to ALL workload updates, not just suspended ones.
Question: Does --ready behave correctly for all workload types (standard, cron, etc.)? Have you tested this with non-suspended workloads?

3. Missing Context in Error Messages

The old implementation had explicit step messages:

"Waiting for runner workload to be deployed"
"Waiting for runner workload to be updated"

With --ready, users might see less informative output if the command hangs. Consider whether additional logging/progress indicators are needed.

4. Potential Race Condition

The code flow is now:

create_runner_workload if cp.fetch_workload(runner_workload).nil?
update_runner_workload  # Uses --ready flag
start_job

Question: For newly created workloads, does --ready ensure the workload is fully initialized before start_job is called?
The old code had wait_for_runner_workload_deploy after creation - this might have been intentional.

5. Test Coverage Gap

Looking at spec/command/run_spec.rb:

Line 95: Test expects "not to include('Updating runner workload')" - this might need review
Lines 231, 252, 274: Tests explicitly check for "Updating runner workload" message
Suggestion: Add a specific test for the suspended cron workload scenario that previously caused hangs

📝 Recommendations

Add Integration Test: Create a test that specifically validates the suspended cron workload scenario:

context "when runner workload is a suspended cron" do
  # Test that update completes without hanging
end

Document --ready Behavior: Add a comment explaining what --ready does:

# Use --ready flag to wait for resource to be ready before returning.
# This handles suspended cron workloads that don't update lastProcessedVersion.
cmd = "cpln apply #{gvc_org} --file #{f.path} --ready"

Verify Backwards Compatibility: Ensure existing tests pass, especially:
- spec/command/run_spec.rb:227-233 (updates runner workload with non-default values)
- spec/command/run_spec.rb:247-253 (updates runner workload with different ENV)
Consider Timeout Configuration: If --ready can hang, consider allowing users to configure timeout behavior.

🎯 Overall Assessment

Verdict: ✅ Approve with minor suggestions

This is a solid improvement that:

✅ Fixes a real bug with suspended cron workloads
✅ Simplifies the codebase significantly
✅ Uses a more reliable approach (delegating to CLI)

The main concerns are around edge cases and ensuring --ready behaves correctly in all scenarios. I recommend:

Running the full test suite to ensure no regressions
Testing manually with both suspended and non-suspended workloads
Adding documentation/comments about the --ready flag behavior

Great work identifying and fixing this issue! 🚀

claude · 2026-01-08T13:30:42Z

Pull Request Review: Fix runner workload readiness logic

Overview

This PR addresses a critical bug where the run command could hang indefinitely when updating suspended cron workloads. The fix simplifies the logic by leveraging the --ready flag in cpln apply instead of manually polling for deployment status.

Code Quality and Best Practices

Strengths:

Simplification: The removal of ~20 lines of complex polling logic in favor of the --ready flag is a significant improvement. This follows the principle of using built-in functionality which is more maintainable.
Clean refactoring: The removal of expected_deployed_version, wait_for_runner_workload_deploy, and wait_for_runner_workload_update methods eliminates unnecessary state tracking.
Consistency: The --ready flag is now used universally in apply_template (lib/core/controlplane.rb:411), making the codebase more consistent.
Good documentation: The CHANGELOG entry clearly explains the fix and links to the PR.

Suggestions:

Remove unused attr_reader: Line 100 in lib/command/run.rb still lists expected_deployed_version in the attr_reader, but this variable is no longer used after the changes. This should be removed for consistency.

Potential Bugs and Issues

Critical Observation:

The PR removes the explicit waiting logic. Both create_runner_workload (line 128) and update_runner_workload (line 242) use cp.apply_hash which internally calls apply_template with --ready. This means operations will now block until ready, which is the intended fix and should resolve the hanging issue.

Concern - Error Handling:
The --ready flag may fail if a workload cannot reach a ready state. Consider what happens if a suspended cron workload has configuration errors or the deployment times out before reaching ready state. The current code in apply_template has error handling, but it is unclear if cpln apply --ready provides meaningful error messages when it fails to reach ready state.

Recommendation: Consider documenting the timeout behavior of cpln apply --ready or adding logging to help debug if this blocks longer than expected.

Performance Considerations

Improvements:

Reduced polling: Eliminates the Ruby-side polling loop that was continuously checking cron_workload_deployed_version.
Less API calls: The old implementation made repeated API calls to check deployment status. The new approach delegates this to the CLI.

Potential concerns:

Blocking behavior: The command now blocks during both creation and update of runner workloads. However, this is the correct behavior - we should wait for the workload to be ready before proceeding.

Security Concerns

No security issues identified. The changes are purely related to deployment synchronization logic.

Test Coverage

Good news:
The existing test suite in spec/command/run_spec.rb is comprehensive and covers interactive/non-interactive modes, workload creation/updates, different configurations, and various scenarios.

Observations:

Line 95 checks that Updating runner workload is NOT in stderr for the success case
Lines 231 and 251 check that Updating runner workload IS present when updates are needed

Recommendation: Consider adding a specific test case for the bug this PR fixes to catch potential regressions.

Additional Observations

Method cron_workload_deployed_version still exists in lib/core/controlplane.rb:301-309 but is no longer used. Consider removing it if unused elsewhere.
Line 242 in lib/command/run.rb could benefit from a comment explaining that apply_hash now waits for readiness via --ready flag.

Summary

This is a well-executed bug fix that simplifies the code while solving a real problem. The approach of using cpln apply --ready is much cleaner than manual polling.

Action items before merge:

Remove expected_deployed_version from attr_reader on line 100
Consider if cron_workload_deployed_version method should be removed
Optional: Add comment explaining --ready behavior

Overall assessment: Approved - This PR significantly improves reliability and code quality. The minor cleanup items are nice-to-haves but not blockers.

Great work on identifying and fixing this subtle bug!

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9b4c8ea and 3dfa17b.

📒 Files selected for processing (3)

CHANGELOG.md
lib/command/run.rb
lib/core/controlplane.rb

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: rafaelgomesxyz
Repo: shakacode/control-plane-flow PR: 183
File: examples/controlplane.yml:131-133
Timestamp: 2024-10-08T20:53:47.076Z
Learning: The `pre_deploy_hook` in the `examples/controlplane.yml` can be a command, not just a script path.

Learnt from: rafaelgomesxyz
Repo: shakacode/control-plane-flow PR: 183
File: examples/controlplane.yml:131-133
Timestamp: 2024-07-27T10:08:35.868Z
Learning: The `pre_deploy_hook` in the `examples/controlplane.yml` can be a command, not just a script path.

🧬 Code graph analysis (1)

lib/core/controlplane.rb (1)

lib/core/shell.rb (1)

cmd (66-73)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: claude-review
GitHub Check: RSpec (Fast) / rspec

🔇 Additional comments (3)

CHANGELOG.md (1)

17-20: LGTM!

The changelog entry clearly documents the bug fix and follows the proper format.

lib/command/run.rb (2)

100-100: LGTM!

The removal of expected_deployed_version from the attribute reader is consistent with the elimination of explicit deployment version tracking. The new --ready flag approach in apply_template now handles readiness implicitly.

195-244: LGTM!

The simplified update logic correctly relies on the --ready flag behavior added to apply_template in lib/core/controlplane.rb. The removal of explicit wait loops should resolve the indefinite hang issue described in the PR objectives, assuming the --ready flag works correctly for suspended cron workloads.

coderabbitai · 2026-01-08T13:32:35Z

lib/core/controlplane.rb

      f.write(data)
      f.rewind
-      cmd = "cpln apply #{gvc_org} --file #{f.path}"
+      cmd = "cpln apply #{gvc_org} --file #{f.path} --ready"


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

cpln apply --ready flag documentation Control Plane CLI

💡 Result:

I couldn't find any documented --ready flag for the cpln apply command in the official Control Plane CLI docs — the cpln apply guide and CLI reference list flags like --file, --gvc, --k8s and stdin usage but make no mention of --ready. [1][2]

If you meant a different command or a newer CLI version, tell me the version (cpln --version) and I’ll re-check the release notes and docs for that version.

Sources:

cpln apply (Apply YAML Manifests) guide. [1]

Control Plane CLI reference / overview. [2]

🏁 Script executed:

# Find cpln version specification in the project find . -name "*.gemfile" -o -name "Gemfile*" -o -name "*.yml" -o -name "*.yaml" | head -20

Repository: shakacode/control-plane-flow

Length of output: 792

🏁 Script executed:

# Search for references to --ready flag in the codebase rg "\-\-ready" --type rb

Repository: shakacode/control-plane-flow

Length of output: 98

🏁 Script executed:

# Look for lastProcessedVersion references rg "lastProcessedVersion" --type rb -i

Repository: shakacode/control-plane-flow

Length of output: 98

🏁 Script executed:

# Check the suspended cron workload reference at lib/command/run.rb line 173 head -n 180 lib/command/run.rb | tail -n 20

Repository: shakacode/control-plane-flow

Length of output: 789

🏁 Script executed:

# Check Gemfile for cpln CLI version specification cat Gemfile

Repository: shakacode/control-plane-flow

Length of output: 482

🏁 Script executed:

# Search for --ready flag in all Ruby files (fixing the rg command) rg "\-\-ready" .

Repository: shakacode/control-plane-flow

Length of output: 159

🏁 Script executed:

# Search for lastProcessedVersion references rg "lastProcessedVersion" .

Repository: shakacode/control-plane-flow

Length of output: 184

🏁 Script executed:

# Check for any cpln version or CLI setup rg "cpln" Gemfile Gemfile.lock 2>/dev/null || echo "No cpln in Gemfile"

Repository: shakacode/control-plane-flow

Length of output: 90

🏁 Script executed:

# Look for CLI documentation or version info in the project find . -name "*.md" -type f | xargs grep -l "ready" 2>/dev/null | head -5

Repository: shakacode/control-plane-flow

Length of output: 159

🏁 Script executed:

# Check README for cpln version requirements grep -i "cpln" README.md | head -20

Repository: shakacode/control-plane-flow

Length of output: 2009

🏁 Script executed:

# Check docs for cpln version info grep -i "cpln\|version\|ready" docs/*.md | grep -i "ready\|version" | head -10

Repository: shakacode/control-plane-flow

Length of output: 945

🏁 Script executed:

# Look at the context around lastProcessedVersion in controlplane.rb grep -B 10 -A 10 "lastProcessedVersion" lib/core/controlplane.rb

Repository: shakacode/control-plane-flow

Length of output: 701

🏁 Script executed:

# Check if there's any PR description or CHANGELOG mentioning --ready find . -name "CHANGELOG*" -o -name "*.md" | xargs grep -l "ready" 2>/dev/null

Repository: shakacode/control-plane-flow

Length of output: 188

🏁 Script executed:

# Look for any .git directory and check recent commits mentioning --ready or lastProcessedVersion git log --all --oneline --grep="ready\|lastProcessedVersion" 2>/dev/null | head -20

Repository: shakacode/control-plane-flow

Length of output: 109

🏁 Script executed:

# Search for any issue references or PR discussions find . -name "*.md" -o -name "*.txt" | xargs grep -i "ready.*flag\|--ready" 2>/dev/null

Repository: shakacode/control-plane-flow

Length of output: 56

🏁 Script executed:

# Check if there's version history or release notes find . -name "CHANGELOG*" -o -name "HISTORY*" -o -name "RELEASES*" 2>/dev/null

Repository: shakacode/control-plane-flow

Length of output: 86

🏁 Script executed:

# Look at surrounding code context - check what changed around line 411 sed -n '400,425p' lib/core/controlplane.rb

Repository: shakacode/control-plane-flow

Length of output: 857

🏁 Script executed:

# Search for any control plane API or CLI documentation URLs in the codebase rg "control.*plane.*cli|cpln.*doc|api.*reference" -i . 2>/dev/null | head -10

Repository: shakacode/control-plane-flow

Length of output: 959

🏁 Script executed:

# Check CHANGELOG for --ready flag or recent cpln changes cat CHANGELOG.md | head -50

Repository: shakacode/control-plane-flow

Length of output: 3075

🏁 Script executed:

# Also search CHANGELOG for lastProcessedVersion or ready-related changes grep -i "ready\|lastProcessedVersion\|cpln apply" CHANGELOG.md | head -20

Repository: shakacode/control-plane-flow

Length of output: 54

The --ready flag is not documented in the official Control Plane CLI, and its availability remains unverified.

The web search found no mention of this flag in the official cpln apply documentation. Additionally:

No cpln version constraint exists in the project dependencies to specify when this flag became available

No CHANGELOG entries document the --ready flag or its behavior

The flag appears only in this one location, with no error handling or fallback logic if it fails

How this flag specifically handles the lastProcessedVersion issue for suspended cron workloads is unclear

Before merging, confirm:

Which cpln CLI version introduced the --ready flag

That the flag is available in the project's minimum supported cpln version

That it correctly resolves the lastProcessedVersion polling issue for suspended cron workloads (as referenced in the cron_workload_deployed_version method in this same file)

claude · 2026-01-08T14:11:17Z

Code Review for PR #260: Fix runner workload readiness logic

Summary

This PR simplifies the runner workload update logic by leveraging the cpln apply --ready flag instead of manually polling for deployment status. This is a well-targeted bug fix that removes redundant code and resolves a real issue where suspended cron workloads could hang indefinitely.

✅ Strengths

Root Cause Fix: Using --ready flag delegates the readiness check to the cpln CLI itself, which is more reliable than manual polling. This is the right approach.
Code Simplification: Removes 17 lines of code (2 methods and 1 instance variable), reducing complexity and maintenance burden.
Proper Testing Consideration: Added DISABLE_APPLY_READY environment variable to disable the flag during tests, preventing test slowdowns. This shows thoughtful design.
Good Documentation: CHANGELOG entry clearly explains the fix with attribution.

🔍 Code Quality

lib/command/run.rb (line 100)

✅ Removed expected_deployed_version from attr_reader - good cleanup
✅ Removed obsolete methods wait_for_runner_workload_deploy and wait_for_runner_workload_update
✅ Simplified flow in the call method

lib/core/controlplane.rb (line 412)

cmd += " --ready" unless ENV.fetch("DISABLE_APPLY_READY", nil)

✅ Uses ENV.fetch with default nil for safety
ℹ️ Minor suggestion: Consider ENV["DISABLE_APPLY_READY"] instead of ENV.fetch("DISABLE_APPLY_READY", nil) for brevity since they're equivalent

spec/spec_helper.rb (line 7)

✅ Correctly disables the flag for all tests to maintain test performance

🐛 Potential Issues

1. Broader Impact on All `apply` Operations

The --ready flag is now added to ALL cpln apply calls throughout the codebase, not just runner workload updates. This affects:

lib/command/apply_template.rb
lib/command/setup_app.rb
Any other code calling apply_hash or apply_template

Questions:

Have you tested other apply operations to ensure they don't timeout or behave unexpectedly with --ready?
Are there scenarios where we don't want to wait for readiness (e.g., applying multiple resources quickly)?
What is the timeout behavior of cpln apply --ready? Could it hang indefinitely in other contexts?

Recommendation: Consider whether the flag should be:

Added selectively (only for workload updates)
Made configurable per call
Or confirmed that it's safe globally (which may be the case, but needs verification)

2. Error Handling

If cpln apply --ready fails or times out, what error message does the user see? Ensure the error handling in apply_template (lib/core/controlplane.rb:407-429) provides clear feedback.

3. Test Coverage Gap

The existing tests in spec/command/run_spec.rb disable the --ready flag via DISABLE_APPLY_READY. This means:

✅ Tests won't be slowed down
❌ The actual fix (using --ready) is not tested

Recommendation: Add at least one integration test that:

Does NOT set DISABLE_APPLY_READY
Verifies that runner workload updates complete successfully with --ready
Potentially tests the bug scenario (suspended cron workload) if feasible

🔒 Security Considerations

✅ No security concerns identified
✅ No user input involved in the flag addition
✅ Environment variable check is safe

⚡ Performance Considerations

Positive:

Removes redundant polling, reducing API calls to Control Plane
More efficient overall flow

Potential Concern:

The --ready flag will now cause ALL apply operations to wait for readiness. For rapid successive applies or large deployments, this could:
- Slow down multi-resource deployments
- Increase overall deployment time if applied to resources that don't need immediate readiness

Question: What's the typical wait time for --ready? Is there a timeout?

📝 Suggested Improvements

Add a test without DISABLE_APPLY_READY to verify the fix works in production-like conditions

Document the --ready flag behavior in comments:

# The --ready flag waits for the resource to reach a ready state before returning.
# This prevents race conditions when applying workload updates, especially for cron workloads
# that may not update lastProcessedVersion immediately.
cmd += " --ready" unless ENV["DISABLE_APPLY_READY"]

Consider making it configurable if broader impact is a concern:

def apply_template(data, wait_for_ready: true)
  # ...
  cmd += " --ready" if wait_for_ready && !ENV["DISABLE_APPLY_READY"]

Verify timeout behavior: Ensure cpln apply --ready has reasonable timeout defaults or consider adding timeout configuration

🧪 Test Coverage Assessment

Current Coverage:

✅ Existing integration tests cover the run command flow
✅ Tests verify workload updates work
❌ Tests bypass the actual --ready flag behavior

Recommendation:
Add one slow test without DISABLE_APPLY_READY to verify the fix, or document why testing with the flag enabled is impractical.

✅ Verdict

This is a solid bug fix with good simplification. The core change is correct, but there are a few areas for consideration:

Major: Verify the broader impact of adding --ready to all apply operations
Medium: Add test coverage for the actual fix (without DISABLE_APPLY_READY)
Minor: Add inline documentation about the flag's purpose

Recommendation: Approve with minor changes. Consider addressing points 1-2 before merging if possible.

Great work on identifying and fixing this race condition! The approach of using the native --ready flag is much cleaner than manual polling. 👍

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3dfa17b and 682689c.

📒 Files selected for processing (4)

CHANGELOG.md
lib/command/run.rb
lib/core/controlplane.rb
spec/spec_helper.rb

🚧 Files skipped from review as they are similar to previous changes (1)

CHANGELOG.md

🧰 Additional context used

🧠 Learnings (4)

📓 Common learnings

Learnt from: rafaelgomesxyz
Repo: shakacode/control-plane-flow PR: 183
File: examples/controlplane.yml:131-133
Timestamp: 2024-10-08T20:53:47.076Z
Learning: The `pre_deploy_hook` in the `examples/controlplane.yml` can be a command, not just a script path.

Learnt from: rafaelgomesxyz
Repo: shakacode/control-plane-flow PR: 183
File: examples/controlplane.yml:131-133
Timestamp: 2024-07-27T10:08:35.868Z
Learning: The `pre_deploy_hook` in the `examples/controlplane.yml` can be a command, not just a script path.

📚 Learning: 2024-11-01T13:43:58.455Z

Learnt from: zzaakiirr
Repo: shakacode/control-plane-flow PR: 240
File: spec/core/terraform_config/generator_spec.rb:337-345
Timestamp: 2024-11-01T13:43:58.455Z
Learning: In `spec/core/terraform_config/generator_spec.rb`, prefer to keep configuration values inline in the tests rather than extracting them into named constants.

Applied to files:

spec/spec_helper.rb

📚 Learning: 2024-11-01T13:55:15.053Z

Learnt from: zzaakiirr
Repo: shakacode/control-plane-flow PR: 240
File: spec/core/terraform_config/workload_spec.rb:24-409
Timestamp: 2024-11-01T13:55:15.053Z
Learning: The user prefers to keep tests in `spec/core/terraform_config/workload_spec.rb` written without shared examples, as they find the current version easier to read.

Applied to files:

spec/spec_helper.rb

📚 Learning: 2024-11-01T11:39:47.992Z

Learnt from: zzaakiirr
Repo: shakacode/control-plane-flow PR: 240
File: lib/command/terraform/generate.rb:28-36
Timestamp: 2024-11-01T11:39:47.992Z
Learning: In `lib/command/terraform/generate.rb`, the `generate_app_config` method uses `File.write` with `mode: "a+"` because multiple templates with the same kind can appear, and appending to existing files is necessary.

Applied to files:

lib/core/controlplane.rb

🧬 Code graph analysis (1)

lib/core/controlplane.rb (1)

lib/core/shell.rb (1)

cmd (66-73)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: claude-review
GitHub Check: RSpec (Fast) / rspec

🔇 Additional comments (3)

spec/spec_helper.rb (1)

7-7: LGTM!

Disabling the --ready flag in tests is appropriate. This prevents tests from blocking while waiting for resource readiness and maintains existing test behavior.

lib/command/run.rb (2)

100-100: LGTM!

The removal of expected_deployed_version from the public interface is correct, as version tracking is no longer needed with the --ready flag handling deployment readiness.

128-130: Ensure --ready flag handles suspended workloads correctly.

The explicit wait logic for runner workload deployment and updates has been removed, relying instead on the --ready flag in cpln apply to handle readiness. However, the runner workload is created as suspended (line 173 sets suspend: true), and the PR description mentions that suspended cron workloads don't always update lastProcessedVersion.

Verify that:

The --ready flag returns successfully for suspended workloads (doesn't wait indefinitely for a workload that will never become "ready" while suspended)

The workload is in the correct state when start_job executes at line 130

This verification is related to the one requested in lib/core/controlplane.rb.

coderabbitai · 2026-01-08T14:13:38Z

lib/core/controlplane.rb

      f.write(data)
      f.rewind
      cmd = "cpln apply #{gvc_org} --file #{f.path}"
+      cmd += " --ready" unless ENV.fetch("DISABLE_APPLY_READY", nil)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify cpln CLI supports --ready flag and check its behavior # Check if --ready flag is documented in cpln apply help echo "=== Checking cpln apply help for --ready flag ===" cpln apply --help | grep -i "ready" || echo "No --ready flag found in help output" # Search codebase for cpln version requirements or constraints echo -e "\n=== Checking for cpln version specifications ===" rg -n "cpln.*version|CPLN.*VERSION" -g '!*.md' -g '!CHANGELOG*' # Check for any references to cpln in installation/setup files echo -e "\n=== Checking for cpln installation or version specs ===" find . -maxdepth 2 \( -name "*.gemfile" -o -name "Gemfile*" -o -name "*.yaml" -o -name "*.yml" -o -name "*.json" \) -type f | head -20

Repository: shakacode/control-plane-flow

Length of output: 1676

🌐 Web query:

Control Plane CLI cpln apply --ready flag documentation and behavior with suspended cron workloads

💡 Result:

Short answer: cpln apply --ready waits for created/updated workloads to reach a "ready" state; because a suspended workload (including a suspended cron whose replicas are 0) will not create running replicas, it will not become ready—so using --ready against a suspended cron will block until the CLI's wait times out (or indefinitely if no timeout is supplied). [1][2]

What to do:

Remove the workload's suspend (unsuspend/start it) before running cpln apply --ready, or

Omit --ready when applying resources you intend to keep suspended, or

Unsuspend as a separate step after apply and then wait for readiness. [2][1]

Sources:

cpln apply / release note adding --ready flag. [1]

Workload "Suspend" behavior (suspend sets scale to 0 / stops serving traffic). [2]

Critical issue: --ready flag will hang indefinitely when applying suspended cron workloads.

The --ready flag does exist in the cpln apply command and waits for workloads to reach a "ready" state. However, suspended cron workloads (with replicas set to 0) will never become ready, causing cpln apply --ready to block indefinitely or until timeout.

The current code at line 412 unconditionally adds the --ready flag unless DISABLE_APPLY_READY is explicitly set. This creates a significant risk: users applying configurations with suspended cron workloads will experience hangs.

Required solutions:

Detect suspended workloads and conditionally omit --ready, or

Document this behavior prominently and require explicit DISABLE_APPLY_READY for workflows with suspended resources, or

Implement a timeout mechanism with clear user messaging.

claude · 2026-01-08T14:29:44Z

Pull Request Review: Fix runner workload readiness logic

Summary

This PR addresses a critical bug where the run command could hang indefinitely when updating suspended cron workloads. The solution simplifies the logic by using cpln apply --ready to handle readiness verification, removing the need for manual version tracking.

Code Quality ✅

Strengths:

Simplification: The change removes 20 lines of complex version-tracking logic and replaces it with a cleaner, more reliable approach
Consistency: Uses the same wait: true parameter pattern for both create_runner_workload and update_runner_workload (lib/command/run.rb:191, 242)
Proper encapsulation: The wait parameter is cleanly threaded through apply_hash → apply_template in lib/core/controlplane.rb:433-434
Documentation: CHANGELOG.md properly documents the fix with attribution

Minor observations:

The removed methods wait_for_runner_workload_deploy, wait_for_runner_workload_update, and instance variable @expected_deployed_version are cleanly eliminated without leaving dead code
Code follows the repository's RuboCop style conventions (double quotes, etc.)

Potential Issues 🔍

1. Test Coverage Gap

The PR adds ENV["DISABLE_APPLY_READY"] = "true" to spec/spec_helper.rb:7, which means the new --ready flag behavior is not tested. The existing tests in spec/command/run_spec.rb will continue to use the old behavior.

Recommendation: Consider adding at least one test that exercises the --ready flag behavior by:

Creating a test with DISABLE_APPLY_READY unset or set to nil
Verifying that the workload is ready after apply completes
This would increase confidence that the fix actually resolves the hanging issue

2. Environment Variable Pattern

The use of ENV.fetch("DISABLE_APPLY_READY", nil).nil? (lib/core/controlplane.rb:412) is correct but could be more idiomatic:

# Current
cmd += " --ready" if wait && ENV.fetch("DISABLE_APPLY_READY", nil).nil?

# Alternative (more explicit)
cmd += " --ready" if wait && ENV["DISABLE_APPLY_READY"] != "true"

Both work, but the second is slightly clearer about intent. Not a blocker.

3. Backward Compatibility

The change fundamentally alters when cpln apply returns:

Before: Returned immediately after submitting the update
After: Waits for the resource to be ready (when wait: true)

This is the intended fix, but consider:

Impact on CI/CD pipelines: Timeouts may need adjustment if workloads take a long time to become ready
Error handling: What happens if a workload never becomes ready? Does cpln apply --ready have its own timeout?

Recommendation: Verify that cpln apply --ready has reasonable timeout behavior and document any known limitations.

Performance Considerations ⚡

Positive impact:

Eliminates polling: The old code used step(..., retry_on_failure: true) which likely polled repeatedly. The --ready flag should be more efficient as it's built into cpln
Reduces API calls: No more separate calls to check cron_workload_deployed_version

Potential concern:

The --ready flag blocks synchronously. For workloads that take a long time to deploy, this could appear to hang to users (even though it's actually working correctly). Consider adding progress indication if cpln apply --ready doesn't provide it.

Security Concerns 🔒

No security issues identified. The change:

Doesn't introduce new input validation requirements
Doesn't expose sensitive data
Doesn't change authentication/authorization logic
The DISABLE_APPLY_READY env var is appropriately scoped to tests only

Test Coverage 📊

Current state:

✅ Existing tests pass (tests explicitly disable the new behavior)
⚠️ New --ready flag path is not tested
✅ Error cases (workload not found) remain covered
✅ Various run scenarios (interactive, detached, different images) remain covered

Missing coverage:

Test that verifies workload readiness after creation/update
Test for behavior when DISABLE_APPLY_READY is not set
Test for timeout/failure scenarios with --ready flag

Architecture Considerations 🏗️

Good decisions:

Using the platform's built-in --ready flag is more reliable than reimplementing version tracking
The cron_workload_deployed_version method (lib/core/controlplane.rb:301-309) is preserved for other potential uses, avoiding premature deletion

Question:

Is cron_workload_deployed_version still used elsewhere? If not, consider removing it in a follow-up PR to reduce maintenance burden. A quick grep would clarify:
```
grep -r "cron_workload_deployed_version" --include="*.rb"
```

Recommendations Summary

High Priority:

Add test coverage for the --ready flag behavior to ensure the fix works as intended

Medium Priority:

Document timeout behavior of cpln apply --ready if not already covered
Verify that cron_workload_deployed_version is still needed elsewhere

Low Priority (Nice to have):

Consider more idiomatic env var check pattern
Consider adding progress indication for long-running deployments

Conclusion ✨

This is a well-executed fix that simplifies the code while solving a critical bug. The main concern is the lack of test coverage for the new behavior due to DISABLE_APPLY_READY being set in tests. The implementation is clean, follows conventions, and should significantly improve reliability of the run command.

Overall Assessment: Approve with recommendation to add test coverage for the new --ready flag behavior.

Great work on identifying and fixing this issue! 🎉

dzirtusss force-pushed the sergey/fix-runner-logic branch from 4dd0605 to 3dfa17b Compare January 8, 2026 13:27

coderabbitai bot added the bug Something isn't working label Jan 8, 2026

coderabbitai bot requested changes Jan 8, 2026

View reviewed changes

dzirtusss force-pushed the sergey/fix-runner-logic branch from 3dfa17b to 682689c Compare January 8, 2026 14:09

coderabbitai bot requested changes Jan 8, 2026

View reviewed changes

Fix runner readiness logic check

5673b97

dzirtusss force-pushed the sergey/fix-runner-logic branch from 682689c to 5673b97 Compare January 8, 2026 14:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix runner workload readiness logic #260

Fix runner workload readiness logic #260

Uh oh!

dzirtusss commented Jan 8, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 8, 2026 •

edited

Loading

Uh oh!

claude bot commented Jan 8, 2026

Uh oh!

claude bot commented Jan 8, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 8, 2026

Uh oh!

claude bot commented Jan 8, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 8, 2026

Uh oh!

claude bot commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix runner workload readiness logic #260

Are you sure you want to change the base?

Fix runner workload readiness logic #260

Uh oh!

Conversation

dzirtusss commented Jan 8, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

claude bot commented Jan 8, 2026

Code Review - PR #260: Fix runner workload readiness logic

Summary

✅ Strengths

🔍 Potential Concerns & Questions

1. Error Handling & Timeout Behavior

2. Behavior Change for Non-Suspended Workloads

3. Missing Context in Error Messages

4. Potential Race Condition

5. Test Coverage Gap

📝 Recommendations

🎯 Overall Assessment

Uh oh!

claude bot commented Jan 8, 2026

Pull Request Review: Fix runner workload readiness logic

Overview

Code Quality and Best Practices

Potential Bugs and Issues

Performance Considerations

Security Concerns

Test Coverage

Additional Observations

Summary

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot commented Jan 8, 2026

Code Review for PR #260: Fix runner workload readiness logic

Summary

✅ Strengths

🔍 Code Quality

🐛 Potential Issues

1. Broader Impact on All apply Operations

2. Error Handling

3. Test Coverage Gap

🔒 Security Considerations

⚡ Performance Considerations

📝 Suggested Improvements

🧪 Test Coverage Assessment

✅ Verdict

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot commented Jan 8, 2026

Pull Request Review: Fix runner workload readiness logic

Summary

Code Quality ✅

Potential Issues 🔍

1. Test Coverage Gap

2. Environment Variable Pattern

3. Backward Compatibility

dzirtusss commented Jan 8, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 8, 2026 •

edited

Loading

1. Broader Impact on All `apply` Operations