Skip to content

Conversation

@MasterPtato
Copy link
Contributor

No description provided.

Copy link
Contributor Author

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@MasterPtato MasterPtato force-pushed the 01-07-fix_add_tokio_worker_threads_subtract_break_earlier_when_allocating_pending_actors branch from dce9507 to d1b9f5c Compare January 7, 2026 20:29
@MasterPtato MasterPtato force-pushed the 01-06-chore_remove_method_from_key_for_consistently branch from 39a5303 to 33639ab Compare January 7, 2026 20:29
@claude
Copy link

claude bot commented Jan 7, 2026

PR Review: Add TOKIO_WORKER_THREADS_SUBTRACT and Fix Loop Breaking

Summary

This PR makes two improvements:

  1. Adds TOKIO_WORKER_THREADS_SUBTRACT env var to dynamically set Tokio worker threads based on available parallelism
  2. Fixes loop breaking logic in allocate_pending_actors to properly exit the outer loop

Code Quality ✅

engine/packages/engine/src/main.rs

  • Good adherence to CLAUDE.md guidelines by avoiding glob imports from anyhow
  • Changed from use anyhow::* to use anyhow::Result

engine/packages/pegboard/src/workflows/runner2.rs

  • Correctly uses labeled break statements (break 'queue_loop) to exit the outer loop
  • Previously, break statements on lines 715, 725, and 733 would only break the inner loop, not the outer 'queue_loop, which was likely a bug
  • This fix ensures that when runners run out or version mismatches occur, the entire queue processing stops as intended ✓

engine/packages/runtime/src/lib.rs

  • New TOKIO_WORKER_THREADS_SUBTRACT feature is well-implemented with proper safeguards
  • Uses saturating_sub() to prevent underflow
  • Ensures minimum of 1 thread with .max(1)

Potential Issues & Suggestions

1. Error Handling - Missing Parse Error Handling ⚠️

engine/packages/runtime/src/lib.rs:132

The code uses .parse().unwrap() which will panic if the env var contains invalid input:

.saturating_sub(subtract_worker_threads.parse().unwrap())

Recommendation: Add error handling consistent with other env var parsing in the file, or at minimum add a descriptive expect message:

.saturating_sub(
    subtract_worker_threads
        .parse()
        .expect("TOKIO_WORKER_THREADS_SUBTRACT must be a valid integer")
)

Better yet, handle the error gracefully with a warning and fallback to default behavior.

2. Thread Count Edge Case 🤔

engine/packages/runtime/src/lib.rs:128-135

When TOKIO_WORKER_THREADS_SUBTRACT is set to a value >= available parallelism, the result will be 1 thread. This might be intentional but could cause performance issues if misconfigured.

Recommendation: Consider adding a warning log if the resulting thread count is significantly reduced:

let available = std::thread::available_parallelism()
    .map_or(1, std::num::NonZeroUsize::get);
let subtract = subtract_worker_threads.parse().unwrap();
let threads = available.saturating_sub(subtract).max(1);

if threads == 1 && available > 2 {
    tracing::warn!(
        available,
        subtract,
        "TOKIO_WORKER_THREADS_SUBTRACT resulted in only 1 worker thread"
    );
}

rt_builder.worker_threads(threads);

3. Missing Documentation 📝

The new TOKIO_WORKER_THREADS_SUBTRACT env var isn't documented. Consider:

  • Adding a comment explaining the use case (e.g., reserving CPU cores for non-Tokio workloads)
  • Documenting valid values and behavior

Performance Considerations ✅

The loop fix in allocate_pending_actors should improve performance by:

  • Avoiding unnecessary iteration when runners are exhausted
  • Properly respecting the early timeout check by breaking completely out of nested loops

The thread count calculation is efficient and should have negligible overhead.

Security Concerns ✅

No security issues identified. The changes are internal configuration and control flow improvements.

Test Coverage 🧪

Missing: No tests for the new TOKIO_WORKER_THREADS_SUBTRACT functionality.

Recommendation: Consider adding unit tests for:

  • Valid subtract values
  • Edge cases (subtract >= available parallelism)
  • Invalid input handling (once error handling is added)
  • Zero/negative values

For the loop fix, integration tests covering the allocate_pending_actors workflow would be valuable if they don't already exist.

Additional Notes

The labeled break fix suggests this was a subtle bug where the inner loop would break but processing would continue with the next queue entry. This could have caused:

  • Wasted work attempting allocations when no runners are available
  • Incorrect handling of version mismatches
  • Potential for exceeding the early timeout

Good catch on this issue! 👍


Overall Assessment: The changes are sound and fix real issues. Main concern is the missing error handling for the env var parsing. With that addressed, this PR is good to merge.

@claude
Copy link

claude bot commented Jan 7, 2026

PR Review

This PR introduces two improvements: a new environment variable for controlling Tokio worker thread counts and a fix for loop control flow in the actor allocation logic.

✅ Positive Points

  1. Good import hygiene (engine/packages/engine/src/main.rs:1): The change from glob import to specific import follows the CLAUDE.md guideline: "Do not glob import (::*) from anyhow. Instead, import individual types and traits"

  2. Clear intent (engine/packages/runtime/src/lib.rs:127-135): The new TOKIO_WORKER_THREADS_SUBTRACT feature provides a useful way to reserve CPU cores for non-Tokio work, which is helpful in multi-runtime environments

  3. Correct loop control (engine/packages/pegboard/src/workflows/runner2.rs:715,725,733): The fix from break to break 'queue_loop ensures the outer loop terminates correctly when conditions are met

⚠️ Issues to Address

1. Missing error handling (engine/packages/runtime/src/lib.rs:132)

Severity: High

The .unwrap() call on the parsed environment variable will panic if the value is not a valid integer:

.saturating_sub(subtract_worker_threads.parse().unwrap())

Recommendation: Use proper error handling with either expect() with a clear message, or unwrap_or_else() with a warning and default value.

2. Potential logical issue (engine/packages/runtime/src/lib.rs:133)

Severity: Medium

The .max(1) ensures we never set worker threads to 0 (which would be invalid for Tokio), but the logic could be clearer with intermediate variables and debug logging.

3. Missing documentation

Severity: Low

The new environment variable should be documented somewhere to help users understand what TOKIO_WORKER_THREADS_SUBTRACT does and when to use it.

📋 Additional Observations

  1. Loop labels are correct: The use of labeled breaks in allocate_pending_actors properly exits the outer loop
  2. No test coverage visible: Consider adding tests for edge cases with the new env var

📊 Summary

Code Quality: Good overall structure with one critical issue
Bugs: Error handling missing for env var parsing
Performance: No concerns
Security: No concerns
Test Coverage: Not visible in this diff

🎯 Recommendation

Request changes - Please address the error handling issue before merging. The other suggestions are optional improvements.

@claude
Copy link

claude bot commented Jan 7, 2026

PR Review: Fix add TOKIO_WORKER_THREADS_SUBTRACT, break earlier when allocating pending actors

Summary

This PR makes two distinct improvements:

  1. Adds a new TOKIO_WORKER_THREADS_SUBTRACT environment variable to configure Tokio worker threads relative to available CPU cores
  2. Fixes loop breaking behavior in allocate_pending_actors to properly exit the queue loop

Code Quality & Best Practices

✅ Positive Observations

  1. Proper anyhow import fix (engine/packages/engine/src/main.rs:1):

    • Correctly follows CLAUDE.md guidance: "Do not glob import (::*) from anyhow. Instead, import individual types and traits"
    • Changes use anyhow::*; to use anyhow::Result;
  2. Loop label fix (engine/packages/pegboard/src/workflows/runner2.rs:714-733):

    • The changes from break; to break 'queue_loop; are correct and important
    • The inner loop previously broke only itself, not the outer 'queue_loop, which could cause incorrect behavior
    • Now properly exits both loops when:
      • Stream is exhausted
      • Version is lower than highest version
      • Runner has no remaining millislots

⚠️ Issues & Concerns

1. Error Handling - Critical Issue (engine/packages/runtime/src/lib.rs:128-135)

Problem: The .parse().unwrap() call on line 132 will panic if TOKIO_WORKER_THREADS_SUBTRACT contains an invalid value.

.saturating_sub(subtract_worker_threads.parse().unwrap())

Impact: Production crashes if the environment variable is misconfigured.

Recommendation:

if let Ok(subtract_worker_threads) = env::var("TOKIO_WORKER_THREADS_SUBTRACT") {
    let subtract: usize = subtract_worker_threads
        .parse()
        .expect("TOKIO_WORKER_THREADS_SUBTRACT must be a valid number");
    rt_builder.worker_threads(
        std::thread::available_parallelism()
            .map_or(1, std::num::NonZeroUsize::get)
            .saturating_sub(subtract)
            .max(1),
    );
}

Using .expect() with a descriptive message is better than .unwrap() as it provides context when the panic occurs.

2. Potential Edge Case - Subtract Value Too Large

Scenario: If TOKIO_WORKER_THREADS_SUBTRACT is set to a value >= available cores, the calculation becomes:

available_parallelism() - subtract_value = 0 (due to saturating_sub)
max(0, 1) = 1

Question: Is 1 thread the desired fallback behavior, or should there be validation/logging when the subtract value is too large?

Recommendation: Consider adding a warning:

let available = std::thread::available_parallelism()
    .map_or(1, std::num::NonZeroUsize::get);
let subtract: usize = subtract_worker_threads
    .parse()
    .expect("TOKIO_WORKER_THREADS_SUBTRACT must be a valid number");

if subtract >= available {
    tracing::warn!(
        available,
        subtract,
        "TOKIO_WORKER_THREADS_SUBTRACT is >= available cores, using 1 thread"
    );
}

rt_builder.worker_threads(available.saturating_sub(subtract).max(1));

3. Missing TOKIO_WORKER_THREADS Handling

Observation: The diff shows TOKIO_WORKER_THREADS_SUBTRACT being added but the original TOKIO_WORKER_THREADS logic appears to be removed (based on the diff context).

Question: Was this intentional? Should both be supported, with TOKIO_WORKER_THREADS_SUBTRACT taking precedence if both are set?

Current behavior: If both env vars are set, only TOKIO_WORKER_THREADS_SUBTRACT will be used.

Recommendation: Either:

  • Document that TOKIO_WORKER_THREADS_SUBTRACT replaces TOKIO_WORKER_THREADS, OR
  • Support both with a precedence order and mutual exclusivity check

Performance Considerations

✅ Good Performance Decisions

  1. Configurable Thread Count: The new env var allows tuning worker threads based on workload, which is useful for:

    • Reserving cores for other processes
    • Testing with reduced parallelism
    • Working around Tokio scheduling issues
  2. Correct Loop Breaking: The break 'queue_loop' fixes prevent unnecessary iterations through the outer queue loop when the inner runner allocation loop determines no more work can be done.

Security Concerns

⚠️ Low Priority

  1. DoS via Environment Variables: Setting TOKIO_WORKER_THREADS_SUBTRACT=0 or a very large value could reduce system performance, but this is similar to existing TOKIO_WORKER_THREADS and requires environment access.

Test Coverage

❌ Missing Test Coverage

No tests were added for:

  1. TOKIO_WORKER_THREADS_SUBTRACT parsing and calculation logic
  2. Edge cases (subtract value too large, invalid values, etc.)
  3. The loop breaking behavior fix in allocate_pending_actors

Recommendations:

  1. Add unit tests for the worker thread calculation with various inputs:

    • Normal case: subtract < available_parallelism
    • Edge case: subtract >= available_parallelism
    • Invalid: non-numeric strings (should panic with clear message)
  2. Consider integration tests for allocate_pending_actors to verify:

    • Proper early exit when version is too low
    • Proper exit when runners have no millislots
    • Timeout behavior

Additional Observations

Documentation

Missing: No comments or documentation for the new TOKIO_WORKER_THREADS_SUBTRACT functionality. Consider adding:

  • A comment explaining the calculation: // Sets worker threads to (available_cores - N), with minimum of 1
  • Environment variable documentation in a README or config guide

Code Style

✅ Follows repository conventions:

  • Hard tabs (as per rustfmt.toml)
  • Structured logging would apply if there were log statements (there aren't any added)
  • Imports follow the pattern of not glob-importing

Summary of Recommendations

Must Fix (Blocking):

  1. Replace .unwrap() with .expect("descriptive message") on line 132

Should Consider:

  1. Add warning when subtract value >= available cores
  2. Clarify relationship with removed TOKIO_WORKER_THREADS env var
  3. Add unit tests for the new worker thread calculation
  4. Add documentation/comments for the new env var

Nice to Have:

  1. Integration tests for the loop breaking behavior
  2. Validation that prevents setting both TOKIO_WORKER_THREADS and TOKIO_WORKER_THREADS_SUBTRACT

Overall Assessment

The PR addresses two legitimate issues:

  • Proper anyhow import hygiene (follows CLAUDE.md)
  • Critical bug fix in loop breaking logic

However, the new TOKIO_WORKER_THREADS_SUBTRACT feature needs better error handling before merging to production. The .unwrap() is a potential crash point that should be addressed.

Rating: ⚠️ Needs minor fixes before merge (primarily the error handling)

@graphite-app
Copy link
Contributor

graphite-app bot commented Jan 7, 2026

Merge activity

  • Jan 7, 8:57 PM UTC: NathanFlurry added this pull request to the Graphite merge queue.
  • Jan 7, 8:57 PM UTC: CI is running for this pull request on a draft pull request (#3779) due to your merge queue CI optimization settings.
  • Jan 7, 8:58 PM UTC: Merged by the Graphite merge queue via draft PR: #3779.

graphite-app bot pushed a commit that referenced this pull request Jan 7, 2026
@graphite-app graphite-app bot closed this Jan 7, 2026
@graphite-app graphite-app bot deleted the 01-07-fix_add_tokio_worker_threads_subtract_break_earlier_when_allocating_pending_actors branch January 7, 2026 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants