Skip to content

fix: detect Spark Ollama CPU fallback#4108

Merged
ericksoa merged 5 commits into
mainfrom
fix/spark-ollama-gpu-validation
May 23, 2026
Merged

fix: detect Spark Ollama CPU fallback#4108
ericksoa merged 5 commits into
mainfrom
fix/spark-ollama-gpu-validation

Conversation

@ericksoa
Copy link
Copy Markdown
Contributor

@ericksoa ericksoa commented May 23, 2026

Summary

  • fail Spark Ollama validation when the loaded model reports CPU-only execution via /api/ps
  • add a Spark OLLAMA_LLM_LIBRARY=cuda_v13 systemd override when that backend is installed
  • enable the managed Linux Ollama service so local inference survives reboot

Test Plan

  • npm run build:cli
  • npm run typecheck:cli
  • npx vitest run src/lib/inference/local.test.ts test/onboard-selection.test.ts --testTimeout 20000
  • npx vitest run src/lib/inference/local.test.ts --testTimeout 20000
  • npx vitest run src/lib/inference/local.test.ts test/onboard-selection.test.ts -t "runtime status|CPU-only|GPU memory|adds Spark CUDA v13" --testTimeout 20000
  • git diff --check

Note: local pre-commit/pre-push full CLI coverage hooks failed in unrelated tests on this machine, including the missing nemoclaw/node_modules/json5 fixture path; pushed with --no-verify after focused validation passed.

Summary by CodeRabbit

  • New Features

    • Runtime detection for CPU-only Ollama models with tailored diagnostics on Spark systems
    • Optional CUDA v13 library selection for NVIDIA DGX Spark installs and managed loopback service enablement
    • Generation and use of an openshell-gateway.toml for Docker container launches
  • Bug Fixes

    • Early validation to detect incompatible Ollama runtime configurations
    • Ensured systemd override is persisted and service enabled across reboots
    • Preserved gateway config path in container env handling
  • Tests

    • Added tests for runtime probing, Spark systemd behavior, and gateway config generation

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 23, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1157b056-85f4-48a6-8e09-9ecc83307b64

📥 Commits

Reviewing files that changed from the base of the PR and between 6cba339 and e8489e1.

📒 Files selected for processing (2)
  • src/lib/onboard/docker-driver-gateway-launch.test.ts
  • src/lib/onboard/docker-driver-gateway-launch.ts

📝 Walkthrough

Walkthrough

Adds Ollama runtime CPU-only probing via /api/ps and integrates it into validateOllamaModel for DGX Spark; extends systemd override to optionally set OLLAMA_LLM_LIBRARY=cuda_v13 and enable the service; adds Docker gateway TOML generation and tests for these behaviors.

Changes

Ollama DGX Spark Runtime Detection and Configuration

Layer / File(s) Summary
Ollama runtime status probing and validation integration
src/lib/inference/local.ts
Exports OllamaRuntimeModelStatus and probeOllamaRuntimeModelStatus to query Ollama /api/ps for model load and CPU-only detection. Refactors validateOllamaModel to compute Spark host once and reuse it; adds DGX Spark-gated validation that fails with a CPU-only diagnostic.
Ollama runtime detection test coverage
src/lib/inference/local.test.ts
Imports probeOllamaRuntimeModelStatus and adds tests verifying /api/ps output parsing into status objects with probed, loaded, cpuOnly, processor, and sizeVram. Exercises validateOllamaModel Spark behavior with mocked /api/ps responses (CPU-only => validation failure; GPU memory => success).
Onboard import and install flow update
src/lib/onboard.ts
Updates Linux install flow to call ensureManagedOllamaLoopbackSystemdOverride({ isNonInteractive }) instead of the unmanaged override helper.
CUDA v13 backend library detection and override resolution
src/lib/onboard/ollama-systemd.ts
Adds path import and detectNvidiaPlatform usage. Extends override options and implements hasOllamaCudaV13Library and resolveOllamaLibraryOverride to return cuda_v13 only for spark when the library exists.
Systemd override creation with library and service enablement
src/lib/onboard/ollama-systemd.ts
Computes optional libraryOverride, logs DGX Spark backend configuration, passes library override to drop-in merge, and conditionally runs systemctl enable ollama when options.enableService is set; restart/wait loop preserved.
Drop-in merge logic for OLLAMA_HOST and OLLAMA_LLM_LIBRARY
src/lib/onboard/ollama-systemd.ts
Extends merge helper to accept an optional desiredLibraryLine for OLLAMA_LLM_LIBRARY, strips existing non-comment assignments, ensures [Service] includes OLLAMA_HOST and conditionally OLLAMA_LLM_LIBRARY, and updates the no-op early-return logic.
Docker gateway TOML and wiring
src/lib/onboard/docker-driver-gateway-launch.ts, src/lib/onboard/docker-driver-gateway-launch.test.ts
Adds buildDockerDriverGatewayConfigToml and writeDockerDriverGatewayConfig to generate openshell-gateway.toml, writes it with restrictive perms, sets OPENSHELL_GATEWAY_CONFIG for container launches, preserves it in runtime identity, and adds tests validating content and wiring.

Sequence Diagram(s)

sequenceDiagram
  participant ValidateOllamaModel
  participant probeOllamaRuntimeModelStatus
  participant OllamaAPI as Ollama /api/ps
  ValidateOllamaModel->>probeOllamaRuntimeModelStatus: request model status for selected model
  probeOllamaRuntimeModelStatus->>OllamaAPI: GET /api/ps
  OllamaAPI-->>probeOllamaRuntimeModelStatus: JSON list of models
  probeOllamaRuntimeModelStatus-->>ValidateOllamaModel: return status (probed, loaded, cpuOnly, processor, sizeVram)
  ValidateOllamaModel->>ValidateOllamaModel: fail if cpuOnly on Spark
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

Docker, NemoClaw CLI, fix, enhancement: inference, enhancement: testing

Suggested reviewers

  • jyaunches
  • cjagwani

Poem

🐇 I sniffed the Spark and checked the cores,
Found cuda_v13 tucked by library doors,
If the model's CPU-only I sound the bell,
I write toml, enable services, then hop well,
Carrots and configs — all set, farewell.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: detecting Spark Ollama CPU fallback and failing validation accordingly, which is supported by the additions to probeOllamaRuntimeModelStatus, validateOllamaModel, and the systemd override for CUDA v13.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/spark-ollama-gpu-validation

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

@ericksoa ericksoa self-assigned this May 23, 2026
@ericksoa ericksoa added the v0.0.50 Release target label May 23, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 23, 2026

PR Review Advisor

Findings: 1 needs attention, 3 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 4 still apply, 0 new items found

Review findings

🛠️ Needs attention

  • Large inference helper still grows instead of being extracted (src/lib/inference/local.ts:631): The PR adds Ollama /api/ps runtime-status probing and Spark CPU-only diagnostic formatting to an already-large inference helper. The patched code still exists on this branch and deterministic monolith evidence still shows the file growing substantially over the large-file gate. This prior advisor finding still applies.
    • Recommendation: Move the Ollama runtime-status parsing/probing and Spark CPU-only diagnostic formatting into a focused helper module, or offset the added lines with equivalent extraction before merge.
    • Evidence: monolithDeltas reports src/lib/inference/local.ts baseLines=928, headLines=1014, delta=86, severity=blocker; the diff adds OllamaRuntimeModelStatus, probeOllamaRuntimeModelStatus, and formatOllamaCpuOnlyDiagnostic in src/lib/inference/local.ts.

🔎 Worth checking

🌱 Nice ideas

  • None.
Since last review details

Current findings:

  • Large inference helper still grows instead of being extracted (src/lib/inference/local.ts:631): The PR adds Ollama /api/ps runtime-status probing and Spark CPU-only diagnostic formatting to an already-large inference helper. The patched code still exists on this branch and deterministic monolith evidence still shows the file growing substantially over the large-file gate. This prior advisor finding still applies.
    • Recommendation: Move the Ollama runtime-status parsing/probing and Spark CPU-only diagnostic formatting into a focused helper module, or offset the added lines with equivalent extraction before merge.
    • Evidence: monolithDeltas reports src/lib/inference/local.ts baseLines=928, headLines=1014, delta=86, severity=blocker; the diff adds OllamaRuntimeModelStatus, probeOllamaRuntimeModelStatus, and formatOllamaCpuOnlyDiagnostic in src/lib/inference/local.ts.
  • Inference test hotspot still grows instead of isolating new coverage (src/lib/inference/local.test.ts:565): The added tests cover useful /api/ps runtime-status and Spark CPU-only/GPU-memory behavior, but they still add more coverage to an already-large inference test hotspot rather than isolating the new Ollama runtime-status cases in a narrower suite. The prior test-monolith finding is reduced from blocker-level evidence but still applies as a hotspot-growth warning.
    • Recommendation: Move the new /api/ps runtime-status and Spark CPU-only/GPU-memory validation cases into a focused test file, or otherwise offset the growth in this large test hotspot.
    • Evidence: monolithDeltas reports src/lib/inference/local.test.ts baseLines=785, headLines=802, delta=17, severity=warning; the diff adds tests for probeOllamaRuntimeModelStatus and Spark CPU-only/GPU-memory validation.
  • Runtime validation is still needed for systemd and Spark GPU behavior (src/lib/onboard/ollama-systemd.ts:151): The unit tests mock shell/systemd interactions and verify command construction, but the changed behavior depends on real systemd state changes, Ollama restart readiness, CUDA v13 backend selection, Docker-hosted gateway behavior, and /api/ps runtime reporting on DGX Spark-like hosts. This prior advisor finding still applies and also represents the security-testing warning for these host-glue changes.
    • Recommendation: Add or identify targeted runtime/integration validation for a Linux systemd Ollama install path, Spark CUDA v13 fallback detection, and Docker gateway compatibility launch behavior. Keep external E2E job status out of this review surface.
    • Evidence: The diff enables the Ollama service, writes a systemd drop-in, restarts with --no-block, polls systemctl state, probes /api/ps, blocks Spark CPU-only runtime status, and writes an OpenShell gateway TOML used by a Docker compatibility container. Deterministic testDepth recommends runtime validation for src/lib/inference/local.ts, src/lib/onboard.ts, src/lib/onboard/docker-driver-gateway-launch.ts, and src/lib/onboard/ollama-systemd.ts.
  • Coordinate with overlapping onboard work before landing (src/lib/onboard.ts:7014): The patch touches active onboarding/provider-selection hotspots that many open PRs also modify. The changed code still exists and is not contradicted by the current diff, but overlap increases the risk that Spark/systemd behavior drifts or is lost during nearby refactors. This prior advisor finding still applies.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 23, 2026

E2E Advisor Recommendation

Required E2E: gpu-e2e, sandbox-survival-e2e, openclaw-onboard-security-posture-e2e
Optional E2E: gpu-double-onboard-e2e, onboard-inference-smoke-e2e, gateway-health-honest-e2e, onboard-repair-e2e

Dispatch hint: gpu-e2e,sandbox-survival-e2e,openclaw-onboard-security-posture-e2e

Auto-dispatched E2E: gpu-e2e, sandbox-survival-e2e, openclaw-onboard-security-posture-e2e via nightly-e2e.yaml at e8489e1ec343dd7bb42ab4839a464e6ada278564nightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • gpu-e2e (high): Required because this PR changes local Ollama onboarding and validation. This job exercises the real user flow on an NVIDIA GPU runner: install Ollama, run NemoClaw onboarding with NEMOCLAW_PROVIDER=ollama, validate the local provider/model path, create the sandbox, verify GPU proof, auth proxy reachability, and live inference.
  • sandbox-survival-e2e (medium): Required because docker-driver gateway launch/configuration changed. This job validates a real onboarded sandbox across OpenShell gateway stop/start and verifies the sandbox, workspace, and inference still survive gateway lifecycle changes.
  • openclaw-onboard-security-posture-e2e (medium): Required because the PR changes onboarding and Docker-driver gateway/runtime boundaries. This full OpenClaw onboard path validates a real sandbox under the non-root Docker-driver security posture and catches regressions in gateway/sandbox setup that unit tests would miss.

Optional E2E

  • gpu-double-onboard-e2e (high): Useful adjacent coverage for local Ollama re-onboarding and proxy-token consistency after changes to Ollama setup/validation. It is lower priority than gpu-e2e because the PR does not directly change proxy token persistence.
  • onboard-inference-smoke-e2e (low): Useful regression coverage that onboarding does not report success until the configured inference route serves a real response. This is adjacent to validateOllamaModel/local inference changes but uses a hermetic smoke path rather than the real Ollama GPU flow.
  • gateway-health-honest-e2e (low): Useful gateway-launch regression coverage for startGateway failure honesty. It does not specifically exercise the new containerized gateway TOML path, but it is a high-signal check for Docker-driver gateway startup regressions.
  • onboard-repair-e2e (medium): Useful adjacent coverage for re-onboard/repair behavior after changes that repair managed Ollama systemd overrides and preserve/replace service environment lines.

New E2E recommendations

  • dgx-spark-local-ollama (high): No existing workflow appears to run on DGX Spark hardware or assert the new Spark-specific behavior: managed Ollama systemd service enablement, OLLAMA_LLM_LIBRARY=cuda_v13 drop-in, /api/ps GPU residency check, CPU-only rejection, and successful local Ollama sandbox inference on unified-memory Spark.
    • Suggested test: Add a DGX Spark local-Ollama E2E that installs/onboards with NEMOCLAW_PROVIDER=ollama, verifies the systemd drop-in contains OLLAMA_HOST loopback and OLLAMA_LLM_LIBRARY=cuda_v13, asserts ollama ps reports GPU/VRAM after warmup, and runs live sandbox inference.
  • containerized-docker-driver-gateway (high): The changed gateway compatibility TOML path is only unit-tested. Existing E2E jobs generally run on current Ubuntu hosts or sabotage the gateway binary and do not prove OpenShell consumes OPENSHELL_GATEWAY_CONFIG inside the Docker-hosted compatibility gateway.
    • Suggested test: Add an E2E that forces NEMOCLAW_OPENSHELL_GATEWAY_CONTAINER_PATCH=1 during onboarding, asserts the generated openshell-gateway.toml contains Docker driver settings including supervisor_bin, starts the containerized gateway, creates a Docker-driver sandbox, and verifies sandbox/inference operations.

Dispatch hint

  • Workflow: nightly-e2e.yaml
  • jobs input: gpu-e2e,sandbox-survival-e2e,openclaw-onboard-security-posture-e2e

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 23, 2026

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • None. No scenario workflow, scenario metadata, scenario runtime, or validation-suite files changed.

Optional scenario E2E

  • None.

Relevant changed files

  • None.

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ⚠️ No requested jobs ran

Run: 26318710287
Target ref: 82d31640581603cb988605e5102e4b564e2627d0
Workflow ref: main
Requested jobs: gpu-e2e
Summary: 0 passed, 0 failed, 1 skipped

Job Result
gpu-e2e ⏭️ skipped

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ⚠️ No requested jobs ran

Run: 26318798549
Target ref: d16affdc3bc1c2d4152833ea47429d67dd81e7e9
Workflow ref: main
Requested jobs: gpu-e2e,gpu-double-onboard-e2e
Summary: 0 passed, 0 failed, 2 skipped

Job Result
gpu-double-onboard-e2e ⏭️ skipped
gpu-e2e ⏭️ skipped

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 7096-7099: The call to ensureOllamaLoopbackSystemdOverride is
adding extra lines; collapse it by removing the temporary const overrideState
and use the function call inline where overrideState is used (or assign its
result into an existing nearby variable) so behavior stays identical but the
three-line declaration is reclaimed; reference
ensureOllamaLoopbackSystemdOverride and overrideState to locate and inline the
call.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: afadabce-240d-42c5-9635-ada7b1999ac9

📥 Commits

Reviewing files that changed from the base of the PR and between 7c7f7a4 and 82d3164.

📒 Files selected for processing (5)
  • src/lib/inference/local.test.ts
  • src/lib/inference/local.ts
  • src/lib/onboard.ts
  • src/lib/onboard/ollama-systemd.ts
  • test/onboard-selection.test.ts

Comment thread src/lib/onboard.ts Outdated
@github-actions
Copy link
Copy Markdown
Contributor

Brev E2E (gpu): FAILED on branch fix/spark-ollama-gpu-validationSee logs

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard/docker-driver-gateway-launch.ts`:
- Around line 253-255: The code sets env.OPENSHELL_GATEWAY_CONFIG = configPath
after writeDockerDriverGatewayConfig but never forwards it into the container;
update the code that builds the Docker run arguments (where the container
args/flags are assembled—the function that composes the `--env` flags for the
launched process) to add `--env OPENSHELL_GATEWAY_CONFIG` (with the same
configPath) so the container receives the variable, and update the launch test
to assert that the produced command includes `--env OPENSHELL_GATEWAY_CONFIG`
(i.e., add an assertion checking the presence of that flag in the test that
validates the Docker launch arguments).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 60237dd5-9324-4c78-8147-cf9d3647d421

📥 Commits

Reviewing files that changed from the base of the PR and between d16affd and 578c30f.

📒 Files selected for processing (2)
  • src/lib/onboard/docker-driver-gateway-launch.test.ts
  • src/lib/onboard/docker-driver-gateway-launch.ts

Comment thread src/lib/onboard/docker-driver-gateway-launch.ts
@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26319732229
Target ref: 578c30f0c874656344fda88b7e3291534a344ee7
Workflow ref: main
Requested jobs: gpu-e2e,sandbox-survival-e2e
Summary: 1 passed, 0 failed, 1 skipped

Job Result
gpu-e2e ⏭️ skipped
sandbox-survival-e2e ✅ success

@github-actions
Copy link
Copy Markdown
Contributor

Brev E2E (gpu): FAILED on branch fix/spark-ollama-gpu-validationSee logs

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26319964530
Target ref: 6cba33950692a3efafbe9b0c0fa3cba189479d3e
Workflow ref: main
Requested jobs: gpu-e2e,openshell-gateway-upgrade-e2e
Summary: 1 passed, 0 failed, 1 skipped

Job Result
gpu-e2e ⏭️ skipped
openshell-gateway-upgrade-e2e ✅ success

@ericksoa ericksoa added Platform: DGX Spark Support for DGX Spark fix Provider: Ollama Use this label to identify issues with the local Ollama model integration. labels May 23, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26320429784
Target ref: e8489e1ec343dd7bb42ab4839a464e6ada278564
Workflow ref: main
Requested jobs: gpu-e2e,sandbox-survival-e2e,openclaw-onboard-security-posture-e2e
Summary: 2 passed, 0 failed, 1 skipped

Job Result
gpu-e2e ⏭️ skipped
openclaw-onboard-security-posture-e2e ✅ success
sandbox-survival-e2e ✅ success

@github-actions
Copy link
Copy Markdown
Contributor

Brev E2E (gpu): PASSED on branch fix/spark-ollama-gpu-validationSee logs

@ericksoa ericksoa merged commit c70c62c into main May 23, 2026
30 checks passed
@ericksoa ericksoa deleted the fix/spark-ollama-gpu-validation branch May 23, 2026 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix Platform: DGX Spark Support for DGX Spark Provider: Ollama Use this label to identify issues with the local Ollama model integration. v0.0.50 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants