Skip to content

fix: disable reasoning during bootstrap to prevent empty responses#791

Closed
yasinBursali wants to merge 2 commits intoLight-Heart-Labs:mainfrom
yasinBursali:fix/bootstrap-disable-reasoning
Closed

fix: disable reasoning during bootstrap to prevent empty responses#791
yasinBursali wants to merge 2 commits intoLight-Heart-Labs:mainfrom
yasinBursali:fix/bootstrap-disable-reasoning

Conversation

@yasinBursali
Copy link
Copy Markdown
Contributor

What

Disable llama-server reasoning/thinking mode during bootstrap so the Qwen3.5-2B bootstrap model produces visible responses.

Why

Qwen3.5-2B is a "thinking" model that allocates all token budget to internal reasoning_content before generating visible content. At default token limits (50-1000 tokens), the model produces completely empty visible responses — the user sees nothing during the entire bootstrap period. This breaks the first-run experience.

How

llama-server (build b8248) supports --reasoning off and the LLAMA_ARG_REASONING environment variable. The fix:

  1. docker-compose.base.yml — Pass LLAMA_ARG_REASONING=${LLAMA_REASONING:-auto} to the llama-server container
  2. installers/phases/11-services.sh — Set LLAMA_REASONING=off in .env when bootstrap is active
  3. scripts/bootstrap-upgrade.sh — Remove LLAMA_REASONING=off from .env before restarting with the full model (restores auto default)
  4. installers/macos/install-macos.sh — Set LLAMA_REASONING=off in .env and as shell variable during bootstrap; add --reasoning flag to native llama-server launch
  5. .env.schema.json — Register LLAMA_REASONING

Lifecycle: off during bootstrap → removed on upgrade → auto default resumes.

Testing

  • shellcheck on all modified shell files — clean (no new warnings)
  • .env.schema.json JSON validation — valid
  • docker-compose.base.yml YAML validation — valid
  • No secrets, no unrelated changes

Manual test steps:

  • Fresh install with bootstrap → verify LLAMA_REASONING=off in .env
  • Send chat message during bootstrap → verify visible response (not empty)
  • After background upgrade completes → verify LLAMA_REASONING removed from .env
  • Non-bootstrap install → verify LLAMA_REASONING absent, auto default used

Review

Critique Guardian: APPROVED

  • Note: AMD/Lemonade-server likely ignores LLAMA_ARG_REASONING env var — harmless no-op

Platform Impact

  • Linux/WSL2 (Docker): LLAMA_ARG_REASONING env var passed through compose — works
  • macOS (native Metal): --reasoning CLI flag + shell variable + .env persistence — works
  • Windows/WSL2: Same Docker compose path as Linux — works

🤖 Generated with Claude Code

Qwen3.5-2B (bootstrap model) is a thinking model that allocates all tokens
to reasoning_content before generating visible output. At default token
limits, users see empty responses during bootstrap.

Set LLAMA_REASONING=off in .env during bootstrap, passed to llama-server
via LLAMA_ARG_REASONING env var (Docker) or --reasoning flag (macOS native).
Removed by bootstrap-upgrade.sh when full model loads, restoring auto default.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@Lightheartdevs Lightheartdevs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Audit: REQUEST CHANGES — macOS bug

The lifecycle is well-designed for Docker (Linux/WSL), but there's a bug on macOS:

Bug: macOS native llama-server is never restarted after upgrade.
bootstrap-upgrade.sh only handles Docker containers (docker compose stop/up). On macOS, llama-server runs as a native process. After the background model download completes:

  • .env gets cleaned up correctly (LLAMA_REASONING removed) ✓
  • But the running native process keeps --reasoning off AND the old bootstrap model indefinitely ✗
  • No code path exists in bootstrap-upgrade.sh to restart the native process

Users on Apple Silicon will be stuck with a degraded bootstrap model with reasoning disabled after upgrade, until manual restart.

Fix options:

  1. Add native process restart logic to bootstrap-upgrade.sh (detect PID, kill, relaunch with new model and --reasoning auto)
  2. At minimum, print a user-facing message: "Restart llama-server to complete the upgrade"

Other findings (non-blocking):

  • AMD/Lemonade backend ignores LLAMA_ARG_REASONING entirely — bootstrap fix has no effect on AMD installs. Worth documenting.
  • .env.example not updated with LLAMA_REASONING — minor inconsistency with other llama-server params documented there
  • mv vs cat > file && rm inconsistency in bootstrap-upgrade.shmv replaces the inode and could change file ownership/permissions. Existing code in the same file uses the cat pattern. Recommend matching it.
  • Commented-out lines (# LLAMA_REASONING=...) are correctly preserved by the awk patterns ✓
  • docker-compose.override.yml interaction is correct (user override wins) ✓
  • ${LLAMA_REASONING:-auto} handles absent/empty/set cases correctly ✓

…n for .env

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@yasinBursali
Copy link
Copy Markdown
Contributor Author

Addressing review feedback

Bug (macOS native llama-server never restarted) — Fixed:

  • Added elif [[ -f "$INSTALL_DIR/data/.llama-server.pid" ]] block in bootstrap-upgrade.sh after the Docker restart section
  • Detects macOS native llama-server via PID file and prints restart notice:
    Native llama-server detected (macOS Metal mode).
    NOTICE: Restart llama-server to load the new model and re-enable reasoning.
    Run: ./dream-macos.sh restart
    
  • Chose notice over automatic restart to avoid PID reuse risks and rollback complexity in this scope

Non-blocking: mvcat pattern — Fixed:

  • Replaced mv "${ENV_FILE}.tmp" "$ENV_FILE" with cat "${ENV_FILE}.tmp" > "$ENV_FILE" && rm -f "${ENV_FILE}.tmp"
  • Preserves inode and file ownership/permissions
  • Matches the pattern used on lines 87, 92, 97, 101 of the same file

@Lightheartdevs
Copy link
Copy Markdown
Collaborator

Closing — superseded by #795 which merged with full macOS auto-restart + rollback logic. This PR only printed a restart notice; #795 does the actual hot-swap with old-model rollback on failure, PID verification, and stale status handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants