fix: disable reasoning during bootstrap to prevent empty responses#791
Closed
yasinBursali wants to merge 2 commits intoLight-Heart-Labs:mainfrom
Closed
fix: disable reasoning during bootstrap to prevent empty responses#791yasinBursali wants to merge 2 commits intoLight-Heart-Labs:mainfrom
yasinBursali wants to merge 2 commits intoLight-Heart-Labs:mainfrom
Conversation
Qwen3.5-2B (bootstrap model) is a thinking model that allocates all tokens to reasoning_content before generating visible output. At default token limits, users see empty responses during bootstrap. Set LLAMA_REASONING=off in .env during bootstrap, passed to llama-server via LLAMA_ARG_REASONING env var (Docker) or --reasoning flag (macOS native). Removed by bootstrap-upgrade.sh when full model loads, restoring auto default. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lightheartdevs
requested changes
Apr 4, 2026
Collaborator
Lightheartdevs
left a comment
There was a problem hiding this comment.
Audit: REQUEST CHANGES — macOS bug
The lifecycle is well-designed for Docker (Linux/WSL), but there's a bug on macOS:
Bug: macOS native llama-server is never restarted after upgrade.
bootstrap-upgrade.sh only handles Docker containers (docker compose stop/up). On macOS, llama-server runs as a native process. After the background model download completes:
.envgets cleaned up correctly (LLAMA_REASONING removed) ✓- But the running native process keeps
--reasoning offAND the old bootstrap model indefinitely ✗ - No code path exists in
bootstrap-upgrade.shto restart the native process
Users on Apple Silicon will be stuck with a degraded bootstrap model with reasoning disabled after upgrade, until manual restart.
Fix options:
- Add native process restart logic to
bootstrap-upgrade.sh(detect PID, kill, relaunch with new model and--reasoning auto) - At minimum, print a user-facing message: "Restart llama-server to complete the upgrade"
Other findings (non-blocking):
- AMD/Lemonade backend ignores
LLAMA_ARG_REASONINGentirely — bootstrap fix has no effect on AMD installs. Worth documenting. .env.examplenot updated withLLAMA_REASONING— minor inconsistency with other llama-server params documented theremvvscat > file && rminconsistency inbootstrap-upgrade.sh—mvreplaces the inode and could change file ownership/permissions. Existing code in the same file uses thecatpattern. Recommend matching it.- Commented-out lines (
# LLAMA_REASONING=...) are correctly preserved by the awk patterns ✓ docker-compose.override.ymlinteraction is correct (user override wins) ✓${LLAMA_REASONING:-auto}handles absent/empty/set cases correctly ✓
…n for .env Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
Author
Addressing review feedbackBug (macOS native llama-server never restarted) — Fixed:
Non-blocking:
|
9 tasks
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Disable llama-server reasoning/thinking mode during bootstrap so the Qwen3.5-2B bootstrap model produces visible responses.
Why
Qwen3.5-2B is a "thinking" model that allocates all token budget to internal
reasoning_contentbefore generating visiblecontent. At default token limits (50-1000 tokens), the model produces completely empty visible responses — the user sees nothing during the entire bootstrap period. This breaks the first-run experience.How
llama-server (build b8248) supports
--reasoning offand theLLAMA_ARG_REASONINGenvironment variable. The fix:docker-compose.base.yml— PassLLAMA_ARG_REASONING=${LLAMA_REASONING:-auto}to the llama-server containerinstallers/phases/11-services.sh— SetLLAMA_REASONING=offin.envwhen bootstrap is activescripts/bootstrap-upgrade.sh— RemoveLLAMA_REASONING=offfrom.envbefore restarting with the full model (restoresautodefault)installers/macos/install-macos.sh— SetLLAMA_REASONING=offin.envand as shell variable during bootstrap; add--reasoningflag to native llama-server launch.env.schema.json— RegisterLLAMA_REASONINGLifecycle:
offduring bootstrap → removed on upgrade →autodefault resumes.Testing
shellcheckon all modified shell files — clean (no new warnings).env.schema.jsonJSON validation — validdocker-compose.base.ymlYAML validation — validManual test steps:
LLAMA_REASONING=offin.envLLAMA_REASONINGremoved from.envLLAMA_REASONINGabsent,autodefault usedReview
Critique Guardian: APPROVED
LLAMA_ARG_REASONINGenv var — harmless no-opPlatform Impact
LLAMA_ARG_REASONINGenv var passed through compose — works--reasoningCLI flag + shell variable +.envpersistence — works🤖 Generated with Claude Code