Skip to content

Conversation

@ilopezluna
Copy link
Contributor

--reasoning-budget N can only be 0 or -1 from here:

controls whether thought tags are allowed and/or extracted from the response, and in which format they're returned; one of:

  • none: leaves thoughts unparsed in message.content
  • deepseek: puts thoughts in message.reasoning_content
  • deepseek-legacy: keeps tags in message.content while also populating message.reasoning_content
    (default: auto)
    (env: LLAMA_ARG_THINK)

So we can only support partially Ollama API: https://docs.ollama.com/api/generate#body-think-one-of-0

In this PR I also rollback changes of compose.go as it's not used anymore, current compose integration uses docker model configure so I have updated that command instead.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ilopezluna, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request streamlines the configuration process for model backends by simplifying the think parameter to align with the llama.cpp server's limited reasoning budget options. It introduces a dedicated ConfigureFlags struct to centralize and standardize how configuration options are handled across commands, improving maintainability and consistency. The compose command has been refactored to remove outdated configuration logic, ensuring that docker model configure is the primary interface for setting model parameters. These changes are supported by updated documentation and comprehensive test coverage.

Highlights

  • Simplified Reasoning Budget: The --think parameter for reasoning models has been simplified to accept only boolean values (true/false), which map directly to the llama.cpp server's --reasoning-budget options of -1 (unlimited) or 0 (disabled). This removes support for string-based budget levels like 'high', 'medium', or 'low'.
  • Centralized Configuration Flags: A new ConfigureFlags struct and associated methods have been introduced in configure_flags.go to centralize the definition, registration, and parsing of all model configuration flags. This promotes consistency and reusability across different commands that configure model backends.
  • Refactored compose Command: The compose up command has been cleaned up by removing deprecated logic related to mode and think parameters, as its configuration now relies on the updated docker model configure command. This includes the removal of unused constants, helper functions, and flag parsing logic.
  • Updated Documentation and Tests: Documentation for docker model configure and docker model compose up has been updated to reflect the changes in flag availability and usage. Corresponding unit tests for flag parsing and behavior have also been adjusted or added to validate the new configuration structure and simplified think parameter.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors and simplifies the model configuration logic. The introduction of configure_flags.go to centralize flag handling for docker model configure is a great improvement for maintainability. The simplification of the compose up command and the think parameter handling across the CLI and Ollama API layer are also positive changes.

I've found one issue in cmd/cli/commands/compose.go where the behavior for handling the --context-size flag has unintentionally changed. My review includes a suggestion to fix this.

@ilopezluna ilopezluna requested a review from a team December 9, 2025 20:17
@ilopezluna ilopezluna marked this pull request as ready for review December 9, 2025 20:17
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • In compose up, switching from cmd.Flags().Changed("context-size") to always constructing ContextSize means a value of 0 will now be sent to ConfigureBackend even when the flag is not set; if 0 is not a meaningful override, consider restoring the Changed check or gating the configuration on ctxSize > 0 at the request-building point, not just for logging.
  • The reasoning-budget constants and logic now exist both in the Ollama HTTP handler and in configure_flags.go; if these are intended to stay aligned, consider centralizing them in a shared package or helper to avoid future divergence.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `compose up`, switching from `cmd.Flags().Changed("context-size")` to always constructing `ContextSize` means a value of 0 will now be sent to `ConfigureBackend` even when the flag is not set; if 0 is not a meaningful override, consider restoring the `Changed` check or gating the configuration on `ctxSize > 0` at the request-building point, not just for logging.
- The reasoning-budget constants and logic now exist both in the Ollama HTTP handler and in `configure_flags.go`; if these are intended to stay aligned, consider centralizing them in a shared package or helper to avoid future divergence.

## Individual Comments

### Comment 1
<location> `cmd/cli/commands/compose.go:84-86` </location>
<code_context>
-				sendInfo(fmt.Sprintf("Setting think mode to %s", think))
-			}
-
 			for _, model := range models {
-				configuration := inference.BackendConfiguration{
-					Speculative: speculativeConfig,
-				}
-				if cmd.Flags().Changed("context-size") {
-					// TODO is the context size the same for all models?
-					v := int32(ctxSize)
-					configuration.ContextSize = &v
-				}
-
-				// Set llama.cpp-specific reasoning budget if provided
-				if reasoningBudget != nil {
-					if configuration.LlamaCpp == nil {
-						configuration.LlamaCpp = &inference.LlamaCppConfig{}
-					}
-					configuration.LlamaCpp.ReasoningBudget = reasoningBudget
-				}
-
+				size := int32(ctxSize)
 				if err := desktopClient.ConfigureBackend(scheduling.ConfigureRequest{
-					Model:                model,
-					Mode:                 backendMode,
</code_context>

<issue_to_address>
**issue (bug_risk):** Context size is now always sent (including 0) instead of only when explicitly set, which changes previous behavior and may unintentionally override defaults.

Previously `ContextSize` was only set when `cmd.Flags().Changed("context-size")` was true; now we always construct `size := int32(ctxSize)` and send `ContextSize: &size`. When the user omits `--context-size`, we now send an explicit `0` instead of `nil`, bypassing the server default. If `0` is not a valid context size, this will silently alter existing workflows. Please preserve the “only set when flag is provided” behavior (e.g., track whether the flag was set, or use a `*int32` flag as in `ConfigureFlags`) before populating `ContextSize`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@ericcurtin
Copy link
Contributor

ericcurtin commented Dec 10, 2025

Boolean thinking (partial Ollama API) support sounds fine to me, it's a small subset of people that will want to do this anyway

Copy link
Contributor

@doringeman doringeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
Tested with ⬇️ .

MODEL_RUNNER_HOST=http://localhost:8080 docker model configure --context-size 1024 --mode embedding embeddinggemma
curl -sS 'localhost:8080/v1/embeddings' \
  -H 'Content-Type: application/json' \
  -d '{"model":"embeddinggemma","input":"hello world"}'

@ilopezluna ilopezluna merged commit 6bef10a into main Dec 10, 2025
13 checks passed
@ilopezluna ilopezluna deleted the simplify-configure branch December 10, 2025 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants