Change context size and reasoning budget types from int64/uint64 to int32 #487

ilopezluna · 2025-12-05T16:35:23Z

This PR standardizes the type used for context_size and reasoning_budget configuration parameters across all backends to int32.

When using Compose we were defaulting to -1 context size in llama.cpp
If context size (or reasoning budged) is not specified do not set a predefined default value.

Changes

Changed ContextSize type from *uint64 to *int32 in model and backend configurations
Changed ReasoningBudget type from *int64 to *int32 in backend configuration
Added UnlimitedContextSize = -1 constant to represent unlimited context in llama.cpp
Updated GetContextSize() to return *int32 (nil when no valid value) instead of a default value
Removed the hardcoded default context size of 4096, now delegating defaults to llama.cpp/vllm

gemini-code-assist · 2025-12-05T16:35:54Z

Summary of Changes

Hello @ilopezluna, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the data types used for context-size and reasoning-budget parameters, standardizing them to int32 across the application. This change improves type consistency, especially when interacting with backend systems like llama.cpp which may expect specific integer sizes. It also introduces more robust handling of optional configuration values through the use of pointers and simplifies the configuration flow for certain backends and API endpoints.

Highlights

Type Refactoring: The data types for context-size and reasoning-budget parameters have been consistently changed from uint64 or int64 to int32 across the codebase. This affects various CLI commands, internal model configuration, and backend inference settings.
Pointer Usage for Configuration: Many configuration fields, such as ContextSize in BackendConfiguration and ModelPackageRequest, are now pointers to int32 (*int32). This allows for explicit differentiation between a zero value and an unset value, providing more precise control over configuration application.
MLX Backend Simplification: The MLX backend's GetMaxTokens function has been simplified to always return nil, indicating that MLX will now rely on its internal model defaults for context size rather than deriving it from external configurations. The associated test file (mlx_config_test.go) has been removed.
Ollama API Configuration Flow: The configureModel function in the Ollama API handler no longer returns the context size and has been streamlined. The logic for preloading models with a specific context size in handleGenerate has been removed, as configuration is now handled directly within configureModel.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request primarily refactors the handling of context size and reasoning budget parameters across the codebase, standardizing their types from uint64 or int64 to *int32. This change impacts various CLI commands (compose, configure, package, mdltool), internal distribution logic (builder, mutate, store), and inference backend configurations (backend.go, llamacpp, vllm). Specifically, context size is now passed as a pointer to int32 and is only applied if explicitly set by a flag or a valid positive/unlimited value. The llamacpp backend now has an UnlimitedContextSize constant and defaults to 4096 if no context size is specified. The mlx backend's GetMaxTokens function was simplified to always return nil, effectively removing its ability to configure context size. The ollama HTTP handler was updated to use the new int32 types for context size and reasoning budget, and its model configuration logic was streamlined, removing a redundant preload step. Review comments highlighted a potential regression in the mlx backend's context size configuration, suggesting the previous logic be restored and adapted. Additionally, the review pointed out potential integer overflow issues when converting uint64 to int32 for context size in package.go and main.go, recommending validation checks. Minor suggestions included removing a redundant type cast in llamacpp.go and refining the context size validation logic in llamacpp_config.go and vllm_config.go to ensure consistency and prevent invalid values.

pkg/inference/backends/mlx/mlx_config.go

gemini-code-assist · 2025-12-05T16:40:38Z

cmd/cli/commands/package.go

+	if cmd.Flags().Changed("context-size") {
 		cmd.PrintErrf("Setting context size %d\n", opts.contextSize)
-		pkg = pkg.WithContextSize(opts.contextSize)
+		pkg = pkg.WithContextSize(int32(opts.contextSize))
 	}


The conversion from uint64 to int32 for contextSize could lead to an integer overflow if a user provides a value larger than math.MaxInt32. While unlikely for a context size, adding a validation check would make the code more robust.

if cmd.Flags().Changed("context-size") { if opts.contextSize > 2147483647 { // math.MaxInt32 return fmt.Errorf("context size %d is too large, must be less than or equal to 2147483647", opts.contextSize) } cmd.PrintErrf("Setting context size %d\n", opts.contextSize) pkg = pkg.WithContextSize(int32(opts.contextSize)) }

gemini-code-assist · 2025-12-05T16:40:39Z

cmd/mdltool/main.go

 	if contextSize > 0 {
 		fmt.Println("Setting context size:", contextSize)
-		b = b.WithContextSize(contextSize)
+		b = b.WithContextSize(int32(contextSize))
 	}


Similar to another file, the conversion from uint64 to int32 for contextSize could overflow. It's best to add a check to ensure the value is within the valid range for an int32.

if contextSize > 0 { if contextSize > 2147483647 { // math.MaxInt32 fmt.Fprintf(os.Stderr, "context size %d is too large, must be less than or equal to 2147483647\n", contextSize) return 1 } fmt.Println("Setting context size:", contextSize) b = b.WithContextSize(int32(contextSize)) }

pkg/inference/backends/llamacpp/llamacpp.go

gemini-code-assist · 2025-12-05T16:40:39Z

pkg/inference/backends/llamacpp/llamacpp_config.go

 	// Model config takes precedence
-	if modelCfg.ContextSize != nil {
-		return *modelCfg.ContextSize
+	if modelCfg.ContextSize != nil && (*modelCfg.ContextSize == UnlimitedContextSize || *modelCfg.ContextSize > 0) {


The condition *modelCfg.ContextSize == UnlimitedContextSize is also checked for the backendCfg. It might be cleaner to check for *modelCfg.ContextSize < 0 to handle all negative values, assuming only -1 is a special valid negative value. However, the current implementation is correct if -1 is the only special value intended.

-1 is the only special value

gemini-code-assist · 2025-12-05T16:40:39Z

pkg/inference/backends/vllm/vllm_config.go

 	if modelCfg.ContextSize != nil {
 		return modelCfg.ContextSize
 	}


The logic for retrieving ContextSize from modelCfg doesn't check if the value is positive. This is inconsistent with the check for backendCfg and could lead to passing 0 or a negative value to vLLM if it's set in the model configuration. It's safer to ensure the context size is a positive value.

Suggested change

if modelCfg.ContextSize != nil {

return modelCfg.ContextSize

}

if modelCfg.ContextSize != nil && *modelCfg.ContextSize > 0 {

return modelCfg.ContextSize

}

In llama.cpp context size -1 is a valid value

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

In llamacpp.GetRequiredMemoryForModel the contextSize used for memory estimation now comes directly from GetContextSize and may be UnlimitedContextSize (-1); consider normalizing -1 to a concrete value (or rejecting it) before calling EstimateLLaMACppRun, since passing -1 as WithLLaMACppContextSize may not produce meaningful estimates.
The new convertToInt32 helper silently returns 0 for unparsable string inputs (and uses a slightly odd double Sscanf pattern); given that 0 now means "ignore/unspecified" rather than "use default", consider switching to strconv.ParseInt (or similar) and surfacing parse errors to callers so that invalid num_ctx or related options don’t get silently discarded.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `llamacpp.GetRequiredMemoryForModel` the `contextSize` used for memory estimation now comes directly from `GetContextSize` and may be `UnlimitedContextSize` (-1); consider normalizing `-1` to a concrete value (or rejecting it) before calling `EstimateLLaMACppRun`, since passing -1 as `WithLLaMACppContextSize` may not produce meaningful estimates.
- The new `convertToInt32` helper silently returns 0 for unparsable string inputs (and uses a slightly odd double `Sscanf` pattern); given that 0 now means "ignore/unspecified" rather than "use default", consider switching to `strconv.ParseInt` (or similar) and surfacing parse errors to callers so that invalid `num_ctx` or related options don’t get silently discarded.

## Individual Comments

### Comment 1
<location> `pkg/inference/backends/mlx/mlx_config.go:68` </location>
<code_context>
-	// Model config takes precedence over backend config
</code_context>

<issue_to_address>
**issue (bug_risk):** MLX GetMaxTokens no longer respects model or backend context size configuration

The previous implementation of `GetMaxTokens` honored both `modelCfg.ContextSize` and `backendCfg.ContextSize`, falling back to `nil` only when neither was set. The new version always returns `nil`, so MLX will ignore configured context sizes and rely solely on model defaults, diverging from llama.cpp and vLLM, which still use `BackendConfiguration.ContextSize`. If consistency with other backends is desired, consider restoring the logic to consult `modelCfg.ContextSize` (after any needed type conversion) and `backendCfg.ContextSize` instead of always returning `nil`.
</issue_to_address>

### Comment 2
<location> `pkg/inference/backends/llamacpp/llamacpp_config_test.go:146` </location>
<code_context>
		{
+			name: "unlimited context size from backend config",
</code_context>

<issue_to_address>
**suggestion (testing):** Add tests for model-config context size, including UnlimitedContextSize and invalid/zero values

To fully cover `GetContextSize`, please add table cases for:

- Model `ContextSize = UnlimitedContextSize` (-1) and backend also set (model should still win and propagate -1).
- Model `ContextSize` positive and backend `UnlimitedContextSize` (model should still win).
- Model `ContextSize` = 0 or invalid negative (e.g. -2) with a valid backend value (model should be ignored and backend used).

These will validate the precedence and validation behavior of the new `*int32` API and `UnlimitedContextSize`.
</issue_to_address>

### Comment 3
<location> `cmd/cli/commands/integration_test.go:219` </location>
<code_context>
-func createAndPushTestModel(t *testing.T, registryURL, modelRef string, contextSize uint64) (modelID, hostFQDN, networkFQDN, digest string) {
+func createAndPushTestModel(t *testing.T, registryURL, modelRef string, contextSize *int32) (modelID, hostFQDN, networkFQDN, digest string) {
</code_context>

<issue_to_address>
**suggestion (testing):** Add an integration test path where contextSize is nil to ensure default behavior is preserved

Right now all integration call sites pass a non‑nil `*int32` to `createAndPushTestModel`, so the `nil` path isn’t exercised. Please add a test that uses `contextSize == nil` and confirms the model can still be pushed/pulled/inspected and that no unexpected context-size metadata is introduced, preserving the previous default behavior after the switch to `*int32`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-12-09T10:01:15Z

pkg/inference/backends/mlx/mlx_config.go

 // Model config takes precedence over backend config.
 // Returns nil if neither is specified (MLX will use model defaults).
 func GetMaxTokens(modelCfg types.Config, backendCfg *inference.BackendConfiguration) *uint64 {
-	// Model config takes precedence


issue (bug_risk): MLX GetMaxTokens no longer respects model or backend context size configuration

The previous implementation of GetMaxTokens honored both modelCfg.ContextSize and backendCfg.ContextSize, falling back to nil only when neither was set. The new version always returns nil, so MLX will ignore configured context sizes and rely solely on model defaults, diverging from llama.cpp and vLLM, which still use BackendConfiguration.ContextSize. If consistency with other backends is desired, consider restoring the logic to consult modelCfg.ContextSize (after any needed type conversion) and backendCfg.ContextSize instead of always returning nil.

sourcery-ai · 2025-12-09T10:01:15Z

cmd/cli/commands/integration_test.go


 // createAndPushTestModel creates a minimal test model and pushes it to the local registry.
 // Returns the model ID, FQDNs for host and network access, and the manifest digest.
-func createAndPushTestModel(t *testing.T, registryURL, modelRef string, contextSize uint64) (modelID, hostFQDN, networkFQDN, digest string) {


suggestion (testing): Add an integration test path where contextSize is nil to ensure default behavior is preserved

Right now all integration call sites pass a non‑nil *int32 to createAndPushTestModel, so the nil path isn’t exercised. Please add a test that uses contextSize == nil and confirms the model can still be pushed/pulled/inspected and that no unexpected context-size metadata is introduced, preserving the previous default behavior after the switch to *int32.

gemini-code-assist bot reviewed Dec 5, 2025

View reviewed changes

ilopezluna force-pushed the fix-types branch from 7c611ce to 3b8d367 Compare December 9, 2025 09:12

fix: change context size and reasoning budget types from uint64 to int32

d47e5a3

ilopezluna force-pushed the fix-types branch from 3b8d367 to d47e5a3 Compare December 9, 2025 09:20

fix: change context size and reasoning budget types from int64 to int32

a868612

ilopezluna changed the title ~~fix: change context size and reasoning budget types from uint64 to int32~~ Change context size and reasoning budget types from int64/uint64 to int32 Dec 9, 2025

ilopezluna marked this pull request as ready for review December 9, 2025 09:59

ilopezluna requested a review from a team December 9, 2025 09:59

sourcery-ai bot reviewed Dec 9, 2025

View reviewed changes

doringeman approved these changes Dec 9, 2025

View reviewed changes

ilopezluna merged commit b4512b5 into main Dec 9, 2025
13 checks passed

ilopezluna deleted the fix-types branch December 9, 2025 10:40

Change context size and reasoning budget types from int64/uint64 to int32 #487

Change context size and reasoning budget types from int64/uint64 to int32 #487

Uh oh!

Conversation

ilopezluna commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

gemini-code-assist bot commented Dec 5, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

ilopezluna Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

ilopezluna Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ilopezluna commented Dec 5, 2025 •

edited

Loading