Define gpu memory utilization #502

ilopezluna · 2025-12-11T12:44:16Z

Adds --gpu-memory-utilization parameter for vLLM configuration. The implementation follows the existing patterns and ensures that no value is passed to vLLM if the parameter is not provided.
This parameter is needed for wsl2

gemini-code-assist · 2025-12-11T12:44:38Z

Summary of Changes

Hello @ilopezluna, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the configure command by adding a new --gpu-memory-utilization flag, specifically for vLLM backend configurations. This flag provides granular control over the GPU memory allocated to the model executor, addressing specific needs for environments like WSL2. The implementation includes robust parsing, validation, and integration, ensuring the parameter is correctly applied to vLLM only when specified by the user.

Highlights

New gpu-memory-utilization flag: Introduced a --gpu-memory-utilization parameter for vLLM configuration, allowing users to specify the fraction of GPU memory (0.0-1.0) for the model executor.
Optional parameter handling: Implemented a Float64PtrValue type to correctly handle the flag as an optional float64 pointer, ensuring it's only passed to vLLM if explicitly provided.
Input validation: Added validation to ensure the provided gpu-memory-utilization value is within the acceptable range of 0.0 to 1.0.
Comprehensive testing: Included new unit tests to verify the flag's registration, default behavior, value setting, and error handling for invalid inputs.
Documentation update: Updated the command-line interface documentation to reflect the new flag and its usage.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for the --gpu-memory-utilization parameter for vLLM configuration. The implementation is solid, following existing patterns for adding new flags, including a custom pflag value type to handle optional float values. The changes are well-tested with comprehensive unit and behavior tests. I've identified a couple of minor areas for improvement regarding code duplication, which are detailed in the specific comments. Overall, this is a great addition.

gemini-code-assist · 2025-12-11T12:45:49Z

cmd/cli/commands/configure_test.go

+
+func TestGPUMemoryUtilizationBehavior(t *testing.T) {
+	// Helper to create float64 pointer
+	float64Ptr := func(f float64) *float64 { return &f }


This helper function to create a *float64 is also defined in pkg/inference/backends/vllm/vllm_config_test.go as float64ptr. To avoid code duplication across test files and improve maintainability, it would be beneficial to move such common test helpers into a shared test utility package.

gemini-code-assist · 2025-12-11T12:45:49Z

pkg/inference/backends/vllm/vllm_config.go

+			if utilization < 0.0 || utilization > 1.0 {
+				return nil, fmt.Errorf("gpu-memory-utilization must be between 0.0 and 1.0, got %f", utilization)
+			}


The validation logic for the gpu-memory-utilization range is also present in cmd/cli/commands/configure_flags.go on lines 190-192. This duplication could lead to maintenance challenges, for instance, if the valid range changes. To improve maintainability, consider abstracting this validation into a shared function or using shared constants for the boundaries and error message. While validation at different layers can be useful, centralizing the core logic will make the code easier to manage.

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

The range validation for GPUMemoryUtilization is implemented in both ConfigureFlags.BuildConfigureRequest and vllm_config.GetArgs with slightly different error messages; consider extracting a shared helper (and standardizing the message) to avoid duplication and potential future drift.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The range validation for `GPUMemoryUtilization` is implemented in both `ConfigureFlags.BuildConfigureRequest` and `vllm_config.GetArgs` with slightly different error messages; consider extracting a shared helper (and standardizing the message) to avoid duplication and potential future drift.

## Individual Comments

### Comment 1
<location> `cmd/cli/commands/configure_test.go:131` </location>
<code_context>
 	}
 }

+func TestConfigureCmdGPUMemoryUtilizationFlag(t *testing.T) {
+	// Create the configure command
+	cmd := newConfigureCmd()
</code_context>

<issue_to_address>
**suggestion (testing):** Add a test case for non-float input to the --gpu-memory-utilization flag to exercise the error path in Float64PtrValue.Set.

Since `Float64PtrValue.Set` returns an error when `strconv.ParseFloat` fails, please add a subtest that calls `cmd.Flags().Set("gpu-memory-utilization", "not-a-number")` and asserts that an error is returned. This will exercise the error path and verify the custom flag’s parsing behavior end-to-end.
</issue_to_address>

### Comment 2
<location> `cmd/cli/commands/configure_test.go:164` </location>
<code_context>
+	}
+}
+
+func TestGPUMemoryUtilizationBehavior(t *testing.T) {
+	// Helper to create float64 pointer
+	float64Ptr := func(f float64) *float64 { return &f }
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding a case that verifies GPU memory utilization is set without clobbering existing VLLM fields in the configure request.

The table-driven cases cover nil, valid, edge, and invalid `GPUMemoryUtilization` values well. Please add one case where other VLLM options (e.g. `HFOverrides`) are already set on the request, to verify that `BuildConfigureRequest` merges `GPUMemoryUtilization` into an existing `req.VLLM` without dropping those fields. This helps catch future refactors that might reinitialize `req.VLLM` and lose configuration.

Suggested implementation:

```golang
	tests := []struct {
		name               string
		gpuMemValue        *float64
		expectError        bool
		expectGPUMemSet    bool
		expectedGPUMemUtil float64
	}{
		{
			name:            "default - not set (nil)",

```

```golang
		{
			name:            "default - not set (nil)",
		},
		// NOTE: other existing cases (valid, edge, invalid) remain unchanged here...

		{
			name:               "valid gpu-mem with existing VLLM options preserved",
			// Use a valid, non-edge value to focus this case on merge behavior rather than validation.
			gpuMemValue:        float64Ptr(0.8),
			expectError:        false,
			expectGPUMemSet:    true,
			expectedGPUMemUtil: 0.8,
		},

```

To fully implement the behavior this new case is intended to check, update the body of `TestGPUMemoryUtilizationBehavior` as follows:

1. Inside the `for _, tt := range tests { t.Run(...) }` loop (which is not shown in the snippet), detect the `"valid gpu-mem with existing VLLM options preserved"` case and pre-populate the configure request with existing VLLM options **before** calling `BuildConfigureRequest`. For example:

   ```go
   for _, tt := range tests {
       t.Run(tt.name, func(t *testing.T) {
           req := &apiv1.ConfigureRequest{}

           if tt.name == "valid gpu-mem with existing VLLM options preserved" {
               req.VLLM = &apiv1.VLLMOptions{
                   HFOverrides: map[string]string{
                       "some.hf.override": "preserve-me",
                   },
               }
           }

           // existing logic that applies gpuMemValue and calls BuildConfigureRequest(...)
           err := BuildConfigureRequest(req /* other args as needed */)

           // existing assertions for tt.expectError, tt.expectGPUMemSet, tt.expectedGPUMemUtil...
           // e.g., checking req.VLLM.GPUMemoryUtilization or similar
   ```

2. After asserting the GPU memory utilization behavior for this case, add explicit checks that the original `HFOverrides` entry is still present and unchanged. For example:

   ```go
           if tt.name == "valid gpu-mem with existing VLLM options preserved" {
               if req.VLLM == nil {
                   t.Fatalf("VLLM options should not be nil after BuildConfigureRequest")
               }

               if got := req.VLLM.HFOverrides["some.hf.override"]; got != "preserve-me" {
                   t.Errorf("expected existing VLLM HFOverrides to be preserved, got %q", got)
               }
           }
       })
   }
   ```

3. Make sure to use the actual types and field names from your codebase (e.g., `apiv1.ConfigureRequest`, `apiv1.VLLMOptions`, `HFOverrides`, `GPUMemoryUtilization`) and adjust the call to `BuildConfigureRequest` to match its real signature.

These adjustments will ensure the new table case verifies that `BuildConfigureRequest` merges `GPUMemoryUtilization` into an existing `req.VLLM` without dropping pre-existing configuration such as `HFOverrides`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-12-11T12:51:49Z

cmd/cli/commands/configure_test.go

 	}
 }

+func TestConfigureCmdGPUMemoryUtilizationFlag(t *testing.T) {


suggestion (testing): Add a test case for non-float input to the --gpu-memory-utilization flag to exercise the error path in Float64PtrValue.Set.

Since Float64PtrValue.Set returns an error when strconv.ParseFloat fails, please add a subtest that calls cmd.Flags().Set("gpu-memory-utilization", "not-a-number") and asserts that an error is returned. This will exercise the error path and verify the custom flag’s parsing behavior end-to-end.

ilopezluna added 2 commits December 11, 2025 13:36

add GPU memory utilization configuration for model executor

90e631f

add docs

e082b9f

gemini-code-assist bot reviewed Dec 11, 2025

View reviewed changes

ilopezluna marked this pull request as ready for review December 11, 2025 12:49

ilopezluna requested a review from a team December 11, 2025 12:49

sourcery-ai bot reviewed Dec 11, 2025

View reviewed changes

doringeman approved these changes Dec 11, 2025

View reviewed changes

ilopezluna merged commit 0f534e9 into main Dec 11, 2025
13 checks passed

ilopezluna deleted the define-gpu-memory-utilization branch December 11, 2025 13:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Define gpu memory utilization #502

Define gpu memory utilization #502

Uh oh!

ilopezluna commented Dec 11, 2025

Uh oh!

gemini-code-assist bot commented Dec 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 11, 2025

Uh oh!

gemini-code-assist bot Dec 11, 2025

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Dec 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Define gpu memory utilization #502

Define gpu memory utilization #502

Uh oh!

Conversation

ilopezluna commented Dec 11, 2025

Uh oh!

gemini-code-assist bot commented Dec 11, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants