Skip to content

Conversation

@ilopezluna
Copy link
Contributor

Adds --gpu-memory-utilization parameter for vLLM configuration. The implementation follows the existing patterns and ensures that no value is passed to vLLM if the parameter is not provided.
This parameter is needed for wsl2

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ilopezluna, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the configure command by adding a new --gpu-memory-utilization flag, specifically for vLLM backend configurations. This flag provides granular control over the GPU memory allocated to the model executor, addressing specific needs for environments like WSL2. The implementation includes robust parsing, validation, and integration, ensuring the parameter is correctly applied to vLLM only when specified by the user.

Highlights

  • New gpu-memory-utilization flag: Introduced a --gpu-memory-utilization parameter for vLLM configuration, allowing users to specify the fraction of GPU memory (0.0-1.0) for the model executor.
  • Optional parameter handling: Implemented a Float64PtrValue type to correctly handle the flag as an optional float64 pointer, ensuring it's only passed to vLLM if explicitly provided.
  • Input validation: Added validation to ensure the provided gpu-memory-utilization value is within the acceptable range of 0.0 to 1.0.
  • Comprehensive testing: Included new unit tests to verify the flag's registration, default behavior, value setting, and error handling for invalid inputs.
  • Documentation update: Updated the command-line interface documentation to reflect the new flag and its usage.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the --gpu-memory-utilization parameter for vLLM configuration. The implementation is solid, following existing patterns for adding new flags, including a custom pflag value type to handle optional float values. The changes are well-tested with comprehensive unit and behavior tests. I've identified a couple of minor areas for improvement regarding code duplication, which are detailed in the specific comments. Overall, this is a great addition.


func TestGPUMemoryUtilizationBehavior(t *testing.T) {
// Helper to create float64 pointer
float64Ptr := func(f float64) *float64 { return &f }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This helper function to create a *float64 is also defined in pkg/inference/backends/vllm/vllm_config_test.go as float64ptr. To avoid code duplication across test files and improve maintainability, it would be beneficial to move such common test helpers into a shared test utility package.

Comment on lines +66 to +68
if utilization < 0.0 || utilization > 1.0 {
return nil, fmt.Errorf("gpu-memory-utilization must be between 0.0 and 1.0, got %f", utilization)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The validation logic for the gpu-memory-utilization range is also present in cmd/cli/commands/configure_flags.go on lines 190-192. This duplication could lead to maintenance challenges, for instance, if the valid range changes. To improve maintainability, consider abstracting this validation into a shared function or using shared constants for the boundaries and error message. While validation at different layers can be useful, centralizing the core logic will make the code easier to manage.

@ilopezluna ilopezluna marked this pull request as ready for review December 11, 2025 12:49
@ilopezluna ilopezluna requested a review from a team December 11, 2025 12:49
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • The range validation for GPUMemoryUtilization is implemented in both ConfigureFlags.BuildConfigureRequest and vllm_config.GetArgs with slightly different error messages; consider extracting a shared helper (and standardizing the message) to avoid duplication and potential future drift.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The range validation for `GPUMemoryUtilization` is implemented in both `ConfigureFlags.BuildConfigureRequest` and `vllm_config.GetArgs` with slightly different error messages; consider extracting a shared helper (and standardizing the message) to avoid duplication and potential future drift.

## Individual Comments

### Comment 1
<location> `cmd/cli/commands/configure_test.go:131` </location>
<code_context>
 	}
 }

+func TestConfigureCmdGPUMemoryUtilizationFlag(t *testing.T) {
+	// Create the configure command
+	cmd := newConfigureCmd()
</code_context>

<issue_to_address>
**suggestion (testing):** Add a test case for non-float input to the --gpu-memory-utilization flag to exercise the error path in Float64PtrValue.Set.

Since `Float64PtrValue.Set` returns an error when `strconv.ParseFloat` fails, please add a subtest that calls `cmd.Flags().Set("gpu-memory-utilization", "not-a-number")` and asserts that an error is returned. This will exercise the error path and verify the custom flag’s parsing behavior end-to-end.
</issue_to_address>

### Comment 2
<location> `cmd/cli/commands/configure_test.go:164` </location>
<code_context>
+	}
+}
+
+func TestGPUMemoryUtilizationBehavior(t *testing.T) {
+	// Helper to create float64 pointer
+	float64Ptr := func(f float64) *float64 { return &f }
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding a case that verifies GPU memory utilization is set without clobbering existing VLLM fields in the configure request.

The table-driven cases cover nil, valid, edge, and invalid `GPUMemoryUtilization` values well. Please add one case where other VLLM options (e.g. `HFOverrides`) are already set on the request, to verify that `BuildConfigureRequest` merges `GPUMemoryUtilization` into an existing `req.VLLM` without dropping those fields. This helps catch future refactors that might reinitialize `req.VLLM` and lose configuration.

Suggested implementation:

```golang
	tests := []struct {
		name               string
		gpuMemValue        *float64
		expectError        bool
		expectGPUMemSet    bool
		expectedGPUMemUtil float64
	}{
		{
			name:            "default - not set (nil)",

```

```golang
		{
			name:            "default - not set (nil)",
		},
		// NOTE: other existing cases (valid, edge, invalid) remain unchanged here...

		{
			name:               "valid gpu-mem with existing VLLM options preserved",
			// Use a valid, non-edge value to focus this case on merge behavior rather than validation.
			gpuMemValue:        float64Ptr(0.8),
			expectError:        false,
			expectGPUMemSet:    true,
			expectedGPUMemUtil: 0.8,
		},

```

To fully implement the behavior this new case is intended to check, update the body of `TestGPUMemoryUtilizationBehavior` as follows:

1. Inside the `for _, tt := range tests { t.Run(...) }` loop (which is not shown in the snippet), detect the `"valid gpu-mem with existing VLLM options preserved"` case and pre-populate the configure request with existing VLLM options **before** calling `BuildConfigureRequest`. For example:

   ```go
   for _, tt := range tests {
       t.Run(tt.name, func(t *testing.T) {
           req := &apiv1.ConfigureRequest{}

           if tt.name == "valid gpu-mem with existing VLLM options preserved" {
               req.VLLM = &apiv1.VLLMOptions{
                   HFOverrides: map[string]string{
                       "some.hf.override": "preserve-me",
                   },
               }
           }

           // existing logic that applies gpuMemValue and calls BuildConfigureRequest(...)
           err := BuildConfigureRequest(req /* other args as needed */)

           // existing assertions for tt.expectError, tt.expectGPUMemSet, tt.expectedGPUMemUtil...
           // e.g., checking req.VLLM.GPUMemoryUtilization or similar
   ```

2. After asserting the GPU memory utilization behavior for this case, add explicit checks that the original `HFOverrides` entry is still present and unchanged. For example:

   ```go
           if tt.name == "valid gpu-mem with existing VLLM options preserved" {
               if req.VLLM == nil {
                   t.Fatalf("VLLM options should not be nil after BuildConfigureRequest")
               }

               if got := req.VLLM.HFOverrides["some.hf.override"]; got != "preserve-me" {
                   t.Errorf("expected existing VLLM HFOverrides to be preserved, got %q", got)
               }
           }
       })
   }
   ```

3. Make sure to use the actual types and field names from your codebase (e.g., `apiv1.ConfigureRequest`, `apiv1.VLLMOptions`, `HFOverrides`, `GPUMemoryUtilization`) and adjust the call to `BuildConfigureRequest` to match its real signature.

These adjustments will ensure the new table case verifies that `BuildConfigureRequest` merges `GPUMemoryUtilization` into an existing `req.VLLM` without dropping pre-existing configuration such as `HFOverrides`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

}
}

func TestConfigureCmdGPUMemoryUtilizationFlag(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add a test case for non-float input to the --gpu-memory-utilization flag to exercise the error path in Float64PtrValue.Set.

Since Float64PtrValue.Set returns an error when strconv.ParseFloat fails, please add a subtest that calls cmd.Flags().Set("gpu-memory-utilization", "not-a-number") and asserts that an error is returned. This will exercise the error path and verify the custom flag’s parsing behavior end-to-end.

@ilopezluna ilopezluna merged commit 0f534e9 into main Dec 11, 2025
13 checks passed
@ilopezluna ilopezluna deleted the define-gpu-memory-utilization branch December 11, 2025 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants