Skip to content

Conversation

DonalEvans
Copy link
Contributor

@DonalEvans DonalEvans commented Aug 26, 2025

Adding support for configuring the thinkingBudget for Gemini 2.5 models when creating chat completion inference endpoints. The thinking_budget field is nested inside the thinking_config object in service_settings.

  • Added ThinkingConfig class to contain the thinking_budget field. This results in a less flat structure for the PUT _inference/chat_completion/ call but will make adding support for include_thoughts easier in future
  • Added extractOptionalInteger() method to ServiceUtils
  • Unit tests for ThinkingConfig class
  • Updated existing tests to account for the new object and field

These changes enable elastic/kibana#227590 to be completed

Specification PR: elastic/elasticsearch-specification#5257

Example usage:

PUT _inference/chat_completion/my_chat_completion_with_thinking_budget
{
  "service": "googlevertexai",
  "service_settings": {
    "service_account_json": <service account info>,
    "model_id": "gemini-2.5-pro",
    "location": "us-central1",
    "project_id": <project id>
  },
  "task_settings" : {
    "thinking_config": {
      "thinking_budget": 256
    }
  }
}

Adding support for configuring the thinkingBudget for Gemini 2.5 models
when creating chat completion inference endpoints. The thinking_budget
field is nested inside the thinking_config object in service_settings.

- Added ThinkingConfig class to contain the thinking_budget field. This
  results in a less flat structure for the PUT
  _inference/chat_completion/ call but will make adding support for
  include_thoughts easier in future
- Added extractOptionalInteger() method to ServiceUtils
- Unit tests for ThinkingConfig class
- Updated existing tests to account for the new object and field

These changes enable elastic/kibana#227590 to be completed
@DonalEvans DonalEvans added >enhancement :ml Machine learning Team:ML Meta label for the ML team v9.2.0 labels Aug 26, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I left a few comments.

DonalEvans and others added 3 commits August 27, 2025 14:47
- Move transport version checks from ThinkingConfig to
  GoogleVertexAiChatCompletionServiceSettings
- Remove default value argument from ThinkingConfig.of()
- Add test coverage for ServiceUtils.extractOptionalPositiveInteger()
  and extractOptionalInteger()
@jonathan-buttner
Copy link
Contributor

Just wanted to capture our discussion earlier this week. After thinking about this more I think it probably makes sense to move the thinking budget settings to the task_settings instead of the service_settings. This gives us more flexibility in the future because the user could in theory override the task_settings on a per request basis. Because this is for chat completion which adheres to a strict schema format we won't be able to allow individual requests to override the task settings but they can at least set them during the inference endpoint creation request.

Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great, just a few comments.

Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@DonalEvans DonalEvans merged commit 1a1954f into elastic:main Sep 4, 2025
33 checks passed
@DonalEvans DonalEvans deleted the add-gemini-thinking-budget branch September 4, 2025 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :ml Machine learning Team:ML Meta label for the ML team v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants