[ML] Support Gemini thinking budget in inference API #133599

DonalEvans · 2025-08-26T22:07:10Z

Adding support for configuring the thinkingBudget for Gemini 2.5 models when creating chat completion inference endpoints. The thinking_budget field is nested inside the thinking_config object in service_settings.

Added ThinkingConfig class to contain the thinking_budget field. This results in a less flat structure for the PUT _inference/chat_completion/ call but will make adding support for include_thoughts easier in future
Added extractOptionalInteger() method to ServiceUtils
Unit tests for ThinkingConfig class
Updated existing tests to account for the new object and field

These changes enable elastic/kibana#227590 to be completed

Specification PR: elastic/elasticsearch-specification#5257

Example usage:

PUT _inference/chat_completion/my_chat_completion_with_thinking_budget
{
  "service": "googlevertexai",
  "service_settings": {
    "service_account_json": <service account info>,
    "model_id": "gemini-2.5-pro",
    "location": "us-central1",
    "project_id": <project id>
  },
  "task_settings" : {
    "thinking_config": {
      "thinking_budget": 256
    }
  }
}

Adding support for configuring the thinkingBudget for Gemini 2.5 models when creating chat completion inference endpoints. The thinking_budget field is nested inside the thinking_config object in service_settings. - Added ThinkingConfig class to contain the thinking_budget field. This results in a less flat structure for the PUT _inference/chat_completion/ call but will make adding support for include_thoughts easier in future - Added extractOptionalInteger() method to ServiceUtils - Unit tests for ThinkingConfig class - Updated existing tests to account for the new object and field These changes enable elastic/kibana#227590 to be completed

elasticsearchmachine · 2025-08-26T22:07:34Z

Pinging @elastic/ml-core (Team:ML)

jonathan-buttner

Great work! I left a few comments.

...ava/org/elasticsearch/xpack/inference/services/googlevertexai/completion/ThinkingConfig.java

...nference/services/googlevertexai/completion/GoogleVertexAiChatCompletionServiceSettings.java

...icsearch/xpack/inference/services/googleaistudio/request/completion/ThinkingConfigTests.java

...e/services/googlevertexai/request/GoogleVertexAiUnifiedChatCompletionRequestEntityTests.java

.../plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/ServiceUtils.java

- Move transport version checks from ThinkingConfig to GoogleVertexAiChatCompletionServiceSettings - Remove default value argument from ThinkingConfig.of() - Add test coverage for ServiceUtils.extractOptionalPositiveInteger() and extractOptionalInteger()

jonathan-buttner · 2025-08-29T20:02:03Z

Just wanted to capture our discussion earlier this week. After thinking about this more I think it probably makes sense to move the thinking budget settings to the task_settings instead of the service_settings. This gives us more flexibility in the future because the user could in theory override the task_settings on a per request basis. Because this is for chat completion which adheres to a strict schema format we won't be able to allow individual requests to override the task settings but they can at least set them during the inference endpoint creation request.

jonathan-buttner

Looking great, just a few comments.

...k/inference/services/googlevertexai/completion/GoogleVertexAiChatCompletionTaskSettings.java

jonathan-buttner

Great work!

DonalEvans added >enhancement :ml Machine learning Team:ML Meta label for the ML team v9.2.0 labels Aug 26, 2025

Merge branch 'main' into add-gemini-thinking-budget

bd109a0

jonathan-buttner reviewed Aug 27, 2025

View reviewed changes

DonalEvans and others added 3 commits August 27, 2025 14:47

Apply review feedback

a39936f

- Move transport version checks from ThinkingConfig to GoogleVertexAiChatCompletionServiceSettings - Remove default value argument from ThinkingConfig.of() - Add test coverage for ServiceUtils.extractOptionalPositiveInteger() and extractOptionalInteger()

Merge branch 'main' into add-gemini-thinking-budget

819aac6

Fix transport versions

9e42997

DonalEvans requested a review from jonathan-buttner August 28, 2025 15:59

DonalEvans and others added 4 commits August 28, 2025 09:14

Merge branch 'main' into add-gemini-thinking-budget

a61a649

Merge branch 'main' into add-gemini-thinking-budget

95160b5

Update docs/changelog/133599.yaml

56fbae2

Merge branch 'main' into add-gemini-thinking-budget

5436712

DonalEvans added 3 commits August 29, 2025 13:30

Merge branch 'main' into add-gemini-thinking-budget

4171c69

Move thinking config settings into task settings

546f5ad

Remove unnecessary override from test

b10aed9

jonathan-buttner reviewed Sep 2, 2025

View reviewed changes

...k/inference/services/googlevertexai/completion/GoogleVertexAiChatCompletionTaskSettings.java Outdated Show resolved Hide resolved

...k/inference/services/googlevertexai/completion/GoogleVertexAiChatCompletionTaskSettings.java Outdated Show resolved Hide resolved

DonalEvans added 2 commits September 2, 2025 08:20

Remove transport version checks for task settings class

90e2a95

Merge branch 'main' into add-gemini-thinking-budget

ef70439

DonalEvans requested a review from jonathan-buttner September 2, 2025 16:51

jonathan-buttner approved these changes Sep 2, 2025

View reviewed changes

DonalEvans added 5 commits September 2, 2025 11:28

Merge branch 'main' into add-gemini-thinking-budget

b458892

Merge branch 'main' into add-gemini-thinking-budget

ac00f5b

Merge branch 'main' into add-gemini-thinking-budget

0f03736

Merge branch 'main' into add-gemini-thinking-budget

779819a

Merge branch 'main' into add-gemini-thinking-budget

83e5135

DonalEvans mentioned this pull request Sep 3, 2025

[ML] Include thinking_config in GoogleVertexAITaskSettings elastic/elasticsearch-specification#5257

Merged

DonalEvans merged commit 1a1954f into elastic:main Sep 4, 2025
33 checks passed

DonalEvans deleted the add-gemini-thinking-budget branch September 4, 2025 16:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Support Gemini thinking budget in inference API #133599

[ML] Support Gemini thinking budget in inference API #133599

Uh oh!

DonalEvans commented Aug 26, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Aug 26, 2025

Uh oh!

jonathan-buttner left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jonathan-buttner commented Aug 29, 2025

Uh oh!

jonathan-buttner left a comment

Uh oh!

Uh oh!

Uh oh!

jonathan-buttner left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[ML] Support Gemini thinking budget in inference API #133599

[ML] Support Gemini thinking budget in inference API #133599

Uh oh!

Conversation

DonalEvans commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Aug 26, 2025

Uh oh!

jonathan-buttner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jonathan-buttner commented Aug 29, 2025

Uh oh!

jonathan-buttner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jonathan-buttner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DonalEvans commented Aug 26, 2025 •

edited

Loading