Enable the thinking_config parameter for the gemini-2.5-flash and gemini-2.5-pro models in Vertex AI, allowing users to fine‑tune the model’s internal reasoning (“thinking”) process via the Gen AI SDK for Python. #30939

Shobhit0109 · 2025-04-20T14:13:02Z

Shobhit0109
Apr 20, 2025

Checked

I searched existing ideas and did not find a similar one
I added a very descriptive title
I've clearly described the feature request and motivation for it

Feature request

Currently, Vertex AI supports thinking budgets for 2.5 Flash in the console and exposes the thinking_budget field in the Python SDK’s ThinkingConfig class, but this option isn’t yet surfaced in our managed SDK wrappers or service APIs for gemini-2.5-flash and gemini-2.5-pro calls.

The ability to manually set a thinking budget (up to 24,576 tokens, with a floor of 1,024 tokens) or disable thinking entirely (budget = 0) is documented in Google Cloud’s Generative AI docs.
Google Cloud

Gemini 2.5 Pro and Flash are preview models designed for advanced reasoning tasks, with 2.5 Pro offering state‑of‑the‑art accuracy and 2.5 Flash providing a cost‑efficient compromise with controllable reasoning.

Motivation

Gemini 2.5 Flash and Pro models default to an automatic internal thinking process capped at 8,192 tokens, with no direct way for users to adjust this behavior.

Allowing developers to explicitly configure the thinking_config parameter enables precise control over reasoning depth, balancing cost, latency, and output quality for diverse workloads

Proposal (If applicable)

I propose we extend the Vertex AI Python client’s generate_content (and related) methods to accept an optional thinking_config parameter for both gemini-2.5‑flash and gemini-2.5‑pro, as illustrated in the Gen AI SDK for Python docs

Shobhit0109 · 2025-04-20T14:36:01Z

Shobhit0109
Apr 20, 2025
Author

We could implement something like this:

import os
from langchain.chat_models import init_chat_model
from langchain.schema import BaseMessage
from google.genai import types
import time

# ─────────── Environment Setup ───────────
# Point to your service-account key (or rely on gcloud ADC)
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "file.json"

# ─────────── Initialize Fixed Chat Model ───────────
chat = init_chat_model(
    # model="gemini-2.0-flash",
    model="gemini-2.5-flash-preview-04-17",
    model_provider="google_vertexai",
    temperature=0.0,
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(
            thinking_budget=1024,
        )
    ),
    thinking_config=types.ThinkingConfig(
        thinking_budget=1024,
    ),
) # Optional either of the above 2 params.

# Invocation
start_time = time.time()
response: BaseMessage = chat.invoke("Tell me a fun fact about space.")
end_time = time.time()

print(f"\n\tTime taken: {end_time - start_time} seconds\n\n{response.content}")
"""
With model gemini-2.0-flash time taken ~ 3sec.
with thinking enabled model ~ 9 sec.
"""

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable the thinking_config parameter for the gemini-2.5-flash and gemini-2.5-pro models in Vertex AI, allowing users to fine‑tune the model’s internal reasoning (“thinking”) process via the Gen AI SDK for Python. #30939

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Enable the thinking_config parameter for the gemini-2.5-flash and gemini-2.5-pro models in Vertex AI, allowing users to fine‑tune the model’s internal reasoning (“thinking”) process via the Gen AI SDK for Python. #30939

Uh oh!

Shobhit0109 Apr 20, 2025

Checked

Feature request

Motivation

Proposal (If applicable)

Replies: 1 comment

Uh oh!

Uh oh!

Shobhit0109 Apr 20, 2025 Author

Shobhit0109
Apr 20, 2025

Shobhit0109
Apr 20, 2025
Author