Enable the thinking_config parameter for the gemini-2.5-flash and gemini-2.5-pro models in Vertex AI, allowing users to fine‑tune the model’s internal reasoning (“thinking”) process via the Gen AI SDK for Python. #30939
Shobhit0109
announced in
Ideas
Replies: 1 comment
-
We could implement something like this: import os
from langchain.chat_models import init_chat_model
from langchain.schema import BaseMessage
from google.genai import types
import time
# ─────────── Environment Setup ───────────
# Point to your service-account key (or rely on gcloud ADC)
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "file.json"
# ─────────── Initialize Fixed Chat Model ───────────
chat = init_chat_model(
# model="gemini-2.0-flash",
model="gemini-2.5-flash-preview-04-17",
model_provider="google_vertexai",
temperature=0.0,
config=types.GenerateContentConfig(
thinking_config=types.ThinkingConfig(
thinking_budget=1024,
)
),
thinking_config=types.ThinkingConfig(
thinking_budget=1024,
),
) # Optional either of the above 2 params.
# Invocation
start_time = time.time()
response: BaseMessage = chat.invoke("Tell me a fun fact about space.")
end_time = time.time()
print(f"\n\tTime taken: {end_time - start_time} seconds\n\n{response.content}")
"""
With model gemini-2.0-flash time taken ~ 3sec.
with thinking enabled model ~ 9 sec.
""" |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Checked
Feature request
Currently, Vertex AI supports thinking budgets for 2.5 Flash in the console and exposes the thinking_budget field in the Python SDK’s ThinkingConfig class, but this option isn’t yet surfaced in our managed SDK wrappers or service APIs for gemini-2.5-flash and gemini-2.5-pro calls.
The ability to manually set a thinking budget (up to 24,576 tokens, with a floor of 1,024 tokens) or disable thinking entirely (budget = 0) is documented in Google Cloud’s Generative AI docs.
Google Cloud
Gemini 2.5 Pro and Flash are preview models designed for advanced reasoning tasks, with 2.5 Pro offering state‑of‑the‑art accuracy and 2.5 Flash providing a cost‑efficient compromise with controllable reasoning.
Motivation
Gemini 2.5 Flash and Pro models default to an automatic internal thinking process capped at 8,192 tokens, with no direct way for users to adjust this behavior.
Allowing developers to explicitly configure the thinking_config parameter enables precise control over reasoning depth, balancing cost, latency, and output quality for diverse workloads
Proposal (If applicable)
I propose we extend the Vertex AI Python client’s generate_content (and related) methods to accept an optional thinking_config parameter for both gemini-2.5‑flash and gemini-2.5‑pro, as illustrated in the Gen AI SDK for Python docs
Beta Was this translation helpful? Give feedback.
All reactions