Replies: 1 comment
-
|
You can try within the prompt something like "Provide a direct answer without explaining your process" or "Prioritize speed over depth". Based on recent research I would try putting it first and then repeating it at the end. Of course the API would be better |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Thank you for the v1.6.0 update, I'm finding the addition of structured output a great step forwards.
I'm using LLM Vision for driveway motion analysis. Ahead of the Gemini Flash 2.5-Lite shutdown, I have tried using Gemini 3 Flash but am seeing greater latency, i.e. several seconds more to get a response vs 2.5 Lite. I believe one of the drivers is the model defaulting to "Dynamic/High" reasoning, which is overkill for simple person/vehicle identification.
The Request:
Would it be possible to add support for the thinking_level parameter in the service call configuration?
According to the Google API docs, Gemini 3 Flash supports a MINIMAL level which is designed specifically to minimise latency for tasks like this. Adding a toggle in the UI (or simply allowing it as a parameter in the YAML) would allow us to "turn off" unnecessary reasoning and get back to the sub-2-second response times seen with lite models.
Testing the models in AI Studio with my driveway motion prompts, Flash 3 is significantly faster when thinking is turned off/reduced.
Beta Was this translation helpful? Give feedback.
All reactions