Skip to content

Gemini models in Helm installation broken #995

@MatthiasVervaet

Description

@MatthiasVervaet

What happened?

I configured the helm chart with this gemini config:

additionalEnvVars:
- name: CLUSTER_NAME
  value: "my-fancy-cluster"
- name: GEMINI_API_KEY
  valueFrom:
    secretKeyRef:
      name: holmes-secrets
      key: gemini-api-key
- name: TOOL_SCHEMA_NO_PARAM_OBJECT_IF_NO_PARAMS
  value: "true"  

modelList:
  gemini-2.5-pro:
    api_key: "{{ env.GEMINI_API_KEY }}"
    model: gemini/gemini-2.5-pro
    temperature: 1
  gemini-2.5-flash:
    api_key: "{{ env.GEMINI_API_KEY }}"
    model: gemini/gemini-2.5-flash
    temperature: 1
  ggemini-live-2.5-flash-preview:
    api_key: "{{ env.GEMINI_API_KEY }}"
    model: gemini/gemini-live-2.5-flash-preview
    temperature: 1

The container span up fine but when i executed an API call to the forwarded service using this curl command:

curl --location 'http://localhost:8080/api/chat' \
--header 'Content-Type: application/json' \
--data '{"ask": "list pods in namespace holmes-poc?", "model": "gemini-2.5-pro"}'

the code crashed with these logs:

setting up colored logging
2025-09-25 12:31:23.712 INFO     logger initialized using INFO log level
2025-09-25 12:31:23.712 INFO     No robusta config in /etc/robusta/config/active_playbooks.yaml
2025-09-25 12:31:23.712 INFO     Not connecting to Robusta platform - robusta token not provided - using ROBUSTA_AI will not be possible
2025-09-25 12:31:23.715 INFO     Loaded models: ['gemini-2.5-flash', 'gemini-2.5-pro', 'ggemini-live-2.5-flash-preview']
2025-09-25 12:31:23.715 INFO     Initializing sentry for production environment...
2025-09-25 12:31:24.766 INFO     Updating status of holmes
2025-09-25 12:31:24.767 INFO     Robusta store not initialized. Skipping upserting holmes status.
2025-09-25 12:31:24.831 INFO     Core investigation toolset loaded
2025-09-25 12:31:25.002 INFO     ✅ Toolset core_investigation
2025-09-25 12:31:25.003 INFO     ✅ Toolset internet
2025-09-25 12:31:25.005 INFO     ❌ Toolset robusta: Data access layer is disabled
2025-09-25 12:31:25.009 INFO     ✅ Toolset kubernetes/logs
2025-09-25 12:31:25.038 INFO     discovered service with label-selector: `app=kube-prometheus-stack-prometheus` at url: `http://kube-prometheus-kube-prome-prometheus.kube-prometheus.svc.cluster.local:9090`
2025-09-25 12:31:25.039 INFO     Prometheus auto discovered at url http://kube-prometheus-kube-prome-prometheus.kube-prometheus.svc.cluster.local:9090
2025-09-25 12:31:25.076 INFO     ✅ Toolset prometheus/metrics
2025-09-25 12:31:25.119 INFO     ✅ Toolset kubernetes/core
2025-09-25 12:31:25.235 INFO     Robusta store not initialized. Skipping sync holmes toolsets.
2025-09-25 12:31:25.235 INFO     Enabled toolsets: ['kubernetes/core', 'core_investigation', 'internet', 'prometheus/metrics', 'kubernetes/logs']
2025-09-25 12:31:25.236 INFO     Disabled toolsets: ['slab', 'confluence', 'kubernetes/live-metrics', 'kubernetes/kube-prometheus-stack', 'kubernetes/kube-lineage-extras', 'argocd/core', 'helm/core', 'robusta', 'opensearch/status', 'grafana/loki', 'grafana/tempo', 'newrelic', 'grafana/grafana', 'notion', 'kafka/admin', 'datadog/logs', 'datadog/general', 'datadog/metrics', 'datadog/traces', 'datadog/rds', 'opensearch/logs', 'opensearch/traces', 'coralogix/logs', 'rabbitmq/core', 'git', 'bash', 'MongoDBAtlas', 'runbook', 'azure/sql', 'ServiceNow']
2025-09-25 12:31:25,289 INFO     Started server process [1]
2025-09-25 12:31:25,290 INFO     Waiting for application startup.
2025-09-25 12:31:25,291 INFO     Application startup complete.
2025-09-25 12:31:25,292 INFO     Uvicorn running on http://0.0.0.0:5050 (Press CTRL+C to quit)
2025-09-25 12:34:21.298 INFO     Using selected model: gemini-2.5-pro
2025-09-25 12:34:21.298 INFO     Creating LLM with model: gemini-2.5-pro
12:34:21 - LiteLLM:INFO: utils.py:3363 - 
LiteLLM completion() model= gemini-2.5-pro; provider = gemini
2025-09-25 12:34:21.344 INFO     
LiteLLM completion() model= gemini-2.5-pro; provider = gemini

Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'.

2025-09-25 12:34:22.754 ERROR    Error in /api/chat: litellm.BadRequestError: VertexAIException BadRequestError - {
  "error": {
    "code": 400,
    "message": "CachedContent can not be used with GenerateContent request setting system_instruction, tools or tool_config.\n\nProposed fix: move those values to CachedContent from GenerateContent request.",
    "status": "INVALID_ARGUMENT"
  }
}

Traceback (most recent call last):
  File "/venv/lib/python3.11/site-packages/litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py", line 2048, in completion
    response = client.post(url=url, headers=headers, json=data)  # type: ignore
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/litellm/llms/custom_httpx/http_handler.py", line 782, in post
    raise e
  File "/venv/lib/python3.11/site-packages/litellm/llms/custom_httpx/http_handler.py", line 764, in post
    response.raise_for_status()
  File "/venv/lib/python3.11/site-packages/httpx/_models.py", line 763, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '400 Bad Request' for url 'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro:generateContent?key=AIzaSyBR7pKGjSuoPYakDxVz09RfxZ8ovzrUJrk'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/venv/lib/python3.11/site-packages/litellm/main.py", line 2819, in completion
    response = vertex_chat_completion.completion(  # type: ignore
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py", line 2052, in completion
    raise VertexAIError(
litellm.llms.vertex_ai.common_utils.VertexAIError: {
  "error": {
    "code": 400,
    "message": "CachedContent can not be used with GenerateContent request setting system_instruction, tools or tool_config.\n\nProposed fix: move those values to CachedContent from GenerateContent request.",
    "status": "INVALID_ARGUMENT"
  }
}


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/server.py", line 368, in chat
    llm_call = ai.messages_call(messages=messages)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/holmes/core/tool_calling_llm.py", line 296, in messages_call
    return self.call(
           ^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/sentry_sdk/tracing_utils.py", line 867, in sync_wrapper
    result = f(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^
  File "/app/holmes/core/tool_calling_llm.py", line 349, in call
    full_response = self.llm.completion(
                    ^^^^^^^^^^^^^^^^^^^^
  File "/app/holmes/core/llm.py", line 273, in completion
    result = litellm_to_use.completion(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/litellm/utils.py", line 1343, in wrapper
    raise e
  File "/venv/lib/python3.11/site-packages/litellm/utils.py", line 1218, in wrapper
    result = original_function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/litellm/main.py", line 3624, in completion
    raise exception_type(
          ^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2301, in exception_type
    raise e
  File "/venv/lib/python3.11/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 1293, in exception_type
    raise BadRequestError(
litellm.exceptions.BadRequestError: litellm.BadRequestError: VertexAIException BadRequestError - {
  "error": {
    "code": 400,
    "message": "CachedContent can not be used with GenerateContent request setting system_instruction, tools or tool_config.\n\nProposed fix: move those values to CachedContent from GenerateContent request.",
    "status": "INVALID_ARGUMENT"
  }
}

2025-09-25 12:34:23,055 INFO     127.0.0.1:60990 - "POST /api/chat HTTP/1.1" 500


Thanks!

What did you expect to happen?

Give me correct response without crashing

How can we reproduce it (as minimally and precisely as possible)?

Run the same command(s) with container holmes:0.14.2 (currently latest)

Anything else we need to know?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions