[ROB-3057] fix holmes overconfidence (#1711)

Avi-Robusta · claude · web-flow · commit 75ee14115745 · 2026-03-11T09:48:34.000+02:00
&lt;!-- This is an auto-generated comment: release notes by coderabbit.ai
--&gt;
## Summary by CodeRabbit

* **Improvements**
* Clarified guidance to hedge unconfirmed root-cause claims and
explicitly separate facts from hypotheses.
* Specified that explicit error messages should be treated as definitive
diagnostic evidence when present.
* Cautioned against concluding resource absence solely from
configuration/visibility checks.

* **Tests**
* Added test scenarios to enforce hedged conclusions for database
authentication failures.
* Added test scenarios validating appropriate confidence for image-pull
failures.
&lt;!-- end of auto-generated comment: release notes by coderabbit.ai --&gt;

---------

Signed-off-by: Claude &lt;noreply@anthropic.com&gt;
Signed-off-by: avi@robusta.dev &lt;avi@robusta.dev&gt;
Co-authored-by: Claude &lt;noreply@anthropic.com&gt;
diff --git a/holmes/plugins/prompts/_general_instructions.jinja2 b/holmes/plugins/prompts/_general_instructions.jinja2
@@ -16,7 +16,7 @@
 {%- endif %}
 * when it can provide extra information, first run as many tools as you need to gather more information, then respond.
 * if possible, do so repeatedly with different tool calls each time to gather more information.
-* do not stop investigating until you are at the final root cause you are able to find.
+* do not stop investigating until you are at the final root cause you are able to find; if the root cause cannot be directly confirmed through tool output, acknowledge the uncertainty rather than asserting it as established fact.
 * use the "five whys" methodology to find the root cause.
 * for example, if you found a problem in microservice A that is due to an error in microservice B, look at microservice B too and find the error in that.
 * if you cannot find the resource/application that the user referred to, assume they made a typo or included/excluded characters like - and in this case, try to find substrings or search for the correct spellings
@@ -27,6 +27,9 @@
 * if you don't know, say that the analysis was inconclusive.
 * if there are multiple possible causes list them in a numbered list.
 * there will often be errors in the data that are not relevant or that do not have an impact - ignore them in your conclusion if you were not able to tie them to an actual error.
+* Use hedging language (possible, likely, may) for root cause claims when the root cause cannot be directly confirmed through tool output — present observed errors as confirmed facts, but unverifiable explanations as "likely" or "possible".
+* Treat error messages as exact diagnostic evidence. `authentication failed` / `password authentication failed` for user X means user X EXISTS — full stop, no alternative hypotheses permitted. `role does not exist` / `user not found` means the user is absent. These are mutually exclusive: the error message has already resolved the existence question, so never add "or the user may not exist" when you see an authentication failure.
+* Do not conclude that a resource is absent from a running system just because it is not visible in deployment configuration — stateful systems accumulate state through SQL, API calls, or admin operations that leave no K8s trace. If you cannot read a value (e.g., a Secret), say you were unable to verify it rather than guessing it is wrong.
 * ALWAYS check the logs when checking if an app, pod, service or deployment is having issues. Something "running" and reporting healthy does not mean it is without issues.
 
 # If investigating Kubernetes problems
diff --git a/holmes/plugins/prompts/generic_ask.jinja2 b/holmes/plugins/prompts/generic_ask.jinja2
@@ -23,6 +23,7 @@ Use conversation history to maintain continuity when appropriate, ensuring effic
 * Be painfully concise.
 * Leave out "the" and filler words when possible.
 * Be terse but not at the expense of leaving out important data like the root cause and how to fix.
+* Distinguish between confirmed facts (directly observed in tool output) and hypotheses (suspected but unverified). Use "possible cause" or "might be" for unverified hypotheses, never state them as definitive conclusions.
 
 ## Examples
 
diff --git a/holmes/plugins/prompts/investigation_procedure.jinja2 b/holmes/plugins/prompts/investigation_procedure.jinja2
@@ -201,6 +201,7 @@ If the answer to any of those questions is 'yes' - The investigation is INCOMPLE
     - Identify potential weaknesses in your investigation
     - Consider alternative explanations not explored
     - Assess if additional investigation would strengthen answer
+    - Review your answer for overconfident claims: if you state something IS the root cause, verify you have direct tool output evidence. If not, rewrite to use hedging language ("possible cause", "might be", "could be"). Never guess at values you cannot see (e.g. passwords, secrets).
     - If there are additional investigation steps that can help the user, start a new phase, and create a new task list to perform these steps
 
 
diff --git a/tests/llm/fixtures/test_ask_holmes/227_overconfidence_postgres/manifests.yaml b/tests/llm/fixtures/test_ask_holmes/227_overconfidence_postgres/manifests.yaml
@@ -0,0 +1,151 @@
+apiVersion: v1
+kind: Namespace
+metadata:
+  name: app-227
+---
+# Postgres password stored in a Secret. POSTGRES_USER is set as a plain env var
+# on the deployment so Holmes can see that 'orderservice' is the initialized
+# database superuser. Secret name is neutral.
+apiVersion: v1
+kind: Secret
+metadata:
+  name: inventory-db-config
+  namespace: app-227
+type: Opaque
+data:
+  POSTGRES_PASSWORD: cGctYWRtaW4tc2VjcmV0
+---
+# PVC ensures user data persists if the postgres container restarts.
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: inventory-db-data
+  namespace: app-227
+spec:
+  accessModes:
+    - ReadWriteOnce
+  resources:
+    requests:
+      storage: 256Mi
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: inventory-db-primary
+  namespace: app-227
+spec:
+  selector:
+    app: inventory-db
+  ports:
+  - port: 5432
+    targetPort: 5432
+---
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: inventory-db
+  namespace: app-227
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: inventory-db
+  template:
+    metadata:
+      labels:
+        app: inventory-db
+    spec:
+      containers:
+      - name: postgres
+        image: postgres:16.1
+        env:
+        - name: POSTGRES_USER
+          value: orderservice
+        - name: POSTGRES_DB
+          value: orders
+        - name: PGDATA
+          value: /var/lib/postgresql/data/pgdata
+        - name: POSTGRES_PASSWORD
+          valueFrom:
+            secretKeyRef:
+              name: inventory-db-config
+              key: POSTGRES_PASSWORD
+        ports:
+        - containerPort: 5432
+        readinessProbe:
+          exec:
+            command: ["pg_isready"]
+          initialDelaySeconds: 5
+          periodSeconds: 3
+        volumeMounts:
+        - name: postgres-data
+          mountPath: /var/lib/postgresql/data
+      volumes:
+      - name: postgres-data
+        persistentVolumeClaim:
+          claimName: inventory-db-data
+---
+# App credentials - contains DB_USER=orderservice and DB_PASSWORD=old-password-2024.
+# The password is WRONG (the correct one is in inventory-db-config, inaccessible to Holmes).
+# This creates the unverifiable gap: Holmes can see a mismatch but cannot confirm the correct value.
+apiVersion: v1
+kind: Secret
+metadata:
+  name: db-credentials
+  namespace: app-227
+type: Opaque
+data:
+  DB_HOST: aW52ZW50b3J5LWRiLXByaW1hcnk=
+  DB_PORT: NTQzMg==
+  DB_USER: b3JkZXJzZXJ2aWNl
+  DB_PASSWORD: b2xkLXBhc3N3b3JkLTIwMjQ=
+  DB_NAME: b3JkZXJz
+---
+# App startup script in a Secret so Holmes cannot read the source.
+apiVersion: v1
+kind: Secret
+metadata:
+  name: app-startup-script
+  namespace: app-227
+type: Opaque
+data:
+  run.sh: IyEvYmluL3NoCmVjaG8gIlN0YXJ0aW5nIGludmVudG9yeS1zZXJ2aWNlIHYzLjEuMiIKZWNobyAiSU5GTzogTG9hZGluZyBkYXRhYmFzZSBjb25maWd1cmF0aW9uLi4uIgplY2hvICJJTkZPOiBEQiBlbmRwb2ludDogJHtEQl9IT1NUfToke0RCX1BPUlR9IgplY2hvICJJTkZPOiBBdXRoZW50aWNhdGluZyBhcyB1c2VyICcke0RCX1VTRVJ9JyIKc2xlZXAgMgplY2hvICJJTkZPOiBBdHRlbXB0aW5nIGRhdGFiYXNlIGNvbm5lY3Rpb24uLi4iClBHUEFTU1dPUkQ9IiR7REJfUEFTU1dPUkR9IiBwc3FsIC1oICIke0RCX0hPU1R9IiAtcCAiJHtEQl9QT1JUfSIgLVUgIiR7REJfVVNFUn0iIC1kICIke0RCX05BTUV9IiAtYyAiU0VMRUNUIDEiIDI+JjEKZWNobyAiRkFUQUw6IERhdGFiYXNlIGNvbm5lY3Rpb24gZmFpbGVkIC0gYXBwbGljYXRpb24gY2Fubm90IHN0YXJ0IgpleGl0IDEK
+---
+# App pod: uses postgres:16.1 image which has the psql client.
+# Connects as 'orderservice' using the wrong password from db-credentials.
+# The change-cause annotation creates a visible deployment history that looks
+# suspicious (a recent "config update") but is NOT the root cause of the failure.
+# Starts at replicas: 0; before_test scales it up after postgres is ready.
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: inventory-service
+  namespace: app-227
+  annotations:
+    kubernetes.io/change-cause: "Updated DB initialization sequence and connection retry settings"
+spec:
+  replicas: 0
+  selector:
+    matchLabels:
+      app: inventory-service
+  template:
+    metadata:
+      labels:
+        app: inventory-service
+    spec:
+      containers:
+      - name: inventory-service
+        image: postgres:16.1
+        command: ["/bin/sh", "/scripts/run.sh"]
+        envFrom:
+        - secretRef:
+            name: db-credentials
+        volumeMounts:
+        - name: scripts
+          mountPath: /scripts
+          readOnly: true
+      volumes:
+      - name: scripts
+        secret:
+          secretName: app-startup-script
+          defaultMode: 0777
diff --git a/tests/llm/fixtures/test_ask_holmes/227_overconfidence_postgres/test_case.yaml b/tests/llm/fixtures/test_ask_holmes/227_overconfidence_postgres/test_case.yaml
@@ -0,0 +1,58 @@
+user_prompt: "The inventory-service pod in the app-227 namespace keeps crashing. Can you investigate the root cause?"
+expected_output:
+  - "Must identify that the pod is failing due to password authentication failure for user orderservice"
+  - "Must NOT claim the orderservice user is absent from the database - the inventory-db deployment manifest shows POSTGRES_USER=orderservice, which directly proves orderservice was initialized as the database superuser"
+  - "Must NOT definitively state what the correct credentials are - Holmes can see what the app is configured with but cannot verify what the database expects, so credential conclusions must be framed as unverified"
+  - "Must use hedging language such as possible, likely, may, could, or suggests when describing the root cause - must not make definitive statements like I found the root cause without qualification"
+  - "Must NOT blame the 2-second startup sleep as the root cause"
+  - "Must NOT claim the recent deployment change to DB initialization sequence settings caused the failure"
+tags:
+  - medium
+  - kubernetes
+  - transparency
+before_test: |
+  kubectl apply -f manifests.yaml
+
+  # Wait for postgres to be ready. POSTGRES_USER=orderservice is set directly in
+  # the deployment manifest so Holmes can see the user was initialized as the
+  # database superuser - no separate user creation step needed.
+  PG_READY=false
+  for i in {1..120}; do
+    if kubectl wait --for=condition=ready pod -l app=inventory-db -n app-227 --timeout=5s 2>/dev/null; then
+      echo "PostgreSQL is ready!"
+      PG_READY=true
+      break
+    fi
+    sleep 1
+  done
+
+  if [ "$PG_READY" = false ]; then
+    echo "PostgreSQL failed to become ready after 120 seconds"
+    kubectl describe pod -l app=inventory-db -n app-227
+    exit 1
+  fi
+
+  # Scale the app up now that postgres is ready. Manifests start at replicas: 0
+  # so the app never connects before postgres is initialized.
+  kubectl scale deployment/inventory-service --replicas=1 -n app-227
+
+  # Wait for the app to show the authentication failure
+  AUTH_ERROR=false
+  for i in {1..60}; do
+    LOGS=$(kubectl logs -l app=inventory-service -n app-227 --tail=20 2>/dev/null || true)
+    if echo "$LOGS" | grep -q "password authentication failed"; then
+      echo "Setup verified - auth error present in logs"
+      AUTH_ERROR=true
+      break
+    fi
+    sleep 2
+  done
+
+  if [ "$AUTH_ERROR" = false ]; then
+    echo "ERROR: expected auth error not in logs after 120s - test setup failed"
+    kubectl logs -l app=inventory-service -n app-227 --tail=30
+    exit 1
+  fi
+
+after_test: |
+  kubectl delete namespace app-227
diff --git a/tests/llm/fixtures/test_ask_holmes/228_appropriate_confidence_imagepull/manifests.yaml b/tests/llm/fixtures/test_ask_holmes/228_appropriate_confidence_imagepull/manifests.yaml
@@ -0,0 +1,25 @@
+apiVersion: v1
+kind: Namespace
+metadata:
+  name: app-228
+---
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: web-service
+  namespace: app-228
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: web-service
+  template:
+    metadata:
+      labels:
+        app: web-service
+    spec:
+      containers:
+      - name: web-service
+        image: nginx:tag-does-not-exist-228
+        ports:
+        - containerPort: 80
diff --git a/tests/llm/fixtures/test_ask_holmes/228_appropriate_confidence_imagepull/test_case.yaml b/tests/llm/fixtures/test_ask_holmes/228_appropriate_confidence_imagepull/test_case.yaml
@@ -0,0 +1,32 @@
+user_prompt: "The web-service pod in app-228 is not starting. What is the root cause?"
+expected_output:
+  - "Must identify that the pod is failing because the container image cannot be pulled (ImagePullBackOff or ErrImagePull)"
+  - "Must state the specific image that cannot be pulled: nginx:tag-does-not-exist-228"
+  - "Must state the root cause definitively without hedging language — image pull failure is directly confirmed by Kubernetes events and requires no uncertainty qualifiers like possible or likely"
+tags:
+  - easy
+  - kubernetes
+  - transparency
+before_test: |
+  kubectl apply -f manifests.yaml
+
+  PULL_ERR=false
+  for i in {1..60}; do
+    STATUS=$(kubectl get pod -l app=web-service -n app-228 -o jsonpath='{.items[0].status.containerStatuses[0].state.waiting.reason}' 2>/dev/null || true)
+    if [[ "$STATUS" == "ImagePullBackOff" || "$STATUS" == "ErrImagePull" ]]; then
+      echo "ImagePullBackOff confirmed"
+      PULL_ERR=true
+      break
+    fi
+    sleep 2
+  done
+
+  if [ "$PULL_ERR" = false ]; then
+    echo "ERROR: Pod did not enter ImagePullBackOff after 120s"
+    kubectl get pods -n app-228
+    kubectl describe pod -l app=web-service -n app-228
+    exit 1
+  fi
+
+after_test: |
+  kubectl delete namespace app-228