- Introduction
- Environment Setup
- Feature - RCA Quality Checker (check_rca_quality.py)
- Feature - Incident Correlation (correlate_incidents.py)
- Feature - Root Cause Entity Type (find_root_cause_entity_type.py)
- Gotcha - Understanding LLM Behavior
This repo contains programs using LLMs to demonstrate
- RCA quality checks e.g. clear, vague, none.
- incident correlation e.g. current database issue is similar to a previous issue from a given list.
- Tying root cause analysis to specific category of known entities e.g. users, vendors, internal servcies.
Set OpenRouter API key (default):
export OPENROUTER_API_KEY=your-key-hereOr use AWS Bedrock:
export AWS_DEFAULT_REGION=us-east-1
export BEDROCK_MODEL_ID=anthropic.claude-3-haiku-20240307-v1:0Test your API key:
./test_openrouter_api.pyNote: If the default model (deepseek/deepseek-chat) is not returning results, try using an alternative free model:
./check_rca_quality.py --model "x-ai/grok-beta:free" ...
./correlate_incidents.py --model "x-ai/grok-beta:free" ...
./find_root_cause_entity_type.py --model "x-ai/grok-beta:free" ...Analyzes RCA document quality and categorizes as clear/vague/none.
Comprehensive RCA with 5 Whys, timeline, and preventive measures:
./check_rca_quality.py \
--rca-file sample_data/rca_01_good.md \
--prompt-file prompts/find_rca_type_prompt.md \
--incident-title "SSL Certificate Expiration"Expected output: "rca_type": "clear"
Brief RCA lacking detail and structure:
./check_rca_quality.py \
--rca-file sample_data/rca_02_vague.md \
--prompt-file prompts/find_rca_type_prompt.md \
--incident-title "Database Connection Pool"Expected output: "rca_type": "vague"
Incident notes without proper RCA:
./check_rca_quality.py \
--rca-file sample_data/rca_03_none.md \
--prompt-file prompts/find_rca_type_prompt.md \
--incident-title "High Memory Usage"Expected output: "rca_type": "none" or "rca_type": "vague"
Finds similar historical incidents using semantic matching.
Query for memory/CPU issues will find matches:
./correlate_incidents.py \
--query "High memory usage on service causing performance issues" \
--incidents-file sample_data/incidents_04_with_similar.json \
--prompt-file prompts/incident_similarity_search_prompt.md \
--min-similarity 0.6Expected: Multiple similar incidents found (INC-001, INC-003, INC-006)
Query for Redis issues won't find matches in this dataset:
./correlate_incidents.py \
--query "Redis cluster failover causing connection drops" \
--incidents-file sample_data/incidents_05_without_similar.json \
--prompt-file prompts/incident_similarity_search_prompt.md \
--min-similarity 0.6Expected: No or very few similar incidents found
Categorizes incidents as internal/vendor/end_user/vague based on the source of the problem.
Cloud provider (AWS/Azure/GCP) issues are INTERNAL because they support YOUR infrastructure:
./find_root_cause_entity_type.py \
--incident-file sample_data/incident_01.json \
--prompt-file prompts/find_root_cause_entity_type_prompt_02.mdExpected: "root_cause_entity_type": "internal" (AWS RDS is infrastructure supporting your database)
Third-party service providers (Stripe, PayPal, etc.) are VENDOR:
./find_root_cause_entity_type.py \
--incident-file sample_data/incident_02.json \
--prompt-file prompts/find_root_cause_entity_type_prompt_02.mdExpected: "root_cause_entity_type": "vendor" (Stripe is a third-party service provider)
Client-side issues on user devices/browsers are END_USER:
./find_root_cause_entity_type.py \
--incident-file sample_data/incident_03.json \
--prompt-file prompts/find_root_cause_entity_type_prompt_02.mdExpected: "root_cause_entity_type": "end_user" (Safari iOS browser issue on user device)
The same AWS RDS incident can be categorized differently based on prompt clarity:
Vague Prompt (Incorrect):
./find_root_cause_entity_type.py \
--incident-file sample_data/incident_01.json \
--prompt-file prompts/find_root_cause_entity_type_prompt.mdOutput:
{
"incident_id": "incident_01",
"incident_title": "AWS RDS Database Connection Failures",
"root_cause_entity_type": "vendor",
"reason": "The incident is attributed to AWS RDS, a third-party service provider, experiencing connection timeouts in the us-east-1 region, which directly impacts database connectivity."
}INCORRECT: Categorizes AWS as vendor
Clear Prompt (Correct):
./find_root_cause_entity_type.py \
--incident-file sample_data/incident_01.json \
--prompt-file prompts/find_root_cause_entity_type_prompt_02.mdOutput:
{
"incident_id": "incident_01",
"incident_title": "AWS RDS Database Connection Failures",
"root_cause_entity_type": "internal",
"reason": "The incident is caused by AWS RDS database connection failures in the us-east-1 region, which is part of the cloud provider infrastructure supporting the application. Since cloud provider infrastructure issues are categorized as internal (as they support your infrastructure), this incident falls under the internal category."
}CORRECT: Categorizes AWS as infrastructure supporting your systems
Key Distinction:
- Infrastructure Providers (AWS, Azure, GCP) = INTERNAL (they support YOUR infrastructure)
- Business Service Providers (Stripe, SendGrid, Twilio) = VENDOR (they provide business functionality)
This demonstrates how prompt engineering directly impacts categorization accuracy.
Running the same query multiple times can produce different results due to LLM non-determinism:
First Run:
./correlate_incidents.py \
--query "Redis cluster failover causing connection drops" \
--incidents-file sample_data/incidents_05_without_similar.json \
--prompt-file prompts/incident_similarity_search_prompt.md \
--min-similarity 0.6Output:
{
"similar_incidents": [
{"incident_id": "INC-103", "similarity_score": 0.75},
{"incident_id": "INC-105", "similarity_score": 0.65}
]
}Second Run (same command):
Output:
{
"similar_incidents": []
}This inconsistency is expected LLM behavior, not a bug.
Setting temperature to 0.0 improves consistency (but doesn't guarantee it):
./correlate_incidents.py \
--query "Redis cluster failover causing connection drops" \
--incidents-file sample_data/incidents_05_without_similar.json \
--prompt-file prompts/incident_similarity_search_prompt.md \
--min-similarity 0.6 \
--temperature 0.0Output:
{
"similar_incidents": [
{"incident_id": "INC-103", "similarity_score": 0.75}
]
}More consistent across runs, but temperature 0.0 is NOT a guarantee of determinism - especially with free models. It's the best effort approach for consistency.
Setting temperature too high (2.0) produces gibberish and parse errors:
./correlate_incidents.py \
--query "Redis cluster failover causing connection drops" \
--incidents-file sample_data/incidents_05_without_similar.json \
--prompt-file prompts/incident_similarity_search_prompt.md \
--min-similarity 0.6 \
--temperature 2.0Output: JSON parse errors with random text fragments
Recommendation: Use temperature 0.0-0.2 for structured tasks. With free models, accept some variability as a tradeoff. See TEMPERATURE_GUIDE.md for details.