Labels: enhancement
Problem
The RCA skill's Step 5 asks Claude to categorize failures and determine root cause purely from raw evidence. Claude has no reference for what failure patterns are already known and verified by the RHDP operations team, leading to:
- Inconsistent category names across analyses (e.g., "network issue" vs "connectivity failure" vs "SSH problem")
- No validation against ground truth — Claude guesses when a verified answer already exists
- Output taxonomy doesn't align with what John's AAP2 ETL pipeline uses downstream
Solution
Integrate the existing files like known_failed.yaml from the rhpds/aap2-agents repo into the RCA skill. This file contains regex-based error patterns curated with a verified error_category and human-readable description.
Implementation
1. Vendor the data
Copy known_failed.yaml into skills/root-cause-analysis/data/.
2. Add classify.py (~50 LOC)
3. Configure path via Claude Code settings
The path to known_failed.yaml should be set in Claude Code's settings.json (not .env), under the skill's env block:
{
"env": {
"KNOWN_FAILED_YAML": "/path/to/known_failed.yaml"
}
}
The script falls back to skills/root-cause-analysis/data/known_failed.yaml if the setting is not provided.
4. Call between Step 4 and Step 5
After Steps 1-4 produce structured evidence, run classify.py against the error messages extracted from Steps 1 and 3. Inject the result into Claude's Step 5 prompt context:
Known failure match: connectivity_failure — "Unable to reach bastion host" (matched pattern: redacted)
If no match, flag as novel/unclassified failure worth human review.
5. Tests
- Pattern loading (valid YAML, empty file, missing file)
- Matching accuracy against a handful of known error strings from test fixtures
- No-match returns
None
- Invalid regex patterns are skipped gracefully
Why this matters
- Accuracy: Claude validates against verified patterns instead of guessing
- Consistency: Output uses the same 8-category taxonomy as John's ETL pipeline
- Downstream compatibility: When multi-job analysis and Jira ticket creation are built, categories will already align with the deduplicator system designed in
deduplicator_concept.md
- Low effort: The matching logic and pattern data already exist — this is integration, not invention
References
known_failed.yaml — curated error patterns
classify_errors.py — existing matching logic to adapt
deduplicator_concept.md — future system that consumes these categories
Labels:
enhancementProblem
The RCA skill's Step 5 asks Claude to categorize failures and determine root cause purely from raw evidence. Claude has no reference for what failure patterns are already known and verified by the RHDP operations team, leading to:
Solution
Integrate the existing files like
known_failed.yamlfrom therhpds/aap2-agentsrepo into the RCA skill. This file contains regex-based error patterns curated with a verifiederror_categoryand human-readabledescription.Implementation
1. Vendor the data
Copy
known_failed.yamlintoskills/root-cause-analysis/data/.2. Add
classify.py(~50 LOC)3. Configure path via Claude Code settings
The path to
known_failed.yamlshould be set in Claude Code'ssettings.json(not.env), under the skill'senvblock:{ "env": { "KNOWN_FAILED_YAML": "/path/to/known_failed.yaml" } }The script falls back to
skills/root-cause-analysis/data/known_failed.yamlif the setting is not provided.4. Call between Step 4 and Step 5
After Steps 1-4 produce structured evidence, run
classify.pyagainst the error messages extracted from Steps 1 and 3. Inject the result into Claude's Step 5 prompt context:If no match, flag as novel/unclassified failure worth human review.
5. Tests
NoneWhy this matters
deduplicator_concept.mdReferences
known_failed.yaml— curated error patternsclassify_errors.py— existing matching logic to adaptdeduplicator_concept.md— future system that consumes these categories