An agentic AI skill that assigns ICD-10-CM diagnosis codes from clinical documentation using the complete FY2025 code set, alphabetic index, tabular list, and official coding guidelines — following the same 10-step workflow that certified medical coders use.
| Problem | How This Skill Solves It |
|---|---|
| LLMs hallucinate ICD codes from training data | Every code is searched in the official index/tabular XML and validated against the 74K-code FY2025 flat file |
| Black-box code assignment with no audit trail | Each code comes with a rationale, guideline reference, and sequencing justification |
| Missing specificity, laterality, 7th characters | Built-in validation catches incomplete codes, missing placeholders, and Excludes1 conflicts |
| Outdated code sets | Pinned to FY2025 (effective Oct 1 2024), with addenda and conversion table references |
| Inconsistent coding across encounters | Deterministic workflow enforces Index → Tabular → Guidelines for every code |
- 10-step coding workflow mirroring the official coding process
- Alphabetic Index search — XML-parsed, supports main terms, subterms, cross-references, and the External Cause Index
- Tabular List lookup — full hierarchy with Excludes1/2, Code First, Use Additional Code, 7th character definitions
- Code description search — keyword-based search across all 74K+ billable codes
- Code validation — structural checks, billability verification, placeholder X detection, Excludes1 conflict checking
- Chapter-specific coding rules — sepsis, diabetes, hypertension, neoplasms, injuries, burns, poisoning/adverse effects, pregnancy, perinatal, pain, pressure ulcers, SDOH, and more
- Poisoning vs. Adverse Effect vs. Underdosing decision tree
- External cause code support (mechanism, place, activity, status)
- SDOH Z-code guidance (Z55–Z65)
- 12 worked examples covering complex real-world scenarios
icd10cm-codes/
├── SKILL.md # Skill definition — workflow, rules, output format
├── README.md # This file
├── scripts/
│ ├── search_index.py # Search the Alphabetic Index (XML)
│ ├── search_codes.py # Keyword search across all billable codes
│ ├── lookup_tabular.py # Look up a code in the Tabular List (XML)
│ └── validate_code.py # Validate code structure, billability, Excludes1 conflicts
└── references/
├── data/
│ ├── icd10cm-codes-2025.txt # All valid billable codes + descriptions (~74K)
│ ├── icd-10-cm-index-2025.xml # Alphabetic Index to Diseases & Injuries
│ ├── icd-10-cm-eindex-2025.xml # External Cause of Injuries Index
│ └── icd-10-cm-tabular-2025.xml # Tabular List (full hierarchy + notes)
├── guidelines-key-rules.md # Condensed coding guidelines & chapter-specific rules
├── coding-workflow.md # Detailed 10-step workflow + 12 worked examples
└── chapter-overview.md # All 21 chapters with section-level code ranges
- Python 3.8+ (standard library only — no external dependencies)
- An LLM agent runtime that supports tool/skill invocation (e.g., Windsurf Cascade, Claude Code)
Each script can be run standalone:
# Search the Alphabetic Index
python scripts/search_index.py "Pneumonia"
python scripts/search_index.py "Diabetes" "type 2" "kidney"
# Search code descriptions by keyword
python scripts/search_codes.py "chronic kidney" "stage 4"
# Look up a code in the Tabular List
python scripts/lookup_tabular.py "E11.22"
# Validate one or more codes
python scripts/validate_code.py "E11.22" "N18.4" "Z79.4"
# Validate + check Excludes1 conflicts between codes
python scripts/validate_code.py "E11.22" "N18.4" --check-excludes
# Search the External Cause Index
python scripts/search_index.py "Fall" --externalWhen integrated as a skill, the LLM agent automatically:
- Reads clinical notes and extracts conditions
- Searches the Alphabetic Index for candidate codes
- Verifies each code in the Tabular List
- Applies official coding guidelines (sequencing, specificity, combination codes)
- Handles special scenarios (poisoning/adverse effects, external causes, SDOH)
- Validates every code for completeness and billability
- Outputs structured results with rationale and guideline references
Clinical Note
│
▼
┌─────────────────────┐
│ 1. Extract Conditions│ ← Identify diagnoses, symptoms, acuity, laterality, SDOH
└──────────┬──────────┘
▼
┌─────────────────────┐
│ 2. Search Index │ ← search_index.py — find candidate codes via main terms
└──────────┬──────────┘
▼
┌─────────────────────┐
│ 3. Search Codes │ ← search_codes.py — keyword fallback when index path unclear
└──────────┬──────────┘
▼
┌─────────────────────┐
│ 4. Verify Tabular │ ← lookup_tabular.py — check specificity, Excludes, 7th char
└──────────┬──────────┘
▼
┌─────────────────────┐
│ 5. Apply Guidelines │ ← Sequencing, combination codes, etiology/manifestation
└──────────┬──────────┘
▼
┌─────────────────────┐
│ 6. External Causes │ ← Injuries: mechanism, place, activity, status (V/W/X/Y)
│ 6b. Drug Scenarios │ ← Poisoning vs adverse effect vs underdosing
└──────────┬──────────┘
▼
┌─────────────────────┐
│ 7. Validate Codes │ ← validate_code.py — structure, billability, conflicts
└──────────┬──────────┘
▼
┌─────────────────────┐
│ 8. Output Results │ ← Structured table with codes, descriptions, rationale
└─────────────────────┘
All data files are sourced from the official CMS ICD-10-CM FY2025 release (effective October 1, 2024):
- CMS ICD-10-CM Resources
- Alphabetic Index, Tabular List, and code descriptions are XML/text formats published by CMS/NCHS
- Official Guidelines: ICD-10-CM Official Guidelines for Coding and Reporting, FY2025 (October 2024)
| Decision | Rationale |
|---|---|
| XML parsing, not regex | Preserves hierarchical structure (chapter → section → category → code) and inherited notes |
| No ML model for code prediction | Eliminates hallucination; every code is looked up from source data |
| Validation as a separate step | Catches errors before output; can be used standalone for QA |
| Worked examples in docs | Teaches the agent by demonstration, reducing guideline misinterpretation |
| Excludes1 conflict detection | Prevents common coding errors that cause claim denials |
- Not a standalone auto-coder — requires an LLM agent runtime for natural language understanding
- English only — clinical note parsing assumes English documentation
- FY2025 only — code set is pinned; update data files annually for new fiscal years
- No NLP preprocessing — relies on the LLM's ability to extract clinical terms from notes
- No claim/billing integration — outputs codes only; does not generate claims or interface with EHR systems
- Guidelines condensed — the reference files summarize the official guidelines; edge cases may require consulting the full PDF
- Download the new CMS ICD-10-CM release files
- Replace the four files in
references/data/with the new year's versions - Update the addenda and conversion table in the workspace root
- Review
references/guidelines-key-rules.mdfor any guideline changes - Test with
validate_code.pyto confirm new codes are recognized
The ICD-10-CM code set is a public domain work product of the U.S. Department of Health and Human Services (CMS/NCHS). The skill code and documentation in this repository are provided as-is for educational and research purposes.
Disclaimer: This tool is intended to assist — not replace — certified medical coders. All code assignments should be reviewed by qualified coding professionals before use in billing or clinical systems.