Goal: Represent and translate the given input (federal, state, and local government policy documents and code from legacy systems) into specs (intermediate representations of rulesets), following a Rules-as-Code (RaC) approach. To create the output, the specs are used to build the modernized system, quickly building much of the UI workflows to gather the input needed to run the ruleset.
- Incremental Approach: Work on one program or rule set topic at a time, building confidence before expanding scope.
- Version Control: Commit specs after each logical milestone to track evolution and enable rollback.
- AI Collaboration: Use AI to accelerate translation, but human review ensures accuracy and policy compliance.
- Testing Focus: Comprehensive tests prevent regressions and document expected behavior.
This project treats the extracted ruleset (policies, workflows, decision logic, definitions) as a first-class code artifact rather than static documentation. Doing so provides the same rigor, traceability, and reliability we expect from software code.
- Every change is tracked and reviewable.
- Historical versions can be compared or restored.
- Rule evolution is transparent and auditable.
- Updates go through a PR review process.
- Logic changes are visible and discussable.
- Reduces silent drift and unintended consequences.
- Tests encode expected behavior and edge cases.
- Regressions are caught automatically.
- Behavior becomes executable documentation.
- Refactors are safer and verifiable.
- New test cases are added as rules are added.
- Gaps surface via failing tests.
- Accuracy improves through small, controlled updates.
- Supports validation, transpilation, and deployment.
- Potential integration with CI/CD.
Progress in Terms of Outcomes, Not Outputs (Outputs are activity. Outcomes are value.)
This rules extractor initiative:
- Outcome: Save meaningful time for project teams.
- Leading indicator: Accuracy of automatically generated rules.
- Validation loop: Early and frequent feedback from the AK team about what matters most.
- Is this testing functional feasibility?
- Yes. This prototype tests if Claude Code can be used to extract and maintain the
specsof a system (ruleset, workflows, documentation, etc.). - It treats the
specsas code, so the specs are version-controlled and there are tests for the rulesets (and other parts of thespecs) to ensure they behave as intended when run in a rules engine.
- Yes. This prototype tests if Claude Code can be used to extract and maintain the
- Is this testing experiential value?
- Yes. From this prototype, we'll learn can and can't be done with Claude Code, and how well an iterative approach works. The hypothesis is: incrementally building up the ruleset produces a more accurate and verifiable ruleset than generating and iterating on an entire ruleset.
- Is this testing look and feel?
- No. User interface and visual design are not the focus of this prototype. However, the experience of interacting with the AI and the IDE (e.g., navigating files, hover-tiggered DSL documentation) will inform ideas about user experience and AI interaction patterns.
- Is this testing performance or scale?
- Yes, it is testing extracted-ruleset accuracy and it will test performance as the policy/ruleset size grows.
- What is the primary (and secondary) purpose of this prototype?
- Primary purpose: explore iterative approach of building and maintaining the
specs. A desired outcome is that thespecsare easier to build incrementally. - Secondary purpose: explore capabilities and limitations of Claude Code on a codebase of
specscontaining files in an atypical language.
- Primary purpose: explore iterative approach of building and maintaining the
- Are we building a foundation we can develop into a product?
- Yes. The lessons learned (e.g., capabilities, incremental approach, test-driven validation of rules, transpilation/conversion to target languages) can inform the requirements and design of a product.
- Are we delivering a win to a client / project team?
- TBD. We're testing initially with the Alaska team.
- Are we trying to prove or disprove a specific piece of the approach?
- Yes, we are explicitly testing the hypothesis that incrementally building a ruleset results in a more accurate and verifiable outcome than generating a full ruleset and refining it afterward.
- Are we trying to demonstrate our approach for a potential / current client?
- Yes, the prototype aims to demonstrate that an AI-assisted workflow can produce rulesets that are accurate, verifiable, and maintainable.
- Why a custom DSL?
- Easier to customize for our evolving purposes
- Understand the degress of expressiveness needed for ruleset generation, which will help decide on a rules engine
- Versatile output: can be converted/transpiled to different output languages like Rego for OPA, OpenFisca, and natural language for humans
- Easier to adapt for input to other tools (e.g., ruleset comparison against ruleset-from-code, subset-of-rules extraction for a subtopic)
This project takes an incremental approach where each iteration involves the user (an SME on the policies) to perform the following:
- The user adds
inputdocs and code in manageable-sized amounts of policy docs- The codebase contains the input and there is no context window (and hence no limit).
- The AI searches the codebase for the data it needs (similar to RAG but without a vector DB).
- The user interacts with an AI to update the
specs. Once satisfied, the specs are committed into git for version control.- The user interacts with the AI to create/update the specs (ruleset, workflows, etc.) in manageable amounts.
- The specs are in a DSL format that will evolve over time.
- The specs are machine-readable so that it can be used to build the UI workflows.
- Tests for updated specs are added by an AI and verified by the user to ensure future changes do not cause a regression.
- Once a logical set of rules are captured, the user guides the AI to generate
output, including the ruleset and code to get end-user input and run the ruleset on a given rules engine.- A transpiler or converter may be needed to create the output ruleset so that it is usable by the modern system.
Each policy domain is a self-contained unit under domains/<name>/:
domains/
snap/ ← example: SNAP federal income eligibility
input/policy_docs/ ← source policy documents (Markdown, PDF, etc.)
specs/
eligibility.civil.yaml ← CIVIL DSL ruleset (human + AI authored)
tests/eligibility_tests.yaml ← test cases
output/
eligibility.rego ← generated OPA/Rego (do not edit manually)
demo/
main.py ← FastAPI backend
static/index.html ← browser form UI
start.sh ← starts OPA + FastAPI
<next-domain>/ ← add new domains here
input/ specs/ output/
core/
ruleset_schema.yaml ← shared CIVIL DSL schema reference
tools/
validate_civil.py ← CIVIL validator
transpile_to_opa.py ← CIVIL → OPA/Rego transpiler
run_tests.py ← test runner
x.py ← CLI entry point (all pipeline actions)
x ← shell shim: exec python x.py "$@"
- Add policy documents to
domains/<name>/input/policy_docs/ - Use the
extract-rulesetcommand (.claude/commands/extract-ruleset.md) to extract CIVIL specs with AI assistance
- Work with AI to create
domains/<name>/specs/<module>.civil.yaml - Follow the CIVIL DSL schema at
core/ruleset_schema.yaml - Commit completed specs to version control
- AI generates
domains/<name>/specs/tests/<module>_tests.yaml - Review and verify test scenarios; add edge cases and boundary conditions
./x transpile <domain> <module>generatesdomains/<name>/output/<module>.rego
./x test <domain> <module>starts OPA automatically, runs tests, and stops OPA./x demo <domain> <module>starts the demo (OPA + FastAPI)- Iterate on specs as needed
./x pipeline snap eligibility # validate + transpile + test (OPA managed automatically)
./x demo snap eligibility # start OPA + FastAPI demo at http://localhost:8000
./x list # show all available domain/module pairspython x.py setup # install uv, create .venv, install deps, install OPAAfter setup, use ./x directly (the shell shim activates the right Python).
flowchart TD
subgraph input
policy_docs@{shape: docs} --> legacy_code
legacy_code@{shape: procs}
verified_artifacts
end
policy_docs --> Extractor1[[Extractor1]] --> ruleset
legacy_code --> Extractor2[[Extractor2]] --> specs
subgraph specs
ruleset & workflows & artifacts
end
specs <--correct?--> verify[/verify/]
subgraph specs_testing
tester_rule_engine[[Rule Engine]]
ruleset --transpile?--> tester_rule_engine
test_cases([test_cases]) --> tester_rule_engine --> expected_results([expected_results])
tester_rule_engine --> explanation([explanation])
end
ruleset ---> Transpiler[[Transpiler]] --> ruleset2[ruleset]
workflows & artifacts ---> Coder[[Coder]] --> webforms
subgraph output["output (modern_system)"]
ruleset2 --> rule_engine[[Rule Engine]] <--> code <--> webforms
end
- One incarnation of
Extractor1is the Policy Extraction (doc-to-logic) prototype Extractor2will likely use AWS Transform, which also produces documentation, which would be included as part of the specs and can be used as input to the Coder.- Another option is to include verified output from AWS Transform (noted as
verified_artifacts) as part of theinput.
- Another option is to include verified output from AWS Transform (noted as
Not yet in the diagram:
- There can be multiple specs that can be compared to identify differences between systems (legacy vs legacy; modern vs modern; legacy vs modern).
- Validating the
modern_systemagainst thelegacy_system