The architecture is an early reserach prototype and is still evolving. This is not ready for prime time.
CBL (Controlled Behavioral Language) converts natural-language requirements into verified behavioral specifications. The system uses a three-layer pipeline with strict trust boundaries: an untrusted LLM proposes specifications, deterministic engines verify and refine them, and a human engineer holds final authority.
The output is a well-posed CBL specification that compiles to JSON IR for certified MBPD toolchains (Simulink Coder, SCADE KCG).
flowchart TD
classDef untrusted fill:#f8d7da,stroke:#842029,stroke-width:2px,color:#842029
classDef mitigated fill:#e8daef,stroke:#6c3483,stroke-width:2px,color:#4a235a
classDef verified fill:#d1ecf1,stroke:#0c5460,stroke-width:2px,color:#0c5460
classDef qualified fill:#d4edda,stroke:#155724,stroke-width:2px,color:#155724
classDef certified fill:#e2e3e5,stroke:#495057,stroke-width:2px,color:#383d41
classDef actor fill:#fff3cd,stroke:#856404,stroke-width:2px,color:#856404
classDef gate fill:#fff,stroke:#e67e22,stroke-width:2.5px,stroke-dasharray:5 5,color:#333
classDef agentbox fill:#fef9e7,stroke:#d4ac0d,stroke-width:1.5px,color:#7d6608
classDef invis fill:none,stroke:none
%% ── Row 1: Engineer ──
Engineer(["<b>Engineer</b> <br> Final Authority"]):::actor
%% ── Row 2: Session Orchestrator (center) ──
Session["<b>Session Orchestrator</b> <br> Python · cbl-elicit"]:::actor
Engineer -- "NL requirements" --> Session
Engineer -. "answers / confirmations" .-> Session
Session -. "questions + repairs" .-> Engineer
Session -. "spec.cbl + reports" .-> Engineer
%% ── Row 3: Untrusted (right) + Agents (far right) ──
subgraph UZ ["UNTRUSTED"]
LLM1["<b>LLM — Extract</b> <br> NL text → facts"]:::untrusted
LLM2["<b>LLM — Revise</b> <br> diagnostics + answers → revised facts"]:::untrusted
end
subgraph AZ ["AGENTS (optional)"]
direction LR
AgentD["<b>Domain</b> <br> e.g., domain-fdi"]:::agentbox
AgentP["<b>Pipeline</b> <br> e.g., requirements-analyst"]:::agentbox
end
Session -- "iteration 1 · nl_text" --> LLM1
Session -- "iterations 2+ · nl_text,<br>diagnostics, answers" --> LLM2
AZ -. "injected into<br>system prompt" .-> UZ
%% ── Row 4: Schema Gate ──
Schema(["<b>Schema Gate</b> <br> jsonschema · 3 retries"]):::gate
LLM1 -- "extracted_facts.json" --> Schema
LLM2 -- "revised extracted_facts.json" --> Schema
%% ── Row 5: Provenance Gate ──
Provenance(["<b>Provenance Gate</b> <br> overwrites all LLM tags"]):::gate
Schema -- "valid facts" --> Provenance
%% ── Row 6: Reasoning Engine ──
subgraph VZ ["VERIFIED"]
Prolog["<b>Reasoning Engine</b> <br> Prolog · 13 rules"]:::verified
end
Provenance -- "tagged facts" --> Prolog
Prolog -. "diagnostics, questions, repairs" .-> Session
%% ── Row 7: Verdict Gate ──
VerdictGate(["<b>Verdict Gate</b> <br> fatal on failure"]):::gate
Prolog -- "verdict.json" --> VerdictGate
%% ── Row 8: Compiler ──
subgraph QZ ["QUALIFIED"]
Compiler["<b>CBL Compiler</b> <br> OCaml · well-posedness checker"]:::qualified
end
VerdictGate -- "valid verdict" --> Compiler
Compiler -. "diagnostics" .-> Session
%% ── Row 9: Compilation Gate ──
CompilerGate(["<b>Compilation Gate</b> <br> zero errors"]):::gate
Compiler -- "spec.cbl" --> CompilerGate
%% ── Row 10: Outputs (left: MBPD, right: Mitigations) ──
subgraph CZ ["CERTIFIED"]
MBPD["<b>MBPD Toolchain</b> <br> Simulink Coder · SCADE KCG"]:::certified
end
subgraph MZ ["LLM HALLUCINATION RISK"]
Mitigations["<b>Mitigations</b> <br> Python + Z3 · 7 modules"]:::mitigated
end
CompilerGate -- "qualified artifact" --> MBPD
Session -- "on pass" --> Mitigations
Mitigations -. "reports" .-> Session
| Zone | Components | Trust Basis | Verification Method |
|---|---|---|---|
| Untrusted | LLM (extract + revise) | Statistical model; may hallucinate | All output gated by schema + provenance + downstream checks |
| Agent Instructions | Domain agents, pipeline agents (.agent.md) |
User-authored; injected into LLM system prompt | Content reviewed by engineer; does not bypass validation gates |
| Mitigated | 7 LLM hallucination mitigation modules | Deterministic, no LLM in checking path | 74 unit tests; no statistical components |
| Verified | Prolog reasoning engine | Rule-based, 13 consistency rules | Deterministic; same input always produces same output |
| Qualified | OCaml compiler | Type-safe, exhaustive pattern matching | CompCert lineage; qualified tool assumption (DO-178C §12.2) |
| Certified | MBPD toolchain | Independently certified code generators | Vendor-supplied qualification kits |
| Authoritative | Engineer + audit log | Human judgment; append-only record | Provenance audit log is deterministic and inspectable |
The three validation gates (Schema, Provenance, Verdict) are deterministic Python code within the Session orchestrator (cbl-elicit/session.py). They enforce trust boundaries between zones but do not constitute a separate trust zone.
Three gates prevent untrusted data from reaching trusted components:
-
Schema Gate (LLM → Prolog):
extracted_facts.schema.jsonenforced by Pythonjsonschema. Retries the LLM up to 3 times on failure; aborts if still invalid. Provenance tags constrained to a 7-value enum. -
Provenance Gate (LLM → Prolog): Unconditional. The orchestrator overwrites every provenance tag the LLM assigns. Facts matching engineer text become
user_stated; previously confirmed facts becomeuser_confirmed; everything else is forced tollm_inferred, which triggers Prolog rule R9 (engineer must confirm before the fact is committed). -
Verdict Gate (Prolog → OCaml):
verdict.schema.jsonenforced by Python. Fatal on failure: a malformed Prolog verdict stops the iteration immediately rather than propagating to OCaml.
The 13 consistency rules in cbl-prolog/consistency.pl partition into errors (block compilation) and warnings (flag for review).
| Rule | Name | Description |
|---|---|---|
| R1 | Empty mode | Every mode must have at least one transition. |
| R2 | Guard completeness | Every mode needs an Otherwise clause (syntactic totality). |
| R3 | Action totality | Every transition must assign all guarantees that lack a default. |
| R4 | Invalid target | Transition target must reference a declared mode. |
| R5 | Invalid initial | The declared initial mode must exist. |
| R5b | No initial mode | At least one initial mode must be declared. |
| R6 | Duplicate declarations | No name may appear in more than one namespace. |
| R7 | Undeclared reference | Every name in guards, actions, predicates, or entry actions must be declared. |
| R8 | Unknown constant | Constants with value __unknown__ block compilation. |
| R9 | Unconfirmed inference | Facts with llm_inferred provenance block compilation until confirmed. |
| R9b | Unconfirmed default | Guarantee defaults with llm_inferred provenance are flagged. |
| R10 | Guard exclusivity | Heuristic warning when two guards in the same mode are not syntactically exclusive. |
| R11 | Type checking | Inferred type of a set action must be compatible with the guarantee's declared type. |
| R12 | Reachability | Warn if a mode is unreachable from the initial mode (tabled transitive closure). |
| R13 | Unused declaration | Warn if a declared name is never referenced. |
R5b and R9b are sub-rules of their parents. R14 was removed (entry-action undeclared references are caught by R7).
Seven post-success checks in cbl-elicit/mitigations/, all deterministic (no LLM in the checking path).
| Module | Description |
|---|---|
back_translation |
Renders the CBL spec back to plain English for side-by-side comparison with original requirements. |
dual_extraction |
Runs LLM extraction twice with different prompts; structurally diffs results to flag divergences. |
mutation_testing |
Generates AST-level mutants (flipped guards, swapped targets, dropped assignments) and checks detection. |
pattern_justification |
Validates that every domain-pattern recommendation cites a real requirement with plausible term overlap. |
provenance_control |
Assigns provenance tags deterministically from the orchestrator; maintains append-only audit log. |
traceability |
Verifies every fact cites a real requirement fragment and every requirement is covered (surjectivity). |
z3_guard_checker |
Uses Z3 SMT queries for guard exclusivity and completeness, strengthening Prolog R2/R10. |
All inter-layer communication uses JSON with explicit schemas.
| Artifact | Producer | Consumer | Schema |
|---|---|---|---|
extracted_facts_N.json |
LLM (Layer 1) | Prolog (Layer 2) | cbl-elicit/schema/extracted_facts.schema.json |
verdict_N.json |
Prolog (Layer 2) | OCaml (Layer 3) | cbl-elicit/schema/verdict.schema.json |
spec.cbl |
OCaml (Layer 3) | Engineer / MBPD | CBL grammar (EBNF in compiler) |
provenance_audit.json |
Orchestrator | Auditor | Append-only log; timestamped entries |
Every semantic value in extracted_facts.json is wrapped as {value, provenance}. The orchestrator (not the LLM) assigns these tags:
| Tag | Meaning | Assigned By |
|---|---|---|
user_stated |
Engineer wrote this in original requirements | Orchestrator (NL fragment match) |
user_confirmed |
Engineer explicitly approved | Orchestrator (confirmation tracking) |
llm_inferred |
LLM proposed, not yet confirmed | Orchestrator (default for all LLM output) |
default_assumed |
System default (e.g., initial values) | Orchestrator |
rule_derived |
Prolog derived from confirmed premises | Prolog engine |
rule_derived_pending |
Prolog derived from unconfirmed premises | Prolog engine |
user_rejected |
Engineer rejected | Orchestrator |
A well-posed CBL specification defines a Mealy machine:
| Symbol | Meaning |
|---|---|
| Finite set of modes (states) | |
Input space (all Assumes variables) |
|
Internal state (Variables + synthesized counters) |
|
Output space (all Guarantees variables) |
|
| Initial mode | |
| Initial internal state | |
| Transition function | |
| Output function |
Well-posedness conditions (all enforced by the OCaml checker):
- Guard exclusivity: At most one guard fires per mode and input.
- Guard completeness: At least one guard fires (including Otherwise).
- Action totality: Every transition assigns all guaranteed outputs.
Theorem: A well-posed CBL specification yields a deterministic, total transition system.
- LLM is untrusted: It may hallucinate facts, invent requirements, or assign false provenance. All output is gated.
- Provenance enforcement is unconditional: No configuration can bypass it. The orchestrator overwrites all LLM provenance tags on every iteration.
- OCaml compiler is a qualified tool: Its acceptance is the final structural gate before downstream certified code generation.
- Engineer confirmation is authoritative: Recorded in the append-only audit log. Prolog treats confirmed facts as committed.
- Downstream MBPD tools are independently certified: Simulink Coder and SCADE KCG carry their own DO-178C / ISO 26262 qualification.
- Pre-enforcement JSON is preserved:
extracted_facts_N.jsonwritten towork_dirbefore provenance enforcement, enabling post-hoc audit of raw LLM output.
- Replace the Mermaid container diagram with a manually drawn image and embed it here.
- Prolog input validation:
bridge.pldoes not schema-validate its JSON input. Not exploitable through the orchestrator (Python validates first), but matters if Prolog is invoked standalone. - OCaml schema validation:
nlp_bridge.mluses ad-hoc field parsing rather than JSON Schema. The well-posedness checker provides stronger semantic validation. - Language consolidation: The repo currently uses three in-repo languages (Python, Prolog, OCaml). The 13 Prolog consistency rules could be absorbed into the OCaml compiler as a
reasoningmodule, reducing to two languages aligned with the trust boundary: Python (orchestrator, untrusted/mitigated layer) and OCaml (everything deterministic, verified/qualified layer).
The following shows one complete iteration through the pipeline, including all validation gates and the engineer interaction points.
sequenceDiagram
participant E as Engineer
participant O as Orchestrator
participant L as LLM (untrusted)
participant S as Schema Gate
participant P as Provenance Gate
participant R as Prolog (13 rules)
participant V as Verdict Gate
participant C as Compiler (OCaml)
participant M as LLM Hallucination Mitigations
E->>O: NL requirements
rect rgba(248,215,218,0.4)
Note over L: UNTRUSTED
O->>L: extract or revise
L-->>O: extracted_facts
end
rect rgba(255,243,205,0.4)
Note over S,P: VALIDATION
O->>S: validate(facts)
alt invalid (up to 3 retries)
S-->>O: errors
O->>L: revise
L-->>O: corrected facts
O->>S: re-validate
end
S-->>O: valid
O->>P: enforce_provenance
Note over P: Overwrite all tags · append audit log
P-->>O: tagged facts
end
rect rgba(209,236,241,0.4)
Note over R: VERIFIED
O->>R: extracted_facts_N.json
R-->>O: verdict.json
end
rect rgba(255,243,205,0.4)
Note over V: VALIDATION
O->>V: validate(verdict)
alt invalid
V-->>O: fatal · iteration stops
end
V-->>O: valid
end
alt questions or repairs
O->>E: diagnostics, questions, repairs
E-->>O: confirmations
Note over O: Loop to next iteration
end
alt pass
rect rgba(212,237,218,0.4)
Note over C: QUALIFIED
O->>C: verdict.json
C-->>O: spec.cbl + JSON IR
end
alt checker passes
rect rgba(232,218,239,0.4)
Note over M: LLM HALLUCINATION MITIGATIONS
O->>M: run all checks
M-->>O: reports
end
O->>E: spec.cbl + reports
else errors
Note over O: Loop with diagnostics
end
end
- Extract: LLM produces
extracted_facts(or revises based on prior diagnostics). - Schema validate: Python
jsonschemaenforces structure. Up to 3 retries on failure; abort if exhausted. - Provenance enforce: Orchestrator overwrites all provenance tags. Audit log appended.
- Prolog reason: 13 consistency rules fire. Repairs and questions generated. Facts partitioned into committed vs. pending.
- Verdict validate: Schema enforced. Fatal on failure.
- Engineer interact: Diagnostics, questions, and repair proposals presented. Engineer confirms or rejects.
- OCaml compile (on pass): Well-posedness checker runs. If errors, diagnostics feed back into next iteration.
- LLM Hallucination Mitigations: Back-translation, traceability, Z3 guard analysis, mutation testing, pattern justification.
- Stall detection: If 2 consecutive iterations show no improvement, engineer offered manual resolution or abort.
- Termination: Success when OCaml passes, or max iterations (default 10) exhausted.