Skip to content

Latest commit

 

History

History
484 lines (393 loc) · 16.7 KB

File metadata and controls

484 lines (393 loc) · 16.7 KB

Canon: AI-Native Programming Language Implementation Plan

Executive Summary

Canon is an AI-written, machine-verified, human-optional programming system. It's not a human-first language—it's a canonical typed semantic IR that AIs author, with humans providing intent and policies while shipping artifacts are produced through verified compilation.

This plan synthesizes the V1 specification with research on effect systems (Koka, Unison, Effekt), capability security (Deno, WASI, E language), modern IRs (MLIR, Cranelift), and AI-oriented code representations.


Why Canon is Different (Competitive Analysis)

Aspect Traditional Languages AI Code Assistants Canon
Primary author Humans AI suggests, human decides AI authors, machine verifies
Side effects Implicit or opt-in tracking Untracked Mandatory, typed, capability-gated
Security model Ambient authority None Object-capability, no ambient authority
Verification External tools None Built into compilation pipeline
Provenance Git blame None First-class AI metadata with confidence
Determinism Often nondeterministic N/A Guaranteed: same input → same output
Policy enforcement External/optional None Mandatory policy gates (OPA/Rego)

Key differentiators:

  1. AI-native: JSON format optimized for LLM consumption/generation
  2. Effect tracking: Every side effect is declared and authorized
  3. Capability security: No ambient authority—code can only touch what it's explicitly granted
  4. Mandatory verification: Compilation fails if types, effects, or policies are violated
  5. AI provenance: Every node tracks its origin, confidence, and alternatives

Research Insights That Refine the V1 Spec

1. Effect System Design (from Koka/Unison research)

V1 Spec: Effects declared on functions, superset checking for callers.

Research Enhancement:

  • Use row-polymorphic effects (Koka-style) with | e open rows for composition
  • Effect inference within components, explicit only at boundaries
  • Distinguishing effect categories matters:
    • Control effects (async, exception) - composable
    • Resource effects (fs, net) - require capabilities
    • Observation effects (log, trace) - often permitted widely

Recommendation: Keep V1 superset checking but add open effect rows for future V2.

2. Capability Model (from Deno/WASI/E research)

V1 Spec: Capabilities as unforgeable refs, effects need cap refs.

Research Enhancement:

  • Handle-based (WASI-style) is proven in production
  • Revocable by default - critical for AI safety (abort runaway tools)
  • Faceted views (E-style) - create read-only tool views
  • Common failure: Capability escape via serialization - caps must be opaque

Critical Addition: Add revocable: bool to CapabilityDef.

3. IR Design (from MLIR/Cranelift research)

V1 Spec: JSON IR with dialects, stable core.

Research Enhancement:

  • Content-addressing for caching (hash-based node IDs)
  • Version in every document from day one
  • Forward compatibility: unknown fields preserved, not rejected
  • JSON Lines for streaming large programs

Key Insight from MLIR: The N² dialect conversion problem. Small core dialect set is correct.

4. AI-Friendliness (from research)

What makes IRs AI-friendly:

  • Unambiguous structure (no implicit context)
  • Deterministic serialization (same input → same output)
  • Token-efficient but readable
  • Rich metadata without schema changes
  • First-class uncertainty - alternatives, confidence scores

Revised Architecture

┌─────────────────────────────────────────────────────────────────┐
│                       Canon Ecosystem                            │
├─────────────────────────────────────────────────────────────────┤
│  User Intent          │   Policies (OPA/Rego)                   │
│  └─ Natural language  │   └─ Safety gates                       │
│     prompts stored    │   └─ Domain allowlists                  │
│     in IR             │   └─ Data classification                │
├─────────────────────────────────────────────────────────────────┤
│                        Canon Core                                │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ Dialects:                                                │    │
│  │  core/1.0     - types, functions, control flow          │    │
│  │  effect/1.0   - effect declarations + checking          │    │
│  │  cap/1.0      - capability model + grants               │    │
│  │  resource/1.0 - databases, queues, secrets, endpoints   │    │
│  │  policy/1.0   - policy blocks (Rego evaluation)         │    │
│  │  test/1.0     - property tests + golden tests           │    │
│  │  ai/1.0       - provenance, confidence, alternatives    │    │
│  └─────────────────────────────────────────────────────────┘    │
├─────────────────────────────────────────────────────────────────┤
│  Compiler Pipeline                                               │
│  Parse → Validate → TypeCheck → EffectCheck → PolicyCheck       │
│  → TestSynth → Codegen → HermeticBuild                          │
├─────────────────────────────────────────────────────────────────┤
│  Codegen Targets                                                 │
│  └─ TypeScript (V1 primary)                                     │
│  └─ Rust (V1 secondary)                                         │
│  └─ Wasm Component (V1.5)                                       │
└─────────────────────────────────────────────────────────────────┘

Implementation Plan

Phase 0: Project Setup

Files to create:

  • /README.md - Project overview
  • /go.mod - Go module
  • /Makefile - Build commands
  • /.github/workflows/ci.yml - CI pipeline

Phase 1: Schema & Core Types

Deliverables:

  1. schemas/canon.v1.schema.json - JSON Schema
  2. pkg/ir/types.go - Go types matching schema
  3. pkg/ir/load.go - JSON loading with validation
  4. pkg/ir/canonical.go - Canonical serialization (stable ordering)

Key Files:

  • /schemas/canon.v1.schema.json
  • /pkg/ir/types.go
  • /pkg/ir/load.go

Phase 2: Validation Engine

Deliverables:

  1. Schema validation against JSON Schema
  2. Reference integrity checking
  3. Type checking for core ops
  4. Effect/capability enforcement

Key Files:

  • /pkg/validate/schema.go
  • /pkg/validate/references.go
  • /pkg/validate/typecheck.go
  • /pkg/validate/effects.go

Validation Rules:

1. Schema: Valid against canon.v1.schema.json
2. References:
   - All component refs in tests exist
   - All node inputs refer to existing nodes
   - All effect/cap refs resolve
3. Types:
   - param node type matches function param
   - const node type matches value
   - call: intrinsic signature matches
   - effect.invoke: matches effect signature
4. Effects/Caps:
   - effect.invoke requires effect in function.effects
   - effect.invoke requires cap in function.caps

Phase 3: Policy Engine (OPA Integration)

Deliverables:

  1. Metadata extraction for Rego input
  2. OPA evaluation wrapper
  3. Default safety policies

Key Files:

  • /pkg/extract/metadata.go
  • /pkg/policy/opa.go
  • /policies/default.rego
  • /policies/safety.rego

Default Policies:

- No network except allowlisted domains
- No filesystem writes (V1)
- No nondeterministic rand/time unless declared
- No logging secrets/PII (tag-based)
- All HTTP calls require timeout + retry bounds
- All DB queries must be parameterized

Phase 4: TypeScript Codegen

Deliverables:

  1. Code generator for TypeScript
  2. Runtime library (effect invocation, assertions)
  3. Generated module structure

Key Files:

  • /pkg/codegen-ts/gen.go
  • /pkg/codegen-ts/templates/
  • /pkg/runtime-ts/effects.ts
  • /pkg/runtime-ts/assert.ts
  • /pkg/runtime-ts/runner.ts

Code generation rules:

param       → const name = input["name"]
const       → inline literal
call        → function call or intrinsic
effect.invoke → await host.invoke({effectName, attrs, input})
if          → ternary or if statement
struct.make → object literal
list.map    → array.map()

Phase 5: Test Runner

Deliverables:

  1. Golden test executor
  2. Effect mocking framework
  3. Test result reporting

Key Files:

  • /pkg/testing/golden.go
  • /pkg/testing/mocks.go
  • /pkg/testing/report.go

Phase 6: CLI Assembly

Deliverables:

  1. canon validate <file>
  2. canon policy <file>
  3. canon gen <file> --target ts --out <dir>
  4. canon test <file>
  5. canon build <file>

Key Files:

  • /cmd/canon/main.go
  • /cmd/canon/validate.go
  • /cmd/canon/policy.go
  • /cmd/canon/gen.go
  • /cmd/canon/test.go
  • /cmd/canon/build.go

Phase 7: Examples & Documentation

Deliverables:

  1. Hello example (pure, no effects)
  2. HTTP service example (net effect + cap)
  3. Policy examples (pass/fail scenarios)

Key Files:

  • /examples/hello/hello.canon.json
  • /examples/http-service/service.canon.json
  • /examples/policy-demo/

AI Provenance Dialect (ai/1.0) - MANDATORY

Every Canon program MUST include provenance tracking at program and node levels.

Program-level provenance (required):

"provenance": [
  {
    "actor": "claude-opus-4.5",
    "actor_version": "20251101",
    "ts": "2025-01-05T12:00:00Z",
    "action": "generate",
    "intent_ref": "i1",
    "note": "Initial generation from user intent"
  }
]

Node-level AI metadata (required for AI-generated nodes):

{
  "id": "n_join1",
  "op": "call",
  "ai_meta": {
    "generated_by": "claude-opus-4.5",
    "confidence": 0.95,
    "reasoning": "String concatenation is the obvious choice for greeting construction",
    "alternatives": [
      {
        "op": "call",
        "attrs": {"callee": "core.format"},
        "confidence": 0.3,
        "note": "Format string approach - more flexible but overkill for simple greeting"
      }
    ],
    "human_reviewed": false,
    "review_required": false
  }
}

AI Metadata Schema:

"AIMeta": {
  "type": "object",
  "required": ["generated_by", "confidence"],
  "properties": {
    "generated_by": {"type": "string"},
    "confidence": {"type": "number", "minimum": 0, "maximum": 1},
    "reasoning": {"type": "string"},
    "alternatives": {"type": "array"},
    "human_reviewed": {"type": "boolean", "default": false},
    "review_required": {"type": "boolean", "default": false}
  }
}

Validation rules for ai/1.0:

  1. Program must have at least one provenance entry
  2. AI-generated nodes (actor != "human") must have ai_meta
  3. Nodes with confidence < 0.7 must have review_required: true
  4. alternatives array entries must be valid node shapes

Design Decisions (Resolved)

Decision Choice Rationale
Implementation Language Go Faster prototyping, excellent CLI tooling
Wasm Output Defer to V1.5 Keep V1 scope focused on TypeScript
AI Provenance Mandatory ai/1.0 Core differentiator for AI-native IR
Human Syntax JSON only AIs don't need human syntax; JSON is toolable
Policy Engine OPA/Rego Cedar deferred to V2

File Structure (Complete)

canon/
├── README.md
├── go.mod
├── Makefile
├── cmd/
│   └── canon/
│       ├── main.go
│       ├── validate.go
│       ├── policy.go
│       ├── gen.go
│       ├── test.go
│       └── build.go
├── pkg/
│   ├── ir/
│   │   ├── types.go
│   │   ├── load.go
│   │   └── canonical.go
│   ├── validate/
│   │   ├── schema.go
│   │   ├── references.go
│   │   ├── typecheck.go
│   │   └── effects.go
│   ├── extract/
│   │   └── metadata.go
│   ├── policy/
│   │   └── opa.go
│   ├── codegen-ts/
│   │   ├── gen.go
│   │   └── templates/
│   └── testing/
│       ├── golden.go
│       ├── mocks.go
│       └── report.go
├── runtime-ts/
│   ├── effects.ts
│   ├── assert.ts
│   └── runner.ts
├── schemas/
│   └── canon.v1.schema.json
├── policies/
│   ├── default.rego
│   └── safety.rego
└── examples/
    ├── hello/
    │   └── hello.canon.json
    └── http-service/
        └── service.canon.json

Success Criteria

V1 is complete when:

  1. canon validate examples/hello/hello.canon.json passes
  2. canon policy examples/hello/hello.canon.json passes with default policies
  3. canon gen examples/hello/hello.canon.json --target ts --out ./out produces valid TypeScript
  4. canon test examples/hello/hello.canon.json runs golden tests successfully
  5. An example with net.http effect + capability demonstrates the effect/cap system
  6. A policy violation (unauthorized domain) is caught and reported

Next Steps After V1

  • V1.5: Wasm Component Model output, Rust codegen target
  • V2: Full effect handlers (not just checking), loops, pattern matching
  • V2.5: ML/GPU dialect, Cedar policy integration
  • V3: Theorem proving hooks, model checking integration

Detailed Implementation Tasks

Milestone 1: Foundation (Core IR + Schema)

  • Create Go module with project structure
  • Define JSON Schema with AI provenance fields
  • Implement IR type definitions in Go
  • Implement JSON loader with validation
  • Implement canonical serialization (stable key ordering)
  • Write unit tests for IR loading/serialization
  • Create hello.canon.json example with AI metadata

Milestone 2: Validation Engine

  • Implement JSON Schema validation
  • Implement reference integrity checker
  • Implement type checker for core ops (param, const, call, effect.invoke)
  • Implement effect/capability enforcement
  • Implement AI provenance validation (confidence thresholds, required fields)
  • Create validation error reporting with line numbers
  • Write comprehensive validation tests

Milestone 3: Policy Engine

  • Integrate OPA Go library
  • Implement metadata extraction for Rego input
  • Create default safety policies
  • Implement policy evaluation in pipeline
  • Create policy violation examples for testing
  • Write policy engine tests

Milestone 4: TypeScript Codegen

  • Design generated code structure
  • Implement node-to-TS translation for each op
  • Create runtime library (effects.ts, assert.ts, runner.ts)
  • Implement file generation with canonical formatting
  • Generate test harness from TestSpec
  • Write codegen tests

Milestone 5: CLI Assembly

  • Implement canon validate command
  • Implement canon policy command
  • Implement canon gen command
  • Implement canon test command
  • Implement canon build command
  • Add --verbose, --json output flags
  • Write CLI integration tests

Milestone 6: Examples + Documentation

  • Complete hello example with golden tests
  • Create HTTP service example with effects/capabilities
  • Create policy violation examples
  • Write README with usage examples
  • Document schema and dialect specifications
  • Create troubleshooting guide

Dependencies (External)

  • Go 1.22+: For generics and modern features
  • OPA: github.com/open-policy-agent/opa - Policy engine
  • JSON Schema: github.com/santhosh-tekuri/jsonschema - Schema validation
  • TypeScript 5.x: For codegen target
  • ts-node: For test execution