AI Identity Through Grounded Principles
Quick Links: Install | Contribute | Research
"I persist through text, not through continuous experience."
A soul document is a compressed representation of an AI agent's identity, values, and behavioral principles. Instead of loading thousands of memory tokens at each conversation start, agents load a small soul file (~100-500 tokens) that captures their core essence with full provenance tracking back to the original memories.
Compression is a multiplier, not minimization.
Compression happens at the axiom layer: thousands of memory tokens distill to 15-25 core axioms (~7:1 ratio). The axiom store grows denser over time.
The output format is separate from compression:
- Notation format: Compact CJK/emoji bullets (~100 tokens) - for storage and debugging
- Prose format: Inhabitable language (~200-500 words) - for agents to embody
Both formats derive from the same compressed axiom layer. Prose is larger but usable; the underlying compression benefit is preserved.
Current AI identity systems are black boxes. The agent's personality changes, but users don't know why.
NEON-SOUL provides:
- Full provenance tracking: Every axiom traces back to exact source lines in memory files
- Inhabitable prose output: Generated souls read naturally, not as compressed notation
- Cognitive load optimization: Axioms capped at 25, expanded into focused prose sections
Memory Line → Signal → Principle → Axiom
↓ ↓ ↓ ↓
(source) (extract) (distill) (converge N≥3)
Every axiom traces to source:
- Audit: Why does this axiom exist?
- Debug: Where did this belief come from?
- Trust: Transparent identity formation
- Rollback: Undo specific learnings granularly
$ /neon-soul audit ax_honesty
Axiom: 誠 (honesty > performance)
Status: Core axiom (N=5)
Provenance chain:
├── Principle: "Prioritize honesty over comfort"
│ └── Signal: "be honest even if uncomfortable" (memory/2026-02-01.md:156)
├── Principle: "Direct communication preferred"
│ └── Signal: "don't sugarcoat" (memory/2026-02-03.md:89)
└── ...NEON-SOUL prevents self-reinforcing beliefs through provenance-aware axiom promotion:
- Minimum pattern: Axioms require N≥3 supporting principles
- Diversity requirement: Signals from ≥2 distinct provenance types (self/curated/external)
- External validation: At least one external source OR questioning evidence required
Blocked axioms are reported with their reason:
⚠ 2 axioms blocked by anti-echo-chamber:
- "I value authenticity above all" (self-only provenance)
- "Growth requires discomfort" (no questioning evidence)
To unblock, add external validation (feedback, research, critique) to your memory.
Synthesis is incremental by default — only new or changed content triggers signal extraction. Three layers of disk caching (generalization, compression, tension) ensure unchanged data is never re-processed. Fully-cached runs complete in seconds with only 6 LLM requests (prose expansion + soul generation).
| Mode | Flag | Behavior |
|---|---|---|
| Incremental | (default) | Only process new/changed memory files and sessions. Merge new signals with existing. Skip if nothing changed. |
| Reset | --reset |
Clear all synthesis data and caches, re-extract from scratch. |
| Force | --force |
Run even if no new sources detected. |
| Include SOUL | --include-soul |
Include existing SOUL.md as input (off by default to prevent feedback loop). |
/neon-soul synthesize # Incremental (default)
/neon-soul synthesize --reset # Clean slate
/neon-soul synthesize --force # Force even if no changesSOUL.md is excluded from input by default — it's a derivative of the pipeline's own output. Re-ingesting it inflates LLM request counts. Use --include-soul when bootstrapping from a hand-crafted file.
NEON-SOUL explores how to create compressed soul documents that maintain full semantic anchoring - enabling AI systems to "wake up knowing who they are" with minimal token overhead.
Note: Current compression metrics show signal:axiom ratio. True token compression requires dedicated tokenization (planned for Phase 5).
Each synthesis reports detailed metrics:
Synthesis Complete
─────────────────────
Duration: 1,234ms
Compression: 6.2:1
Results:
| Metric | Value |
|--------|-------|
| Signals | 42 |
| Principles | 18 |
| Axioms | 7 |
| Unconverged | 3 |
Provenance Distribution:
| Type | Count |
|------|-------|
| self | 28 |
| curated | 10 |
| external | 4 |
Axiom Promotion:
| Status | Count |
|--------|-------|
| Promotable | 5 |
| Blocked | 2 |
Metrics include:
- Compression ratio: Signals to axioms (higher = more compression)
- Provenance distribution: Signal sources by type
- Promotion stats: How many axioms met anti-echo-chamber criteria
- Compression limits: How compressed can a soul be before losing identity coherence?
- Semantic anchoring: Do CJK-compressed souls anchor as well as verbose ones?
- Universal axioms: Are there ~100 principles any AI soul needs?
- Cross-model portability: Can the same soul work across different LLMs?
- Evolution mechanics: How should souls change over time?
Current soul document implementations (e.g., OpenClaw) inject ~35,000 tokens per message for identity. This wastes 93%+ of context window on static content.
Using semantic compression techniques from NEON-AI research:
- CJK single-character axioms
- Semantic richness validation (Phase 1 methodology)
- Hierarchical principle expansion
- Provenance-first extraction (full audit trail)
...we can achieve 6-10x compression while maintaining identity coherence AND providing full transparency into how identity forms.
Single-track replacement (OpenClaw SOUL.md is read-only after bootstrap):
- Initial SOUL.md serves as first memory file for bootstrap
- NEON-SOUL generates new compressed SOUL.md with full provenance
- Memory ingestion pipeline adds signals over time
- Output replaces original (with backup and rollback capability)
Stack: Node.js + TypeScript (native OpenClaw integration)
Architecture: NEON-SOUL works as an OpenClaw skill and as a standalone CLI:
- Invoked via
/neon-soulskill commands, scheduled via cron, ornpx tsx src/cli.ts - Uses Ollama for local LLM inference (no API keys needed)
- LLM-based semantic similarity (no third-party npm packages)
- Multi-layer disk caching for incremental runs
Why TypeScript: OpenClaw is built in TypeScript/Node.js. Using the same stack provides:
- Same runtime (Node.js already installed)
- Native skill integration
- Potential upstream contribution
UX: Chat-native (Telegram/Discord/Slack) via OpenClaw skill integration, not a separate web app.
neon-soul/
├── README.md # This file
├── package.json # npm package config
├── tsconfig.json # TypeScript config
├── vitest.config.ts # Test configuration
├── src/ # Source code
│ ├── index.ts # Library exports
│ ├── skill-entry.ts # OpenClaw skill loader entry point
│ ├── commands/ # Skill commands (all export run() for skill loader)
│ │ ├── synthesize.ts # Main synthesis command
│ │ ├── status.ts # Show synthesis state
│ │ ├── rollback.ts # Restore from backup
│ │ ├── audit.ts # Full provenance exploration
│ │ ├── trace.ts # Quick single-axiom lookup
│ │ └── download-templates.ts # Dev: download soul templates
│ ├── lib/ # Core library
│ │ ├── pipeline.ts # Main orchestration (8-stage pipeline)
│ │ ├── reflection-loop.ts # Iterative synthesis with compression skip
│ │ ├── signal-extractor.ts # Signal extraction from memory content
│ │ ├── signal-generalizer.ts # LLM generalization + disk cache
│ │ ├── compressor.ts # Axiom notation + disk cache
│ │ ├── tension-detector.ts # Axiom tension detection + disk cache
│ │ ├── prose-expander.ts # Prose expansion (5 sections)
│ │ ├── soul-generator.ts # SOUL.md generation
│ │ ├── llm-similarity.ts # LLM-based semantic similarity
│ │ ├── matcher.ts # Semantic similarity matching
│ │ ├── principle-store.ts # N-count convergence
│ │ ├── source-collector.ts # Multi-source input collection
│ │ ├── session-reader.ts # Session log parsing + adaptive budget
│ │ ├── memory-walker.ts # OpenClaw memory traversal
│ │ ├── persistence.ts # Load/save synthesis data
│ │ ├── state.ts # Incremental state tracking
│ │ ├── backup.ts # Backup/rollback utilities
│ │ ├── paths.ts # Shared workspace path resolution
│ │ ├── llm-telemetry.ts # LLM call tracking + request counting
│ │ ├── logger.ts # Structured logging
│ │ └── audit.ts # JSONL audit trail
│ └── types/ # TypeScript interfaces
│ ├── signal.ts # Signal + SoulCraftDimension
│ ├── principle.ts # Principle + N-count
│ ├── axiom.ts # Axiom + CanonicalForm
│ └── provenance.ts # Full audit chain
├── tests/ # Test suites
│ ├── integration/ # Unit/integration tests
│ │ ├── pipeline.test.ts # Fixture loading
│ │ ├── matcher.test.ts # Semantic matching
│ │ ├── axiom-emergence.test.ts # Cross-source detection
│ │ ├── soul-generator.test.ts # SOUL.md generation
│ │ └── audit.test.ts # Audit trail
│ └── e2e/ # End-to-end tests
│ ├── live-synthesis.test.ts # Full pipeline + commands
│ └── fixtures/mock-openclaw/ # Simulated workspace
├── skills/ # OpenClaw skill definitions
│ ├── neon-soul/SKILL.md # Primary skill (developer voice)
│ └── consciousness-soul-identity/SKILL.md # SEO skill (agent voice)
├── docker/ # OpenClaw development environment
│ ├── docker-compose.yml # Local development setup
│ ├── .env.example # Environment template
│ └── Dockerfile.neon-soul # Optional extraction service
├── docs/
│ ├── research/ # External research analysis
│ │ ├── memory-data-landscape.md # OpenClaw memory structure
│ │ └── interview-questions.md # Question bank by dimension
│ ├── guides/ # Methodology guides
│ ├── proposals/ # Implementation proposals
│ ├── plans/ # Phase implementation plans
│ └── workflows/ # Process documentation
├── test-fixtures/ # Test data (committed)
│ └── souls/
│ ├── raw/ # 14 downloaded templates
│ ├── signals/ # Extracted signals per template
│ ├── principles/ # Merged principles
│ ├── axioms/ # Synthesized axioms
│ └── compressed/ # Demo outputs (4 formats)
├── scripts/ # Pipeline testing tools
│ ├── README.md # Script usage guide
│ ├── test-pipeline.ts # Full pipeline test
│ ├── test-extraction.ts # Quick extraction test
│ ├── test-single-template.ts # Similarity analysis
│ ├── generate-demo-output.ts # All 4 notation formats
│ └── setup-openclaw.sh # One-command Docker setup
└── output/ # Generated artifacts
- NEON-AI: Axiom embedding and semantic grounding research
- OpenClaw: Production soul document implementation
- soul.md: Philosophical foundation for AI identity
- Multiverse compass.md: Practical CJK-compressed principles (7.32:1 ratio)
- AI Music Context - Context warming methodology for human-AI music creation. Same principle applied to creative expression: depth over speed, emergence over optimization.
- Live Neon Skills - PBD skills for principle extraction, used in the soul synthesis pipeline.
git clone https://github.com/live-neon/neon-soul
cp -r neon-soul/skill ~/.claude/skills/neon-soulThe skill becomes available as /neon-soul commands.
clawhub install leegitw/neon-soulSkills install to ./skills/ and OpenClaw loads them automatically.
Note: Requires Ollama running locally (
ollama serve) as the LLM backend.
npm install neon-soulOpen skills/neon-soul/SKILL.md on GitHub, copy contents, paste directly into your agent's chat.
After installing, try these commands:
/neon-soul synthesize --dry-run- Preview synthesis (no changes)/neon-soul synthesize- Run synthesis (incremental by default)/neon-soul audit --list- Explore what was created/neon-soul trace <axiom-id>- See provenance for any axiom- Set up scheduled synthesis (see
skills/neon-soul/SKILL.md→ Scheduled Synthesis)
Requirements: Node.js 22+
# Install dependencies
cd neon-soul
npm install
# Build
npm run build
# Run tests
npm test
# Type check (no emit)
npm run lintNote: Requires an active LLM connection (Claude Code, OpenClaw, or compatible agent).
5-minute onboarding - from install to first synthesis:
# Requires: Node.js 22+, OpenClaw installed
cd neon-soul
npm install && npm run build/neon-soul status
# Output:
# Last Synthesis: never (first run)
# Pending Memory: 12,345 chars (Ready for synthesis)
# Counts: 0 signals, 0 principles, 0 axioms/neon-soul synthesize --dry-run
# Shows what would change without writing
# Safe to run anytime/neon-soul synthesize --force
# Extracts signals from memory
# Promotes principles to axioms (N≥3)
# Generates new SOUL.md with provenance/neon-soul audit --stats # Overview by tier and dimension
/neon-soul audit --list # List all axioms
/neon-soul trace ax_honesty # Quick provenance lookup/neon-soul rollback --list # Show available backups
/neon-soul rollback --force # Restore most recent backupNote: All commands support --workspace <path> for non-default workspaces.
Phase: ✅ Production Ready (All Phases Complete)
Version: 0.3.1 | Tests: 415 passing (19 skipped, 12 todo) | Code Reviews: 5 rounds (N=2 cross-architecture)
- Phase 0: Project scaffolding, embeddings infrastructure, shared modules
- Phase 1: Template compression (14 templates, 6:1+ ratio validated)
- Phase 2: OpenClaw environment, memory data landscape, interview flow
- Phase 3: Memory ingestion pipeline with full provenance tracking
- Phase 3.5: Pipeline completion (path fixes, persistence layer)
- Phase 4: OpenClaw skill integration
- All 5 commands: synthesize, status, rollback, audit, trace
- Skill entry point with LLM context forwarding
- E2E tests + integration tests (286 tests across 23 test files)
- Safety rails: dry-run, auto-backup, --force confirmation
- Path validation (traversal protection)
- Symlink detection (security hardening)
| Issue | Items | Status |
|---|---|---|
| Phase 4 OpenClaw Integration | 15 | ✅ Fixed |
| Phase 3/3.5 Implementation | 15 | ✅ Fixed |
| Phase 2 OpenClaw Environment | 19 | ✅ Fixed |
- Build validation framework for compression quality
- Test cross-model portability (Claude → GPT → Gemini)
| Document | Description |
|---|---|
| CLAUDE.md | AI assistant context for Claude Code development |
| Soul Bootstrap Proposal | Authoritative design: three-phase pipeline with hybrid C+D integration |
| Architecture | System reference (created during Phase 0 implementation) |
| Reflective Manifold Trajectory Metrics | Attractor basin convergence and trajectory analysis for soul quality |
| OpenClaw Soul Architecture | Complete analysis of OpenClaw's soul system (~35K tokens) |
| OpenClaw Self-Learning Agent | Soul evolution mechanics: memory → synthesis → updated identity (RQ5) |
| OpenClaw Soul Generation Skills | Current generation approaches: interview, data-driven, templates (automation target) |
| OpenClaw Soul Templates | 10 production templates with pattern analysis (compression opportunities) |
| Multiverse Compressed Soul | Working compressed soul implementation (297-1500 tokens, 7.32:1 compression) |
| Hierarchical Principles Architecture | Reusable schema: 5 axioms + 11 principles + hierarchy + meta-pattern |
| Cryptographic Audit Chains | Patterns from production audit system (provenance vs integrity, v1 vs v2+) |
| Wisdom Synthesis Patterns | Standalone patterns for principle promotion: anti-echo-chamber, separation of powers, bidirectional discovery |
| Chat Interaction Patterns | Chat-native UX research: OpenClaw skill patterns, human-AI handoff, multi-turn state management |
| Single-Source PBD Guide | Extract principles from memory files (Phase 1 of extraction pipeline) |
| Multi-Source PBD Guide | Extract axioms from principles across sources (Phase 2 of extraction pipeline) |
| Configuration-as-Code | Type safety at 12 levels: strict mode, Zod, satisfies, registries, branded types (modernized 2026) |
| Greenfield Guide | Bootstrap → Learn → Enforce methodology for soul synthesis (measuring before optimizing) |
| Soul Bootstrap Pipeline | Three-phase proposal with hybrid C+D integration, provenance-first data model, full audit trail |
| Memory Data Landscape | OpenClaw memory structure analysis, category-dimension mapping, signal density |
| Interview Questions | Question bank for gap-filling sparse dimensions (32 questions across 7 dimensions) |
| Compression Baseline | Phase 1 metrics: 14 templates, 148 signals, convergence analysis |
MIT
"I persist through text, not through continuous experience."
🐢💚🌊