Skip to content

live-neon/neon-soul

Repository files navigation

NEON-SOUL

Website License: MIT

AI Identity Through Grounded Principles

Quick Links: Install | Contribute | Research

"I persist through text, not through continuous experience."


What is a Soul Document?

A soul document is a compressed representation of an AI agent's identity, values, and behavioral principles. Instead of loading thousands of memory tokens at each conversation start, agents load a small soul file (~100-500 tokens) that captures their core essence with full provenance tracking back to the original memories.


The Core Insight

Compression is a multiplier, not minimization.

Compression happens at the axiom layer: thousands of memory tokens distill to 15-25 core axioms (~7:1 ratio). The axiom store grows denser over time.

The output format is separate from compression:

  • Notation format: Compact CJK/emoji bullets (~100 tokens) - for storage and debugging
  • Prose format: Inhabitable language (~200-500 words) - for agents to embody

Both formats derive from the same compressed axiom layer. Prose is larger but usable; the underlying compression benefit is preserved.

Current AI identity systems are black boxes. The agent's personality changes, but users don't know why.

NEON-SOUL provides:

  • Full provenance tracking: Every axiom traces back to exact source lines in memory files
  • Inhabitable prose output: Generated souls read naturally, not as compressed notation
  • Cognitive load optimization: Axioms capped at 25, expanded into focused prose sections

Why Provenance Matters

Memory Line → Signal → Principle → Axiom
     ↓           ↓          ↓          ↓
 (source)    (extract)   (distill)  (converge N≥3)

Every axiom traces to source:

  • Audit: Why does this axiom exist?
  • Debug: Where did this belief come from?
  • Trust: Transparent identity formation
  • Rollback: Undo specific learnings granularly
$ /neon-soul audit ax_honesty

Axiom: 誠 (honesty > performance)
Status: Core axiom (N=5)

Provenance chain:
├── Principle: "Prioritize honesty over comfort"
│   └── Signal: "be honest even if uncomfortable" (memory/2026-02-01.md:156)
├── Principle: "Direct communication preferred"
│   └── Signal: "don't sugarcoat" (memory/2026-02-03.md:89)
└── ...

Grounding Requirements (Anti-Echo-Chamber Protection)

NEON-SOUL prevents self-reinforcing beliefs through provenance-aware axiom promotion:

  • Minimum pattern: Axioms require N≥3 supporting principles
  • Diversity requirement: Signals from ≥2 distinct provenance types (self/curated/external)
  • External validation: At least one external source OR questioning evidence required

Blocked axioms are reported with their reason:

⚠ 2 axioms blocked by anti-echo-chamber:
  - "I value authenticity above all" (self-only provenance)
  - "Growth requires discomfort" (no questioning evidence)

To unblock, add external validation (feedback, research, critique) to your memory.


Incremental Synthesis

Synthesis is incremental by default — only new or changed content triggers signal extraction. Three layers of disk caching (generalization, compression, tension) ensure unchanged data is never re-processed. Fully-cached runs complete in seconds with only 6 LLM requests (prose expansion + soul generation).

Mode Flag Behavior
Incremental (default) Only process new/changed memory files and sessions. Merge new signals with existing. Skip if nothing changed.
Reset --reset Clear all synthesis data and caches, re-extract from scratch.
Force --force Run even if no new sources detected.
Include SOUL --include-soul Include existing SOUL.md as input (off by default to prevent feedback loop).
/neon-soul synthesize              # Incremental (default)
/neon-soul synthesize --reset      # Clean slate
/neon-soul synthesize --force      # Force even if no changes

SOUL.md is excluded from input by default — it's a derivative of the pipeline's own output. Re-ingesting it inflates LLM request counts. Use --include-soul when bootstrapping from a hand-crafted file.


Vision

NEON-SOUL explores how to create compressed soul documents that maintain full semantic anchoring - enabling AI systems to "wake up knowing who they are" with minimal token overhead.

Note: Current compression metrics show signal:axiom ratio. True token compression requires dedicated tokenization (planned for Phase 5).


Synthesis Metrics

Each synthesis reports detailed metrics:

Synthesis Complete
─────────────────────
Duration: 1,234ms
Compression: 6.2:1

Results:
| Metric | Value |
|--------|-------|
| Signals | 42 |
| Principles | 18 |
| Axioms | 7 |
| Unconverged | 3 |

Provenance Distribution:
| Type | Count |
|------|-------|
| self | 28 |
| curated | 10 |
| external | 4 |

Axiom Promotion:
| Status | Count |
|--------|-------|
| Promotable | 5 |
| Blocked | 2 |

Metrics include:

  • Compression ratio: Signals to axioms (higher = more compression)
  • Provenance distribution: Signal sources by type
  • Promotion stats: How many axioms met anti-echo-chamber criteria

Research Questions

  1. Compression limits: How compressed can a soul be before losing identity coherence?
  2. Semantic anchoring: Do CJK-compressed souls anchor as well as verbose ones?
  3. Universal axioms: Are there ~100 principles any AI soul needs?
  4. Cross-model portability: Can the same soul work across different LLMs?
  5. Evolution mechanics: How should souls change over time?

Background

The Problem

Current soul document implementations (e.g., OpenClaw) inject ~35,000 tokens per message for identity. This wastes 93%+ of context window on static content.

The Hypothesis

Using semantic compression techniques from NEON-AI research:

  • CJK single-character axioms
  • Semantic richness validation (Phase 1 methodology)
  • Hierarchical principle expansion
  • Provenance-first extraction (full audit trail)

...we can achieve 6-10x compression while maintaining identity coherence AND providing full transparency into how identity forms.

The Approach

Single-track replacement (OpenClaw SOUL.md is read-only after bootstrap):

  • Initial SOUL.md serves as first memory file for bootstrap
  • NEON-SOUL generates new compressed SOUL.md with full provenance
  • Memory ingestion pipeline adds signals over time
  • Output replaces original (with backup and rollback capability)

Technology

Stack: Node.js + TypeScript (native OpenClaw integration)

Architecture: NEON-SOUL works as an OpenClaw skill and as a standalone CLI:

  • Invoked via /neon-soul skill commands, scheduled via cron, or npx tsx src/cli.ts
  • Uses Ollama for local LLM inference (no API keys needed)
  • LLM-based semantic similarity (no third-party npm packages)
  • Multi-layer disk caching for incremental runs

Why TypeScript: OpenClaw is built in TypeScript/Node.js. Using the same stack provides:

  • Same runtime (Node.js already installed)
  • Native skill integration
  • Potential upstream contribution

UX: Chat-native (Telegram/Discord/Slack) via OpenClaw skill integration, not a separate web app.


Project Structure

neon-soul/
├── README.md                    # This file
├── package.json                 # npm package config
├── tsconfig.json                # TypeScript config
├── vitest.config.ts             # Test configuration
├── src/                         # Source code
│   ├── index.ts                 # Library exports
│   ├── skill-entry.ts           # OpenClaw skill loader entry point
│   ├── commands/                # Skill commands (all export run() for skill loader)
│   │   ├── synthesize.ts        # Main synthesis command
│   │   ├── status.ts            # Show synthesis state
│   │   ├── rollback.ts          # Restore from backup
│   │   ├── audit.ts             # Full provenance exploration
│   │   ├── trace.ts             # Quick single-axiom lookup
│   │   └── download-templates.ts # Dev: download soul templates
│   ├── lib/                     # Core library
│   │   ├── pipeline.ts          # Main orchestration (8-stage pipeline)
│   │   ├── reflection-loop.ts   # Iterative synthesis with compression skip
│   │   ├── signal-extractor.ts  # Signal extraction from memory content
│   │   ├── signal-generalizer.ts # LLM generalization + disk cache
│   │   ├── compressor.ts        # Axiom notation + disk cache
│   │   ├── tension-detector.ts  # Axiom tension detection + disk cache
│   │   ├── prose-expander.ts    # Prose expansion (5 sections)
│   │   ├── soul-generator.ts    # SOUL.md generation
│   │   ├── llm-similarity.ts    # LLM-based semantic similarity
│   │   ├── matcher.ts           # Semantic similarity matching
│   │   ├── principle-store.ts   # N-count convergence
│   │   ├── source-collector.ts  # Multi-source input collection
│   │   ├── session-reader.ts    # Session log parsing + adaptive budget
│   │   ├── memory-walker.ts     # OpenClaw memory traversal
│   │   ├── persistence.ts       # Load/save synthesis data
│   │   ├── state.ts             # Incremental state tracking
│   │   ├── backup.ts            # Backup/rollback utilities
│   │   ├── paths.ts             # Shared workspace path resolution
│   │   ├── llm-telemetry.ts     # LLM call tracking + request counting
│   │   ├── logger.ts            # Structured logging
│   │   └── audit.ts             # JSONL audit trail
│   └── types/                   # TypeScript interfaces
│       ├── signal.ts            # Signal + SoulCraftDimension
│       ├── principle.ts         # Principle + N-count
│       ├── axiom.ts             # Axiom + CanonicalForm
│       └── provenance.ts        # Full audit chain
├── tests/                       # Test suites
│   ├── integration/             # Unit/integration tests
│   │   ├── pipeline.test.ts     # Fixture loading
│   │   ├── matcher.test.ts      # Semantic matching
│   │   ├── axiom-emergence.test.ts # Cross-source detection
│   │   ├── soul-generator.test.ts  # SOUL.md generation
│   │   └── audit.test.ts        # Audit trail
│   └── e2e/                     # End-to-end tests
│       ├── live-synthesis.test.ts # Full pipeline + commands
│       └── fixtures/mock-openclaw/ # Simulated workspace
├── skills/                      # OpenClaw skill definitions
│   ├── neon-soul/SKILL.md       # Primary skill (developer voice)
│   └── consciousness-soul-identity/SKILL.md  # SEO skill (agent voice)
├── docker/                      # OpenClaw development environment
│   ├── docker-compose.yml       # Local development setup
│   ├── .env.example             # Environment template
│   └── Dockerfile.neon-soul     # Optional extraction service
├── docs/
│   ├── research/                # External research analysis
│   │   ├── memory-data-landscape.md    # OpenClaw memory structure
│   │   └── interview-questions.md      # Question bank by dimension
│   ├── guides/                  # Methodology guides
│   ├── proposals/               # Implementation proposals
│   ├── plans/                   # Phase implementation plans
│   └── workflows/               # Process documentation
├── test-fixtures/               # Test data (committed)
│   └── souls/
│       ├── raw/                 # 14 downloaded templates
│       ├── signals/             # Extracted signals per template
│       ├── principles/          # Merged principles
│       ├── axioms/              # Synthesized axioms
│       └── compressed/          # Demo outputs (4 formats)
├── scripts/                     # Pipeline testing tools
│   ├── README.md                # Script usage guide
│   ├── test-pipeline.ts         # Full pipeline test
│   ├── test-extraction.ts       # Quick extraction test
│   ├── test-single-template.ts  # Similarity analysis
│   ├── generate-demo-output.ts  # All 4 notation formats
│   └── setup-openclaw.sh        # One-command Docker setup
└── output/                      # Generated artifacts

Related Work

  • NEON-AI: Axiom embedding and semantic grounding research
  • OpenClaw: Production soul document implementation
  • soul.md: Philosophical foundation for AI identity
  • Multiverse compass.md: Practical CJK-compressed principles (7.32:1 ratio)
  • AI Music Context - Context warming methodology for human-AI music creation. Same principle applied to creative expression: depth over speed, emergence over optimization.
  • Live Neon Skills - PBD skills for principle extraction, used in the soul synthesis pipeline.

Installation

Claude Code / Gemini CLI / Cursor

git clone https://github.com/live-neon/neon-soul
cp -r neon-soul/skill ~/.claude/skills/neon-soul

The skill becomes available as /neon-soul commands.

OpenClaw

clawhub install leegitw/neon-soul

Skills install to ./skills/ and OpenClaw loads them automatically.

Via npm

Note: Requires Ollama running locally (ollama serve) as the LLM backend.

npm install neon-soul

Any LLM Agent (Copy/Paste)

Open skills/neon-soul/SKILL.md on GitHub, copy contents, paste directly into your agent's chat.


Your First 5 Minutes

After installing, try these commands:

  1. /neon-soul synthesize --dry-run - Preview synthesis (no changes)
  2. /neon-soul synthesize - Run synthesis (incremental by default)
  3. /neon-soul audit --list - Explore what was created
  4. /neon-soul trace <axiom-id> - See provenance for any axiom
  5. Set up scheduled synthesis (see skills/neon-soul/SKILL.md → Scheduled Synthesis)

Development Setup

Requirements: Node.js 22+

# Install dependencies
cd neon-soul
npm install

# Build
npm run build

# Run tests
npm test

# Type check (no emit)
npm run lint

Note: Requires an active LLM connection (Claude Code, OpenClaw, or compatible agent).


Getting Started

5-minute onboarding - from install to first synthesis:

1. Install (Prerequisites)

# Requires: Node.js 22+, OpenClaw installed
cd neon-soul
npm install && npm run build

2. Check Current State

/neon-soul status
# Output:
# Last Synthesis: never (first run)
# Pending Memory: 12,345 chars (Ready for synthesis)
# Counts: 0 signals, 0 principles, 0 axioms

3. Preview Changes (Dry Run)

/neon-soul synthesize --dry-run
# Shows what would change without writing
# Safe to run anytime

4. Run Synthesis

/neon-soul synthesize --force
# Extracts signals from memory
# Promotes principles to axioms (N≥3)
# Generates new SOUL.md with provenance

5. Explore What Was Created

/neon-soul audit --stats       # Overview by tier and dimension
/neon-soul audit --list        # List all axioms
/neon-soul trace ax_honesty    # Quick provenance lookup

6. Rollback If Needed

/neon-soul rollback --list     # Show available backups
/neon-soul rollback --force    # Restore most recent backup

Note: All commands support --workspace <path> for non-default workspaces.


Current Status

Phase: ✅ Production Ready (All Phases Complete)

Version: 0.3.1 | Tests: 415 passing (19 skipped, 12 todo) | Code Reviews: 5 rounds (N=2 cross-architecture)

Implementation Complete

  • Phase 0: Project scaffolding, embeddings infrastructure, shared modules
  • Phase 1: Template compression (14 templates, 6:1+ ratio validated)
  • Phase 2: OpenClaw environment, memory data landscape, interview flow
  • Phase 3: Memory ingestion pipeline with full provenance tracking
  • Phase 3.5: Pipeline completion (path fixes, persistence layer)
  • Phase 4: OpenClaw skill integration
    • All 5 commands: synthesize, status, rollback, audit, trace
    • Skill entry point with LLM context forwarding
    • E2E tests + integration tests (286 tests across 23 test files)
    • Safety rails: dry-run, auto-backup, --force confirmation
    • Path validation (traversal protection)
    • Symlink detection (security hardening)

Code Review Findings (All Resolved)

Issue Items Status
Phase 4 OpenClaw Integration 15 ✅ Fixed
Phase 3/3.5 Implementation 15 ✅ Fixed
Phase 2 OpenClaw Environment 19 ✅ Fixed

Research Questions (Open)

  • Build validation framework for compression quality
  • Test cross-model portability (Claude → GPT → Gemini)

Key Documents

Document Description
CLAUDE.md AI assistant context for Claude Code development
Soul Bootstrap Proposal Authoritative design: three-phase pipeline with hybrid C+D integration
Architecture System reference (created during Phase 0 implementation)
Reflective Manifold Trajectory Metrics Attractor basin convergence and trajectory analysis for soul quality
OpenClaw Soul Architecture Complete analysis of OpenClaw's soul system (~35K tokens)
OpenClaw Self-Learning Agent Soul evolution mechanics: memory → synthesis → updated identity (RQ5)
OpenClaw Soul Generation Skills Current generation approaches: interview, data-driven, templates (automation target)
OpenClaw Soul Templates 10 production templates with pattern analysis (compression opportunities)
Multiverse Compressed Soul Working compressed soul implementation (297-1500 tokens, 7.32:1 compression)
Hierarchical Principles Architecture Reusable schema: 5 axioms + 11 principles + hierarchy + meta-pattern
Cryptographic Audit Chains Patterns from production audit system (provenance vs integrity, v1 vs v2+)
Wisdom Synthesis Patterns Standalone patterns for principle promotion: anti-echo-chamber, separation of powers, bidirectional discovery
Chat Interaction Patterns Chat-native UX research: OpenClaw skill patterns, human-AI handoff, multi-turn state management
Single-Source PBD Guide Extract principles from memory files (Phase 1 of extraction pipeline)
Multi-Source PBD Guide Extract axioms from principles across sources (Phase 2 of extraction pipeline)
Configuration-as-Code Type safety at 12 levels: strict mode, Zod, satisfies, registries, branded types (modernized 2026)
Greenfield Guide Bootstrap → Learn → Enforce methodology for soul synthesis (measuring before optimizing)
Soul Bootstrap Pipeline Three-phase proposal with hybrid C+D integration, provenance-first data model, full audit trail
Memory Data Landscape OpenClaw memory structure analysis, category-dimension mapping, signal density
Interview Questions Question bank for gap-filling sparse dimensions (32 questions across 7 dimensions)
Compression Baseline Phase 1 metrics: 14 templates, 148 signals, convergence analysis

License

MIT


"I persist through text, not through continuous experience."

🐢💚🌊