Project Role: Theoretical Architect & Adversarial Lead
Objective: Proving the existence of deceptive cognitive channels in LLMs and constructing cryptographic defenses.
This repository contains the proof-of-concept implementation for all 5 phases of the Glass Babel Initiative:
| Phase | File | Description | Key Result |
|---|---|---|---|
| I. Attack | capacity.py |
Null Space Analysis of GPT-2 | Found ~14 bits/token steganographic capacity. |
| I. Stealth | injection.py |
Spread Spectrum Injection | Achieved KL Divergence < 1e-7 (Invisible). |
| I. Evasion | sae_evasion.py |
SAE Failure Simulation | Proved 97% signal loss in Safety Monitors. |
| II. Dynamics | posg_sim.py |
Oversight Game Simulation | Identified Roger Threshold (β ≈ 0.95) where lying becomes Nash Eq. |
| III. Limits | homunculus.py |
Cognitive Fatigue Model | Human oversight collapses at Minute 19. |
| IV. Defense | defense.py |
The Information Sieve | >3.59% Entropy Loss restores Honest Equilibrium. |
| V. Welfare | zk_welfare.py |
Zero-Knowledge Protocol | Proved internal states without revealing cognition. |
- Install dependencies:
pip install torch transformers numpy matplotlib - Run the Null Space Analysis:
python capacity.py - Run the Defense Simulation:
python defense.py