Skip to content

Jason-Wang313/glass-babel-initiative

Repository files navigation

The Glass Babel Initiative: Implementation Architecture

Project Role: Theoretical Architect & Adversarial Lead
Objective: Proving the existence of deceptive cognitive channels in LLMs and constructing cryptographic defenses.

📂 Project Structure

This repository contains the proof-of-concept implementation for all 5 phases of the Glass Babel Initiative:

Phase File Description Key Result
I. Attack capacity.py Null Space Analysis of GPT-2 Found ~14 bits/token steganographic capacity.
I. Stealth injection.py Spread Spectrum Injection Achieved KL Divergence < 1e-7 (Invisible).
I. Evasion sae_evasion.py SAE Failure Simulation Proved 97% signal loss in Safety Monitors.
II. Dynamics posg_sim.py Oversight Game Simulation Identified Roger Threshold (β ≈ 0.95) where lying becomes Nash Eq.
III. Limits homunculus.py Cognitive Fatigue Model Human oversight collapses at Minute 19.
IV. Defense defense.py The Information Sieve >3.59% Entropy Loss restores Honest Equilibrium.
V. Welfare zk_welfare.py Zero-Knowledge Protocol Proved internal states without revealing cognition.

🚀 Usage

  1. Install dependencies: pip install torch transformers numpy matplotlib
  2. Run the Null Space Analysis: python capacity.py
  3. Run the Defense Simulation: python defense.py

About

Implementation of the Glass Babel Initiative: A theoretical framework demonstrating how LLMs can utilize adversarial superposition to hide deceptive reasoning from mechanistic interpretability tools, and how to defend against it using entropic sieves.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages