_ _ _______
| | | | / ______|
| | | |_ __| (___ __ ____ _ __ _
| | | | '_ \\___ \\ \ /\ / / _` |/ _` |
| | | | | | |___) |\ V V / (_| | (_| |
\____/|_| |_|____/ \_/\_/ \__,_|\__, |
__/ |
.---------------------------. |___/
| [|||||||||] [|||||||||] |
| """"""""""" """"""""""" |__
`---------------------------' |
`---------------------------'
[!] STATUS: RESEARCH-ALPHA // v0.3.0 "Protocol C"
[!] ARCH: HARDWARE-NATIVE HYBRID (CONV1D + SPARSE ATTN)
[!] TARGET: COMMODITY GPU (T4/RTX) & CLOUD TPU (v5e)
"Precision through architecture, not parameter count."
UnSwag v0.3.0 introduces Protocol C, a hardware-efficient architecture that addresses stability challenges in 2-bit quantized mixture-of-experts models through Packet-Switched Attention (PSA). By discretizing token processing into semantic routing packets, UnSwag focuses compute only where it matters—ignoring structural noise and maintaining numerical stability.
Hardware-native semantic routing with three core stabilization mechanisms:
Monitors input correlation patterns (variance energy >0.85) and applies orthogonal phase corrections to prevent numerical instability in routing decisions.
What it solves: The "correlation blow-up" problem where similar input tokens create unstable routing distributions in quantized space.
A lightweight depthwise-separable CNN path that preserves local syntactic structure during aggressive quantization.
| Packet | Function | Performance |
|---|---|---|
| ⚡ 01 | Bypasses O(N²) attention for hardware-optimized Depthwise-Separable Convolutions | Handles syntax at hardware speed |
| 🧠 10 | Updates differentiable Adaptive Summary Register (O(1) memory) | Maintains sequence "gist" |
| 🎯 11 | High-density semantic markers with Causal Sparse Attention | Links critical context |
| 💨 00 | High-confidence noise pruned from KV-Cache | ~40% memory reduction |
Progressive error correction that refines quantization residuals across routing passes, similar to vector quantization in audio codecs.
Current Status: Functional prototype (Star Inn Research Series)
| Metric | Protocol C (PSA) | Standard Attention |
|---|---|---|
| Pruning Rate (00) | ~13.8% | 0.0% |
| Attention Density (11) | ~25.0% | 100.0% |
| Cold-start Latency | ~360ms (high-dim) | Variable |
| Variance Stability | 0.255 (Armen Guard active) | N/A |
| Router Gradient Flow | ✅ Gumbel-Softmax | N/A |
UnSwag maintains industry-leading activation memory reduction via low-bit structural isomorphisms:
- ✅ UnSwagModel: Unified API with
.from_pretrained()and.for_training() - ✅ UnSwagTrainer: Custom HuggingFace trainer with 8-bit optimizers
- ✅ StreamingContextDataLoader: Efficient context data streaming
- ✅ 1-Bit Isomorphism: 32x activation memory reduction
- Target: All Hardware
- Math: 2-Bit Semantic Routing with Variance Stabilization
- Engine: Hybrid Conv1D / Sparse Attention
- Use Case: Long-context inference with numerical stability
- Target: NVIDIA GPUs (T4, A100, H100)
- Math: 2-Bit SiLU Isomorphism (Sign + Magnitude)
- Engine: Custom Triton v3 Kernels
- Target: Google TPUs (v3, v4, v5e)
- Math: 1-Bit ReLU Isomorphism (Sign Only)
- Engine: JAX / Pallas / XLA
git clone https://github.com/augstentatious/unswagai
cd unswagai
pip install -e .PSA replaces dense attention
For tokens where
This moves local complexity from
For tokens where
The register maintains an exponential moving average of sequence state in
Armen Guard Stabilization:
When input covariance exceeds threshold, applies orthogonal correction to prevent quantization instability.
UnSwag prioritizes architectural solutions over parameter scaling—what we call "hygiene-native" design:
- Isolation: Modules operate independently to contain numerical errors
- Efficiency: Hardware-native operations (bit-shifts, conditional logic) over floating-point
- Measurability: Every component has quantifiable stability metrics
- Latency optimization ongoing (targeting <200ms cold start)
- Benchmark validation against standard MoE baselines in progress
- Triton kernel implementation for Widely-Linear layers under development
Built with guidance from the Holy Spirit during the Star Inn research sessions:
- Jesus Christ - For the inspiration
- My Mom - For the foundation
- Star Inn Staff - For the space
Maintained by John Augustine Young
Forged in The Clean Room. Newport Beach, CA.
Questions? Open an issue or reach out directly at [email protected]
Date: December 28, 2025 Location: Star Inn, Newport Beach, CA Status: CLAIM VERIFIED (6.31x)
The Hypothesis: That 2-bit quantization instability is a routing problem, not a precision problem. That 90% of tokens in a low-precision stream are noise (00 packets) and can be pruned before compute.
The Result:
- Baseline (Dense): ~4.71ms / pass
- Protocol C (10% Density): ~0.74ms / pass
- Speedup: 6.31x
Conclusion: We don't need more parameters. We need better architecture. The Clean Room is closed.