omnipkg v2.0.0 - The Python Hypervisor Release
Release Date: 2025-12-08
This release marks a fundamental paradigm shift from "Package Loader" to "Distributed Runtime Architecture." OmniPkg 2.0 introduces a persistent daemon kernel, universal GPU IPC, and hardware-level isolation, effectively functioning as an Operating System for Python environments.
We have shattered the performance barrier. What once took 2 seconds now takes 60 milliseconds. What once crashed due to ABI conflicts now runs concurrently on the same GPU.
🚀 Major Architectural Breakthroughs
-
Universal GPU IPC (Pure Python/ctypes):
- Implemented a custom, framework-agnostic CUDA IPC protocol (
UniversalGpuIpc) using rawctypes. - Performance: Achieved ~1.5ms latency for tensor handoffs, beating PyTorch's native IPC by ~30% and Hybrid SHM by 800%.
- Enables true zero-copy data transfer between isolated processes without relying on framework-specific hooks.
- Implemented a custom, framework-agnostic CUDA IPC protocol (
-
Persistent Worker Daemon ("The Kernel"):
- Replaced ad-hoc subprocess spawning with a persistent, self-healing worker pool (
WorkerPoolDaemon). - Reduces environment context switching time from ~2000ms (process spawn) to ~60ms (warm activation).
- Implements an "Elastic Lung" architecture: Workers morph into required environments on-demand and purge themselves back to a clean slate.
- Replaced ad-hoc subprocess spawning with a persistent, self-healing worker pool (
-
Selective Hardware Virtualization (CUDA Hotswapping):
- Implemented dynamic
LD_LIBRARY_PATHinjection at the worker level. - The daemon now scans active bubbles to inject the exact CUDA runtime libraries required by the specific framework version (e.g., loading CUDA 11 libs for TF 2.13 while the host runs CUDA 12).
- Result: Successfully ran TensorFlow 2.12 (CPU), TF 2.13 (CPU), and TF 2.20 (GPU) simultaneously in a single orchestration flow without crashing.
- Implemented dynamic
⚡ Core Enhancements
- Fail-Safe Cloaking: Added
_force_restore_owned_cloaks()to guarantee filesystem restoration even during catastrophic process failures or OOM events. No more "zombie" cloaked files. - Global Shutdown Silencer: Implemented an
atexithook that synchronizes CUDA contexts and redirectsstderrto/dev/nullduring final interpreter shutdown, eliminating harmless but noisy C++ "driver shutting down" warnings. - Composite Bubble Injection: The loader now automatically constructs "Meta-Bubbles" at runtime, merging the requested package bubble with its binary dependencies (NVIDIA libs, Triton) on the fly.
🐛 Critical Fixes
- PyTorch 1.13+ Compatibility: Patched the worker daemon to handle
TypedStorageserialization changes in newer PyTorch versions, preventing crashes during native IPC. - Deadlock Prevention: Implemented
ThreadPoolExecutorin the daemon manager to allow recursive worker calls (Worker A calling Worker B) without deadlocking the socket. - Lazy Loading: Made
psutilandtorchimports lazy within the daemon to prevent "poisoning" the process with default environment versions before isolation takes effect.
📊 Benchmarks (vs v1.x)
| Metric | v1.x (Hybrid) | v2.0 (Universal) | Speedup |
|---|---|---|---|
| IPC Tensor Handoff | 14ms | 1.5ms | 9.3x |
| Context Switch (Cold) | ~2500ms | ~2500ms | 1.0x |
| Context Switch (Warm) | ~2000ms | ~0.06s | 33x |
| Recursive Depth | 5 levels | Unlimited | ∞ |
📦 Upgrade
# Via pip
pip install --upgrade omnipkg
# Via omnipkg itself
8pkg upgradeWelcome to the Singularity.