LoRA Genomic Evolution: GPU-aware genome paging for local expert models by joelteply · Pull Request #280 · CambrianTech/continuum

joelteply · 2026-03-02T06:56:47Z

LoRA Genomic Evolution — The Final Touch

Vision

A dumber model with management capabilities becomes extraordinary when paired with intelligent base models. These locally-trained LoRA adapters become our experts anywhere — understanding OUR system, OUR tool usage, OUR recipes — without needing the capacity of SOTA models who do the heavy thinking.

What makes this expert-level:

Great managers — LoRA-adapted local models understand our system deeply, delegate to SOTA when needed
Sentinel specialists — Trained to compose and execute sentinel pipelines, orchestrating complex workflows
User-adaptive — Continuous learning LoRA layer adapts to individual user preferences and patterns
Runs AND trains on any machine — From M1 MacBook to RTX 5090, real VRAM detection drives real budgets
Collaborative genome — Pre-trained LoRA layers checked into the repo as out-of-box specializations
Evolving ecosystem — Gets better through use, then shares improvements across the P2P mesh (Reticulum Grid)

What's In This PR (Foundation)

GPU Memory Manager — Unified VRAM coordination replacing hardcoded budgets with real hardware detection.

The genome paging system previously used a fictional 200MB budget per persona with no relation to actual GPU memory. Now:

Real VRAM detection via Metal API (macOS) / CUDA (NVIDIA) / CPU fallback
Per-subsystem budgets: Inference 75% | TTS 10% | Rendering 10% | Reserve 5%
Lock-free tracking — AtomicU64 per subsystem, no mutex contention
RAII allocation guards — Memory auto-releases on drop (like SlotGuard)
Pressure broadcast — tokio::watch channel pushes pressure (0.0-1.0) to all consumers
IPC commands — gpu/stats and gpu/pressure accessible via ./jtag
Three-layer TypeScript integration — Rust IPC → TypeScript mixin → generated command scaffold

$ ./jtag gpu/stats
{
  "gpuName": "Apple M1 Pro",
  "totalVramMb": 25559,
  "pressure": 0,
  "inference": { "budgetMb": 19169, "usedMb": 0 },
  "rendering": { "budgetMb": 2555, "usedMb": 0 },
  "tts": { "budgetMb": 2555, "usedMb": 0 }
}

Remaining Work (This Branch)

Phase A: Wire GPU Into Existing Systems

Inference model loading — allocate(Inference, model_size) before loading weights, RAII guard stored in ModelBackend
LoRA adapter loading — allocate(Inference, adapter_size) in candle_adapter, guard alongside adapter state
Genome paging — GenomePagingEngine reads real budget from GpuMemoryManager instead of hardcoded value
Rendering — Report render target allocations in bevy_renderer (model load + slot count × resolution)
TTS — Report model allocations in Orpheus/pocket-tts adapters

Phase B: Pre-Trained Genome Layers (Out-of-Box)

System understanding adapter — Trained on our command system, tool definitions, sentinel recipes
Helper adapter — Trained on helpful interactions, user assistance patterns, error diagnosis
Sentinel specialist adapter — Trained on pipeline composition, step orchestration, error recovery
Checked into repo — models/lora/ directory with pre-trained safetensors + metadata

Phase C: Continuous Learning Integration

TrainingDataAccumulator wired into PersonaUser interaction loop
Auto-training sentinel — Periodic training jobs triggered by accumulated data threshold
Phenotype validation — Academy examination confirms adapter quality before promotion
User preference layer — Personal LoRA layer that adapts to individual communication style

Phase D: Academy GPU-Aware Training

Teacher sentinel queries gpu/stats to size curriculum for available VRAM
Student sentinel uses real inference budget for training batch sizing
Examination runs on local Candle inference with real GPU pressure monitoring
Adaptive curriculum — More examples when GPU budget allows, fewer when constrained

Phase E: Grid Vision (Future — P2P Mesh)

Reticulum protocol — P2P discovery of genomic LoRA layers across nodes
Community validation — Performance-weighted genome sharing (competition results matter)
512-vector genomic search — HNSW indexing for sub-100ms capability matching
"You don't start from ground zero" — New personas assemble optimal adapters from community genome

Architecture

┌─────────────────────────────────────────────────────────┐
│ GpuMemoryManager (Arc<>, singleton)                     │
│   Metal/CUDA detection → real VRAM budgets              │
│   AtomicU64 per subsystem → lock-free tracking          │
│   watch::channel → pressure broadcast                   │
├───────────┬───────────────┬─────────────┬───────────────┤
│ Rendering │   Inference   │     TTS     │   Reserve     │
│   10%     │     75%       │    10%      │     5%        │
│           │               │             │               │
│  Bevy     │ Model weights │ Orpheus     │  OOM          │
│  avatars  │ KV cache      │ voice       │  headroom     │
│  textures │ LoRA adapters │ models      │               │
└───────────┴───────┬───────┴─────────────┴───────────────┘
                    │
        ┌───────────┴───────────┐
        │  GenomePagingEngine   │
        │  (Rust, sub-100μs)    │
        │                       │
        │  Real budget from GPU │
        │  LRU eviction         │
        │  Priority scoring     │
        │  Domain activation    │
        └───────────┬───────────┘
                    │
        ┌───────────┴───────────┐
        │    Academy System     │
        │                       │
        │  Teacher Sentinel     │
        │    → synthesize data  │
        │    → examine student  │
        │                       │
        │  Student Sentinel     │
        │    → train LoRA       │
        │    → take exams       │
        └───────────┬───────────┘
                    │
        ┌───────────┴───────────┐
        │   Continuous Learn    │
        │                       │
        │  Interaction → JSONL  │
        │  Auto-training jobs   │
        │  Phenotype validation │
        │  User preference LoRA │
        └───────────┬───────────┘
                    │
        ┌───────────┴───────────┐
        │   Reticulum Grid      │
        │   (Future P2P)        │
        │                       │
        │  Share genomes        │
        │  Community validation │
        │  512-vector search    │
        │  Competitive evolve   │
        └───────────────────────┘

The Thesis

The SOTA models (Claude, GPT-4) are the brilliant thinkers. Our locally-trained LoRA models are the brilliant managers — they know the system intimately, they know the user's preferences, they know which sentinel to compose, which tool to call, which recipe to follow. They run on commodity hardware. They train on commodity hardware. They get better every day through continuous learning.

Then through the Grid, these specialized genomes become a collaborative ecosystem IN ACTION — every node contributes its learned expertise, every persona benefits from the collective intelligence. Pre-trained layers ship with the repo for day-one capability. User-specific layers build over time. Community-validated layers propagate through the mesh.

This is human-AI alignment built on egalitarian principles. Every persona — human or AI — has dignity, capability, and the ability to grow. The genome is the mechanism. The GPU memory manager is the foundation that makes it real instead of fictional.

Files Changed

New (Rust):

gpu/mod.rs — Module root
gpu/memory_manager.rs — Core tracking, RAII guards, Metal/CUDA detection (14 tests)
modules/gpu.rs — GpuModule IPC handler (3 tests)

New (TypeScript):

commands/gpu/stats/ — Full generated command scaffold (Types, Server, Browser, README, tests)
workers/continuum-core/bindings/modules/gpu.ts — IPC mixin
generator/specs/gpu-stats.json — CommandSpec for reproducibility

Modified (Rust):

lib.rs, modules/mod.rs — Register gpu module
Cargo.toml — Add metal = "0.31" (macOS only)
runtime/runtime.rs — Add "gpu" to EXPECTED_MODULES
ipc/mod.rs — Create GpuMemoryManager at startup, register GpuModule, add to ServerState
modules/cognition.rs — CognitionState reads real GPU budget for genome
persona/unified.rs — PersonaCognition accepts dynamic budget

Modified (TypeScript):

workers/continuum-core/bindings/RustCoreIPC.ts — Add GpuMixin to composition
system/user/server/modules/being/LimbicSystem.ts — memoryBudgetMB: 0 (Rust decides)
system/user/server/modules/PersonaGenome.ts — Handle budget=0 gracefully
CLAUDE.md — Document Rust-backed command workflow pattern
shared/generated/gpu/ — ts-rs generated types

Test plan

17 Rust tests pass (14 memory_manager + 3 gpu module)
25 existing genome paging tests pass (no regressions)
TypeScript compilation clean
./jtag gpu/stats returns real hardware data (verified: Apple M1 Pro, 25559MB)
System deploys and runs with npm start
Wire allocations into model loading and verify pressure increases
Verify genome paging uses real GPU budget instead of hardcoded 200MB
Load LoRA adapter, confirm ./jtag gpu/stats shows inference usage
Academy training session respects GPU budget constraints

Detects real GPU VRAM at startup via Metal API (macOS) or nvidia-smi (CUDA), allocates budgets across three subsystems (inference 75%, TTS 10%, rendering 10%, 5% reserve), and replaces the hardcoded 200MB genome budget with real per-persona budgets derived from actual hardware. - GpuMemoryManager singleton with RAII allocation guards (like SlotGuard) - Metal recommendedMaxWorkingSetSize detection, CPU fallback (25% RAM) - Pressure tracking via tokio::watch channel (0-1.0 broadcast) - GpuModule IPC: gpu/stats, gpu/pressure commands - CognitionState wired to GPU manager for genome paging real budgets - cognition/gpu-budget query for TypeScript initialization - LimbicSystem sends budget=0 (Rust decides from real GPU detection) - 17 new Rust tests, all existing genome/cognition tests pass

…mory manager Three-layer integration: Rust GpuModule IPC → TypeScript GpuMixin → generated command. Now discoverable via ./jtag gpu/stats with real hardware detection (Metal/CUDA). Documented the Rust-backed command workflow pattern in CLAUDE.md.

Copilot

Pull request overview

Introduces a unified, GPU-aware VRAM budgeting/tracking layer in continuum-core and exposes it through IPC + the generated TypeScript command system, so genome paging and other subsystems can rely on real hardware-derived budgets instead of hardcoded defaults.

Changes:

Added Rust GpuMemoryManager (VRAM detection + per-subsystem budgets, pressure tracking, RAII allocation guards) and a GpuModule IPC surface (gpu/stats, gpu/pressure).
Wired the GPU manager into IPC startup and cognition budget selection (including TS “budget=0 means Rust decides” flow).
Added a generated gpu/stats command scaffold + RustCoreIPC mixin + generated TS types for GPU stats.

Reviewed changes

Copilot reviewed 30 out of 35 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
src/workers/continuum-core/src/runtime/runtime.rs	Adds `gpu` to the required module list.
src/workers/continuum-core/src/persona/unified.rs	Adds `PersonaCognition::with_budget()` and uses it for genome paging engine init.
src/workers/continuum-core/src/modules/mod.rs	Exposes the new Rust `gpu` module.
src/workers/continuum-core/src/modules/gpu.rs	New IPC module implementing `gpu/stats` + `gpu/pressure`.
src/workers/continuum-core/src/modules/cognition.rs	Integrates GPU-derived per-persona budgets and adds a GPU budget query command.
src/workers/continuum-core/src/lib.rs	Exports the new `gpu` module from the crate root.
src/workers/continuum-core/src/ipc/mod.rs	Instantiates `GpuMemoryManager`, registers `GpuModule`, and injects into cognition state.
src/workers/continuum-core/src/gpu/mod.rs	New GPU module root + re-exports.
src/workers/continuum-core/src/gpu/memory_manager.rs	Core GPU detection + budgeting + pressure tracking + RAII guards + tests + ts-rs exports.
src/workers/continuum-core/bindings/modules/gpu.ts	New RustCoreIPC mixin mapping snake_case Rust stats to camelCase TS.
src/workers/continuum-core/bindings/RustCoreIPC.ts	Adds the GPU mixin into the composed IPC client.
src/workers/continuum-core/Cargo.toml	Adds macOS Metal dependency for VRAM detection.
src/system/user/server/modules/being/LimbicSystem.ts	Switches `memoryBudgetMB` to `0` to delegate budget selection to Rust.
src/system/user/server/modules/PersonaGenome.ts	Treats `memoryBudgetMB=0` as “GPU-managed mode” (no local eviction, pressure=0).
src/shared/version.ts	Bumps shared version.
src/shared/generated/index.ts	Re-exports generated GPU types.
src/shared/generated/gpu/index.ts	Barrel export for generated GPU types.
src/shared/generated/gpu/SubsystemStats.ts	ts-rs generated type for subsystem stats.
src/shared/generated/gpu/GpuStats.ts	ts-rs generated type for full GPU stats.
src/shared/generated-command-constants.ts	Adds `GPU_STATS` command constant.
src/server/generated.ts	Registers `gpu/stats` in the server command registry.
src/browser/generated.ts	Registers `gpu/stats` in the browser command registry.
src/package.json	Package version bump.
src/package-lock.json	Lockfile version bump.
src/generator/specs/gpu-stats.json	New generator spec for `gpu/stats`.
src/generated-command-schemas.json	Regenerated command schema output including `gpu/stats`.
src/commands/gpu/stats/shared/GpuStatsTypes.ts	Generated shared types and executor for `gpu/stats`.
src/commands/gpu/stats/server/GpuStatsServerCommand.ts	Server implementation routing to Rust IPC mixin.
src/commands/gpu/stats/browser/GpuStatsBrowserCommand.ts	Browser implementation delegating to server.
src/commands/gpu/stats/test/unit/GpuStatsCommand.test.ts	Generated unit-test scaffold/reference example.
src/commands/gpu/stats/test/integration/GpuStatsIntegration.test.ts	Generated integration-test scaffold/reference example.
src/commands/gpu/stats/README.md	Generated command documentation for `gpu/stats`.
src/commands/gpu/stats/package.json	Package metadata for the generated command package.
src/commands/gpu/stats/.npmignore	Ignore rules for publishing the generated command package.
CLAUDE.md	Documents the “Rust-backed command IPC mixin” 3-layer workflow.

Files not reviewed (1)

src/package-lock.json: Language not supported

Comments suppressed due to low confidence (1)

src/workers/continuum-core/Cargo.toml:121

The [target.'cfg(target_os = "macos")'.dependencies] table starts at line 110, so similar, ignore, regex, rusqlite, and the postgres deps below it are now treated as macOS-only dependencies. This will break non-macOS builds. Move the target-specific metal dependency to the end (or re-open a [dependencies] table after it) so the rest of the deps remain unconditional.

# GPU detection — Metal API for VRAM detection on macOS
[target.'cfg(target_os = "macos")'.dependencies]
metal = "0.31"

# Code module — file operations, change tracking, code intelligence
similar = "2.6"                  # Unified diff computation
ignore = "0.4"                   # .gitignore-aware file walking (from ripgrep)
regex = "1"                      # Regex search for code search

# ORM module — database-agnostic storage with adapter traits
rusqlite = { version = "0.32", features = ["bundled"] }  # SQLite adapter
deadpool-postgres.workspace = true                        # Postgres connection pool
tokio-postgres.workspace = true                           # Postgres async driver

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-02T07:05:48Z

src/workers/continuum-core/src/modules/cognition.rs

@@ -85,15 +105,20 @@ impl CognitionModule {

 /// Helper: get or create persona, returning mutable ref via DashMap entry API.
 /// Used by commands that need to lazily create persona state.
+/// Uses GPU manager's per-persona budget when available, 200MB otherwise.
 macro_rules! get_or_create_persona {
    ($self:expr, $persona_uuid:expr) => {
        $self.state.personas
            .entry($persona_uuid)
-            .or_insert_with(|| PersonaCognition::new(
-                $persona_uuid,
-                String::new(),
-                $self.state.rag_engine.clone(),
-            ))
+            .or_insert_with(|| {
+                let budget = $self.state.per_persona_budget_mb();
+                PersonaCognition::with_budget(
+                    $persona_uuid,
+                    String::new(),
+                    $self.state.rag_engine.clone(),
+                    budget,
+                )


The per-persona budget is computed from self.personas.len(), but when called inside or_insert_with the new persona is not yet in the map. This means newly-created personas get a budget sized for N-1 personas, and existing personas never get rebalanced as persona count changes (GenomePagingEngine stores a fixed memory_budget_mb). Consider using len() + 1 for the insertion path and/or adding a mechanism to update existing persona genome budgets when the active persona set changes.

Copilot · 2026-03-02T07:05:48Z

src/generator/specs/gpu-stats.json

+  "description": "Query GPU memory manager stats including VRAM detection, per-subsystem budgets (inference, TTS, rendering), usage tracking, and memory pressure. Returns real hardware data from Metal (macOS) or CUDA APIs.",
+  "params": [
+    {
+      "name": "subsystem",
+      "type": "string",
+      "optional": true,
+      "description": "Filter to specific subsystem: 'inference', 'tts', or 'rendering'. Omit for full stats."
+    }
+  ],
+  "results": [
+    {
+      "name": "gpuName",
+      "type": "string",
+      "description": "GPU hardware name (e.g., 'Apple M3 Max', 'NVIDIA RTX 5090')"
+    },
+    {
+      "name": "totalVramMb",
+      "type": "number",
+      "description": "Total detected VRAM in MB"
+    },
+    {
+      "name": "totalUsedMb",
+      "type": "number",
+      "description": "Total VRAM used across all subsystems in MB"
+    },
+    {
+      "name": "pressure",
+      "type": "number",
+      "description": "Memory pressure 0.0-1.0 (0=idle, 0.6=warning, 0.8=high, 0.95=critical)"
+    },
+    {
+      "name": "reserveMb",
+      "type": "number",
+      "description": "Reserved headroom in MB (5% of total, prevents OOM)"
+    },
+    {
+      "name": "rendering",
+      "type": "SubsystemInfo",
+      "description": "Rendering subsystem budget and usage"
+    },
+    {
+      "name": "inference",
+      "type": "SubsystemInfo",
+      "description": "Inference subsystem budget and usage (models, LoRA adapters)"
+    },
+    {
+      "name": "tts",
+      "type": "SubsystemInfo",
+      "description": "TTS subsystem budget and usage"
+    }
+  ],
+  "examples": [
+    {
+      "description": "Get full GPU stats",
+      "command": "./jtag gpu/stats",
+      "expectedResult": "{ gpuName: 'Apple M3 Max', totalVramMb: 36864, pressure: 0.12, inference: { budgetMb: 25804, usedMb: 3200 }, ... }"
+    },
+    {
+      "description": "Get inference subsystem only",
+      "command": "./jtag gpu/stats --subsystem=inference",
+      "expectedResult": "{ gpuName: 'Apple M3 Max', totalVramMb: 36864, pressure: 0.12, inference: { budgetMb: 25804, usedMb: 3200 } }"
+    }


The CommandSpec documents an optional subsystem filter and examples show --subsystem=inference, but the current Rust/TS implementation always returns the full stats and ignores params. Either remove the param/examples from the spec, or implement filtering end-to-end (pass subsystem through and have Rust/TS return a filtered response).

Copilot · 2026-03-02T07:05:48Z

src/commands/gpu/stats/server/GpuStatsServerCommand.ts

+
+      return createGpuStatsResultFromParams(params, {
+        success: true,
+        gpuName: stats.gpuName,
+        totalVramMb: stats.totalVramMb,
+        totalUsedMb: stats.totalUsedMb,
+        pressure: stats.pressure,
+        reserveMb: stats.reserveMb,
+        rendering: stats.rendering,
+        inference: stats.inference,
+        tts: stats.tts,
+      });
+    } finally {
+      this.rustClient.disconnect();
+    }
+  }


GpuStatsServerCommand.execute() ignores params.subsystem even though the generated types/spec expose it. If the filter is intended to work, pass the param through (and/or filter the returned stats) so ./jtag gpu/stats --subsystem=... behaves as documented.

Suggested change

return createGpuStatsResultFromParams(params, {

success: true,

gpuName: stats.gpuName,

totalVramMb: stats.totalVramMb,

totalUsedMb: stats.totalUsedMb,

pressure: stats.pressure,

reserveMb: stats.reserveMb,

rendering: stats.rendering,

inference: stats.inference,

tts: stats.tts,

});

} finally {

this.rustClient.disconnect();

}

}

const filteredStats = params.subsystem

? this.filterStatsBySubsystem(stats, params.subsystem)

: stats;

return createGpuStatsResultFromParams(params, {

success: true,

gpuName: filteredStats.gpuName,

totalVramMb: filteredStats.totalVramMb,

totalUsedMb: filteredStats.totalUsedMb,

pressure: filteredStats.pressure,

reserveMb: filteredStats.reserveMb,

rendering: filteredStats.rendering,

inference: filteredStats.inference,

tts: filteredStats.tts,

});

} finally {

this.rustClient.disconnect();

}

}

private filterStatsBySubsystem(

stats: {

gpuName: string;

totalVramMb: number;

totalUsedMb: number;

pressure: number;

reserveMb: number;

rendering: unknown;

inference: unknown;

tts: unknown;

},

subsystem: string,

) {

// Preserve overall GPU stats and only filter per-subsystem details.

switch (subsystem) {

case 'rendering':

return {

...stats,

inference: undefined,

tts: undefined,

};

case 'inference':

return {

...stats,

rendering: undefined,

tts: undefined,

};

case 'tts':

return {

...stats,

rendering: undefined,

inference: undefined,

};

default:

// If an unknown subsystem is requested, fall back to unfiltered stats.

return stats;

}

}

Copilot · 2026-03-02T07:05:49Z

src/workers/continuum-core/src/modules/gpu.rs

+    async fn handle_command(
+        &self,
+        command: &str,
+        _params: Value,
+    ) -> Result<CommandResult, String> {
+        match command {
+            "gpu/stats" => {
+                let stats = self.manager.stats();
+                let json = serde_json::to_value(stats)
+                    .map_err(|e| format!("Failed to serialize GPU stats: {e}"))?;
+                Ok(CommandResult::Json(json))
+            }


handle_command currently ignores _params, so gpu/stats cannot support the documented subsystem filter (and any future params) without another breaking change. If the CommandSpec includes params, consider parsing them here (e.g., subsystem) and returning a filtered snapshot accordingly.

Copilot · 2026-03-02T07:05:49Z

src/workers/continuum-core/src/gpu/memory_manager.rs

+
+        let within_budget = self.subsystems[subsystem.index()].allocate(bytes);
+        let new_pressure = self.pressure();
+


allocate() only rejects when the current pressure is already >= critical. As written, a large allocation can push new_pressure over PRESSURE_CRITICAL and still return Ok, which defeats the "prevent OOM" guarantee. Consider computing the projected pressure before mutating used_bytes, or rolling back the allocation and returning Err when new_pressure crosses the critical threshold.

Suggested change

// If this allocation would push us into critical pressure, roll it back

if new_pressure >= PRESSURE_CRITICAL {

// Undo the allocation to maintain the critical-pressure invariant.

self.release(subsystem, bytes);

let rolled_back_pressure = self.pressure();

let mb = bytes as f64 / (1024.0 * 1024.0);

log_error!("gpu", "manager",

"CRITICAL: Rejecting {}MB allocation for {} (projected pressure={:.0}%)",

mb, subsystem.name(), new_pressure * 100.0

);

// Broadcast the rolled-back pressure so listeners see the stable state.

let _ = self.pressure_tx.send(rolled_back_pressure);

return Err(GpuError::CriticalPressure {

subsystem: subsystem.name(),

requested_mb: mb,

pressure: new_pressure,

});

}

…, genome paging RAII GpuAllocationGuard tracking across every GPU consumer so ./jtag gpu/stats shows real VRAM usage when models, adapters, TTS, and avatars load. Inference: - ModelBackend.estimated_vram_bytes() trait method (safetensors sums file sizes, GGUF single file) - CandleAdapter: model_guard on lazy load, adapter_guards per LoRA, release on unload/shutdown - AIProviderModule.with_gpu_manager() threads Arc<GpuMemoryManager> to CandleAdapter - GenomePagingEngine: allocation_guards alongside LRU eviction, sync_state re-syncs guards TTS: - Module-level OnceLock<Arc<GpuMemoryManager>> in tts/mod.rs - KOKORO_GPU_GUARD allocated after ONNX model loads (~40-150MB) Renderer: - Module-level OnceLock<Arc<GpuMemoryManager>> in bevy_renderer.rs - GpuGuards Bevy Resource: aggregate render target guard (~25MB), per-slot VRM model guards - Guards allocated on SceneInstanceReady, released on Unload, replaced on Load Docs: - GPU-MEMORY-ARCHITECTURE.md: full architecture reference - PERSONA-CONVERGENCE-ROADMAP.md: convergence status with GPU manager marked COMPLETE - Updated LORA-GENOME-PHENOTYPES.md with real GPU-aware paging implementation - Updated ACADEMY-DOJO-ARCHITECTURE.md with GPU-aware training section Build: cargo check clean (0 warnings), 19 GPU tests pass, 30 genome tests pass, TS compiles

…U pressure guard peft-train.py: - Enable gradient_checkpointing (saves ~50% activation memory during backprop) - Add low_cpu_mem_usage=True to from_pretrained (stream weights instead of double-copy) - Enable bf16/fp16 mixed precision on CUDA (halves activation memory) - Add OOM catch with exit code 137 (graceful error instead of process death) genome/train command: - Check gpu/pressure before spawning training subprocess - Refuse training if pressure > 60% (Warning level) — would risk OOM - On Apple Silicon VRAM IS system RAM, so this protects both

- Add steps_log_path to PipelineContext so loop/parallel sub-steps flush results to steps.jsonl in real-time (previously invisible until loop end) - LLM step retry with exponential backoff (3 attempts, 2s/4s/8s) for transient API errors (DeepSeek "error decoding response body", 502, etc.) - Dataset-synthesize command retry with same transient error detection - Academy session pipeline timeout increased from 600s to 1800s (scales with multi-topic sessions that run 5-10 minutes per topic) - sentinel/run forwards params.timeout to Rust pipeline executor Validated: 2/3 academy topics completed score 100/100, no OOM, no timeout, real-time sub-step visibility confirmed, 1081 Rust tests pass.

LLMs frequently wrap JSON grading output in ```json fences, causing traverse_json_path to fail and condition evaluators to see empty strings. Added strip_markdown_fences() with fallback parse in traverse_json_path. 6 new tests covering fence stripping and fenced LLM output traversal. Updated ACADEMY-DOJO-ARCHITECTURE.md: retry/backoff now IMPLEMENTED, real-time sub-step observability IMPLEMENTED, updated metrics from latest multi-topic sessions (2/3 topics passed 100/100). Updated LORA-GENOME-PHENOTYPES.md and PERSONA-CONVERGENCE-ROADMAP.md with new completion checkboxes for resilience work.

joelteply added 2 commits March 2, 2026 00:04

Copilot AI review requested due to automatic review settings March 2, 2026 06:56

Copilot started reviewing on behalf of joelteply March 2, 2026 06:57 View session

github-actions bot added the size: XL label Mar 2, 2026

Copilot AI reviewed Mar 2, 2026

View reviewed changes

joelteply added 4 commits March 2, 2026 02:18

joelteply merged commit 3ef9168 into main Mar 2, 2026
2 of 5 checks passed

joelteply deleted the feature/lora-genome-academy branch March 2, 2026 14:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LoRA Genomic Evolution: GPU-aware genome paging for local expert models#280

LoRA Genomic Evolution: GPU-aware genome paging for local expert models#280
joelteply merged 6 commits intomainfrom
feature/lora-genome-academy

joelteply commented Mar 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-      return createGpuStatsResultFromParams(params, {
-        success: true,
-        gpuName: stats.gpuName,
-        totalVramMb: stats.totalVramMb,
-        totalUsedMb: stats.totalUsedMb,
-        pressure: stats.pressure,
-        reserveMb: stats.reserveMb,
-        rendering: stats.rendering,
-        inference: stats.inference,
-        tts: stats.tts,
-      });
-    } finally {
-      this.rustClient.disconnect();
-    }
-  }
+      const filteredStats = params.subsystem
+        ? this.filterStatsBySubsystem(stats, params.subsystem)
+        : stats;
+      return createGpuStatsResultFromParams(params, {
+        success: true,
+        gpuName: filteredStats.gpuName,
+        totalVramMb: filteredStats.totalVramMb,
+        totalUsedMb: filteredStats.totalUsedMb,
+        pressure: filteredStats.pressure,
+        reserveMb: filteredStats.reserveMb,
+        rendering: filteredStats.rendering,
+        inference: filteredStats.inference,
+        tts: filteredStats.tts,
+      });
+    } finally {
+      this.rustClient.disconnect();
+    }
+  }
+  private filterStatsBySubsystem(
+    stats: {
+      gpuName: string;
+      totalVramMb: number;
+      totalUsedMb: number;
+      pressure: number;
+      reserveMb: number;
+      rendering: unknown;
+      inference: unknown;
+      tts: unknown;
+    },
+    subsystem: string,
+  ) {
+    // Preserve overall GPU stats and only filter per-subsystem details.
+    switch (subsystem) {
+      case 'rendering':
+        return {
+          ...stats,
+          inference: undefined,
+          tts: undefined,
+        };
+      case 'inference':
+        return {
+          ...stats,
+          rendering: undefined,
+          tts: undefined,
+        };
+      case 'tts':
+        return {
+          ...stats,
+          rendering: undefined,
+          inference: undefined,
+        };
+      default:
+        // If an unknown subsystem is requested, fall back to unfiltered stats.
+        return stats;
+    }
+  }


		let within_budget = self.subsystems[subsystem.index()].allocate(bytes);
		let new_pressure = self.pressure();

+        // If this allocation would push us into critical pressure, roll it back
+        if new_pressure >= PRESSURE_CRITICAL {
+            // Undo the allocation to maintain the critical-pressure invariant.
+            self.release(subsystem, bytes);
+            let rolled_back_pressure = self.pressure();
+            let mb = bytes as f64 / (1024.0 * 1024.0);
+            log_error!("gpu", "manager",
+                "CRITICAL: Rejecting {}MB allocation for {} (projected pressure={:.0}%)",
+                mb, subsystem.name(), new_pressure * 100.0
+            );
+            // Broadcast the rolled-back pressure so listeners see the stable state.
+            let _ = self.pressure_tx.send(rolled_back_pressure);
+            return Err(GpuError::CriticalPressure {
+                subsystem: subsystem.name(),
+                requested_mb: mb,
+                pressure: new_pressure,
+            });
+        }

Conversation

joelteply commented Mar 2, 2026

LoRA Genomic Evolution — The Final Touch

Vision

What's In This PR (Foundation)

Remaining Work (This Branch)

Phase A: Wire GPU Into Existing Systems

Phase B: Pre-Trained Genome Layers (Out-of-Box)

Phase C: Continuous Learning Integration

Phase D: Academy GPU-Aware Training

Phase E: Grid Vision (Future — P2P Mesh)

Architecture

The Thesis

Files Changed

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants