Accepted
2026-02-25
Training graph transformers involves thousands of gradient steps, each of which modifies model weights. In safety-critical applications, we need guarantees that training did not introduce pathological behavior: unbounded loss spikes, conservation law violations, equivariance breakage, or adversarial vulnerability. Post-hoc auditing of trained models is expensive and often misses subtle training-time regressions.
The RuVector workspace provides the building blocks for verified training:
ruvector-gnnprovidesOptimizer(SGD, Adam),ElasticWeightConsolidation(EWC),LearningRateScheduler,ReplayBuffer, and a training loop withTrainConfigincrates/ruvector-gnn/src/training.rsruvector-verifiedprovidesProofEnvironment,ProofAttestation(82 bytes),FastTermArenafor high-throughput proof allocation, and tiered verification viaProofTierruvector-coherenceprovidesSpectralCoherenceScoreandSpectralTracker(behindspectralfeature) for monitoring model quality during trainingruvector-mincut-gated-transformerprovidesEnergyGateincrates/ruvector-mincut-gated-transformer/src/energy_gate.rsfor energy-based decision making
However, there is no mechanism for issuing per-step invariant proofs during training, no TrainingCertificate that attests to the training run's integrity, and no integration between the proof system and the gradient update loop.
We will implement a verified_training module in ruvector-graph-transformer that wraps ruvector-gnn's training infrastructure with proof gates, producing per-step invariant proofs and a final TrainingCertificate that attests to the entire training run.
/// A training wrapper that issues proof attestations per gradient step.
///
/// Wraps ruvector_gnn::training::Optimizer and composes with
/// ruvector_verified::ProofEnvironment for per-step invariant verification.
pub struct VerifiedTrainer {
/// The underlying GNN optimizer (SGD or Adam).
optimizer: Optimizer,
/// EWC for continual learning (optional).
ewc: Option<ElasticWeightConsolidation>,
/// Learning rate scheduler.
scheduler: LearningRateScheduler,
/// Proof environment for generating attestations.
proof_env: ProofEnvironment,
/// Fast arena for high-throughput proof allocation.
arena: FastTermArena,
/// Per-step invariant specifications.
invariants: Vec<TrainingInvariant>,
/// Accumulated attestations for the training run.
ledger: MutationLedger,
/// Configuration.
config: VerifiedTrainerConfig,
}Each gradient step is bracketed by invariant checks. The TrainingInvariant enum defines what is verified:
pub enum TrainingInvariant {
/// Loss stability: loss stays within a bounded envelope relative to
/// a moving average. Raw loss is NOT monotonic in SGD — this invariant
/// captures what is actually enforceable: bounded deviation from trend.
///
/// **This is a true invariant**, not a heuristic: the proof certifies
/// that loss_t <= moving_avg(loss, window) * (1 + spike_cap).
LossStabilityBound {
/// Maximum spike relative to moving average (e.g., 0.10 = 10% above MA).
spike_cap: f64,
/// Window size for exponential moving average.
window: usize,
/// Gradient norm cap: reject step if ||grad|| > this value.
max_gradient_norm: f64,
/// Step size cap: reject step if effective lr * ||grad|| > this value.
max_step_size: f64,
},
/// Weight norm conservation: ||W_t|| stays within bounds per layer.
/// Prevents gradient explosion/vanishing.
///
/// Rollback strategy: **delta-apply** — gradients are applied to a
/// scratch buffer, norms checked, then committed only if bounds hold.
/// This avoids doubling peak memory via full snapshots.
WeightNormBound {
/// Maximum L2 norm per layer.
max_norm: f64,
/// Minimum L2 norm per layer (prevents collapse).
min_norm: f64,
/// Rollback strategy.
rollback: RollbackStrategy,
},
/// Equivariance: model output is equivariant to graph permutations.
/// **This is a statistical test, not a formal proof.** The certificate
/// records the exact scope: rng seed, sample count, permutation ID hashes.
/// A verifier can replay the exact same permutations to confirm.
PermutationEquivariance {
/// Number of random permutations to test per check.
samples: usize,
/// Maximum allowed deviation (L2 distance / output norm).
max_deviation: f64,
/// RNG seed for reproducibility. Bound into the proof scope.
rng_seed: u64,
},
/// Lipschitz bound: **estimated** Lipschitz constant stays below threshold.
/// Verified per-layer via spectral norm power iteration.
///
/// **Attestation scope:** The certificate records that the estimated bound
/// (via K power iterations with tolerance eps) stayed below max_lipschitz.
/// This does NOT certify the true Lipschitz constant — it certifies
/// that the estimate with stated parameters was within bounds.
LipschitzBound {
/// Maximum Lipschitz constant per layer.
max_lipschitz: f64,
/// Power iteration steps for spectral norm estimation.
power_iterations: usize,
/// Convergence tolerance for power iteration.
tolerance: f64,
},
/// Coherence: spectral coherence score stays above threshold.
/// Uses ruvector-coherence::spectral::SpectralCoherenceScore.
///
/// **Attestation scope:** Like Lipschitz, this is an estimate based on
/// sampled eigenvalues. The certificate records the estimation parameters.
CoherenceBound {
/// Minimum coherence score.
min_coherence: f64,
/// Number of eigenvalue samples for estimation.
eigenvalue_samples: usize,
},
/// Energy gate: compute energy or coherence proxy BEFORE applying
/// gradients. If below threshold, require a stronger proof tier,
/// reduce learning rate, or refuse the step entirely.
///
/// Integrates with ruvector-mincut-gated-transformer::EnergyGate
/// to make training behave like inference gating.
EnergyGate {
/// Minimum energy threshold for standard-tier step.
min_energy: f64,
/// If energy < min_energy, force this tier for verification.
escalation_tier: ProofTier,
/// If energy < critical_energy, refuse the step entirely.
critical_energy: f64,
},
/// Custom invariant with a user-provided verification function.
Custom {
/// Name for logging and attestation.
name: String,
/// Estimated proof complexity (for tier routing).
complexity: u32,
},
}
/// Rollback strategy for failed invariant checks.
pub enum RollbackStrategy {
/// Apply gradients to a scratch buffer, check invariants, then commit.
/// Peak memory: weights + one layer's gradients. No full snapshot.
DeltaApply,
/// Store per-layer deltas, revert only modified layers on failure.
/// Peak memory: weights + delta buffer (typically < 10% of weights).
ChunkedRollback,
/// Full snapshot (doubles peak memory). Use only when other strategies
/// are insufficient (e.g., cross-layer invariants).
FullSnapshot,
}impl VerifiedTrainer {
/// Execute one verified training step.
///
/// 1. Compute gradients via the underlying optimizer
/// 2. Before applying gradients, verify pre-step invariants
/// 3. Apply gradients
/// 4. Verify post-step invariants
/// 5. Issue attestation for this step
/// 6. If any invariant fails, roll back gradients and return error
pub fn step(
&mut self,
loss: f64,
gradients: &Gradients,
weights: &mut Weights,
) -> Result<StepAttestation> {
// 1. Pre-step: verify gradient bounds and loss stability
let pre_proofs = self.verify_invariants(
InvariantPhase::PreStep,
loss, weights,
)?;
// 2. Energy gate: compute energy/coherence proxy BEFORE mutation.
// If below threshold, escalate proof tier or refuse step.
if let Some(energy_gate) = &self.energy_gate {
let energy = energy_gate.evaluate(weights, gradients);
if energy < energy_gate.critical_energy {
return Err(GraphTransformerError::MutationRejected {
reason: format!("energy {} < critical {}", energy, energy_gate.critical_energy),
});
}
if energy < energy_gate.min_energy {
// Force escalation to stronger proof tier
self.current_tier_override = Some(energy_gate.escalation_tier);
}
}
// 3. Apply gradients via delta-apply strategy (default).
// Gradients go into a scratch buffer, not directly into weights.
let delta = self.optimizer.compute_delta(gradients, weights)?;
// 4. Post-step verification on proposed (weights + delta).
// No mutation has occurred yet.
match self.verify_invariants_on_proposed(
InvariantPhase::PostStep, loss, weights, &delta
) {
Ok(post_proofs) => {
// 5. Commit: apply delta to actual weights.
weights.apply_delta(&delta);
// 6. Compose attestation and append to ledger.
let attestation = self.compose_step_attestation(
pre_proofs, post_proofs,
);
self.ledger.append(attestation.clone());
self.scheduler.step();
self.current_tier_override = None;
Ok(StepAttestation {
step: self.ledger.len() as u64,
attestation,
loss,
invariants_checked: self.invariants.len(),
overridden: false,
})
}
Err(e) if self.config.allow_override => {
// Degraded mode: step proceeds with OverrideProof.
// The override is visible in the certificate.
let override_proof = self.create_override_proof(&e)?;
weights.apply_delta(&delta);
self.ledger.append(override_proof.clone());
self.override_count += 1;
Ok(StepAttestation {
step: self.ledger.len() as u64,
attestation: override_proof,
loss,
invariants_checked: self.invariants.len(),
overridden: true,
})
}
Err(e) => {
// Fail-closed: delta is discarded, weights unchanged.
// Refusal is recorded in the ledger.
let refusal = self.create_refusal_attestation(&e);
self.ledger.append(refusal);
Err(e)
}
}
}
}Training invariant verification uses the same three-tier routing as ADR-047:
| Invariant | Typical Tier | Rationale | Formally Proven? |
|---|---|---|---|
LossStabilityBound |
Reflex | Moving avg comparison + gradient norm check, < 10 ns | Yes — bounded comparison |
WeightNormBound |
Standard(100) | L2 norm computation, < 1 us | Yes — exact computation |
PermutationEquivariance |
Deep | Random permutation + forward pass, < 100 us | No — statistical test with bound scope |
LipschitzBound |
Standard(500) | Power iteration spectral norm, < 10 us | No — estimate with stated tolerance |
CoherenceBound |
Standard(200) | Spectral coherence from sampled eigenvalues, < 5 us | No — estimate with stated sample count |
EnergyGate |
Reflex/Standard | Energy proxy evaluation, < 100 ns | Yes — threshold comparison |
Custom |
Routed by complexity field |
User-defined | Depends on implementation |
Distinction between proven and estimated invariants: The certificate explicitly records which invariants are formally proven (exact computation within the proof system) and which are statistical estimates with bound scope (rng_seed, sample_count, iterations, tolerance). A verifier knows exactly what was tested and can replay it.
The routing decision is made by converting each TrainingInvariant into a ProofKind and calling ruvector_verified::gated::route_proof. For example, LossStabilityBound maps to ProofKind::DimensionEquality (literal comparison), while PermutationEquivariance maps to ProofKind::Custom { estimated_complexity: samples * 100 }.
For models that require adversarial robustness certification, the verified_training module provides an IBP (Interval Bound Propagation) / DeepPoly integration:
pub struct RobustnessCertifier {
/// Perturbation radius (L-infinity norm).
epsilon: f64,
/// Certification method.
method: CertificationMethod,
}
pub enum CertificationMethod {
/// Interval Bound Propagation -- fast but loose.
IBP,
/// DeepPoly -- tighter but slower.
DeepPoly,
/// Combined: IBP for initial bound, DeepPoly for refinement.
Hybrid { ibp_warmup_epochs: usize },
}
impl RobustnessCertifier {
/// Certify that the model's output is stable within epsilon-ball.
/// Returns a ProofGate<RobustnessCertificate> with the certified radius.
pub fn certify(
&self,
model: &GraphTransformer<impl GraphRepr>,
input: &GraphBatch,
env: &mut ProofEnvironment,
) -> Result<ProofGate<RobustnessCertificate>>;
}
pub struct RobustnessCertificate {
/// Certified perturbation radius.
pub certified_radius: f64,
/// Fraction of nodes certified robust.
pub certified_fraction: f64,
/// Method used.
pub method: CertificationMethod,
/// Attestation.
pub attestation: ProofAttestation,
}At the end of a training run, a TrainingCertificate is produced by composing all step attestations:
pub struct TrainingCertificate {
/// Total training steps completed.
pub total_steps: u64,
/// Total invariant violations (zero if fully verified).
pub violations: u64,
/// Number of steps that proceeded via OverrideProof (degraded mode).
pub overridden_steps: u64,
/// Composed attestation over all steps via compose_chain.
pub attestation: ProofAttestation,
/// Final loss value.
pub final_loss: f64,
/// Final coherence score (if CoherenceBound invariant was active).
pub final_coherence: Option<f64>,
/// Robustness certificate (if adversarial certification was run).
pub robustness: Option<RobustnessCertificate>,
/// Epoch at which the certificate was sealed.
pub epoch: u64,
/// Per-invariant statistics.
pub invariant_stats: Vec<InvariantStats>,
// --- Artifact binding (hardening move #7) ---
/// BLAKE3 hash of the final model weights. Binds certificate to
/// the exact model artifact. Cannot be separated.
pub weights_hash: [u8; 32],
/// BLAKE3 hash of the VerifiedTrainerConfig (serialized).
pub config_hash: [u8; 32],
/// BLAKE3 hash of the dataset manifest (or RVF manifest root).
/// None if no dataset manifest was provided.
pub dataset_manifest_hash: Option<[u8; 32]>,
/// BLAKE3 hash of the code (build hash / git commit).
/// None if not provided.
pub code_build_hash: Option<[u8; 32]>,
}
pub struct InvariantStats {
/// Invariant name.
pub name: String,
/// Whether this invariant is formally proven or a statistical estimate.
pub proof_class: ProofClass,
/// Number of times checked.
pub checks: u64,
/// Number of times satisfied.
pub satisfied: u64,
/// Number of times overridden (degraded mode).
pub overridden: u64,
/// Average verification latency.
pub avg_latency_ns: u64,
/// Proof tier distribution: [reflex_count, standard_count, deep_count].
pub tier_distribution: [u64; 3],
}
pub enum ProofClass {
/// Formally proven: exact computation within the proof system.
Formal,
/// Statistical estimate with bound scope. Certificate records
/// the estimation parameters (rng_seed, iterations, tolerance).
Statistical {
rng_seed: Option<u64>,
iterations: usize,
tolerance: f64,
},
}
impl VerifiedTrainer {
/// Seal the training run and produce a certificate.
///
/// 1. Compacts the mutation ledger (proof-gated: compaction itself
/// produces a composed attestation + witness that the compacted
/// chain corresponds exactly to the original sequence).
/// 2. Computes BLAKE3 hashes of weights, config, and optional manifests.
/// 3. Composes all attestations into the final certificate.
///
/// The sealed certificate is a product artifact: verifiable by
/// third parties without trusting training logs.
pub fn seal(self, weights: &Weights) -> TrainingCertificate;
}The target is proof overhead < 5% of training step time. For a typical GNN training step of ~10 ms (on CPU):
LossMonotonicity(Reflex): < 10 ns = 0.0001%WeightNormBound(Standard): < 1 us = 0.01%LipschitzBound(Standard): < 10 us = 0.1%CoherenceBound(Standard): < 5 us = 0.05%PermutationEquivariance(Deep, sampled): < 100 us = 1%- Attestation composition: < 1 us = 0.01%
- Total: < 120 us = 1.2% (well within 5% budget)
For GPU-accelerated training (step time ~1 ms), PermutationEquivariance with many samples may exceed 5%. Mitigation: reduce sample count or check equivariance every N steps (configurable via check_interval in VerifiedTrainerConfig).
The VerifiedTrainer composes with ruvector-gnn's continual learning primitives:
pub struct VerifiedTrainerConfig {
/// Optimizer type (from ruvector-gnn).
pub optimizer: OptimizerType,
/// EWC lambda (0.0 = disabled). Uses ruvector_gnn::ElasticWeightConsolidation.
pub ewc_lambda: f64,
/// Replay buffer size (0 = disabled). Uses ruvector_gnn::ReplayBuffer.
pub replay_buffer_size: usize,
/// Scheduler type (from ruvector-gnn).
pub scheduler: SchedulerType,
/// Invariants to verify per step.
pub invariants: Vec<TrainingInvariant>,
/// Check interval for expensive invariants (e.g., equivariance).
/// Cheap invariants (Reflex tier) run every step.
pub expensive_check_interval: usize,
/// Warmup steps during which invariant violations are logged but
/// do not trigger rollback. After warmup, fail-closed applies.
pub warmup_steps: usize,
/// Robustness certification config (None = disabled).
pub robustness: Option<RobustnessCertifier>,
/// Energy gate config (None = disabled).
/// If enabled, energy is evaluated before every gradient application.
pub energy_gate: Option<EnergyGateConfig>,
/// Default rollback strategy for invariant failures.
pub rollback_strategy: RollbackStrategy,
/// Allow degraded mode: if true, failed invariant checks produce
/// an OverrideProof and increment a visible violation counter
/// instead of stopping the step. Default: false (fail-closed).
pub allow_override: bool,
/// Optional dataset manifest hash for binding to the certificate.
pub dataset_manifest_hash: Option<[u8; 32]>,
/// Optional code build hash for binding to the certificate.
pub code_build_hash: Option<[u8; 32]>,
}When EWC is enabled, the WeightNormBound invariant is automatically adjusted to account for the EWC penalty term. When the replay buffer is active, replayed samples also go through invariant verification.
- Every training run produces a
TrainingCertificatebound to the exact model weights via BLAKE3 hash — portable, verifiable by third parties without trusting logs - Per-step invariant proofs catch regressions immediately — loss spikes, norm explosions, equivariance breaks become training-stopping events, not evaluation surprises
- Clear distinction between formally proven invariants and statistical estimates — the certificate is defensible because it states exactly what was proven and what was estimated
- EnergyGate integration makes training behave like inference gating — consistent proof-gated mutation across the full lifecycle
- Delta-apply rollback strategy avoids doubling peak memory while preserving proof-gated semantics
- Fail-closed by default with explicit OverrideProof for degraded mode — violations are visible, not silent
PermutationEquivarianceis a statistical test, not a formal proof — the certificate is honest about this, but it means equivariance is not guaranteed, only tested with bound scopeLipschitzBoundvia power iteration is an estimate — the certificate attests the estimate was within bounds, not the true Lipschitz constant- The
TrainingCertificateis only as strong as the invariants specified — missing invariants are not caught - Robustness certification (IBP/DeepPoly) produces loose bounds for deep models; the certified radius may be conservative
- Over-conservative invariants can stop learning — mitigated by check intervals, warmup periods, and adaptive thresholds (which are themselves bounded)
- Proof cache hit rate drops: High learning rate causes diverse weight states, Standard/Deep proofs dominate and exceed 5% budget. Mitigated by caching invariant structure (not values) — proof terms depend on structure, values are parameters. Monitor
ProofStats::cache_hit_rateand alert below 80% - GPU steps dominated by Deep checks: Schedule deep checks asynchronously with two-phase commit: provisional update, finalize after deep check, revert if failed. Mitigation preserves proof-gated semantics without blocking the training loop
- EWC Fisher information: O(n_params^2) in naive case. The existing diagonal approximation may miss cross-parameter interactions. Mitigated by periodic full Fisher computation (every K epochs) as a Deep-tier invariant
- Attestation chain growth: 82 bytes per step * 100,000 steps = 8 MB. Mitigated by
MutationLedger::compact— compaction is itself proof-gated: it produces a composed attestation plus a witness that the compacted chain corresponds exactly to the original sequence under the current epoch algebra - Certificate separation: Without artifact binding, the certificate can be detached from the model. Mitigated by BLAKE3 hashes of weights, config, dataset manifest, and code build hash in the certificate
Train 200 steps with invariants enabled, then intentionally inject one bad gradient update that would push a layer norm above max_norm. The system must:
- Reject the step (fail-closed)
- Emit a refusal attestation to the ledger
- Leave weights unchanged (delta-apply was not committed)
- The sealed
TrainingCertificatemust show exactly one violation with the correct step index and invariant name - The
weights_hashin the certificate must match the actual final weights
- Define
TrainingInvariantenum andVerifiedTrainerConfigincrates/ruvector-graph-transformer/src/verified_training/invariants.rs - Implement
VerifiedTrainerwrappingruvector_gnn::training::Optimizerincrates/ruvector-graph-transformer/src/verified_training/pipeline.rs - Implement invariant-to-ProofKind mapping for tier routing
- Implement
RobustnessCertifierwith IBP and DeepPoly incrates/ruvector-graph-transformer/src/verified_training/mod.rs - Implement
TrainingCertificateandseal()method - Add benchmarks: verified training step overhead on a 3-layer GNN (128-dim, 10K nodes)
- Integration test: train a small GNN for 100 steps with all invariants, verify certificate
- ADR-045: Lean-Agentic Integration (
ProofEnvironment,FastTermArena) - ADR-046: Graph Transformer Unified Architecture (module structure)
- ADR-047: Proof-Gated Mutation Protocol (
ProofGate<T>,MutationLedger,compose_chain) crates/ruvector-gnn/src/training.rs:Optimizer,OptimizerType,TrainConfig,sgd_stepcrates/ruvector-gnn/src/ewc.rs:ElasticWeightConsolidationcrates/ruvector-gnn/src/scheduler.rs:LearningRateScheduler,SchedulerTypecrates/ruvector-gnn/src/replay.rs:ReplayBuffer,ReplayEntrycrates/ruvector-verified/src/gated.rs:ProofTier,route_proof,verify_tieredcrates/ruvector-verified/src/proof_store.rs:ProofAttestation,create_attestationcrates/ruvector-verified/src/fast_arena.rs:FastTermArenacrates/ruvector-coherence/src/spectral.rs:SpectralCoherenceScore,SpectralTrackercrates/ruvector-mincut-gated-transformer/src/energy_gate.rs:EnergyGate- Gowal et al., "Scalable Verified Training" (ICML 2019) -- IBP training
- Singh et al., "Abstract Interpretation with DeepPoly" (POPL 2019)