Status: Proposed Date: 2026-02-08 Parent: ADR-017 Temporal Tensor Compression, ADR-018 Block-Based Storage Engine Author: System Architecture Team
Note: Delta compression is now implemented via RVF OVERLAY_SEG as part of ADR-029. See the overlay epochs specification (docs/research/rvf/spec/05-overlay-epochs.md).
| Version | Date | Author | Changes |
|---|---|---|---|
| 0.1 | 2026-02-08 | Architecture Team | Initial proposal |
This ADR defines delta compression, reconstruction policies, and the associated
read/write data paths for the Temporal Tensor Store. It extends the tiered
quantization system from ADR-017 with a fourth logical tier -- Tier0 -- that
compresses data to zero resident bits while preserving the ability to
reconstruct on demand via delta chains or low-rank factor decomposition. The
design adds sparse delta encoding for incremental writes, bounded-depth delta
chain management with automatic compaction, and three explicit reconstruction
policies (None, Delta, Factor) that control what happens when a reader
requests a block that has been evicted to Tier0.
All structures target Rust with #[no_std] compatibility for the WASM path,
consistent with the zero-dependency constraint established in ADR-017.
ADR-017 introduced three quantization tiers (8-bit hot, 7/5-bit warm, 3-bit cold) that trade precision for storage. However, it provides no mechanism for tensors that have become completely stale -- data that has not been accessed in a long time and whose storage cost exceeds its value. Today the only option is full deletion, which is irreversible.
Production workloads produce tensor streams where the vast majority of blocks become irrelevant within minutes but a small fraction are needed hours or days later for debugging, auditing, or replay. We need a tier that retains the ability to reconstruct without paying any per-block storage cost during steady state.
The current write path (ADR-017 push_frame) always stores a full quantized
representation. When a tensor block changes by only a few elements --
common during fine-tuning steps or incremental embedding updates -- writing the
entire block wastes bandwidth and storage. Delta encoding captures only the
changed elements as sparse pairs.
- Zero-cost eviction: Tier0 blocks consume zero data bytes; only metadata survives.
- Configurable reconstruction: Callers choose whether evicted blocks are reconstructable, and by which method.
- Bounded delta chains: Delta reads are O(K) where K is a small, configurable constant (default 8), not O(history_length).
- Sparse delta writes: Incremental changes below a threshold are stored as sparse vectors, saving up to 90% over full-block rewrites.
- WASM-safe: All structures use fixed-size integers and simple layouts
compatible with
wasm32-unknown-unknown.
The tier model from ADR-017 is extended with Tier0:
Tier1 (Hot) -- 8-bit quantized -- full fidelity, fast access
Tier2 (Warm) -- 7/5-bit quantized -- reduced fidelity, moderate access
Tier3 (Cold) -- 3-bit quantized -- low fidelity, infrequent access
Tier0 (Zero) -- 0-bit evicted -- metadata only, reconstructable on demand
Tier0 is reached when the tier score from TierPolicy::select_bits falls
below a new configurable threshold evict_min_score (default: 4), or when the
storage engine triggers explicit eviction under memory pressure.
/// Controls how a Tier0 (evicted) block is handled on read.
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
#[repr(u8)]
pub enum ReconstructPolicy {
/// No reconstruction. Reads return an error or zeros depending on
/// `zero_fill_on_evict` in the global config.
None = 0,
/// Reconstruct from a base block plus a bounded-depth delta chain.
/// The base is stored in the factor file or an older tier snapshot.
Delta = 1,
/// Reconstruct from stored low-rank factors (SVD decomposition).
/// Factors are stored in a dedicated factor file: U, S, V matrices.
Factor = 2,
}
/// Error returned when a Tier0 block cannot be read.
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub enum ReadError {
/// Block has been evicted and the reconstruction policy is None.
TensorEvicted,
/// Delta chain is corrupted or a link is missing.
DeltaChainBroken { depth: u16 },
/// Factor file is missing or corrupt.
FactorMissing,
/// Block metadata not found.
BlockNotFound,
/// Supplied output buffer is too small.
BufferTooSmall { needed: usize, provided: usize },
}| Policy | Storage Cost | Read Latency | Quality | Best For |
|---|---|---|---|---|
| None | 0 | N/A (error) | N/A | Truly disposable data |
| Delta | O(K * nnz) | O(K * N) | Exact at base tier | Audit trails, debugging replay |
| Factor | O(k*(m+n)) | O(km + kn) | Bounded by truncation rank | Attention weight matrices |
Delta Record (variable length):
Offset Size Field Description
------ ----- ------------- ------------------------------------------
0 16 tensor_id u128 LE - identifies the tensor
16 4 block_index u32 LE - block within the tensor
20 8 base_epoch u64 LE - epoch of the base this delta applies to
28 2 nnz u16 LE - number of non-zero delta entries
30 4 delta_scale f32 LE - scale factor for i16 delta values
34 nnz*4 pairs Array of (index: u16, value: i16) pairs
Total size per delta: 34 + 4 * nnz bytes.
For WASM targets, delta values are stored as i16 with a shared delta_scale
(f32) to keep the arithmetic simple and avoid f64 in the critical path.
/// On-disk header for a single delta record.
#[derive(Clone, Debug)]
#[repr(C, packed)]
pub struct DeltaHeader {
pub tensor_id: u128,
pub block_index: u32,
pub base_epoch: u64,
pub nnz: u16,
pub delta_scale: f32,
}
/// A single sparse delta entry: position and quantized value.
#[derive(Clone, Copy, Debug)]
#[repr(C, packed)]
pub struct DeltaPair {
pub index: u16,
pub value: i16,
}
/// In-memory representation of a delta record.
#[derive(Clone, Debug)]
pub struct DeltaRecord {
pub header: DeltaHeader,
pub pairs: Vec<DeltaPair>,
}
impl DeltaRecord {
/// Serialise to bytes (little-endian, WASM-safe).
pub fn to_bytes(&self) -> Vec<u8> {
let mut buf = Vec::with_capacity(34 + self.pairs.len() * 4);
buf.extend_from_slice(&self.header.tensor_id.to_le_bytes());
buf.extend_from_slice(&self.header.block_index.to_le_bytes());
buf.extend_from_slice(&self.header.base_epoch.to_le_bytes());
buf.extend_from_slice(&self.header.nnz.to_le_bytes());
buf.extend_from_slice(&self.header.delta_scale.to_le_bytes());
for p in &self.pairs {
buf.extend_from_slice(&p.index.to_le_bytes());
buf.extend_from_slice(&p.value.to_le_bytes());
}
buf
}
/// Deserialise from bytes. Returns None on truncated input.
pub fn from_bytes(data: &[u8]) -> Option<Self> {
if data.len() < 34 {
return None;
}
let tensor_id = u128::from_le_bytes(data[0..16].try_into().ok()?);
let block_index = u32::from_le_bytes(data[16..20].try_into().ok()?);
let base_epoch = u64::from_le_bytes(data[20..28].try_into().ok()?);
let nnz = u16::from_le_bytes(data[28..30].try_into().ok()?);
let delta_scale = f32::from_le_bytes(data[30..34].try_into().ok()?);
let pairs_len = nnz as usize;
if data.len() < 34 + pairs_len * 4 {
return None;
}
let mut pairs = Vec::with_capacity(pairs_len);
let mut off = 34;
for _ in 0..pairs_len {
let index = u16::from_le_bytes(data[off..off + 2].try_into().ok()?);
let value = i16::from_le_bytes(data[off + 2..off + 4].try_into().ok()?);
pairs.push(DeltaPair { index, value });
off += 4;
}
Some(Self {
header: DeltaHeader {
tensor_id,
block_index,
base_epoch,
nnz,
delta_scale,
},
pairs,
})
}
}The per-block metadata from ADR-018 is extended with reconstruction fields:
/// Extended block metadata supporting Tier0 and reconstruction.
#[derive(Clone, Debug)]
pub struct BlockMeta {
pub tensor_id: u128,
pub block_index: u32,
pub epoch: u64,
/// Current storage tier: 0 = evicted, 1 = hot, 2 = warm, 3 = cold.
pub tier: u8,
/// Bit width of the stored representation (0 for Tier0).
pub bits: u8,
/// Reconstruction policy when tier == 0.
pub reconstruct_policy: ReconstructPolicy,
/// Number of deltas chained on top of the base for this block.
pub delta_chain_len: u16,
/// Epoch of the base block at the root of the delta chain.
pub base_epoch: u64,
/// Byte offset into the tier data file (unused when tier == 0).
pub data_offset: u64,
/// Byte length in the tier data file (0 when tier == 0).
pub data_len: u32,
/// Access tracking for tier policy.
pub access_count: u32,
pub last_access_ts: u32,
}Caller BlockStore TierDataFile DeltaStore FactorStore
| | | | |
|-- read_block(id) -->| | | |
| |-- lookup_meta(id) ->| | |
| |<--- BlockMeta ------| | |
| | | | |
| [tier 1/2/3?] | | |
| |-- read_bytes ------>| | |
| |<--- quantized ------| | |
| |-- dequantize ------>| | |
|<-- f32 buffer ------| | | |
| | | | |
| [tier 0, policy=None?] | | |
|<-- Err(TensorEvicted) | | |
| | | | |
| [tier 0, policy=Delta?] | | |
| |-- load_base ------->| | |
| |<--- base block -----| | |
| |-- load_deltas ------|----------------->| |
| |<--- delta chain ----|------------------| |
| |-- apply_chain ----->| | |
|<-- reconstructed ---| | | |
| | | | |
| [tier 0, policy=Factor?] | | |
| |-- load_factors -----|------------------|---------------->|
| |<--- U, S, V --------|------------------|-----------------|
| |-- reconstruct_svd ->| | |
|<-- reconstructed ---| | | |
/// Result of reading a block. Contains the f32 data or an error.
pub type ReadResult = Result<Vec<f32>, ReadError>;
/// Read a block, performing reconstruction if necessary.
pub fn read_block(
meta: &BlockMeta,
tier_files: &TierDataFiles,
delta_store: &DeltaStore,
factor_store: &FactorStore,
zero_fill_on_evict: bool,
out: &mut Vec<f32>,
) -> Result<(), ReadError> {
match meta.tier {
// --- Tier 1/2/3: quantized data present ---
1 | 2 | 3 => {
let raw = tier_files
.read_range(meta.tier, meta.data_offset, meta.data_len)
.map_err(|_| ReadError::BlockNotFound)?;
// Dequantize into caller buffer using the segment decode path
// from ADR-017. The raw bytes include the TQTC segment header.
out.clear();
crate::segment::decode(&raw, out);
if out.is_empty() {
return Err(ReadError::BlockNotFound);
}
Ok(())
}
// --- Tier 0: evicted, attempt reconstruction ---
0 => match meta.reconstruct_policy {
ReconstructPolicy::None => {
if zero_fill_on_evict {
// Return a zero-filled buffer of the expected size.
// The block_size is derived from tensor metadata.
out.clear();
out.resize(block_size_from_meta(meta), 0.0);
Ok(())
} else {
Err(ReadError::TensorEvicted)
}
}
ReconstructPolicy::Delta => {
reconstruct_via_delta(meta, tier_files, delta_store, out)
}
ReconstructPolicy::Factor => {
reconstruct_via_factor(meta, factor_store, out)
}
},
_ => Err(ReadError::BlockNotFound),
}
}
/// Reconstruct a Tier0 block by loading the base and applying the
/// delta chain up to the target epoch.
fn reconstruct_via_delta(
meta: &BlockMeta,
tier_files: &TierDataFiles,
delta_store: &DeltaStore,
out: &mut Vec<f32>,
) -> Result<(), ReadError> {
// 1. Load the base block (stored in an older tier or factor file).
let base_raw = tier_files
.read_base(meta.tensor_id, meta.base_epoch)
.map_err(|_| ReadError::DeltaChainBroken { depth: 0 })?;
out.clear();
crate::segment::decode(&base_raw, out);
if out.is_empty() {
return Err(ReadError::DeltaChainBroken { depth: 0 });
}
// 2. Load and apply deltas sequentially (oldest to newest).
let deltas = delta_store
.load_chain(meta.tensor_id, meta.block_index, meta.base_epoch, meta.epoch)
.map_err(|_| ReadError::DeltaChainBroken {
depth: meta.delta_chain_len,
})?;
for (i, delta) in deltas.iter().enumerate() {
apply_delta(out, delta).map_err(|_| ReadError::DeltaChainBroken {
depth: i as u16 + 1,
})?;
}
Ok(())
}
/// Apply a single sparse delta to a mutable f32 buffer.
fn apply_delta(buf: &mut [f32], delta: &DeltaRecord) -> Result<(), ReadError> {
let scale = delta.header.delta_scale;
for pair in &delta.pairs {
let idx = pair.index as usize;
if idx >= buf.len() {
return Err(ReadError::BufferTooSmall {
needed: idx + 1,
provided: buf.len(),
});
}
buf[idx] += (pair.value as f32) * scale;
}
Ok(())
}
/// Reconstruct a Tier0 block from stored SVD factors.
fn reconstruct_via_factor(
meta: &BlockMeta,
factor_store: &FactorStore,
out: &mut Vec<f32>,
) -> Result<(), ReadError> {
let factors = factor_store
.load(meta.tensor_id, meta.block_index)
.map_err(|_| ReadError::FactorMissing)?;
// factors.u: [m x k], factors.s: [k], factors.v: [k x n]
// Reconstruct: out[i][j] = sum_r( U[i][r] * S[r] * V[r][j] )
let m = factors.m;
let n = factors.n;
let k = factors.k;
out.clear();
out.resize(m * n, 0.0);
for r in 0..k {
let s_r = factors.s[r];
for i in 0..m {
let u_ir = factors.u[i * k + r];
let u_s = u_ir * s_r;
for j in 0..n {
out[i * n + j] += u_s * factors.v[r * n + j];
}
}
}
Ok(())
}Caller BlockStore Quantizer TierDataFile
| | | |
|-- write_block(data) ->| | |
| |-- select_tier ---->| |
| |<-- bits, tier -----| |
| |-- quantize ------->| |
| |<-- segment bytes --| |
| |-- write_segment ---|-------------------->|
| |-- update_meta ---->| |
|<-- Ok ----------------| | |
/// Write a full block replacement. Quantizes at the current tier and
/// stores the complete representation, discarding any prior data.
pub fn write_block_full(
meta: &mut BlockMeta,
data: &[f32],
policy: &TierPolicy,
tier_files: &mut TierDataFiles,
now_ts: u32,
) -> Result<(), WriteError> {
// 1. Determine tier from access pattern.
let bits = policy.select_bits(meta.access_count, meta.last_access_ts, now_ts);
let tier = tier_from_bits(bits);
// 2. Quantize via ADR-017 segment encoding.
let group_len = policy.group_len as usize;
let scales = crate::quantizer::compute_scales(data, group_len, bits);
let mut packed = Vec::new();
crate::quantizer::quantize_and_pack(&scales, &scales, group_len, bits, &mut packed);
let mut segment = Vec::new();
crate::segment::encode(
bits,
policy.group_len,
data.len() as u32,
1, // single frame
&scales,
&packed,
&mut segment,
);
// 3. Write segment bytes to the appropriate tier data file.
let (offset, len) = tier_files.append(tier, &segment)?;
// 4. Update metadata.
meta.tier = tier;
meta.bits = bits;
meta.data_offset = offset;
meta.data_len = len as u32;
meta.epoch += 1;
meta.delta_chain_len = 0;
meta.base_epoch = meta.epoch;
Ok(())
}Caller BlockStore DeltaEncoder DeltaStore
| | | |
|-- write_delta(data) ->| | |
| |-- diff vs current->| |
| |<-- changed_frac ---| |
| | | |
| [changed_frac < p?] | |
| |-- encode_sparse -->| |
| |<-- DeltaRecord ----| |
| |-- store_delta -----|-------------------->|
| |-- update_meta ---->| |
|<-- Ok(DeltaStored) ---| | |
| | | |
| [changed_frac >= p?] | |
| |-- write_block_full (see 7.1) |
|<-- Ok(FullReplace) ---| | |
/// Decision thresholds for delta vs full write.
#[derive(Clone, Copy, Debug)]
pub struct DeltaPolicy {
/// Maximum fraction of changed elements to use delta encoding.
/// If the fraction exceeds this, a full write is performed instead.
pub max_changed_fraction: f32, // default: 0.10 (10%)
/// Maximum L2 norm of the delta relative to the block norm.
/// Prevents delta encoding when the change is large in magnitude.
pub max_relative_delta_norm: f32, // default: 0.05 (5%)
/// Maximum number of deltas in a chain before compaction is forced.
pub max_delta_chain: u16, // default: 8
}
impl Default for DeltaPolicy {
fn default() -> Self {
Self {
max_changed_fraction: 0.10,
max_relative_delta_norm: 0.05,
max_delta_chain: 8,
}
}
}
/// Outcome of a write operation.
#[derive(Debug)]
pub enum WriteOutcome {
DeltaStored,
FullReplace,
}
/// Attempt a delta write. Falls back to full replace when the change is
/// too large or the delta chain has reached its maximum depth.
pub fn write_block_delta(
meta: &mut BlockMeta,
old_data: &[f32],
new_data: &[f32],
delta_policy: &DeltaPolicy,
tier_policy: &TierPolicy,
tier_files: &mut TierDataFiles,
delta_store: &mut DeltaStore,
now_ts: u32,
) -> Result<WriteOutcome, WriteError> {
assert_eq!(old_data.len(), new_data.len());
let n = old_data.len();
// 1. Compute diff statistics.
let mut changed_count: usize = 0;
let mut delta_norm_sq: f64 = 0.0;
let mut block_norm_sq: f64 = 0.0;
for i in 0..n {
let diff = (new_data[i] - old_data[i]) as f64;
block_norm_sq += (old_data[i] as f64) * (old_data[i] as f64);
if diff.abs() > 1e-9 {
changed_count += 1;
delta_norm_sq += diff * diff;
}
}
let changed_frac = changed_count as f32 / n as f32;
let relative_norm = if block_norm_sq > 0.0 {
(delta_norm_sq / block_norm_sq).sqrt() as f32
} else {
f32::MAX
};
// 2. Decision: delta or full replace?
let chain_full = meta.delta_chain_len >= delta_policy.max_delta_chain;
let change_too_large = changed_frac > delta_policy.max_changed_fraction
|| relative_norm > delta_policy.max_relative_delta_norm;
if chain_full || change_too_large {
write_block_full(meta, new_data, tier_policy, tier_files, now_ts)?;
return Ok(WriteOutcome::FullReplace);
}
// 3. Encode sparse delta.
let max_abs_delta = old_data
.iter()
.zip(new_data.iter())
.map(|(a, b)| (b - a).abs())
.fold(0.0f32, f32::max);
let delta_scale = if max_abs_delta == 0.0 {
1.0
} else {
max_abs_delta / i16::MAX as f32
};
let inv_scale = 1.0 / delta_scale;
let mut pairs = Vec::with_capacity(changed_count);
for i in 0..n {
let diff = new_data[i] - old_data[i];
if diff.abs() > 1e-9 {
let quantized = (diff * inv_scale).round() as i16;
pairs.push(DeltaPair {
index: i as u16,
value: quantized,
});
}
}
let record = DeltaRecord {
header: DeltaHeader {
tensor_id: meta.tensor_id,
block_index: meta.block_index,
base_epoch: meta.base_epoch,
nnz: pairs.len() as u16,
delta_scale,
},
pairs,
};
// 4. Store delta and update metadata.
delta_store.append(&record)?;
meta.epoch += 1;
meta.delta_chain_len += 1;
Ok(WriteOutcome::DeltaStored)
}The max_delta_chain parameter (default: 8) bounds the number of deltas that
can be chained before compaction. This guarantees that delta-based
reconstruction is bounded by O(K * N) where K <= max_delta_chain and N is
the block size.
At 8 deltas with an average sparsity of 10%, the read amplification is:
base_read + 8 * 0.10 * N * 4 bytes = base_read + 3.2 * N bytes
For a 512-element block this is base_read + 6.4 KB, well within acceptable
latency.
DeltaStore Compactor TierDataFile MetadataStore
| | | |
|-- chain_len > K? | | |
| | | |
|-- load_base ---->| | |
|<-- base f32 -----| | |
| | | |
|-- load_deltas -->| | |
|<-- [d0..dK] -----| | |
| | | |
| [apply d0, d1, ..., dK] | |
| | | |
|-- quantize ----->| | |
|<-- new segment --| | |
| |-- write_segment -->| |
| |-- delete_deltas -->| |
| |-- update_meta -----|-------------------->|
|<-- compacted ----| | |
/// Compact a delta chain into a new base block.
///
/// This is the primary mechanism for bounding read latency. When
/// `meta.delta_chain_len` exceeds `max_delta_chain`, the compactor:
/// 1. Loads the base block and decodes it to f32.
/// 2. Applies all deltas in epoch order.
/// 3. Re-quantizes at the current tier.
/// 4. Stores the result as a new base, deletes old deltas.
pub fn compact_delta_chain(
meta: &mut BlockMeta,
tier_policy: &TierPolicy,
tier_files: &mut TierDataFiles,
delta_store: &mut DeltaStore,
now_ts: u32,
) -> Result<(), CompactionError> {
// 1. Load and decode the base block.
let base_raw = tier_files
.read_base(meta.tensor_id, meta.base_epoch)
.map_err(|_| CompactionError::BaseMissing)?;
let mut buffer = Vec::new();
crate::segment::decode(&base_raw, &mut buffer);
if buffer.is_empty() {
return Err(CompactionError::BaseDecodeFailed);
}
// 2. Load and apply all deltas in order.
let deltas = delta_store
.load_chain(
meta.tensor_id,
meta.block_index,
meta.base_epoch,
meta.epoch,
)
.map_err(|_| CompactionError::DeltaLoadFailed)?;
for delta in &deltas {
let scale = delta.header.delta_scale;
for pair in &delta.pairs {
let idx = pair.index as usize;
if idx < buffer.len() {
buffer[idx] += (pair.value as f32) * scale;
}
}
}
// 3. Re-quantize at the current tier.
let bits = tier_policy.select_bits(meta.access_count, meta.last_access_ts, now_ts);
let tier = tier_from_bits(bits);
let group_len = tier_policy.group_len as usize;
let scales = crate::quantizer::compute_scales(&buffer, group_len, bits);
let mut packed = Vec::new();
crate::quantizer::quantize_and_pack(&scales, &scales, group_len, bits, &mut packed);
let mut segment = Vec::new();
crate::segment::encode(
bits,
tier_policy.group_len,
buffer.len() as u32,
1,
&scales,
&packed,
&mut segment,
);
let (offset, len) = tier_files.append(tier, &segment)?;
// 4. Delete old deltas and the old base.
delta_store.delete_chain(
meta.tensor_id,
meta.block_index,
meta.base_epoch,
meta.epoch,
)?;
// 5. Update metadata to reflect the new base.
meta.tier = tier;
meta.bits = bits;
meta.data_offset = offset;
meta.data_len = len as u32;
meta.base_epoch = meta.epoch;
meta.delta_chain_len = 0;
Ok(())
}
/// Map bit width to tier number.
fn tier_from_bits(bits: u8) -> u8 {
match bits {
8 => 1,
7 | 5 => 2,
3 => 3,
0 => 0,
_ => 3, // conservative fallback
}
}When a block is evicted to Tier0:
- The data bytes in the tier data file are logically deleted (marked free for reuse or physically removed during compaction).
meta.bitsis set to 0 andmeta.tieris set to 0.meta.data_lenis set to 0.- The reconstruction policy determines whether a base snapshot and/or delta chain are preserved.
/// Evict a block to Tier0. Optionally preserves reconstruction data.
pub fn evict_to_tier0(
meta: &mut BlockMeta,
policy: ReconstructPolicy,
tier_files: &mut TierDataFiles,
) -> Result<(), EvictionError> {
// Delete the data from the tier file.
if meta.data_len > 0 {
tier_files.mark_free(meta.tier, meta.data_offset, meta.data_len)?;
}
meta.tier = 0;
meta.bits = 0;
meta.data_offset = 0;
meta.data_len = 0;
meta.reconstruct_policy = policy;
// When policy is None, also delete any delta chain and factors
// to reclaim storage immediately.
// When policy is Delta or Factor, the associated stores are preserved.
Ok(())
}FactorRecord:
Offset Size Field Description
------ -------- -------- ------------------------------------------
0 16 id u128 LE - tensor_id
16 4 block u32 LE - block_index
20 4 m u32 LE - rows of U
24 4 n u32 LE - cols of V
28 4 k u32 LE - truncation rank
32 m*k*4 u_data f32 LE - U matrix (row-major)
32+m*k*4 k*4 s_data f32 LE - singular values
... k*n*4 v_data f32 LE - V matrix (row-major)
/// Stored low-rank factors for SVD-based reconstruction.
#[derive(Clone, Debug)]
pub struct FactorRecord {
pub tensor_id: u128,
pub block_index: u32,
pub m: usize, // rows
pub n: usize, // cols
pub k: usize, // truncation rank, k << min(m, n)
pub u: Vec<f32>, // m x k, row-major
pub s: Vec<f32>, // k singular values
pub v: Vec<f32>, // k x n, row-major
}
impl FactorRecord {
/// Storage cost in bytes (excluding header overhead).
pub fn storage_bytes(&self) -> usize {
(self.m * self.k + self.k + self.k * self.n) * 4
}
/// Reconstruction error bound: sum of discarded singular values
/// (Eckart-Young theorem). The caller computes the full SVD and
/// provides only the top-k factors.
pub fn is_worthwhile(&self, full_block_bytes: usize) -> bool {
self.storage_bytes() < full_block_bytes / 2
}
}Factor reconstruction is most effective for tensors with low effective rank, such as attention weight matrices where the top 32-64 singular values capture over 95% of the Frobenius norm.
Symptom: Reads become progressively slower as chains grow.
Root cause: Compaction not triggered, or max_delta_chain set too high.
Mitigation: The write path checks delta_chain_len >= max_delta_chain
before every delta write and forces a full replace (which resets the chain).
Background compaction runs when chain_len > max_delta_chain / 2 to stay
ahead of the threshold.
Monitoring: Expose max_chain_len and avg_chain_len as metrics on the
BlockStore. Alert when max_chain_len approaches 80% of max_delta_chain.
Symptom: Quality drops sharply on blocks with outlier values, particularly
at 3-bit quantization where qmax = 3.
Root cause: A single outlier in a group inflates the scale, crushing the dynamic range available for all other values.
Mitigation:
-
Outlier clamping: Before computing scales, clamp values at the 99.9th percentile of absolute values within each group. Outliers beyond the clamp are stored separately as sparse corrections (same format as delta pairs).
-
Two-level scale for 3-bit: Use a per-block coarse scale and a per-group fine scale. The fine scale is a 4-bit multiplier (0.25x to 4.0x) applied on top of the coarse scale. This provides 16 sub-ranges within the block's dynamic range.
-
Per-group scale inside block: Already implemented in ADR-017. Groups of 64 elements each get their own scale, limiting outlier blast radius to 64 values.
Symptom: Delta reconstruction fails with DeltaChainBroken { depth: 0 }.
Root cause: The base block referenced by the delta chain was deleted or corrupted.
Mitigation: Base blocks referenced by active delta chains are pinned and
cannot be freed by tier file compaction. The eviction path must verify that no
active delta chains reference a base before releasing it. The metadata field
base_epoch serves as the foreign key for this reference check.
All parameters described in this ADR are consolidated into DeltaPolicy and
ReconstructPolicy, both attached to the per-tensor or per-collection
TierPolicy. The full configuration surface:
| Parameter | Location | Default | Description |
|---|---|---|---|
evict_min_score |
TierPolicy | 4 | Score threshold for Tier0 eviction |
reconstruct_policy |
BlockMeta | None | Per-block reconstruction strategy |
zero_fill_on_evict |
Global config | false | Return zeros instead of error for Tier0/None |
max_changed_fraction |
DeltaPolicy | 0.10 | Fraction threshold for delta vs full write |
max_relative_delta_norm |
DeltaPolicy | 0.05 | Norm threshold for delta vs full write |
max_delta_chain |
DeltaPolicy | 8 | Maximum chain depth before compaction |
Rejected. Periodic checkpoints (every N epochs regardless of chain length) waste storage when the tensor is not being modified. Bounded chains with on-demand compaction are more space-efficient and simpler to reason about.
Rejected. For tensors changing by less than 10% per update, COW quadruples write amplification compared to sparse deltas. The delta path reduces write volume by 80-90% for typical incremental updates.
Rejected. General-purpose compression does not exploit the semantic structure of tensor updates (sparse changes, known value distributions). Delta encoding provides better compression for the specific access pattern, and avoids adding external dependencies to the WASM-compatible core.
Rejected. Storing factors with rank k = min(m, n) provides exact
reconstruction but offers no compression. The truncation rank must be bounded
such that factor_bytes < 0.5 * full_block_bytes for the factor policy to be
worthwhile.
- Tier0 eviction reduces per-block storage to metadata only (0 data bytes)
- Delta reconstruction produces correct output for chain depths 1 through
max_delta_chain - Factor reconstruction matches SVD reference within floating-point tolerance
- Delta writes with <10% change use <20% of the bytes of a full write
- Compaction reduces chain length to 0 and produces a valid base block
- Read latency for delta reconstruction at chain depth 8 is under 50us for 512-dim blocks
- All structures serialise/deserialise correctly on both native and WASM targets
-
ReconstructPolicy::Nonewithzero_fill_on_evict = falsereturnsTensorEvictederror -
ReconstructPolicy::Nonewithzero_fill_on_evict = truereturns a zero-filled buffer
- ADR-017: Temporal Tensor Compression with Tiered Quantization (2026-02-06)
- ADR-018: Block-Based Storage Engine (parent, in progress)
- Eckart, C. and Young, G. "The approximation of one matrix by another of lower rank." Psychometrika 1(3), 1936.
- Pelkonen, T., et al. "Gorilla: A Fast, Scalable, In-Memory Time Series Database." VLDB 2015.
- Frantar, E., et al. "GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers." ICLR 2023.
- Liu, Z., et al. "KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache." ICML 2024.