rvl

Reveal the smallest set of numeric changes that explain what actually changed.

No AI. No inference. Pure deterministic arithmetic.

brew install cmdrvl/tap/rvl

TL;DR

The Problem: Comparing CSV exports by hand is slow and noisy — Excel hell, brittle scripts, eyeballing numbers. When two files differ, you need to know what actually changed and whether it matters.

The Solution: One command, one verdict. rvl finds the smallest ranked set of numeric deltas that explain the change — or proves nothing changed — using deterministic arithmetic. Never probabilistic. Never ambiguous.

Why Use rvl?

Feature	What It Does
Ranked explanations	Finds the fewest cells that account for ≥95% of total numeric change
Three clear outcomes	REAL CHANGE, NO REAL CHANGE, or REFUSAL — never a partial answer
Tolerance-aware	Ignores floating-point noise below your threshold — no false positives
Machine-readable	`--json` output for pipelines, CI gates, and automation
Zero config	Auto-detects delimiters, numeric formats, currency symbols, accounting parens
Deterministic	Same inputs always produce the same output — no sampling, no heuristics

Quick Example

$ rvl old.csv new.csv --key id

RVL

REAL CHANGE

Compared: old.csv -> new.csv
Alignment: key=id
Columns: common=15 old_only=2 new_only=1
Checked: 4,183 rows, 12 numeric columns (50,196 cells)
Dialect(old): delimiter=, quote=" escape=none
Dialect(new): delimiter=, quote=" escape=none
Ranking: abs(delta) (unscaled)
Settings: threshold=95.0% tolerance=1e-9

3 cells explain 95.2% of total numeric change (threshold 95.0%):

1. NVDA.market_value  +1842100  (123 -> 1842223)
2. UST10Y.price       -0.37     (4.21 -> 3.84)
3. EURUSD.fx_rate     +0.0013   (1.0842 -> 1.0855)

Everything else in common numeric columns is <= tolerance or in the tail (not required to reach threshold).

Out of 50,196 cells, 3 cells explain 95.2% of all numeric change. That's the whole answer.

# No change? Proof:
$ rvl old.csv old_copy.csv
# → NO REAL CHANGE (exit 0), max delta 7e-10

# Machine-readable:
$ rvl old.csv new.csv --json | jq '.contributors[0]'

# Exit code only (for scripts):
$ rvl old.csv new.csv > /dev/null 2>&1
$ echo $?  # 0 = no change, 1 = changed, 2 = refused

The Three Outcomes

rvl always produces exactly one of three outcomes. There are no partial results, "and N more" buckets, or probabilistic scores.

1. REAL CHANGE

Printed when the top contributors (up to 25) explain ≥ threshold of total numeric change.

RVL

REAL CHANGE

Compared: old.csv -> new.csv
Alignment: key=id
Columns: common=15 old_only=2 new_only=1
Checked: 4,183 rows, 12 numeric columns (50,196 cells)
Dialect(old): delimiter=, quote=" escape=none
Dialect(new): delimiter=, quote=" escape=none
Ranking: abs(delta) (unscaled)
Settings: threshold=95.0% tolerance=1e-9

3 cells explain 95.2% of total numeric change (threshold 95.0%):

1. NVDA.market_value  +1842100  (123 -> 1842223)
2. UST10Y.price       -0.37     (4.21 -> 3.84)
3. EURUSD.fx_rate     +0.0013   (1.0842 -> 1.0855)

Everything else in common numeric columns is <= tolerance or in the tail (not required to reach threshold).

How to read this:

3 cells explain 95.2% — only 3 numeric cells (out of 50,196) account for 95.2% of all numeric change.
Contributors — ranked by abs(delta), largest first. Each shows the cell label (row_id.column), signed delta, and old → new values.
Coverage — cumulative share of total change (L1 distance). rvl prints the smallest prefix of contributors whose cumulative coverage reaches the threshold.
Threshold — if the top 25 contributors can't reach 95%, rvl refuses (E_DIFFUSE) instead of printing a misleading partial list.

2. NO REAL CHANGE

Printed when all numeric deltas are within tolerance.

RVL

NO REAL CHANGE

Compared: old.csv -> new.csv
Alignment: row-order (no key)
Columns: common=15 old_only=2 new_only=1
Checked: 4,183 rows, 12 numeric columns (50,196 cells)
Dialect(old): delimiter=, quote=" escape=none
Dialect(new): delimiter=, quote=" escape=none
Ranking: abs(delta) (unscaled)
Settings: threshold=95.0% tolerance=1e-9
Max abs delta: 7e-10 (<= tolerance 1e-9).
No numeric deltas above tolerance in common numeric columns.

How to read this:

Max abs delta — the largest absolute difference observed across all cells (before tolerance zeroing). Proves nothing slipped through.
This is a deterministic guarantee: every common numeric cell was checked.

3. REFUSAL

Printed when rvl cannot produce a deterministic verdict. Always includes a concrete next step.

RVL ERROR (E_KEY_DUP)

Compared: old.csv -> new.csv
Alignment: key=id
Dialect(old): delimiter=, quote=" escape=none
Dialect(new): delimiter=, quote=" escape=none
Settings: threshold=95.0% tolerance=1e-9

Cannot align rows: key "id" is not unique in old.csv (first duplicate: "A123" at data record 184).
Next: choose a unique key column or dedupe the data, then rerun.

How to read this:

Error code — machine-stable identifier (e.g., E_KEY_DUP). See Refusal Codes.
Example — first concrete instance of the problem (file, record number, value).
Next — a concrete rerun command or remediation step. Refusals are operator handoffs, never dead ends.

How It Works

Alignment

Row-order mode (no --key): rows align by position. Requires identical non-blank row counts. If rvl detects that rows are shuffled (via key discovery), it refuses with E_NEED_KEY and suggests a --key to use.

Key mode (--key <column>): rows align by matching key values. Key values are ASCII-trimmed, must be non-empty and unique within each file, and must match exactly between files. Any violation produces a specific refusal (E_NO_KEY, E_KEY_EMPTY, E_KEY_DUP, E_KEY_MISMATCH).

Numeric Columns

Only columns present in both files are compared. Only numeric columns are diffed. A column is numeric if every aligned row is either missing on both sides or parseable finite numbers on both sides.

Supported numeric formats:

Plain: 123, -123.45, 1e6, -1.2E-3
Thousands separators: 1,234, -1,234,567.89 (US-style, 3-digit groups)
Currency prefix: $123.45, -$1,234.56, $-100
Accounting parentheses: (123.45) → parsed as -123.45
Leading + is allowed: +123, +$1,234.56

Missing tokens (case-insensitive): empty string, -, NA, N/A, NULL, NAN, NONE.

Tolerance

Absolute noise floor applied per-cell. If abs(new - old) <= tolerance, the delta is treated as zero (no contribution). Default: 1e-9. There is no relative/percentage tolerance in v0.

max_abs_delta in the output tracks the largest raw delta observed (before zeroing) for transparency.

Threshold and Coverage

Total change = sum of all abs(delta) values above tolerance (L1 distance across all common numeric cells).
Contribution = abs(delta) for a single cell (after tolerance).
Coverage = sum of top contributor contributions / total change.
Threshold (default 0.95) = minimum coverage required for a REAL CHANGE verdict.
MAX_CONTRIBUTORS = 25 (hard cap, not configurable in v0).

If the top 25 contributors can't reach the threshold, rvl refuses with E_DIFFUSE rather than printing an incomplete explanation. Lower the threshold explicitly if needed: --threshold 0.80.

Contributor Ranking

Contributors are ranked by abs(delta) descending (unscaled — large-magnitude columns dominate by design). Ties are broken by row ID ascending, then column name ascending (byte order). rvl prints only the smallest prefix of contributors whose cumulative coverage reaches the threshold.

How rvl Compares

Capability	rvl	Excel / Sheets	`diff` / `csvdiff`	Custom pandas script
Ranked numeric explanation	✅ Top-K with coverage proof	❌ Manual	❌ Row-level only	⚠️ You write it
Deterministic verdict	✅ Always	❌ Human judgment	⚠️ Text diff only	⚠️ You write it
Tolerance handling	✅ Built-in	❌ Manual rounding	❌ None	⚠️ You write it
Refusal on ambiguity	✅ Never wrong, refuses instead	❌ Silent errors	❌ Garbage in/out	❌ Crashes
Auto-detects delimiters	✅	N/A	❌	❌
Setup time	✅ One curl command	N/A	⚠️ Minutes	❌ Hours
Machine-readable output	✅ `--json`	❌	⚠️ Text only	✅

When to use rvl:

Monthly/quarterly reconciliation of CSV exports (holdings, positions, balances)
CI gate: did the pipeline output actually change?
Audit trail: prove what changed and by how much

When rvl might not be ideal:

Non-numeric diffs (text columns, schema changes) — use shape for structural checks first
Files that don't fit in memory
Diffs where you need relative (percentage) tolerance — not yet supported in v0

Installation

Homebrew (Recommended)

brew install cmdrvl/tap/rvl

Shell Script

curl -fsSL https://raw.githubusercontent.com/cmdrvl/rvl/main/scripts/install.sh | bash

Windows (PowerShell)

Set-ExecutionPolicy -ExecutionPolicy Bypass -Scope Process -Force; iex ((New-Object System.Net.WebClient).DownloadString('https://raw.githubusercontent.com/cmdrvl/rvl/main/scripts/install.ps1'))

From Source

cargo build --release
./target/release/rvl --help

Prebuilt binaries are available for x86_64 and ARM64 on Linux, macOS, and Windows (x86_64). Each release includes SHA256 checksums, cosign signatures, and an SBOM.

CLI Reference

rvl <old.csv> <new.csv> [OPTIONS]

Flags

Flag	Type	Default	Description
`--key <column>`	string	(none)	Align rows by key column value. Without this, rows align by position (1st↔1st, 2nd↔2nd, etc.).
`--threshold <float>`	float	`0.95`	Coverage target (0 < x ≤ 1.0). The minimum fraction of total numeric change that the top contributors must explain.
`--tolerance <float>`	float	`1e-9`	Per-cell noise floor (x ≥ 0). Absolute deltas ≤ this value are treated as zero.
`--delimiter <delim>`	string	(auto-detect)	Force CSV delimiter for both files. See Delimiter.
`--capsule-out <dir>`	string	(disabled)	Write deterministic replay capsule artifacts (`manifest.json`, `old.csv`, `new.csv`, `output.txt`, `replay.sh`) to `<dir>/capsule-<id>/`.
`--json`	flag	`false`	Emit a single JSON object on stdout instead of human-readable output.

Invalid --threshold or --tolerance values are CLI argument errors (exit 2).

Exit Codes

Code	Meaning
`0`	NO REAL CHANGE
`1`	REAL CHANGE
`2`	REFUSAL or CLI error

Output Routing

Mode	REAL CHANGE	NO REAL CHANGE	REFUSAL
Human (default)	stdout	stdout	stderr
`--json`	stdout	stdout	stdout

In --json mode, stderr is reserved for process-level failures only (CLI parse errors, panics).

Delimiter

Auto-Detection (default)

Each file's delimiter is detected independently by sampling the header plus up to 200 data records (or ~64KB). Candidate delimiters are tried in order: , → \t → ; → | → ^. The candidate with the best score (most records parsed, most consistent field count, most fields) wins.

If multiple candidates tie and produce different parsed output, rvl refuses with E_DIALECT. If they produce identical output, the tie breaks by candidate order (comma first).

If auto-detection yields only 1 column, rvl refuses with E_DIALECT (the file may use an unsupported delimiter).

`sep=` Directive

If the first non-blank line of a file is sep=<char> (e.g., sep=;), rvl uses that delimiter for the file (unless --delimiter overrides it). The sep= line is skipped during parsing.

`--delimiter` (forced)

Overrides both auto-detection and sep= directives for both files. Accepted values:

Format	Examples
Named	`comma`, `tab`, `semicolon`, `pipe`, `caret` (case-insensitive)
Hex	`0x09` (tab), `0x1f` (unit separator), `0x2c` (comma)
Single ASCII char	`,`, `\|`, `;`

Valid range: ASCII 0x01–0x7F, excluding " (0x22), \r (0x0D), \n (0x0A). Invalid values are CLI argument errors (exit 2). Use tab or 0x09, not \t (no escape sequences).

Agent / CI Integration

Both rvl and shape are designed to be consumed by agents and pipelines, not just humans.

Agent workflow: shape → rvl

# 1. Structural gate (is comparison even valid?)
shape old.csv new.csv --key id --json > shape.json
if [ $? -ne 0 ]; then
  # INCOMPATIBLE or REFUSAL — read .reasons or .refusal for why
  jq '.reasons // .refusal' shape.json
  exit 1
fi

# 2. Numeric explanation (only if structurally compatible)
rvl old.csv new.csv --key id --json > rvl.json

# 3. Agent extracts the verdict
outcome=$(jq -r '.outcome' rvl.json)
if [ "$outcome" = "REAL_CHANGE" ]; then
  jq '.contributors[] | "\(.row_id).\(.column): \(.delta)"' rvl.json
fi

What makes this agent-friendly

Exit codes — 0/1/2 map directly to pass/fail/error branching
--json — structured output an agent can parse without regex
Refusals have next steps — an agent can read .refusal.code and decide whether to retry with different flags or escalate
shape --describe — prints the tool's operator.json contract so an agent can discover invocation, flags, and exit codes without reading docs

Capsule replay workflow (agent swarms)

Use capsules when you need a deterministic handoff between agents, CI jobs, or debugging sessions:

# 1. Produce the normal verdict and write a replay capsule sidecar
rvl old.csv new.csv --key id --json --capsule-out ./capsules > run.json

# 2. Inspect generated capsule
ls ./capsules/capsule-*/
# manifest.json old.csv new.csv output.txt replay.sh

# 3. Re-run exactly from the capsule payload
cd ./capsules/capsule-<id>
./replay.sh > replay.json

manifest.json includes:

original invocation args (key, threshold, tolerance, delimiter, json)
outcome and refusal code (if any)
contributor summary for REAL_CHANGE
replay command plus artifact hashes for integrity checks

For troubleshooting, compare run.json vs replay.json outcome/refusal code first; if they differ, the environment or binary changed.

Scripting Examples

Check if files changed (exit code only):

rvl old.csv new.csv > /dev/null 2>&1
echo $?  # 0 = no change, 1 = changed, 2 = refused

Extract top contributor from JSON:

rvl old.csv new.csv --json | jq '.contributors[0]'

Get total change magnitude:

rvl old.csv new.csv --json | jq '.metrics.total_change'

Handle refusals programmatically:

rvl old.csv new.csv --json | jq 'select(.outcome == "REFUSAL") | .refusal'

Force a tab-delimited comparison with relaxed threshold:

rvl old.tsv new.tsv --delimiter tab --key account_id --threshold 0.80

Gate a pipeline (shape before rvl):

shape old.csv new.csv --key loan_id --json > shape.json \
  && rvl old.csv new.csv --key loan_id --json > rvl.json

Refusal Codes

Every refusal includes the error code, first concrete example, and a Next: remediation step.

Code	Meaning	Next Step
`E_IO`	File read error	Check file path and permissions
`E_ENCODING`	Unsupported encoding (UTF-16/32 BOM or NUL bytes)	Convert/re-export as UTF-8
`E_CSV_PARSE`	CSV parse failure (invalid quoting/escaping)	Re-export as standard RFC4180 CSV
`E_HEADERS`	Missing header, duplicate headers, or rows wider than header	Fix headers or re-export
`E_DIALECT`	Delimiter ambiguous or undetectable	Use `--delimiter <delim>` or add `sep=<char>` to file
`E_NO_KEY`	`--key` column not found in one or both files	Use a column name that exists in both files
`E_KEY_EMPTY`	Empty key value in a non-blank row	Choose a key column with no empty values, or fill missing keys
`E_KEY_DUP`	Duplicate key values within a file	Choose a unique key column or dedupe the data
`E_KEY_MISMATCH`	Key sets differ between files (missing/extra keys)	Export comparable scopes or fix the join key
`E_ROWCOUNT`	Row count mismatch (row-order mode)	Use `--key <column>` for a missing/extra-keys report
`E_NEED_KEY`	Detected row reorder without `--key`	Use `--key <suggested>` (rvl prints candidates)
`E_MIXED_TYPES`	Column has both numeric and non-numeric values	Normalize column values to numeric or exclude the column
`E_NO_NUMERIC`	No numeric columns in common	Ensure both files share at least one numeric column
`E_MISSINGNESS`	Numeric value vs. missing token in aligned cell	Fill missing values or exclude the column
`E_DIFFUSE`	Top 25 contributors can't reach threshold	Use `--threshold 0.80` (or lower) to accept less coverage

Troubleshooting

"E_NEED_KEY" even though rows look the same

Your rows are in a different order between files. rvl detected this and refuses rather than silently comparing wrong row pairs. Use the --key it suggests:

rvl old.csv new.csv --key loan_id

"E_DIFFUSE" — can't reach threshold

Changes are spread across too many cells for the top 25 to explain 95%. This usually means a broad recalculation (e.g., FX revaluation). Lower the threshold:

rvl old.csv new.csv --threshold 0.80

"E_MIXED_TYPES" on a column that looks numeric

A cell in that column has a value rvl can't parse as a number (check for stray text, #N/A variants not in the missing list, or locale-specific formatting). The error message shows the first offending cell.

"E_DIALECT" — delimiter detection failed

Your file uses an uncommon delimiter or has inconsistent field counts. Force the delimiter:

rvl old.csv new.csv --delimiter pipe      # for |
rvl old.csv new.csv --delimiter 0x09      # for tab
rvl old.csv new.csv --delimiter semicolon # for ;

Large files are slow

rvl loads both files into memory. For very large files (millions of rows), ensure sufficient RAM. There is no streaming mode in v0.

Limitations

Limitation	Detail
Numeric columns only	rvl compares numbers. Text column changes are ignored — use `diff` or `shape` for structural checks.
Absolute tolerance only	No relative/percentage tolerance in v0. A $0.01 delta on a $1M balance and a $0.01 balance are treated identically.
MAX_CONTRIBUTORS = 25	Hard cap, not configurable in v0. If change is spread across >25 cells, rvl refuses (`E_DIFFUSE`).
In-memory	Both files are loaded fully into memory. No streaming mode yet.
Two files only	No multi-file or directory comparison.
No column filtering	All common numeric columns are compared. You can't exclude specific columns in v0.

FAQ

Why "rvl"?

Short for reveal. The tool reveals what actually changed, cutting through the noise.

Is this just `diff` for CSVs?

No. diff shows you every line that's different. rvl tells you which numeric changes matter — the smallest set that explains the change. It's an explanation, not a diff.

What if my files have different columns?

rvl compares only columns present in both files. Extra columns on either side are reported in the header but don't affect the verdict.

Can I use this in CI/CD?

Yes. Exit codes (0/1/2) and --json output are designed for automation. Gate on exit code, or parse the JSON for richer assertions.

What about non-US number formats (e.g., `1.234,56`)?

Not supported in v0. rvl assumes US-style formatting (comma as thousands separator, period as decimal).

How does rvl relate to shape?

shape checks structural compatibility (do columns match? is the key valid?). rvl checks numeric content (what changed and by how much?). Run shape first to validate structure, then rvl to explain changes.

JSON Output Reference

A single JSON object on stdout. If the process fails before domain evaluation (e.g., invalid CLI args), JSON may not be emitted.

{
  "version": "rvl.v0",
  "outcome": "REAL_CHANGE",            // "REAL_CHANGE" | "NO_REAL_CHANGE" | "REFUSAL"
  "files": {
    "old": "old.csv",
    "new": "new.csv"
  },
  "alignment": {
    "mode": "key",                      // "key" | "row_order"
    "key_column": "u8:id"              // encoded identifier, or null
  },
  "dialect": {
    "old": { "delimiter": ",", "quote": "\"", "escape": null },
    "new": { "delimiter": ",", "quote": "\"", "escape": null }
  },
  "threshold": 0.95,
  "tolerance": 1e-9,
  "counts": {
    "rows_old": 4183,
    "rows_new": 4183,
    "rows_aligned": 4183,
    "columns_old": 17,
    "columns_new": 16,
    "columns_common": 15,
    "columns_old_only": 2,
    "columns_new_only": 1,
    "numeric_columns": 12,
    "numeric_cells_checked": 50196,
    "numeric_cells_changed": 3
  },
  "metrics": {
    "total_change": 1842100.3713,       // L1 distance (sum of abs deltas above tolerance)
    "max_abs_delta": 1842100.0,         // largest abs(delta) observed (pre-zeroing)
    "top_k_coverage": 0.952             // coverage of top MAX_CONTRIBUTORS
  },
  "limits": {
    "max_contributors": 25
  },
  "contributors": [                     // empty unless REAL_CHANGE
    {
      "row_id": "u8:NVDA",
      "column": "u8:market_value",
      "old": 123.0,
      "new": 1842223.0,
      "delta": 1842100.0,
      "contribution": 1842100.0,
      "share": 0.9998,                  // contribution / total_change
      "cumulative_share": 0.9998
    }
    // ... more contributors, ranked by contribution desc
  ],
  "refusal": null                       // null unless REFUSAL
  // When REFUSAL:
  // "refusal": {
  //   "code": "E_KEY_DUP",
  //   "message": "duplicate key values",
  //   "detail": { "file": "old.csv", "key_samples": ["A123"], ... }
  // }
}

Identifier Encoding (JSON)

Row IDs and column names in JSON use unambiguous encoding:

u8:<string> — valid UTF-8 with no ASCII control bytes (e.g., u8:NVDA, u8:market_value)
hex:<hex-bytes> — anything else (e.g., hex:ff00ab)

Copy the encoded identifier directly into --key to avoid ambiguity.

Nullable Fields

On REFUSAL, counts and metrics fields may be null if they couldn't be computed (e.g., rows_aligned is null for E_ROWCOUNT; all metrics are null for E_NEED_KEY).

Spec

The full specification is docs/PLAN_RVL.md. This README covers everything needed to use the tool; the spec adds implementation details, edge-case definitions, and testing requirements.

Development

cargo fmt --check
cargo clippy --all-targets -- -D warnings
cargo test

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
.beads		.beads
.github/workflows		.github/workflows
benches		benches
docs		docs
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

License

cmdrvl/rvl

Folders and files

Latest commit

History

Repository files navigation

rvl

TL;DR

Why Use rvl?

Quick Example

The Three Outcomes

1. REAL CHANGE

2. NO REAL CHANGE

3. REFUSAL

How It Works

Alignment

Numeric Columns

Tolerance

Threshold and Coverage

Contributor Ranking

How rvl Compares

Installation

Homebrew (Recommended)

Shell Script

Windows (PowerShell)

From Source

CLI Reference

Flags

Exit Codes

Output Routing

Delimiter

Auto-Detection (default)

sep= Directive

--delimiter (forced)

Agent / CI Integration

Agent workflow: shape → rvl

What makes this agent-friendly

Capsule replay workflow (agent swarms)

Scripting Examples

Refusal Codes

Troubleshooting

"E_NEED_KEY" even though rows look the same

"E_DIFFUSE" — can't reach threshold

"E_MIXED_TYPES" on a column that looks numeric

"E_DIALECT" — delimiter detection failed

Large files are slow

Limitations

FAQ

Why "rvl"?

Is this just diff for CSVs?

What if my files have different columns?

Can I use this in CI/CD?

What about non-US number formats (e.g., 1.234,56)?

How does rvl relate to shape?

Identifier Encoding (JSON)

Nullable Fields

Spec

Development

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`sep=` Directive

`--delimiter` (forced)

Is this just `diff` for CSVs?

What about non-US number formats (e.g., `1.234,56`)?

Packages