You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`--threshold <float>`| float |`0.95`| Coverage target (0 < x ≤ 1.0). The minimum fraction of total numeric change that the top contributors must explain. |
282
282
|`--tolerance <float>`| float |`1e-9`| Per-cell noise floor (x ≥ 0). Absolute deltas ≤ this value are treated as zero. |
283
283
|`--delimiter <delim>`| string |*(auto-detect)*| Force CSV delimiter for both files. See [Delimiter](#delimiter). |
Compared against expected parse/refusal sets in `tests/corpus_parse.rs`:
98
+
-**Arrow mismatches (5):**
99
+
- Expected parse-ok but skipped: `extra_fields_empty.csv`, `extra_trailing_empty_fields.csv`, `ragged_rows_long_empty.csv`, `wide_row_extra_empty.csv`
100
+
- Expected refusal but parsed: `duplicate_headers.csv`
101
+
-**Polars mismatches (6):**
102
+
- Expected parse-ok but skipped: `backslash_escape.csv`, `extra_fields_empty.csv`, `extra_trailing_empty_fields.csv`, `ragged_rows_long_empty.csv`, `wide_row_extra_empty.csv`
103
+
- Expected refusal but parsed: `duplicate_headers.csv`
104
+
105
+
Note on forced-delimiter fixtures:
106
+
-`control_byte_header.csv` and `delim_0x1f.csv` are parse-ok fixtures that require forced delimiter in the corpus spec.
107
+
- Targeted runs with `RVL_BAKEOFF_DELIMITER=0x01` and `RVL_BAKEOFF_DELIMITER=0x1f` succeed for both Arrow and Polars, so these were excluded from mismatch counts.
108
+
88
109
### Throughput / Memory
89
110
| Parser | Rows/sec | MB/sec | Peak RSS | Notes |
90
111
| --- | --- | --- | --- | --- |
@@ -108,7 +129,8 @@ Note: the bakeoff harness is in-memory and does not include disk I/O.
108
129
Baseline Rust `csv` passes the corpus (0 mismatches). simd-csv is ~18.9% faster
109
130
in the parser-only bakeoff but skips backslash-escape cases in the harness and
110
131
does not meet the >=25% throughput gate. Arrow and Polars are both slower than
111
-
the baseline on the same large inputs. Keep Rust `csv` for v0.
132
+
the baseline on the same large inputs and fail corpus compatibility checks.
133
+
Keep Rust `csv` for v0.
112
134
113
135
## Next Steps
114
136
- If needed, evaluate Arrow/Polars CSV readers and record results.
0 commit comments