- Introduction
- Migrating from Ruby CSV
- Ruby CSV Pitfalls
- Parsing Strategy
- The Basic Read API
- The Basic Write API
- Batch Processing
- Configuration Options
- Row and Column Separators
- Header Transformations
- Header Validations
- Column Selection
- Data Transformations
- Value Converters
- Bad Row Quarantine
- Instrumentation Hooks
- Examples
- Real-World CSV Files
- SmarterCSV over the Years
- Release Notes
SmarterCSV was born from a StackOverflow question in 2011 about importing CSV data into MongoDB. The answer involved processing CSV rows as hashes — which turned out to be so useful that it became a gem.
The original write-up is preserved at The original post.
The first gem release was v1.0.1 on 2012-07-30.
| Version | Date | Highlight |
|---|---|---|
| 1.0.1 | 2012-07-30 | First release: CSV → array of hashes, batch processing, key mapping |
| 1.0.17 | 2014-01-13 | row_sep: :auto — automatic row separator detection |
| 1.0.18 | 2014-10-27 | Multi-line / embedded-newline field support |
| 1.1.0 | 2015-07-26 | value_converters — custom per-column type parsing (dates, money, …) |
| 1.4.0 | 2022-02-11 | Experimental col_sep: :auto detection; switched to MIT-only licence |
| 1.5.1 | 2022-04-27 | duplicate_header_suffix for CSV files with repeated headers |
| 1.6.0 | 2022-05-03 | Complete rewrite of the pure-Ruby line parser |
| 1.7.0 | 2022-06-26 | First C extension — >10× speedup over 1.6.x announced |
| 1.8.0 | 2023-03-18 | col_sep: :auto and row_sep: :auto made the default |
| 1.9.0 | 2023-09-04 | Structured error objects with programmatic key access |
| 1.10.0 | 2023-12-31 | Performance & memory improvements; stricter user_provided_headers |
| 1.11.0 | 2024-07-02 | SmarterCSV::Writer — CSV generation from hashes |
| 1.12.0 | 2024-07-09 | Thread-safe SmarterCSV::Reader class; docs site added |
| 1.13.0 | 2024-11-06 | Auto-generation of extra column names; improved quote robustness |
| 1.14.0 | 2025-04-07 | Advanced Writer options; header_converter |
| 1.14.3 | 2025-05-04 | C-extension fast path for unquoted fields; inline whitespace stripping |
| 1.15.0 | 2026-02-04 | Major C-extension rewrite — ~5× faster than 1.14.4; 39% less memory |
| 1.15.1 | 2026-02-17 | Fix for backslash in quoted fields (quote_escaping: option) |
| 1.15.2 | 2026-02-20 | Further C-path optimisations; 5.4×–37.4× faster than 1.14.4 |
| 1.16.0 | 2026-03-12 | New each/each_chunk enumerator API; SmarterCSV.parse; bad row quarantine; column selection headers: { only: }; 1.8×–8.6× faster than Ruby CSV.read; new features for Reader and Writer; minor breaking: quote_boundary: :standard |
| 1.16.1 | 2026-03-16 | SmarterCSV.errors class-level error access; fix col_sep in quoted headers (#325); fix quoted numeric conversion |
Measured on Apple M1, Ruby 3.4.7. Best of 2 sessions × 30 runs.
All times are C-accelerated except the 1.6.1 column (no C extension existed).
— = not measured for that version.
| File | Rows | 1.6.1 Rb (s) | 1.7.1 C (s) | 1.14.4 C (s) | 1.15.2 C (s) | 1.16.0 C (s) | total gain |
|---|---|---|---|---|---|---|---|
| PEOPLE_IMPORT_B.csv | 50k | 3.793 | 1.083 | 1.656 | 0.101 | 0.087 | 43.6× |
| PEOPLE_IMPORT_C.csv | 50k | 21.612 | 2.763 | 8.172 | 0.207 | 0.169 | 127.8× |
| PEOPLE_IMPORT_NB.csv | 50k | 3.746 | 1.053 | 1.605 | 0.086 | 0.080 | 46.9× |
| PEOPLE_IMPORT_NC.csv | 50k | 3.831 | 1.018 | 1.495 | 0.076 | 0.063 | 60.8× |
| uscities.csv | 31k | — | — | 1.058 | 0.113 | 0.108 | — |
| uszips.csv | 34k | — | — | 1.277 | 0.111 | 0.102 | — |
| worldcities.csv | 48k | — | — | 1.070 | 0.116 | 0.097 | — |
| fmap.csv | 50k | 2.130 | 0.873 | — | — | — | — |
| zipcode.csv | 44k | 1.572 | 0.797 | — | — | — | — |
| sample_10M.csv | 50k | 1.291 | 0.661 | 0.459 | 0.053 | 0.046 | 28.0× |
| sensor_data_50krows_50cols.csv | 50k | — | — | 3.985 | 0.272 | 0.264 | — |
| embedded_newlines_20k.csv | 80k | 0.716 | 0.366 | 0.540 | 0.056 | 0.054 | 13.2× |
| embedded_separators_20k.csv | 20k | 0.714 | 0.333 | 0.278 | 0.032 | 0.025 | 28.6× |
| heavy_quoting_20k.csv | 20k | 1.309 | 0.484 | 0.522 | 0.054 | 0.036 | 36.5× |
| long_fields_20k.csv | 20k | 5.698 | 1.112 | 2.960 | 0.110 | 0.045 | 126.6× |
| many_empty_fields_20k.csv | 20k | 1.149 | 0.420 | 0.395 | 0.031 | 0.025 | 45.8× |
| multi_char_separator_20k.csv | 20k | — | — | 0.539 | 0.033 | 0.026 | — |
| tab_separated_20k.tsv | 20k | — | — | 0.462 | 0.034 | 0.025 | — |
| utf8_multibyte_20k.csv | 20k | 0.709 | 0.305 | 0.228 | 0.020 | 0.017 | 41.7× |
| whitespace_heavy_20k.csv | 20k | 1.335 | 0.393 | 0.536 | 0.036 | 0.028 | 47.5× |
| wide_500_cols_20k.csv | 20k | 39.755 | 9.532 | 17.658 | 1.419 | 1.352 | 29.4× |
total gain = v1.6.1 Ruby time / v1.16.0 C-accelerated time (files without 1.6.1 data show —)
Highlights:
long_fields_20k(long quoted fields): 126.6× —memchr-based field scanning makes long quoted fields essentially free to skip.PEOPLE_IMPORT_C(116 columns): 127.8× — wide rows multiply every per-field saving across all columns.PEOPLE_IMPORT_NC(17 columns): 60.8× — Ruby-path optimisations #10 & #11 provide an extra boost on moderately wide files.wide_500_cols_20kwent from 39.8 seconds → 1.35 seconds — and withheaders: { only: }keeping just 2 of those 500 columns it drops further to ~0.1 seconds (an additional ~16× on top).embedded_newlinesshows the smallest gain (13.2×) — multi-line stitching is bounded by I/O and the line-counting loop, not field parsing.
- Parsing CSV Files in Ruby with SmarterCSV
- SmarterCSV 1.15.2 — Faster than raw CSV arrays
- Processing 1.4 Million CSV Records in Ruby, fast
- Faster Parsing CSV with Parallel Processing by Jack Lin
PREVIOUS: Real-World CSV Files | NEXT: Release Notes | UP: README