SmarterCSV Basic API

Let's explore the basic APIs for reading and writing CSV files. There is a simplified API (backwards conpatible with previous SmarterCSV versions) and the full API, which allows you to access the internal state of the reader or writer instance after processing.

Reading CSV

SmarterCSV has convenient defaults for automatically detecting row and column separators based on the given data. This provides more robust parsing of input files when you have no control over the data, e.g. when users upload CSV files. Learn more about this in this section.

Simplified Interface

The simplified call to read CSV files is:

  ```
     array_of_hashes = SmarterCSV.process(file_or_input, options)

  ```

To parse a CSV string directly (no file needed), use SmarterCSV.parse:

  ```
     array_of_hashes = SmarterCSV.parse(csv_string, options)

  ```

This is equivalent to SmarterCSV.process(StringIO.new(csv_string), options) and is the idiomatic replacement for CSV.parse(str, headers: true, header_converters: :symbol). See Migrating from Ruby CSV for a full comparison.

It can also be used with a block. The block always receives an array of hashes and an optional chunk index:

  ```
     SmarterCSV.process(file_or_input, options) do |array_of_hashes|
       # without chunk_size, each yield contains a one-element array (one row)
     end
  ```

  ```
     SmarterCSV.process(file_or_input, options) do |array_of_hashes, chunk_index|
        # the chunk_index can be used to track chunks for parallel processing
     end
  ```

When processing batches of rows, use the chunk_size option. The block receives an array of up to chunk_size hashes per yield:

  ```
     SmarterCSV.process(file_or_input, {chunk_size: 100}) do |array_of_hashes, chunk_index|
        # process one chunk of up to 100 rows of CSV data
        puts "Processing chunk #{chunk_index}..."
     end
  ```

Full Interface

The simplified API works in most cases, but if you need access to the internal state and detailed results of the CSV-parsing, you should use this form:

  ```
    reader = SmarterCSV::Reader.new(file_or_input, options)
    data = reader.process

    puts reader.raw_headers
  ```

It can also be used with a block. The block always receives an array of hashes and an optional chunk index:

  ```
    reader = SmarterCSV::Reader.new(file_or_input, options)
    data = reader.process do |array_of_hashes, chunk_index|
       # do something here
    end

    puts reader.raw_headers
  ```

This allows you access to the internal state of the reader instance after processing.

Modern Enumerator API — `each`

Reader#each is the modern, idiomatic way to read CSV rows one at a time. It always yields a single Hash per row and includes Enumerable, so every standard Ruby enumerable method works out of the box.

Simplified form

SmarterCSV.each('data.csv', options) do |hash|
  MyModel.upsert(hash)
end

Full form (recommended — retains reader state after processing)

reader = SmarterCSV::Reader.new('data.csv', options)

reader.each do |hash|
  MyModel.upsert(hash)
end

puts reader.headers       # accessible after processing
puts reader.errors.inspect

Returns an Enumerator when called without a block

enum = SmarterCSV.each('data.csv', options)
enum.to_a   # => [{ name: "Alice", ... }, { name: "Bob", ... }, ...]

Enumerable methods work directly

Because Reader includes Enumerable, all standard Ruby enumerable methods work:

reader = SmarterCSV::Reader.new('data.csv', options)

# Filter rows
us_users = reader.select { |h| h[:country] == 'US' }

# Transform
names = reader.map { |h| h[:name] }

# Count good rows
reader.count

# Row index (0-based count of successfully parsed rows, excluding bad rows)
reader.each_with_index do |hash, i|
  puts "Row #{i}: #{hash[:name]}"
end

# Free chunking via Enumerable — no chunk_size needed
reader.each_slice(100) do |batch|
  MyModel.insert_all(batch)
end

Lazy evaluation

lazy lets you stop early without reading the entire file:

# Read only the first 10 rows matching a condition
reader = SmarterCSV::Reader.new('big.csv', options)
result = reader.lazy.select { |h| h[:status] == 'active' }.first(10)

`each` ignores `chunk_size`

If chunk_size is set in options, each ignores it and always yields individual Hash objects. Use each_chunk for chunked batch processing.

Interaction with `on_bad_row`

each respects all on_bad_row options. Bad rows are skipped (or routed to your handler) and never yielded:

reader = SmarterCSV::Reader.new('data.csv', on_bad_row: :collect)
reader.each { |hash| MyModel.upsert(hash) }
reader.errors[:bad_rows].each { |rec| puts "Bad row: #{rec[:error_message]}" }

Value Transformation Pipeline

After each row is parsed, SmarterCSV applies transformations to field values in this order:

Step	Option	Default	Description
1	`strip_whitespace`	`true`	Strips leading/trailing whitespace from all values (and headers) at parse time
2	`nil_values_matching`	`nil`	Sets values matching the regexp to `nil`
3	`remove_empty_values`	`true`	Removes keys whose value is `nil` or blank
4	`remove_zero_values`	`false`	Removes keys whose value is numeric zero
5	`convert_values_to_numeric`	`true`	Converts numeric-looking strings to `Integer` or `Float`
6	`value_converters`	`nil`	Applies per-key custom converter lambdas or classes
7	`remove_empty_hashes`	`true`	Drops rows that are entirely empty after all transformations

Steps 2–6 run per field, in that order, for every key/value pair in the row. value_converters receive the value after numeric conversion — guard against Integer/Float input if needed.

See Data Transformations and Value Converters for details.

Header Transformation Pipeline

Before any data rows are processed, the header line passes through these steps:

comment_regexp → strip_chars_from_headers → split on col_sep → strip quote_char
    → strip_whitespace → [gsub spaces/dashes→_ → downcase_header]
    → disambiguate_headers → symbolize → key_mapping

user_provided_headers bypasses the file header and all transformation steps — your array is used as-is.

See Header Transformations for the full step-by-step table and options.

Rescue from Exceptions

While SmarterCSV uses sensible defaults to process the most common CSV files, it will raise exceptions if it can not auto-detect col_sep, row_sep, or if it encounters other problems. Therefore please rescue from SmarterCSV::Error, and handle outliers according to your requirements.

If you encounter unusual CSV files, please follow the tips in the Troubleshooting section below. You can use the options below to accommodate for unusual formats.

Troubleshooting

In case your CSV file is not being parsed correctly, try to examine it in a text editor. For closer inspection a tool like hexdump can help find otherwise hidden control character or byte sequences like BOMs.

$ hexdump -C spec/fixtures/bom_test_feff.csv
00000000  fe ff 73 6f 6d 65 5f 69  64 2c 74 79 70 65 2c 66  |..some_id,type,f|
00000010  75 7a 7a 62 6f 78 65 73  0d 0a 34 32 37 36 36 38  |uzzboxes..427668|
00000020  30 35 2c 7a 69 7a 7a 6c  65 73 2c 31 32 33 34 0d  |05,zizzles,1234.|
00000030  0a 33 38 37 35 39 31 35  30 2c 71 75 69 7a 7a 65  |.38759150,quizze|
00000040  73 2c 35 36 37 38 0d 0a                           |s,5678..|

Assumptions / Limitations

By default, quote escaping uses :auto mode — SmarterCSV tries backslash-escape (\") first and falls back to RFC 4180 doubled-quotes (""). Use quote_escaping: :double_quotes or :backslash to fix the mode explicitly. See Parsing Strategy.
Quote characters around fields are expected to be balanced, e.g. valid: "field", invalid: "field\" — an escaped quote_char does not denote the end of a field.

NOTES about File Encodings:

if you have a CSV file which contains unicode characters, you can process it as follows:

       File.open(filename, "r:bom|utf-8") do |f|
         data = SmarterCSV.process(f);
       end

if the CSV file with unicode characters is in a remote location, similarly you need to give the encoding as an option to the open call:

       require 'open-uri'
       file_location = 'http://your.remote.org/sample.csv'
       open(file_location, 'r:utf-8') do |f|   # don't forget to specify the UTF-8 encoding!!
         data = SmarterCSV.process(f)
       end

PREVIOUS: Parsing Strategy | NEXT: The Basic Write API | UP: README

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Contents

SmarterCSV Basic API

Reading CSV

Simplified Interface

Full Interface

Modern Enumerator API — `each`

Simplified form

Full form (recommended — retains reader state after processing)

Returns an Enumerator when called without a block

Enumerable methods work directly

Lazy evaluation

`each` ignores `chunk_size`

Interaction with `on_bad_row`

Value Transformation Pipeline

Header Transformation Pipeline

Rescue from Exceptions

Troubleshooting

Assumptions / Limitations

NOTES about File Encodings:

Uh oh!

FilesExpand file tree

basic_read_api.md

Latest commit

History

basic_read_api.md

File metadata and controls

Contents

SmarterCSV Basic API

Reading CSV

Simplified Interface

Full Interface

Modern Enumerator API — each

Simplified form

Full form (recommended — retains reader state after processing)

Returns an Enumerator when called without a block

Enumerable methods work directly

Lazy evaluation

each ignores chunk_size

Interaction with on_bad_row

Value Transformation Pipeline

Header Transformation Pipeline

Rescue from Exceptions

Troubleshooting

Assumptions / Limitations

NOTES about File Encodings:

Modern Enumerator API — `each`

`each` ignores `chunk_size`

Interaction with `on_bad_row`