Skip to content

Commit 11f1c6e

Browse files
Make delimiter optional and fix bug when reusing options table (#42)
1 parent 527620a commit 11f1c6e

File tree

6 files changed

+222
-59
lines changed

6 files changed

+222
-59
lines changed

README.md

Lines changed: 35 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -17,16 +17,16 @@ luarocks install ftcsv
1717
There are two main parsing methods: `ftcv.parse` and `ftcsv.parseLine`.
1818
`ftcsv.parse` loads the entire file and parses it, while `ftcsv.parseLine` is an iterator that parses one line at a time.
1919

20-
### `ftcsv.parse(fileName, delimiter [, options])`
21-
`ftcsv.parse` will load the entire csv file into memory, then parse it in one go, returning a lua table with the parsed data and a lua table containing the column headers. It has only two required parameters - a file name and delimiter (limited to one character). A few optional parameters can be passed in via a table (examples below).
20+
### `ftcsv.parse(fileName, [, options])`
21+
`ftcsv.parse` will load the entire csv file into memory, then parse it in one go, returning a lua table with the parsed data and a lua table containing the column headers. It has only one required parameter - the file name. A few optional parameters can be passed in via a table (examples below).
2222

2323
Just loading a csv file:
2424
```lua
2525
local ftcsv = require('ftcsv')
26-
local zipcodes, headers = ftcsv.parse("free-zipcode-database.csv", ",")
26+
local zipcodes, headers = ftcsv.parse("free-zipcode-database.csv")
2727
```
2828

29-
### `ftcsv.parseLine(fileName, delimiter, [, options])`
29+
### `ftcsv.parseLine(fileName, [, options])`
3030
`ftcsv.parseLine` will open a file and read `options.bufferSize` bytes of the file. `bufferSize` defaults to 2^16 bytes (which provides the fastest parsing on most unix-based systems), or can be specified in the options. `ftcsv.parseLine` is an iterator and returns one line at a time. When all the lines in the buffer are read, it will read in another `bufferSize` bytes of a file and repeat the process until the entire file has been read.
3131

3232
If specifying `bufferSize` there are a couple of things to remember:
@@ -37,7 +37,7 @@ If specifying `bufferSize` there are a couple of things to remember:
3737
Parsing through a csv file:
3838
```lua
3939
local ftcsv = require("ftcsv")
40-
for index, zipcode in ftcsv.parseLine("free-zipcode-database.csv", ",") do
40+
for index, zipcode in ftcsv.parseLine("free-zipcode-database.csv") do
4141
print(zipcode.Zipcode)
4242
print(zipcode.State)
4343
end
@@ -48,11 +48,18 @@ end
4848
The options are the same for `parseLine` and `parse`, with the exception of `loadFromString` and `bufferSize`. `loadFromString` only works with `parse` and `bufferSize` can only be specified for `parseLine`.
4949

5050
The following are optional parameters passed in via the third argument as a table.
51+
- `delimeter`
52+
53+
If your file doesn't use the comma character as the delimiter, you can specify your own. It is limited to one character and defaults to `,`
54+
```lua
55+
ftcsv.parse("a>b>c\r\n1,2,3", {loadFromString=true, delimiter=">"})
56+
```
57+
5158
- `loadFromString`
5259

5360
If you want to load a csv from a string instead of a file, set `loadFromString` to `true` (default: `false`)
5461
```lua
55-
ftcsv.parse("a,b,c\r\n1,2,3", ",", {loadFromString=true})
62+
ftcsv.parse("a,b,c\r\n1,2,3", {loadFromString=true})
5663
```
5764

5865
- `rename`
@@ -63,7 +70,7 @@ The following are optional parameters passed in via the third argument as a tabl
6370

6471
```lua
6572
local options = {loadFromString=true, rename={["a"] = "d", ["b"] = "e", ["c"] = "f"}}
66-
local actual = ftcsv.parse("a,b,c\r\napple,banana,carrot", ",", options)
73+
local actual = ftcsv.parse("a,b,c\r\napple,banana,carrot", options)
6774
```
6875

6976
- `fieldsToKeep`
@@ -74,7 +81,7 @@ The following are optional parameters passed in via the third argument as a tabl
7481

7582
```lua
7683
local options = {loadFromString=true, fieldsToKeep={"a","f"}, rename={["c"] = "f"}}
77-
local actual = ftcsv.parse("a,b,c\r\napple,banana,carrot\r\n", ",", options)
84+
local actual = ftcsv.parse("a,b,c\r\napple,banana,carrot\r\n", options)
7885
```
7986

8087
Also Note: If you apply a function to the headers via headerFunc, and want to select fields from fieldsToKeep, you need to have what the post-modified header would be in fieldsToKeep.
@@ -85,7 +92,7 @@ The following are optional parameters passed in via the third argument as a tabl
8592

8693
```lua
8794
local options = {loadFromString=true, ignoreQuotes=true}
88-
local actual = ftcsv.parse('a,b,c\n"apple,banana,carrot', ",", options)
95+
local actual = ftcsv.parse('a,b,c\n"apple,banana,carrot', options)
8996
```
9097

9198
- `headerFunc`
@@ -95,23 +102,23 @@ The following are optional parameters passed in via the third argument as a tabl
95102
Ex: making all fields uppercase
96103
```lua
97104
local options = {loadFromString=true, headerFunc=string.upper}
98-
local actual = ftcsv.parse("a,b,c\napple,banana,carrot", ",", options)
105+
local actual = ftcsv.parse("a,b,c\napple,banana,carrot", options)
99106
```
100107

101108
- `headers`
102109

103110
Set `headers` to `false` if the file you are reading doesn't have any headers. This will cause ftcsv to create indexed tables rather than a key-value tables for the output.
104111

105112
```lua
106-
local options = {loadFromString=true, headers=false}
107-
local actual = ftcsv.parse("apple>banana>carrot\ndiamond>emerald>pearl", ">", options)
113+
local options = {loadFromString=true, headers=false, delimiter=">"}
114+
local actual = ftcsv.parse("apple>banana>carrot\ndiamond>emerald>pearl", options)
108115
```
109116

110117
Note: Header-less files can still use the `rename` option and after a field has been renamed, it can specified as a field to keep. The `rename` syntax changes a little bit:
111118

112119
```lua
113-
local options = {loadFromString=true, headers=false, rename={"a","b","c"}, fieldsToKeep={"a","b"}}
114-
local actual = ftcsv.parse("apple>banana>carrot\ndiamond>emerald>pearl", ">", options)
120+
local options = {loadFromString=true, headers=false, rename={"a","b","c"}, fieldsToKeep={"a","b"}, delimiter=">"}
121+
local actual = ftcsv.parse("apple>banana>carrot\ndiamond>emerald>pearl", options)
115122
```
116123

117124
In the above example, the first field becomes 'a', the second field becomes 'b' and so on.
@@ -120,7 +127,7 @@ For all tested examples, take a look in /spec/feature_spec.lua
120127

121128
The options can be string together. For example if you wanted to `loadFromString` and not use `headers`, you could use the following:
122129
```lua
123-
ftcsv.parse("apple,banana,carrot", ",", {loadFromString=true, headers=false})
130+
ftcsv.parse("apple,banana,carrot", {loadFromString=true, headers=false})
124131
```
125132

126133
## Encoding
@@ -137,7 +144,7 @@ file:close()
137144
### Options
138145
- `fieldsToKeep`
139146

140-
if `fieldsToKeep` is set in the encode process, only the fields specified will be written out to a file.
147+
if `fieldsToKeep` is set in the encode process, only the fields specified will be written out to a file. The `fieldsToKeep` will be written out in the order that is specified.
141148

142149
```lua
143150
local output = ftcsv.encode(everyUser, ",", {fieldsToKeep={"Name", "Phone", "City"}})
@@ -148,7 +155,7 @@ file:close()
148155
if `onlyRequiredQuotes` is set to `true`, the output will only include quotes around fields that are quotes, have newlines, or contain the delimter.
149156

150157
```lua
151-
local output = ftcsv.encode(everyUser, ",", {noQuotes=true})
158+
local output = ftcsv.encode(everyUser, ",", {onlyRequiredQuotes=true})
152159
```
153160

154161

@@ -184,7 +191,7 @@ NOTE: times are measured using `os.clock()`, so they are in CPU seconds. Each te
184191
Benchmarks were run under ftcsv 1.2.0
185192

186193
## Performance
187-
I did some basic testing and found that in lua, if you want to iterate over a string character-by-character and compare chars, `string.byte` performs faster than `string.sub`. As such, ftcsv iterates over the whole file and does byte compares to find quotes and delimiters and then generates a table from it. When using vanilla lua, it proved faster to use `string.find` instead of iterating character by character (which is faster in LuaJIT), so ftcsv accounts for that and will perform the fastest option that is availble. If you have thoughts on how to improve performance (either big picture or specifically within the code), create a GitHub issue - I'd love to hear about it!
194+
I did some basic testing and found that in lua, if you want to iterate over a string character-by-character and compare chars, `string.byte` performs faster than `string.sub`. As such, ftcsv iterates over the whole file and does byte compares to find quotes and delimiters and then generates a table from it. When using vanilla lua, it proved faster to use `string.find` instead of iterating character by character (which is faster in LuaJIT), so ftcsv accounts for that and will perform the fastest option that is available. If you have thoughts on how to improve performance (either big picture or specifically within the code), create a GitHub issue - I'd love to hear about it!
188195

189196

190197
## Contributing
@@ -200,6 +207,16 @@ Feel free to create a new issue for any bugs you've found or help you need. If y
200207
8. Enjoy the changes made!
201208

202209

210+
## Delimiter no longer required as of 1.4.0!
211+
Starting with version 1.4.0, the delimiter no longer required as the second argument. **But don't worry,** ftcsv remains backwards compatible! We check the argument types and adjust parsing as necessary. There is no intention to remove this backwards compatibility layer, so you can always enjoy your up-to-date lightning fast CSV parser!
212+
213+
So this works just fine:
214+
```lua
215+
ftcsv.parse("a>b>c\r\n1,2,3", ">", {loadFromString=true})
216+
```
217+
218+
The delimiter as the second argument will always take precedent if both are provided.
219+
203220

204221
## Licenses
205222
- The main library is licensed under the MIT License. Feel free to use it!
Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
11
package = "ftcsv"
2-
version = "1.3.0-1"
2+
version = "1.4.0-1"
33

44
source = {
55
url = "git://github.com/FourierTransformer/ftcsv.git",
6-
tag = "1.3.0"
6+
tag = "1.4.0"
77
}
88

99
description = {
1010
summary = "A fast pure lua csv library (parser and encoder)",
1111
detailed = [[
1212
ftcsv is a fast and easy to use csv library for lua. It can read in CSV files,
13-
do some basic transformations (rename fields) and can create the csv format.
13+
do some basic transformations (rename fields, retain, etc) and can create a CSV file.
1414
It supports UTF-8, header-less CSVs, and maintaining correct line endings for
1515
multi-line fields.
1616

ftcsv.lua

Lines changed: 58 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
local ftcsv = {
2-
_VERSION = 'ftcsv 1.3.0',
2+
_VERSION = 'ftcsv 1.4.0',
33
_DESCRIPTION = 'CSV library for Lua',
44
_URL = 'https://github.com/FourierTransformer/ftcsv',
55
_LICENSE = [[
@@ -90,7 +90,7 @@ end
9090

9191

9292
-- determine the real headers as opposed to the header mapping
93-
local function determineRealHeaders(headerField, fieldsToKeep)
93+
local function determineRealHeaders(headerField, fieldsToKeep)
9494
local realHeaders = {}
9595
local headerSet = {}
9696
for i = 1, #headerField do
@@ -396,6 +396,22 @@ local function initializeInputFromStringOrFile(inputFile, options, amount)
396396
return inputString, file
397397
end
398398

399+
local function determineArgumentOrder(delimiter, options)
400+
-- backwards compatibile layer
401+
if type(delimiter) == "string" then
402+
return delimiter, options
403+
404+
-- the new format for parseLine
405+
elseif type(delimiter) == "table" then
406+
local realDelimiter = delimiter.delimiter or ","
407+
return realDelimiter, delimiter
408+
409+
-- if nothing is specified, assume "," delimited and call it a day!
410+
else
411+
return ",", nil
412+
end
413+
end
414+
399415
local function parseOptions(delimiter, options, fromParseLine)
400416
-- delimiter MUST be one character
401417
assert(#delimiter == 1 and type(delimiter) == "string", "the delimiter must be of string type and exactly one character")
@@ -404,50 +420,54 @@ local function parseOptions(delimiter, options, fromParseLine)
404420

405421
if options then
406422

407-
if options.headers ~= nil then
408-
assert(type(options.headers) == "boolean", "ftcsv only takes the boolean 'true' or 'false' for the optional parameter 'headers' (default 'true'). You passed in '" .. tostring(options.headers) .. "' of type '" .. type(options.headers) .. "'.")
409-
end
423+
if options.headers ~= nil then
424+
assert(type(options.headers) == "boolean", "ftcsv only takes the boolean 'true' or 'false' for the optional parameter 'headers' (default 'true'). You passed in '" .. tostring(options.headers) .. "' of type '" .. type(options.headers) .. "'.")
425+
end
410426

411-
if options.rename ~= nil then
412-
assert(type(options.rename) == "table", "ftcsv only takes in a key-value table for the optional parameter 'rename'. You passed in '" .. tostring(options.rename) .. "' of type '" .. type(options.rename) .. "'.")
413-
end
427+
if options.rename ~= nil then
428+
assert(type(options.rename) == "table", "ftcsv only takes in a key-value table for the optional parameter 'rename'. You passed in '" .. tostring(options.rename) .. "' of type '" .. type(options.rename) .. "'.")
429+
end
414430

415-
if options.fieldsToKeep ~= nil then
416-
assert(type(options.fieldsToKeep) == "table", "ftcsv only takes in a list (as a table) for the optional parameter 'fieldsToKeep'. You passed in '" .. tostring(options.fieldsToKeep) .. "' of type '" .. type(options.fieldsToKeep) .. "'.")
417-
local ofieldsToKeep = options.fieldsToKeep
418-
if ofieldsToKeep ~= nil then
419-
fieldsToKeep = {}
420-
for j = 1, #ofieldsToKeep do
421-
fieldsToKeep[ofieldsToKeep[j]] = true
422-
end
423-
end
424-
if options.headers == false and options.rename == nil then
425-
error("ftcsv: fieldsToKeep only works with header-less files when using the 'rename' functionality")
431+
if options.fieldsToKeep ~= nil then
432+
assert(type(options.fieldsToKeep) == "table", "ftcsv only takes in a list (as a table) for the optional parameter 'fieldsToKeep'. You passed in '" .. tostring(options.fieldsToKeep) .. "' of type '" .. type(options.fieldsToKeep) .. "'.")
433+
local ofieldsToKeep = options.fieldsToKeep
434+
if ofieldsToKeep ~= nil then
435+
fieldsToKeep = {}
436+
for j = 1, #ofieldsToKeep do
437+
fieldsToKeep[ofieldsToKeep[j]] = true
426438
end
427439
end
428-
429-
if options.loadFromString ~= nil then
430-
assert(type(options.loadFromString) == "boolean", "ftcsv only takes a boolean value for optional parameter 'loadFromString'. You passed in '" .. tostring(options.loadFromString) .. "' of type '" .. type(options.loadFromString) .. "'.")
440+
if options.headers == false and options.rename == nil then
441+
error("ftcsv: fieldsToKeep only works with header-less files when using the 'rename' functionality")
431442
end
443+
end
432444

433-
if options.headerFunc ~= nil then
434-
assert(type(options.headerFunc) == "function", "ftcsv only takes a function value for optional parameter 'headerFunc'. You passed in '" .. tostring(options.headerFunc) .. "' of type '" .. type(options.headerFunc) .. "'.")
435-
end
445+
if options.loadFromString ~= nil then
446+
assert(type(options.loadFromString) == "boolean", "ftcsv only takes a boolean value for optional parameter 'loadFromString'. You passed in '" .. tostring(options.loadFromString) .. "' of type '" .. type(options.loadFromString) .. "'.")
447+
end
448+
449+
if options.headerFunc ~= nil then
450+
assert(type(options.headerFunc) == "function", "ftcsv only takes a function value for optional parameter 'headerFunc'. You passed in '" .. tostring(options.headerFunc) .. "' of type '" .. type(options.headerFunc) .. "'.")
451+
end
452+
453+
if options.ignoreQuotes == nil then
454+
options.ignoreQuotes = false
455+
else
456+
assert(type(options.ignoreQuotes) == "boolean", "ftcsv only takes a boolean value for optional parameter 'ignoreQuotes'. You passed in '" .. tostring(options.ignoreQuotes) .. "' of type '" .. type(options.ignoreQuotes) .. "'.")
457+
end
436458

437-
if options.ignoreQuotes == nil then
438-
options.ignoreQuotes = false
459+
if fromParseLine == true then
460+
if options.bufferSize == nil then
461+
options.bufferSize = 2^16
439462
else
440-
assert(type(options.ignoreQuotes) == "boolean", "ftcsv only takes a boolean value for optional parameter 'ignoreQuotes'. You passed in '" .. tostring(options.ignoreQuotes) .. "' of type '" .. type(options.ignoreQuotes) .. "'.")
463+
assert(type(options.bufferSize) == "number", "ftcsv only takes a number value for optional parameter 'bufferSize'. You passed in '" .. tostring(options.bufferSize) .. "' of type '" .. type(options.bufferSize) .. "'.")
441464
end
442465

443-
if options.bufferSize == nil then
444-
options.bufferSize = 2^16
445-
else
446-
assert(type(options.bufferSize) == "number", "ftcsv only takes a number value for optional parameter 'bufferSize'. You passed in '" .. tostring(options.bufferSize) .. "' of type '" .. type(options.bufferSize) .. "'.")
447-
if fromParseLine == false then
448-
error("ftcsv: bufferSize can only be specified using 'parseLine'. When using 'parse', the entire file is read into memory")
449-
end
466+
else
467+
if options.bufferSize ~= nil then
468+
error("ftcsv: bufferSize can only be specified using 'parseLine'. When using 'parse', the entire file is read into memory")
450469
end
470+
end
451471

452472
else
453473
options = {
@@ -539,6 +559,8 @@ end
539559

540560
-- runs the show!
541561
function ftcsv.parse(inputFile, delimiter, options)
562+
local delimiter, options = determineArgumentOrder(delimiter, options)
563+
542564
local options, fieldsToKeep = parseOptions(delimiter, options, false)
543565

544566
local inputString = initializeInputFromStringOrFile(inputFile, options, "*all")
@@ -573,6 +595,7 @@ local function initializeInputFile(inputString, options)
573595
end
574596

575597
function ftcsv.parseLine(inputFile, delimiter, userOptions)
598+
local delimiter, userOptions = determineArgumentOrder(delimiter, userOptions)
576599
local options, fieldsToKeep = parseOptions(delimiter, userOptions, true)
577600
local inputString, file = initializeInputFile(inputFile, options)
578601

spec/dynamic_features_spec.lua

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ describe("csv features", function()
6161
end
6262

6363
local options = {loadFromString=true, rename={["a"] = "d", ["b"] = "e", ["c"] = "f"}}
64-
local actual, actualHeaders = ftcsv.parse(defaultString, ",", options)
64+
local actual, actualHeaders = ftcsv.parse(defaultString, options)
6565
assert.are.same(expected, actual)
6666
assert.are.same(expectedHeaders, actualHeaders)
6767
end)
@@ -123,7 +123,7 @@ describe("csv features", function()
123123
end
124124

125125
local options = {loadFromString=true, fieldsToKeep={"a", "b"}}
126-
local actual, actualHeaders = ftcsv.parse(defaultString, ",", options)
126+
local actual, actualHeaders = ftcsv.parse(defaultString, options)
127127
assert.are.same(expected, actual)
128128
assert.are.same(expectedHeaders, actualHeaders)
129129
end)
@@ -347,7 +347,7 @@ describe("csv features", function()
347347
end
348348

349349
local options = {loadFromString=true, headers=false}
350-
local actual, actualHeaders = ftcsv.parse(defaultString, ",", options)
350+
local actual, actualHeaders = ftcsv.parse(defaultString, options)
351351
assert.are.same(expected, actual)
352352
assert.are.same(expectedHeaders, actualHeaders)
353353
end)

0 commit comments

Comments
 (0)