Skip to content

Commit bc90c5a

Browse files
committed
Benchmarks scaffolding & Nil/Bool strategies
1 parent 5ae5412 commit bc90c5a

File tree

8 files changed

+161
-28
lines changed

8 files changed

+161
-28
lines changed

Package.swift

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,12 @@ let package = Package(
88
],
99
products: [
1010
.library(name: "CodableCSV", targets: ["CodableCSV"]),
11+
// .executable(name: "CodableCSV-Benchmarks", targets: ["CodableCSV-Benchmarks"])
1112
],
1213
dependencies: [],
1314
targets: [
1415
.target(name: "CodableCSV", dependencies: [], path: "sources"),
1516
.testTarget(name: "CodableCSVTests", dependencies: ["CodableCSV"], path: "tests"),
17+
// .target(name: "CodableCSV-Benchmarks", dependencies: ["CodableCSV"], path: "benchmarks"),
1618
]
1719
)

README.md

Lines changed: 32 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ A `CSVReadder` parses CSV data from a given input (`String`, or `Data`, or file)
9999
"""
100100
let reader = try CSVReader(input: string) { $0.headerStrategy = .firstLine }
101101

102-
let headers = reader.headers // ["numA", "numB", "numC"]
102+
let headers = reader.headers // ["numA", "numB", "numC"]
103103
let rowA = try reader.readRow() // ["1", "2", "3"]
104104
let rowB = try reader.readRow() // ["4", "5", "6"]
105105
```
@@ -316,7 +316,7 @@ let result = try decoder.decode(CustomType.self, from: data)
316316
317317
```swift
318318
let decoder = CSVDecoder { $0.bufferingStrategy = .sequential }
319-
let content: [Student] = try decoder([Student].self, from: URL("~/Desktop/Student.csv"))
319+
let content: [Student] = try decoder.decode([Student].self, from: URL("~/Desktop/Student.csv"))
320320
```
321321
322322
If you are dealing with a big CSV file, it is preferred to used direct file decoding, a `.sequential` or `.unrequested` buffering strategy, and set *presampling* to false; since then memory usage is drastically reduced.
@@ -325,17 +325,21 @@ If you are dealing with a big CSV file, it is preferred to used direct file deco
325325
326326
The decoding process can be tweaked by specifying configuration values at initialization time. `CSVDecoder` accepts the [same configuration values as `CSVReader`](#Reader-configuration) plus the following ones:
327327
328+
- `nilStrategy` (default: `.empty`) indicates how the `nil` *concept* (absence of value) is represented on the CSV.
329+
330+
- `boolStrategy` (default: `.insensitive`) defines how strings are decoded to `Bool` values.
331+
328332
- `floatStrategy` (default `.throw`) defines how to deal with non-conforming floating-point numbers (e.g. `NaN`).
329333
330334
- `decimalStrategy` (default `.locale`) indicates how strings are decoded to `Decimal` values.
331335
332-
- `dataStrategy` (default `.deferredToDate`) specify how strings are decoded to `Date` values.
336+
- `dateStrategy` (default `.deferredToDate`) specify how strings are decoded to `Date` values.
333337
334338
- `dataStrategy` (default `.base64`) indicates how strings are decoded to `Data` values.
335339
336340
- `bufferingStrategy` (default `.keepAll`) controls the behavior of `KeyedDecodingContainer`s.
337341
338-
Selecting a buffering strategy directly affect the the decoding performance and the amount of memory used during the process. For more information check this README's [Tips using `Codable`](#Tips-using-codable) section and the [`Strategy.DecodingBuffer` definition](sources/Codable/Decodable/DecodingStrategy.swift).
342+
Selecting a buffering strategy affects the the decoding performance and the amount of memory used during the process. For more information check this README's [Tips using `Codable`](#Tips-using-codable) section and the [`Strategy.DecodingBuffer` definition](sources/Codable/Decodable/DecodingStrategy.swift).
339343
340344
The configuration values can be set during `CSVDecoder` initialization or at any point before the `decode` function is called.
341345
@@ -377,6 +381,10 @@ If you are dealing with a big CSV content, it is preferred to use direct file en
377381
378382
The encoding process can be tweaked by specifying configuration values. `CSVEncoder` accepts the [same configuration values as `CSVWriter`](#Writer-configuration) plus the following ones:
379383
384+
- `nilStrategy` (default: `.empty`) indicates how the `nil` *concept* (absence of value) is represented on the CSV.
385+
386+
- `boolStrategy` (default: `.deferredToString`) defines how Boolean values are encoded to `String` values.
387+
380388
- `floatStrategy` (default `.throw`) defines how to deal with non-conforming floating-point numbers (e.g. `NaN`).
381389
382390
- `decimalStrategy` (default `.locale`) indicates how decimal numbers are encoded to `String` values.
@@ -412,7 +420,7 @@ encoder.dataStrategy = .custom { (data, encoder) in
412420
</p></details>
413421
</ul>
414422
415-
## Tips using `Codable`
423+
### Tips using `Codable`
416424
417425
`Codable` is fairly easy to use and most Swift standard library types already conform to it. However, sometimes it is tricky to get custom types to comply to `Codable` for specific functionality. That is why I am leaving here some tips and advices concerning its usage:
418426
@@ -597,3 +605,22 @@ struct Student: Codable {
597605
<p align="center">
598606
<img src="docs/Assets/Roadmap.svg" alt="Roadmap"/>
599607
</p>
608+
609+
The library has been heavily documented and any contribution is welcome. Please take a look at the [How to contribute](docs/CONTRIBUTING.md) document or peer into a more detailed roadmap on the [Github projects](https://github.com/dehesa/CodableCSV/projects).
610+
611+
### Community
612+
613+
If `CodableCSV` is not of your liking, the Swift community has other CSV solutions:
614+
- [CSV.swift](https://github.com/yaslab/CSV.swift) is a simpler library with a focus on conforming to the [RFC4180](https://tools.ietf.org/html/rfc4180) standard.
615+
616+
It offers a great imperative CSV reader/writer and a row decoder. However, it lacks configurability (such as custom field/row delimiters, escaping scalar selection, presampling, etc.) and a CSV encoder. It doesn't support whole CSV file decoding and it doesn't mirror Foundation's JSON & PLIST decoder/encoder APIs.
617+
618+
- [SwiftCSV](https://github.com/swiftcsv/SwiftCSV) is an older/popular CSV parse-only library.
619+
620+
It offers a well-tested imperative CSV parser with a slower development cycle. It lacks an imperative writer, an encoder, a decoder, and parsing configuration values.
621+
622+
- [SwiftCSVExport](https://github.com/vigneshuvi/SwiftCSVExport) reads/writes CSV imperatively with great Objective-C support.
623+
624+
It offers an imperative CSV reader/writer relying on the Objective-C toolchain, which makes it great to use on Objective-C project.
625+
626+
There are many good tools outside the Swift community. Since writing them all would be a hard task, I will just point you to the great [AwesomeCSV](https://github.com/secretGeek/awesomeCSV) github repo. Take it a look! There are a lot of treasures to be found there.

benchmarks/main.swift

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
import Foundation
2+
import CodableCSV
3+
4+
print("Benchmarks go here")

sources/Codable/Decodable/Containers/SingleValueDecodingContainer.swift

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -58,16 +58,26 @@ extension ShadowDecoder.SingleValueContainer {
5858
}
5959

6060
func decodeNil() -> Bool {
61-
(try? self.lowlevelDecode { $0.isEmpty }) ?? false
61+
switch self.decoder.source.configuration.nilStrategy {
62+
case .empty: return (try? self.lowlevelDecode { $0.isEmpty }) ?? false
63+
case .custom(let closure): return closure(self.decoder)
64+
}
6265
}
6366

6467
func decode(_ type: Bool.Type) throws -> Bool {
65-
try self.lowlevelDecode {
66-
switch $0.uppercased() {
67-
case "TRUE", "YES": return true
68-
case "FALSE", "NO", "": return false
69-
default: return nil
68+
switch self.decoder.source.configuration.boolStrategy {
69+
case .deferredToBool:
70+
return try self.lowlevelDecode { Bool($0) }
71+
case .insensitive:
72+
return try self.lowlevelDecode {
73+
switch $0.uppercased() {
74+
case "TRUE", "YES": return true
75+
case "FALSE", "NO", "": return false
76+
default: return nil
77+
}
7078
}
79+
case .custom(let closure):
80+
return try closure(self.decoder)
7181
}
7282
}
7383

sources/Codable/Decodable/DecoderConfiguration.swift

Lines changed: 61 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,10 @@ extension CSVDecoder {
55
@dynamicMemberLookup public struct Configuration {
66
/// The underlying `CSVReader` configurations.
77
@usableFromInline private(set) internal var readerConfiguration: CSVReader.Configuration
8+
/// The strategy to use when decoding a `nil` representation.
9+
public var nilStrategy: Strategy.NilDecoding
10+
/// The strategy to use when decoding Boolean values.
11+
public var boolStrategy: Strategy.BoolDecoding
812
/// The strategy to use when dealing with non-conforming numbers.
913
public var floatStrategy: Strategy.NonConformingFloat
1014
/// The strategy to use when decoding decimal values.
@@ -19,6 +23,8 @@ extension CSVDecoder {
1923
/// Designated initializer setting the default values.
2024
public init() {
2125
self.readerConfiguration = .init()
26+
self.nilStrategy = .empty
27+
self.boolStrategy = .insensitive
2228
self.floatStrategy = .throw
2329
self.decimalStrategy = .locale(nil)
2430
self.dateStrategy = .deferredToDate
@@ -40,12 +46,49 @@ extension CSVDecoder.Configuration {
4046
// MARK: -
4147

4248
extension Strategy {
49+
/// The strategy to use for decoding `nil` representations.
50+
public enum NilDecoding {
51+
/// An empty string is considered a `nil` value.
52+
///
53+
/// An empty string can be both the absence of characters between field delimiters and an empty escaped field (e.g. `""`).
54+
case empty
55+
/// Decodes the `nil` as a custom value decoded by the given closure.
56+
/// - parameter decoding: Function receiving the CSV decoder used to parse a custom `nil` value.
57+
/// - parameter decoder: The decoder on which to fetch a single value container to obtain the underlying `String` value.
58+
/// - returns: Boolean indicating whether the encountered value was a `nil` representation. If the value is not supported, return `false`.
59+
case custom(_ decoding: (_ decoder: Decoder) -> Bool)
60+
}
61+
62+
/// The strategy to use for decoding `Bool` values.
63+
public enum BoolDecoding {
64+
/// Defer to `Bool`'s `LosslessStringConvertible` initializer.
65+
///
66+
/// For a value to be considered `true` or `false`, it must be a string with the exact value of `"true"` or `"false"`.
67+
case deferredToBool
68+
/// Decodes a Boolean from an underlying string value by transforming `true`/`false` and `yes`/`no` disregarding case sensitivity.
69+
///
70+
/// The value: `True`, `TRUE`, `TruE` or `YES`are accepted.
71+
case insensitive
72+
/// Decodes the `Bool` as a custom value decoded by the given closure.
73+
///
74+
/// If the closure fails to decode a value from the given decoder, the error will be bubled up.
75+
/// - parameter decoding: Function receiving the CSV decoder used to parse a custom `Bool` value.
76+
/// - parameter decoder: The decoder on which to fetch a single value container to obtain the underlying `String` value.
77+
/// - returns: Boolean value decoded from the underlying storage.
78+
case custom(_ decoding: (_ decoder: Decoder) throws -> Bool)
79+
}
80+
4381
/// The strategy to use for decoding `Decimal` values.
4482
public enum DecimalDecoding {
4583
/// The locale used to interpret the number (specifically `decimalSeparator`).
4684
case locale(Locale? = nil)
4785
/// Decode the `Decimal` as a custom value decoded by the given closure.
48-
case custom((_ decoder: Decoder) throws -> Decimal)
86+
///
87+
/// If the closure fails to decode a value from the given decoder, the error will be bubled up.
88+
/// - parameter decoding: Function receiving the CSV decoder used to parse a custom `Decimal` value.
89+
/// - parameter decoder: The decoder on which to fetch a single value container to obtain the underlying `String` value.
90+
/// - returns: `Decimal` value decoded from the underlying storage.
91+
case custom(_ decoding: (_ decoder: Decoder) throws -> Decimal)
4992
}
5093

5194
/// The strategy to use for decoding `Date` values.
@@ -61,7 +104,12 @@ extension Strategy {
61104
/// Decode the `Date` as a string parsed by the given formatter.
62105
case formatted(DateFormatter)
63106
/// Decode the `Date` as a custom value decoded by the given closure.
64-
case custom((_ decoder: Decoder) throws -> Date)
107+
///
108+
/// If the closure fails to decode a value from the given decoder, the error will be bubled up.
109+
/// - parameter decoding: Function receiving the CSV decoder used to parse a custom `Date` value.
110+
/// - parameter decoder: The decoder on which to fetch a single value container to obtain the underlying `String` value.
111+
/// - returns: `Date` value decoded from the underlying storage.
112+
case custom(_ decoding: (_ decoder: Decoder) throws -> Date)
65113
}
66114

67115
/// The strategy to use for decoding `Data` values.
@@ -71,7 +119,12 @@ extension Strategy {
71119
/// Decode the `Data` from a Base64-encoded string.
72120
case base64
73121
/// Decode the `Data` as a custom value decoded by the given closure.
74-
case custom((_ decoder: Decoder) throws -> Data)
122+
///
123+
/// If the closure fails to decode a value from the given decoder, the error will be bubled up.
124+
/// - parameter decoding: Function receiving the CSV decoder used to parse a custom `Data` value.
125+
/// - parameter decoder: The decoder on which to fetch a single value container to obtain the underlying `String` value.
126+
/// - returns: `Data` value decoded from the underlying storage.
127+
case custom(_ decoding: (_ decoder: Decoder) throws -> Data)
75128
}
76129

77130
/// Indication of how many rows are cached for reuse by the decoder.
@@ -86,15 +139,15 @@ extension Strategy {
86139
/// Forward/Backwards decoding jumps are allowed. A row that has been previously decoded can be decoded again.
87140
/// - remark: This strategy consumes the largest amount of memory from all the supported options.
88141
case keepAll
142+
// /// Only CSV fields that have been decoded but not requested by the user are being kept in memory.
143+
// ///
144+
// /// *Keyed containers* can be used to read rows/fields unordered. However, previously requested rows cannot be requested again or an error will be thrown.
145+
// /// - remark: This strategy tries to keep the cache to a minimum, but memory usage may be big if the user doesn't request intermediate rows.
146+
// case unrequested
89147
/// No rows are kept in memory (except for the CSV row being decoded at the moment)
90148
///
91149
/// *Keyed containers* can be used, but at a file-level any forward jump will discard the in-between rows. At a row-level *keyed containers* may still be used for random-order reading.
92150
/// - remark: This strategy provides the smallest usage of memory from them all.
93151
case sequential
94-
/// Only CSV fields that have been decoded but not requested by the user are being kept in memory.
95-
///
96-
/// *Keyed containers* can be used to read rows/fields unordered. However, previously requested rows cannot be requested again or an error will be thrown.
97-
/// - remark: This strategy tries to keep the cache to a minimum, but memory usage may be big if the user doesn't request intermediate rows.
98-
// case unrequested
99152
}
100153
}

sources/Codable/Decodable/Internal/Source.swift

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,10 +59,10 @@ extension ShadowDecoder {
5959
self.field = { [unowned buffer = self.buffer, unowned reader = self.reader] in
6060
var result: [String]
6161
var nextIndex = reader.rowIndex
62-
// C.1. Is the requested row in the buffer? (only the row right before the writer's pointer shall be in teh buffer).
62+
// C.1. Is the requested row in the buffer? (only the row right before the writer's pointer shall be in the buffer).
6363
if $0 == nextIndex-1 {
6464
result = try buffer.fetch(at: $0) ?! DecodingError.expiredCache(rowIndex: $0, fieldIndex: $1)
65-
// C.2. Is the user trying to queried a previously decoded row?
65+
// C.2. Is the user trying to query a previously decoded row?
6666
} else if $0 < nextIndex-1 {
6767
throw DecodingError.expiredCache(rowIndex: $0, fieldIndex: $1)
6868
// C.3. Is the row further along?

sources/Codable/Encodable/Containers/SingleValueEncodingContainer.swift

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,11 +50,17 @@ extension ShadowEncoder.SingleValueContainer {
5050
}
5151

5252
mutating func encodeNil() throws {
53-
try self.lowlevelEncoding { String() }
53+
switch self.encoder.sink.configuration.nilStrategy {
54+
case .empty: try self.lowlevelEncoding { String() }
55+
case .custom(let closure): try closure(self.encoder)
56+
}
5457
}
5558

5659
mutating func encode(_ value: Bool) throws {
57-
try self.lowlevelEncoding { String(value) }
60+
switch self.encoder.sink.configuration.boolStrategy {
61+
case .deferredToString: try self.lowlevelEncoding { String(value) }
62+
case .custom(let closure): return try closure(value, self.encoder)
63+
}
5864
}
5965

6066
mutating func encode(_ value: Int) throws {

0 commit comments

Comments
 (0)