Skip to content

Commit 4ed8f45

Browse files
committed
Decoding API rework and README restructure
1 parent d4fda6d commit 4ed8f45

15 files changed

+376
-193
lines changed

README.md

Lines changed: 235 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -27,63 +27,46 @@ There are two ways to use [CodableCSV](https://github.com/dehesa/CodableCSV):
2727
1. as an active row-by-row and field-by-field reader or writer.
2828
2. through Swift's `Codable` interface.
2929

30-
The _active entities_ (reference types) provide _imperative_ control on how to read or write CSV data.
30+
> `CodableCSV` can _encode to_ or _decode from_ `String`s, `Data` blobs, or CSV files (represented by `URL` addresses).
31+
32+
## Active Encoding/Decoding
33+
34+
The _active entities_ provide _imperative_ control on how to read or write CSV data.
3135

3236
<details><summary><code>CSVReader</code>.</summary><p>
3337

34-
A `CSVReadder` reads CSV data and lets you access each CSV row as an array of `String`s:
38+
A `CSVReadder` parses CSV data from an input and returns you each CSV row as an array of strings.
3539

36-
- row-by-row.
40+
- Row-by-row parsing.
3741

3842
```swift
39-
let reader = try CSVReader(fileURL: ...)
43+
let reader = try CSVReader(data: ...)
4044
while let row = try reader.parseRow() {
4145
// Do something with the row: [String]
4246
}
4347
```
4448

45-
- with `Sequence` syntax.
49+
- `Sequence` syntax parsing.
4650

4751
```swift
48-
let reader = try CSVReader(data: ...)
52+
let reader = try CSVReader(fileURL: ...)
4953
for row in reader {
5054
// Do something with the row: [String]
5155
}
5256
```
5357

5458
Please note the `Sequence` syntax (i.e. `IteratorProtocol`) doesn't throw errors; therefore if the CSV data is invalid, the previous code will crash your program. If you don't control the origin of the CSV data, use the `parseRow()` function instead.
5559

56-
### Reader Inputs
57-
58-
A `CSVReader` are able to read the following input sources:
59-
60-
- `String`.
61-
62-
```swift
63-
let reader = try CSVReader(string: "A,B,C\n D,E,F\n G,H,I\n")
64-
```
65-
66-
- `Data`.
67-
68-
```swift
69-
let reader = try CSVReader(data: Data(...))
70-
```
71-
72-
- A file `URL`.
60+
- Whole input parsing.
7361

7462
```swift
75-
let reader = try CSVReader(fileURL: URL(...))
63+
let file = try CSVReader.parse(string: ..., configuration: ...)
64+
// file is of type: (headers: [String], rows: [[String]])
7665
```
7766

78-
During initialization, an optional `Configuration` structure may be provided. These configuration values lets you tweak the parsing process.
79-
80-
```swift
81-
let reader = try CSVReader(data: ..., configuration: ...)
82-
```
83-
8467
### Reader Configuration
8568

86-
`CSVReader` accept the following configuration properties:
69+
`CSVReader` accepts the following configuration properties:
8770

8871
- `encoding` (default: `nil`) specify the CSV file encoding.
8972

@@ -97,49 +80,262 @@ let reader = try CSVReader(data: ..., configuration: ...)
9780

9881
CSV files may contain an optional header row at the very beginning. This configuration value lets you specify whether the file has a header row or not, or whether you want the library to figure it out.
9982

100-
- `trimStrategy` (default: `.none`) trims the given characters at the beginning and end of each parsed row and field.
83+
- `trimStrategy` (default: empty set) trims the given characters at the beginning and end of each parsed field.
84+
85+
The trim characters are applied for the escaped and unescaped fields.
10186

10287
- `presample` (default: `false`) indicates whether the CSV data should be completely loaded into memory before parsing begins.
10388

10489
Loading all data into memory may provide faster iteration for small to medium size files, since you get rid of the overhead of managing an `InputStream`.
10590

106-
There is a convenience initializer letting you specify configuration values within a closure during initialization:
91+
The configuration values are only set during initialization and can be passed to the `CSVReader` instance through a structure or with a convenience closure syntax:
10792

10893
```swift
10994
let reader = CSVReader(data: ...) {
11095
$0.encoding = .utf8
111-
$0.delimiters.row = "~"
96+
$0.delimiters.row = "\r\n"
11297
$0.headerStrategy = .firstLine
11398
$0.trimStrategy = .whitespaces
11499
}
115100
```
116101

117-
</details>
102+
</p></details>
118103

119104
<details><summary><code>CSVWriter</code>.</summary><p>
120105

121106
#warning("Complete me")
122107

123-
</details>
108+
</p></details>
109+
110+
## Swift's `Codable`
124111

125-
The CSV encoders/decoders provided by this library let you use Swift's `Codable` declarative approach.
112+
The encoders/decoders provided by this library let you use Swift's `Codable` declarative approach to encode/decode CSV data.
126113

127114
<details><summary><code>CSVDecoder</code>.</summary><p>
128115

116+
`CSVDecoder` transforms CSV data into a Swift type conforming to `Decodable`. The decoding process is very simple and it only requires creating a decoding instance and call its `decode` function passing the `Decodable` type and the input data.
117+
129118
```swift
130119
let decoder = CSVDecoder()
131-
decoder.delimiters = (.comma, .lineFeed)
132120
let result = try decoder.decode(CustomType.self, from: data)
133121
```
134122

135-
</details>
123+
### Decoder Configuration
124+
125+
The decoding process can be tweaked by specifying configuration values at initialization time. `CSVDecoder` accepts the [same configuration values as `CSVReader`](#Reader-Configuration) plus the following ones:
126+
127+
- `floatStrategy` (default: `.throw`) defines how to deal with non-conforming floating-point numbers (such as `NaN`, or `+Infinity`).
128+
129+
- `decimalStrategy` (default: `.locale(nil)`) indicates how decimal numbers are decoded (from `String` to `Decimal` value).
130+
131+
- `dataStrategy` (default: `.deferredToDate`) specify the strategy to use when decoding dates.
132+
133+
- `dataStrategy` (default: `.base64`) specify the strategy to use when decoding data blobs.
134+
135+
- `bufferingStrategy` (default: `.keepAll`) tells the decoder how to cache previously decoded CSV rows.
136+
137+
Caching rows allow random access through `KeyedDecodingContainer`s.
138+
139+
The configuration values can be set during `CSVDecoder` initialization or at any point before the `decode` function is called.
140+
141+
```swift
142+
let decoder = CSVDecoder {
143+
$0.encoding = .utf8
144+
$0.delimiters.field = "\t"
145+
$0.headerStrategy = .firstLine
146+
$0.bufferingStrategy = .ordered
147+
}
148+
149+
decoder.decimalStratey = .custom {
150+
let value = try Float(from: $0)
151+
return Decimal(value)
152+
}
153+
```
154+
155+
</p></details>
136156

137157
<details><summary><code>CSVEncoder</code>.</summary><p>
138158

139159
#warning("Complete me")
140160

161+
</p></details>
162+
163+
## Tips Using `Codable`
164+
165+
`Codable` is fairly easy to use and most Swift standard library types already conform to it. However, sometimes it is tricky to get custom types to comply to `Codable` for very specific functionality. That is why I am leaving here some tips and advices concerning its usage:
166+
167+
<details><summary>Basic adoption.</summary><p>
168+
169+
`Codable` is just a type alias for `Decodable` and `Encodable`. When a custom type conforms to `Codable`, the type is stating that it has the ability to decode itself from and encode itself to a external representation. Which representation depends on the decoder or encoder chosen. Foundation provides support for [JSON and Property Lists](https://developer.apple.com/documentation/foundation/archives_and_serialization), but the community provide many other formats, such as: [YAML](https://github.com/jpsim/Yams), [XML](https://github.com/MaxDesiatov/XMLCoder), [BSON](https://github.com/OpenKitten/BSON), and CSV (through this library).
170+
171+
Lets see a regular CSV encoding/decoding usage through `Codable`'s interface. Let's suppose we have a list of students formatted in a CSV file:
172+
173+
```swift
174+
let data = """
175+
name,age,hasPet
176+
John,22,true
177+
Marine,23,false
178+
Alta,24,true
179+
"""
180+
```
181+
182+
In Swift, a _student_ has the following structure:
183+
184+
```swift
185+
struct Student: Codable {
186+
var name: String
187+
var age: Int
188+
var hasPet: Bool
189+
}
190+
```
191+
192+
To decode the CSV data, we just need to create a decoder and call `decode` on it passing the given data.
193+
194+
```swift
195+
let decoder = CSVDecoder { $0.headerStrategy = .firstLine }
196+
let students = try decoder.decode([Student], from: data)
197+
```
198+
199+
The inverse process (from Swift to CSV) is very similar (and simple).
200+
201+
```swift
202+
let encoder = CSVEncoder { $0.headerStraty = .firstLine }
203+
let newData = try encoder.encode(students)
204+
```
205+
206+
</p></details>
207+
208+
<details><summary>Specific behavior for CSV data.</summary><p>
209+
210+
When encoding/decoding CSV data, it is important to keep several points in mind:
211+
212+
</p>
213+
<ul>
214+
<details><summary>Default behavior requires a CSV with a headers row.</summary><p>
215+
216+
The default behavior (i.e. not including `init(from:)` and `encode(to:)`) rely on the existance of the synthesized `CodingKey`s whose `stringValue`s are the property names. For these properties to match any CSV field, the CSV data must contain a _headers row_ at the very beginning. If your CSV doesn't contain a _headers row_, you can specify coding keys with integer values representing the field index.
217+
218+
```swift
219+
struct Student: Codable {
220+
var name: String
221+
var age: Int
222+
var hasPet: Bool
223+
224+
private CodingKeys: Int, CodingKey {
225+
case name = 0
226+
case age = 1
227+
case hasPet = 2
228+
}
229+
}
230+
```
231+
232+
</p></details>
233+
<details><summary>A CSV is a long list of records/rows.</summary><p>
234+
235+
CSV formatted data is commonly used with flat hierarchies (e.g. a list of students, a list of car models, etc.). Nested structures, such as the ones found in JSON files, are not supported by default in CSV implementations (e.g. a list of users, where each user has a list of services she uses, and each service has a list of the user's configuration values).
236+
237+
You can definitely support complex structures in CSV, but you would have to flatten the hierarchy in a single model or build a custom encoding/decoding process. This process would make sure there is always a maximum of two keyed/unkeyed containers.
238+
239+
As an example, we can create a nested structure for a school with students who own pets.
240+
241+
```swift
242+
struct School: Codable {
243+
let students: [Student]
244+
}
245+
246+
struct Student: Codable {
247+
var name: String
248+
var age: Int
249+
var pet: Pet
250+
}
251+
252+
struct Pet: Codable {
253+
var nickname: String
254+
var gender: Gender
255+
256+
enum Gender: Codable {
257+
case male, female
258+
}
259+
}
260+
```
261+
262+
By default the previous example wouldn't work. If you want to keep the nested structure, you need to overwrite the custom `init(from:)` implementation (to support `Decodable`).
263+
264+
```swift
265+
extension School {
266+
init(from decoder: Decoder) throws {
267+
var container = try decoder.unkeyedContainer()
268+
while !container.isAtEnd {
269+
self.student.append(try container.decode(Student.self))
270+
}
271+
}
272+
}
273+
274+
extension Student {
275+
init(from decoder: Decoder) throws {
276+
var container = try decoder.container(keyedBy: CustomKeys.self)
277+
self.name = try container.decode(String.self, forKey: .name)
278+
self.age = try container.decode(Int.self, forKey: .age)
279+
self.pet = try decoder.singleValueContainer.decode(Pet.self)
280+
}
281+
}
282+
283+
extension Pet {
284+
init(from decoder: Decoder) throws {
285+
var container = try decoder.container(keyedBy: CustomKeys.self)
286+
self.nickname = try container.decode(String.self, forKey: .nickname)
287+
self.gender = try container.decode(Gender.self, forKey: .gender)
288+
}
289+
}
290+
291+
extension Pet.Gender {
292+
init(from decoder: Decoder) throws {
293+
var container = try decoder.singleValueContainer()
294+
self = try container.decode(Int.self) == 1 ? .male : .female
295+
}
296+
}
297+
298+
private RowKeys: Int, CodingKey {
299+
case name = 0
300+
case age = 1
301+
case nickname = 2
302+
case gender = 3
303+
}
304+
```
305+
306+
You could have avoided building the initializers overhead by defining a flat structure such as:
307+
308+
```swift
309+
struct Student: Codable {
310+
var name: String
311+
var age: Int
312+
var nickname: String
313+
var gender: Gender
314+
315+
enum Gender: Int, Codable {
316+
case male = 1
317+
case female = 2
318+
}
319+
}
320+
```
321+
322+
</p></details>
323+
</ul>
324+
141325
</details>
142326

327+
<details><summary>Configuration values and encoding/decoding strategies.</summary><p>
328+
329+
#warning("Complete me")
330+
331+
</p></details>
332+
333+
<details><summary>Performance advices.</summary><p>
334+
335+
#warning("Complete me")
336+
337+
</p></details>
338+
143339
# Roadmap
144340

145341
<p align="center">

Sources/Active/Reader/Reader.swift

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
import Foundation
2-
#warning("CSVReader.header was previously and optional. Verify changes up the chain")
32
/// Reads CSV text data row-by-row.
43
///
54
/// The `CSVReader` is a sequential reader. It reads each line only once (i.e. it cannot re-read a previous CSV row).
@@ -265,7 +264,7 @@ fileprivate extension CSVReader {
265264
///
266265
/// When this function is executed, the quote opening the "escaped field" has already been read.
267266
/// - parameter rowIndex: The index of the row being parsed.
268-
/// - throws: `CSVReader.Error.invalidInput` exclusively.
267+
/// - throws: `CSVReader.Error` exclusively.
269268
/// - returns: The parsed field and whether the row/file ending characters have been found.
270269
private func parseEscapedField(rowIndex: Int) throws -> (value: String, isAtEnd: Bool) {
271270
var field: String.UnicodeScalarView = .init()

Sources/Active/Reader/ReaderConfiguration.swift

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ extension CSVReader {
4444
/// - parameter configuration: The configuration values provided by the API user.
4545
/// - parameter iterator: The iterator providing `Unicode.Scalar` values.
4646
/// - parameter buffer: Small buffer use to store `Unicode.Scalar` values that have been read from the input, but haven't yet been processed.
47+
/// - throws: `CSVReader.Error` exclusively.
4748
init(configuration: Configuration, iterator: ScalarIterator, buffer: ScalarBuffer) throws {
4849
// 1. Figure out the field and row delimiters.
4950
switch (configuration.delimiters.field.rawValue, configuration.delimiters.row.rawValue) {

Sources/Active/Reader/ReaderEncodings.swift

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ extension String.Encoding {
2323
///
2424
/// This function looks for the Byte Order Mark (or [BOM](https://en.wikipedia.org/wiki/Byte_order_mark)) at the beginning of the file.
2525
/// - parameter stream: The input stream reading the data's bytes.
26+
/// - throws: `CSVReader.Error` exclusively.
2627
/// - returns: The inferred encoding (if any) and the bytes read from the input data (without the BOM bytes if any).
2728
internal static func infer(from stream: InputStream) throws -> (encoding: String.Encoding?, unusedBytes: [UInt8]) {
2829
var unusedBytes: [UInt8]? = nil
@@ -66,6 +67,7 @@ fileprivate extension String.Encoding {
6667
/// - parameter unusedBytes: The input data bytes that have been read, but are not part from the BOM.
6768
/// - parameter dataFetcher: Closure retrieving the input data up the the maximum supported by the given mutable buffer pointer. The closure returns the number of bytes actually read from the input data.
6869
/// - parameter buffer: The buffer where the input data bytes will be placed.
70+
/// - throws: Whatever `dataFetcher` may throw.
6971
private init?(unusedBytes: inout [UInt8]?, dataFetcher: (_ buffer: UnsafeMutableBufferPointer<UInt8>) throws -> Int) rethrows {
7072
// 1. Gather all BOMs and count what is the maximum number of bytes to represent any of them.
7173
let allEncodings = BOM.allCases

Sources/Active/Reader/ReaderInference.swift

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@ internal extension CSVReader {
2424

2525
internal extension CSVReader {
2626
/// Closure accepting a scalar and returning a Boolean indicating whether the scalar (and subsquent unicode scalars) form a delimiter.
27+
/// - parameter scalar: The scalar that may start a delimiter.
28+
/// - throws: `CSVReader.Error` exclusively.
2729
typealias DelimiterChecker = (_ scalar: Unicode.Scalar) throws -> Bool
2830

2931
/// Creates a delimiter identifier closure.

0 commit comments

Comments
 (0)