Skip to content

Commit 5f7d413

Browse files
authored
Introduces a DataclassWriter (dfurtado#38)
The addition of a DataclassWriter will allow users to create CSV out of dataclasses.
1 parent d0e0fde commit 5f7d413

File tree

12 files changed

+344
-24
lines changed

12 files changed

+344
-24
lines changed

HISTORY.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,3 +61,9 @@
6161
* Handle properties with init set to False
6262
* Handle Option type annotation
6363

64+
### 1.2.0 (2021-03-02)
65+
66+
* Introduction of a DataclassWriter
67+
* Added type hinting to external API
68+
* Documentation updates
69+
* Bug fixes

README.md

Lines changed: 105 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@ Dataclass CSV makes working with CSV files easier and much better than working w
1919
- Familiar syntax. The `DataclassReader` is used almost the same way as the `DictReader` in the standard library.
2020
- It uses `dataclass` features that let you define metadata properties so the data can be parsed exactly the way you want.
2121
- Make the code cleaner. No more extra loops to convert data to the correct type, perform validation, set default values, the `DataclassReader` will do all this for you.
22+
- In additon of the `DataclassReader` the library also provides a `DataclassWriter` which enables creating a CSV file
23+
using a list of instances of a dataclass.
2224

2325

2426
## Installation
@@ -29,6 +31,8 @@ pipenv install dataclass-csv
2931

3032
## Getting started
3133

34+
## Using the DataclassReader
35+
3236
First, add the necessary imports:
3337

3438
```python
@@ -90,7 +94,7 @@ User(firstname='Edit', email='[email protected]', age=3)
9094
User(firstname='Ella', email='[email protected]', age=2)
9195
```
9296

93-
## Error handling
97+
### Error handling
9498

9599
One of the advantages of using the `DataclassReader` is that it makes it easy to detect when the type of data in the CSV file is not what your application's model is expecting. And, the `DataclassReader` shows errors that will help to identify the rows with problem in your CSV file.
96100

@@ -109,7 +113,7 @@ received a value of type <class 'str'>. [CSV Line number: 3]
109113

110114
Note that apart from telling what the error was, the `DataclassReader` will also show which line of the CSV file contain the data with errors.
111115

112-
## Default values
116+
### Default values
113117

114118
The `DataclassReader` also handles properties with default values. Let's modify the dataclass `User` and add a default value for the field `email`:
115119

@@ -154,7 +158,7 @@ class User:
154158
age: int
155159
```
156160

157-
## Mapping dataclass fields to columns
161+
### Mapping dataclass fields to columns
158162

159163
The mapping between a dataclass property and a column in the CSV file will be done automatically if the names match, however, there are situations that the name of the header for a column is different. We can easily tell the `DataclassReader` how the mapping should be done using the method `map`. Assuming that we have a CSV file with the contents below:
160164

@@ -174,7 +178,7 @@ reader.map('First name').to('firstname')
174178

175179
Now the DataclassReader will know how to extract the data from the column **First Name** and add it to the to dataclass property **firstname**
176180

177-
## Supported type annotation
181+
### Supported type annotation
178182

179183
At the moment the `DataclassReader` support `int`, `str`, `float`, `complex`, `datetime`, and `bool`. When defining a `datetime` property, it is necessary to use the `dateformat` decorator, for example:
180184

@@ -214,7 +218,7 @@ The output would look like this:
214218
User(name='Edit', email='[email protected]', birthday=datetime.datetime(2018, 11, 23, 0, 0))
215219
```
216220

217-
## Fields metadata
221+
### Fields metadata
218222

219223
It is important to note that the `dateformat` decorator will define the date format that will be used to parse date to all properties
220224
in the class. Now there are situations where the data in a CSV file contains two or more columns with date values in different formats. It is possible
@@ -248,7 +252,7 @@ class User:
248252
Note that the format for the `birthday` field was not speficied using the `field` metadata. In this case the format specified in the `dateformat`
249253
decorator will be used.
250254

251-
## Handling values with empty spaces
255+
### Handling values with empty spaces
252256

253257
When defining a property of type `str` in the `dataclass`, the `DataclassReader` will treat values with only white spaces as invalid. To change this
254258
behavior, there is a decorator called `@accept_whitespaces`. When decorating the class with the `@accept_whitespaces` all the properties in the class
@@ -279,7 +283,7 @@ class User:
279283
created_at: datetime
280284
```
281285

282-
## User-defined types
286+
### User-defined types
283287

284288
You can use any type for a field as long as its constructor accepts a string:
285289

@@ -300,6 +304,100 @@ class User:
300304
ssn: SSN
301305
```
302306

307+
308+
## Using the DataclassWriter
309+
310+
Reading a CSV file using the `DataclassReader` is great and gives us the type-safety of Python's dataclasses and type annotation, however, there are situations where we would like to use dataclasses for creating CSV files, that's where the `DataclassWriter` comes in handy.
311+
312+
Using the `DataclassWriter` is quite simple. Given that we have a dataclass `User`:
313+
314+
```python
315+
from dataclasses import dataclass
316+
317+
318+
@dataclass
319+
class User:
320+
firstname: str
321+
lastname: str
322+
age: int
323+
```
324+
325+
And in your program we have a list of users:
326+
327+
```python
328+
329+
users = [
330+
User(firstname="John", lastname="Smith", age=40),
331+
User(firstname="Daniel", lastname="Nilsson", age=10),
332+
User(firstname="Ella", "Fralla", age=4)
333+
]
334+
```
335+
336+
In order to create a CSV using the `DataclassWriter` import it from `dataclass_csv`:
337+
338+
```python
339+
from dataclass_csv import DataclassWriter
340+
```
341+
342+
Initialize it with the required arguments and call the method `write`:
343+
344+
```python
345+
with open("users.csv", "w") as f:
346+
w = DataclassWriter(f, users, User)
347+
w.writer()
348+
```
349+
350+
That's it! Let's break down the snippet above.
351+
352+
First, we open a file called `user.csv` for writing. After that, an instance of the `DataclassWriter` is created. To create a `DataclassWriter` we need to pass the `file`, the list of `User` instances, and lastly, the type, which in this case is `User`.
353+
354+
The type is required since the writer uses it when trying to figure out the CSV header. By default, it will use the names of the
355+
properties defined in the dataclass, in the case of the dataclass `User` the title of each column
356+
will be `firstname`, `lastname` and `age`.
357+
358+
See below the CSV created out of a list of `User`:
359+
360+
```text
361+
firstname,lastname,age
362+
John,Smith,40
363+
Daniel,Nilsson,10
364+
Ella,Fralla,4
365+
```
366+
367+
The `DataclassWriter` also takes a `**fmtparams` which accepts the same parameters as the `csv.writer`, for more
368+
information see: https://docs.python.org/3/library/csv.html#csv-fmt-params
369+
370+
Now, there are situations where we don't want to write the CSV header. In this case, the method `write` of
371+
the `DataclassWriter` accepts an extra argument, called `skip_header`. The default value is `False` and when set to
372+
`True` it will skip the header.
373+
374+
#### Modifying the CSV header
375+
376+
As previously mentioned the `DataclassWriter` uses the names of the properties defined in the dataclass as the CSV header titles, however,
377+
depending on your use case it makes sense to change it. The `DataclassWriter` has a `map` method just for this purpose.
378+
379+
Using the `User` dataclass with the properties `firstname`, `lastname` and `age`. The snippet below shows how to change `firstname` to `First name` and `lastname` to `Last name`:
380+
381+
```python
382+
with open("users.csv", "w") as f:
383+
w = DataclassWriter(f, users, User)
384+
385+
# Add mappings for firstname and lastname
386+
w.map("firstname").to("First name")
387+
w.map("lastname").to("Last name")
388+
389+
w.writer()
390+
```
391+
392+
The CSV output of the snippet above will be:
393+
394+
```text
395+
First name,Last name,age
396+
John,Smith,40
397+
Daniel,Nilsson,10
398+
Ella,Fralla,4
399+
```
400+
303401
## Copyright and License
304402

305403
Copyright (c) 2018 Daniel Furtado. Code released under BSD 3-clause license

dataclass_csv/__init__.py

Lines changed: 32 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,10 @@
66
`dataclasses`. It takes advantage of `dataclasses` features to perform
77
data validation and type conversion.
88
9-
Basic Usage:
9+
Basic Usage
10+
~~~~~~~~~~~~~
11+
12+
Read data from a CSV file:
1013
1114
>>> from dataclasses import dataclass
1215
>>> from dataclass_csv import DataclassReader
@@ -27,15 +30,42 @@
2730
User(firstname='User2', lastname='Test', age=34)
2831
]
2932
33+
Write dataclasses to a CSV file:
34+
35+
>>> from dataclasses import dataclass
36+
>>> from dataclass_csv import DataclassWriter
37+
38+
>>> @dataclass
39+
>>> class User:
40+
>>> firstname: str
41+
>>> lastname: str
42+
>>> age: int
43+
44+
>>> users = [
45+
>>> User(firstname='User1', lastname='Test', age=23),
46+
>>> User(firstname='User2', lastname='Test', age=34)
47+
>>> ]
48+
49+
>>> with open('users.csv', 'w') as f:
50+
>>> writer = DataclassWriter(f, users, User)
51+
>>> writer.write()
52+
3053
3154
:copyright: (c) 2018 by Daniel Furtado.
3255
:license: BSD, see LICENSE for more details.
3356
"""
3457

3558

3659
from .dataclass_reader import DataclassReader
60+
from .dataclass_writer import DataclassWriter
3761
from .decorators import dateformat, accept_whitespaces
3862
from .exceptions import CsvValueError
3963

4064

41-
__all__ = ['DataclassReader', 'dateformat', 'accept_whitespaces', 'CsvValueError']
65+
__all__ = [
66+
'DataclassReader',
67+
'DataclassWriter',
68+
'dateformat',
69+
'accept_whitespaces',
70+
'CsvValueError',
71+
]

dataclass_csv/dataclass_reader.py

Lines changed: 24 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
from datetime import datetime
55
from distutils.util import strtobool
6-
from typing import Union, Type, Optional, Sequence, Dict, Any, List
6+
from typing import Union, Type, Optional, Sequence, Dict, Any, List, Iterable
77

88
from .field_mapper import FieldMapper
99
from .exceptions import CsvValueError
@@ -12,12 +12,12 @@
1212
class DataclassReader:
1313
def __init__(
1414
self,
15-
f,
15+
f: Any,
1616
cls: Type[object],
17-
fieldnames: Optional[Sequence[str]]=None,
18-
restkey: Optional[str]=None,
19-
restval: Optional[Any]=None,
20-
dialect: str='excel',
17+
fieldnames: Optional[Sequence[str]] = None,
18+
restkey: Optional[str] = None,
19+
restval: Optional[Any] = None,
20+
dialect: str = 'excel',
2121
*args: List[Any],
2222
**kwds: Dict[str, Any],
2323
):
@@ -59,7 +59,9 @@ def _get_default_value(self, field):
5959
)
6060

6161
def _get_possible_keys(self, fieldname, row):
62-
possible_keys = list(filter(lambda x: x.strip() == fieldname, row.keys()))
62+
possible_keys = list(
63+
filter(lambda x: x.strip() == fieldname, row.keys())
64+
)
6365
if possible_keys:
6466
return possible_keys[0]
6567

@@ -144,7 +146,9 @@ def _process_row(self, row):
144146
try:
145147
value = self._get_value(row, field)
146148
except ValueError as ex:
147-
raise CsvValueError(ex, line_number=self.reader.line_num) from None
149+
raise CsvValueError(
150+
ex, line_number=self.reader.line_num
151+
) from None
148152

149153
if not value and field.default is None:
150154
values.append(None)
@@ -159,15 +163,19 @@ def _process_row(self, row):
159163
or '__origin__' in field_type.__dict__
160164
and field_type.__origin__ is Union
161165
):
162-
real_types = [t for t in field_type.__args__ if t is not type(None)]
166+
real_types = [
167+
t for t in field_type.__args__ if t is not type(None)
168+
]
163169
if len(real_types) == 1:
164170
field_type = real_types[0]
165171

166172
if field_type is datetime:
167173
try:
168174
transformed_value = self._parse_date_value(field, value)
169175
except ValueError as ex:
170-
raise CsvValueError(ex, line_number=self.reader.line_num) from None
176+
raise CsvValueError(
177+
ex, line_number=self.reader.line_num
178+
) from None
171179
else:
172180
values.append(transformed_value)
173181
continue
@@ -180,7 +188,9 @@ def _process_row(self, row):
180188
else strtobool(str(value).strip()) == 1
181189
)
182190
except ValueError as ex:
183-
raise CsvValueError(ex, line_number=self.reader.line_num) from None
191+
raise CsvValueError(
192+
ex, line_number=self.reader.line_num
193+
) from None
184194
else:
185195
values.append(transformed_value)
186196
continue
@@ -211,5 +221,7 @@ def map(self, csv_fieldname: str) -> FieldMapper:
211221
:param csv_fieldname: The name of the CSV field
212222
"""
213223
return FieldMapper(
214-
lambda property_name: self._add_to_mapping(property_name, csv_fieldname)
224+
lambda property_name: self._add_to_mapping(
225+
property_name, csv_fieldname
226+
)
215227
)

0 commit comments

Comments
 (0)