-
Notifications
You must be signed in to change notification settings - Fork 45
Open
Description
Current behaviour:
from omniduct.duct import Duct
duct = Duct.for_protocol(protocol='sqlalchemy')(...)
query = 'SELECT * FROM ...'
# 1
duct.stream(query, format='csv', batch=2)
# 2
duct.stream_to_file(query, '.../data.csv', batch=2)
# 3
duct.stream_to_file(query, '.../data.csv')
1: Batched stream() to memory repeatedly writes the column names with each batch.
2: Thus, when wrapped by stream_to_file(), the column names are written to file repeatedly for each batch
Eg:
State,City
California,San Francisco
Oregon,Portland
State,City
Texas,Houston
California,Los Angeles
3: When batch=None, stream(), and thus stream_to_file() does not write column names at all. So the output data file will not contain a column names header.
Eg:
California,San Francisco
Oregon,Portland
Texas,Houston
California,Los Angeles
In my opinion, the desired behaviour should be:
- When streaming to csv file, the column names should be written once, as a header.
- When streaming to memory, the generator should return only row data (no column names), like a cursor would.
What do you think about this? I can open a PR to get this done.
Thanks.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels