Skip to content

Commit d05497d

Browse files
Clarify CSV library speed tradeoffs
Expanded explanation of why the library can't match Sep/Sylvan parsing speed due to architectural differences and IDataReader requirements. Added context about workflow optimizations and tradeoffs for SQL Server imports.
1 parent 4d493c6 commit d05497d

File tree

1 file changed

+7
-1
lines changed

1 file changed

+7
-1
lines changed

content/post/new-csv-library.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,13 @@ Being honest: if pure parsing speed is your only concern, [Sep](https://github.c
6969
- **Progress reporting** - Know how far along your 10 million row import is
7070
- **dbatools integration** - Works seamlessly with Import-DbaCsv
7171

72-
If you're doing `file.csv.gz → SqlBulkCopy → SQL Server`, our complete workflow may actually be faster than combining Sep + manual decompression + manual IDataReader wrapping.
72+
### The speed tradeoff
73+
74+
I asked Claude to explain why we can't match Sep/Sylvan, and it comes down to architecture. Sep uses `Span<T>` and only creates actual strings when you explicitly ask for them. But the `IDataReader` interface that SqlBulkCopy uses requires returning real objects from `GetValue()`. For string columns, that means allocating actual `string` instances—we can't just hand back a span.
75+
76+
Could we create a Sep-like API? Sure. But then you'd need to write your own IDataReader wrapper to use SqlBulkCopy, handle decompression yourself, implement progress reporting... you get the idea. We optimized for the complete workflow, not the micro-benchmark.
77+
78+
For `file.csv.gz → SqlBulkCopy → SQL Server` workflows, Dataplat's integrated pipeline is often comparable to combining Sep + manual decompression + manual IDataReader wrapping, while being simpler to use.
7379

7480
### Why is it so much faster?
7581

0 commit comments

Comments
 (0)