Skip to content

Commit 4d493c6

Browse files
Update CSV library post with new benchmarks and features
Revised the blog post to reflect updated benchmarks comparing Dataplat.Dbatools.Csv with Sep, Sylvan, CsvHelper, and LumenWorks. Added sections highlighting the library's strengths for SQL Server workflows, new features like progress reporting and cancellation support, and clarified its positioning versus pure parsing speed libraries.
1 parent f3b0904 commit 4d493c6

File tree

1 file changed

+62
-13
lines changed

1 file changed

+62
-13
lines changed

content/post/new-csv-library.md

Lines changed: 62 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: "A New CSV Library: 6x Faster, 40x Less Memory"
2+
title: "A New CSV Library: Built for SQL Server"
33
date: 2025-11-30
44
author: "Chrissy LeMaire"
55
slug: "new-csv-library"
@@ -31,22 +31,45 @@ What came back was fast as heck and used several patterns (apparently `Span<T>`,
3131

3232
## The results
3333

34-
Using Claude to figure out benchmarking, I ran some proper benchmarks and the new Dataplat.Dbatools.Csv library isn't just a little faster. It's in a completely different performance class.
34+
Using Claude to figure out benchmarking, I ran proper benchmarks comparing Dataplat.Dbatools.Csv against not just LumenWorks, but also the modern CSV libraries: Sep, Sylvan, and CsvHelper.
3535

36-
| Scenario | Dataplat | LumenWorks | Speed Boost | Memory Savings |
37-
|----------|----------|------------|-------------|----------------|
38-
| **Small** (1K rows) | 0.83 ms | 3.26 ms | **3.9x faster** | **25x less** |
39-
| **Medium** (100K rows) | 65.3 ms | 364.5 ms | **5.6x faster** | **41x less** |
40-
| **Large** (1M rows) | 559 ms | 3,435 ms | **6.1x faster** | **40x less** |
41-
| **Wide** (100K×50 cols) | 277 ms | 493 ms | **1.8x faster** | **7.3x less** |
36+
**Benchmark: 100,000 rows × 10 columns (.NET 8, AVX-512)**
4237

43-
Processing 1 million rows (96 MB CSV file):
44-
- **Dataplat**: 0.56 seconds using 420 MB RAM
45-
- **LumenWorks**: 3.4 seconds using 16.7 GB RAM
38+
Here's the interesting thing: performance varies dramatically depending on how you access the data.
4639

47-
That's a **6.1x speed improvement** with **40x less memory allocation**. The memory difference is honestly the bigger deal here. LumenWorks creates so much garbage that large files can cause `OutOfMemoryException` on machines that should easily handle them and as a matter of fact, my benchmarking crashed my browser too.
40+
**Single column read (typical SqlBulkCopy/IDataReader pattern):**
4841

49-
6.1x was the max of all the benchmarks that I ran, though 4.7x was the average.
42+
| Library | Time (ms) | vs Dataplat |
43+
|---------|-----------|-------------|
44+
| Sep | 19 ms | 3.8x faster |
45+
| Sylvan | 29 ms | 2.5x faster |
46+
| **Dataplat** | **74 ms** | **baseline** |
47+
| CsvHelper | 76 ms | ~same |
48+
| LumenWorks | 433 ms | **5.9x slower** |
49+
50+
**All columns read (full row processing):**
51+
52+
| Library | Time (ms) | vs Dataplat |
53+
|---------|-----------|-------------|
54+
| Sep | 35 ms | 2.1x faster |
55+
| Sylvan | 37 ms | 2.0x faster |
56+
| **Dataplat** | **73 ms** | **baseline** |
57+
| CsvHelper | 101 ms | 1.4x slower |
58+
| LumenWorks | 100 ms | 1.4x slower |
59+
60+
For the single-column pattern (which is how SqlBulkCopy typically reads data), Dataplat is **~6x faster** than LumenWorks! For full row processing, we're still **~1.4x faster**.
61+
62+
### Where we stand in 2025
63+
64+
Being honest: if pure parsing speed is your only concern, [Sep](https://github.com/nietras/Sep/) is faster. Sep can hit 21 GB/s with AVX-512 SIMD. But our library isn't trying to be Sep. We're built for **database import workflows** where you need:
65+
66+
- **IDataReader interface** - Stream directly to SqlBulkCopy without intermediate allocations
67+
- **Built-in compression** - Import `.csv.gz` files without extracting first
68+
- **Real-world data handling** - Lenient parsing for messy enterprise exports
69+
- **Progress reporting** - Know how far along your 10 million row import is
70+
- **dbatools integration** - Works seamlessly with Import-DbaCsv
71+
72+
If you're doing `file.csv.gz → SqlBulkCopy → SQL Server`, our complete workflow may actually be faster than combining Sep + manual decompression + manual IDataReader wrapping.
5073

5174
### Why is it so much faster?
5275

@@ -124,6 +147,32 @@ German CSV with comma as decimal separator? French dates? We got you:
124147
Import-DbaCsv -Path german_data.csv -SqlInstance sql01 -Database tempdb -Culture "de-DE" -AutoCreateTable
125148
```
126149

150+
### Progress reporting (v1.1.0)
151+
152+
For those big imports where you want to know what's happening:
153+
154+
```csharp
155+
var options = new CsvReaderOptions
156+
{
157+
ProgressReportInterval = 10000,
158+
ProgressCallback = progress =>
159+
{
160+
Console.WriteLine($"Processed {progress.RecordsRead:N0} records ({progress.RowsPerSecond:N0}/sec)");
161+
}
162+
};
163+
```
164+
165+
### Cancellation support (v1.1.0)
166+
167+
Long-running import and need to stop it? CancellationToken support is built in:
168+
169+
```csharp
170+
var options = new CsvReaderOptions
171+
{
172+
CancellationToken = cancellationTokenSource.Token
173+
};
174+
```
175+
127176
## A brand new command: Export-DbaCsv
128177

129178
This one's been requested for years ([GitHub issue #8646](https://github.com/dataplat/dbatools/issues/8646)). We finally have a proper Export-DbaCsv with compression support:

0 commit comments

Comments
 (0)