Skip to content

Commit 549f7f7

Browse files
committed
Update csv_file_size.md
1 parent d43ccc1 commit 549f7f7

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

docs/limitations/csv_file_size.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ At the same time, Microsoft aims the 32-bit version of Excel is [limited to 4GB
1717

1818
By the fact CSV interface works with strings, and [VBA uses 10 bytes + string length](https://docs.microsoft.com/en-us/office/vba/language/reference/user-interface-help/data-type-summary) for store this data type and this can let user's application run out of memory, it’s crucial to set some boundary over the file size users can work from the CSV interface class module.
1919

20-
To achieve this, the first step is to study the performance of the VBA CSV interface when parsing files with varying size, and then establish the recommended file size in function of the RAM installed in the user’s computer. However, a lot of assumptions are involved here and the most dangerous one is: \[the results obtained in a test runed in a Windows x64 OS with 8GB RAM, can been used as a reference to infer that the percent (%) of the used memory controls the code performance when work with “big” files on VBA\].
20+
To achieve this, the first step is to study the performance of the VBA-CSV interface when parsing files with varying size, and then establish the recommended file size in function of the RAM installed in the user’s computer. However, a lot of assumptions are involved here and the most dangerous one is: \[the results obtained in a test runed in a Windows x64 OS with 8GB RAM, can been used as a reference to infer that the percent (%) of the used memory controls the code performance when work with “big” files on VBA\].
2121

2222
The previous assumption conduces to the following affirmation: as the peak memory used reach the limit of the available memory, VBA experiments a performance loss.
2323

@@ -32,35 +32,35 @@ The below graphs show the benchmark results for a set of CSV files with size ran
3232

3333
![PapaParse-Fastcsv-Benchmark](PapaParse-Fastcsv-Benchmark.png)
3434

35-
Up to here, we encountered the fact VBA CSV interface is a great contender when working with CSV files with size ranked from nearly 5 MB to nearly 384 MB.
35+
Up to here, we encountered the fact VBA-CSV interface is a great contender when working with CSV files with size ranked from nearly 5 MB to nearly 384 MB.
3636

3737
Let's see what's happened with the performance when the CSV files doubles its size.
3838

3939
![PapaParse-Fastcsv-Vrate](PapaParse-Fastcsv-Vrate.png)
4040

41-
The data shows that the size of the CSV file and the time required to parse them have a direct proportionality ratio: duplicating the size doubles the time required to parse the file. In the case of VBA CSV interface, proportionality is maintained only with CSV files with size in the range from almost 5 MB to almost 96 MB. Starting with CSV with more than 96 MB of content, when the file size is doubled, the time required to parse the next file is almost tripled.
41+
The data shows that the size of the CSV file and the time required to parse them have a direct proportionality ratio: duplicating the size doubles the time required to parse the file. In the case of VBA-CSV interface, proportionality is maintained only with CSV files with size in the range from almost 5 MB to almost 96 MB. Starting with CSV with more than 96 MB of content, when the file size is doubled, the time required to parse the next file is almost tripled.
4242

4343
Papa Parse was the tested faster solution, of course, VBA imposes a limited file size range. But, broadly speaking, we can say that when a user needs to work with really big CSV files Fast-csv can do the work as faster as Papa Parse can do it. Let’s see the next graph.
4444

4545
![Agains-PapaParse-DPPrate](Agains-PapaParse-DPPrate.png)
4646

4747
## CSV file size considerations
4848

49-
Due an analysis of the experimental results, using regression and direct proportionality ratios, the VBA CSV interface is supposed to be suitable to use at the limits shown in the below graph.
49+
Due an analysis of the experimental results, using regression and direct proportionality ratios, the VBA-CSV interface is supposed to be suitable to use at the limits shown in the below graph.
5050

5151
![File-size-limits](File-size-limits.png)
5252

5353
The above experimental results make smooth to arrive to the following conclusions:
5454

55-
* VBA CSV interface is suitable to almost any Microsoft Office Excel user. It can handle considerably large CSV files without any problem.
56-
* As the size of the CSV increases, VBA CSV interface has a performance drop.
57-
* With each available GB of RAM VBA CSV interface can parse CSV files sized up to 107.61 MB, then, the max file size for a specific machine can be estimated with the formula \[107.61 * Available RAM\] (results in MB).
58-
* VBA CSV interface can work at high performance with files sized to almost \[75.45 * Available RAM\] (results in MB).
59-
* Parse files sized over \[107.61 * Available RAM\] (results in MB) makes the VBA CSV interface a not reliable alternative.
55+
* VBA-CSV interface is suitable to almost any Microsoft Office Excel user. It can handle considerably large CSV files without any problem.
56+
* As the size of the CSV increases, VBA-CSV interface has a performance drop.
57+
* With each available GB of RAM VBA-CSV interface can parse CSV files sized up to 107.61 MB, then, the max file size for a specific machine can be estimated with the formula \[107.61 * Available RAM\] (results in MB).
58+
* VBA-CSV interface can work at high performance with files sized to almost \[75.45 * Available RAM\] (results in MB).
59+
* Parse files sized over \[107.61 * Available RAM\] (results in MB) makes the VBA-CSV interface a not reliable alternative.
6060

6161
>⚠️**Caution**
6262
>{: .text-grey-lt-000 .bg-green-000 }
6363
>Each the formulation presented here is experimental and can be wrong due the assumptions involved. There is a real chance to experiment unexpected behavior. User need to remember that parse a 2 GB sized CSV file can require up to 20 GB of Available RAM.
6464
>
65-
>The target CSV can't have size greater than 2GB. This because VBA CSV interface use `LONG` datatype when parsing. So, if you want to process files larger than 2GB, you'll need to use another solution instead.
65+
>The target CSV can't have size greater than 2GB. This because VBA-CSV interface use `LONG` datatype when parsing. So, if you want to process files larger than 2GB, you'll need to use another solution instead.
6666
{: .text-grey-dk-300 .bg-yellow-000 }

0 commit comments

Comments
 (0)