You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,20 +6,20 @@
6
6
VBA CSV interface is the most complete, and open source, CSV/TSV VBA parser library nowadays. The library is RFC-4180 compliant and enables users to manipulate CSV content at the highest speed. All the modules were developed to accomplish the data exchange task with the greatest performance and to grant an easy use.
7
7
8
8
## Advantages
9
-
*__Stable__. Fully Test Driven Developed (TDD) library, ([50/50 test passed](https://github.com/ws-garcia/VBA-CSV-interface/blob/master/testing/tests/results/)), that includes 500+ line of code for testing. See [VBA test library by Tim Hall](https://github.com/ws-garcia/vba-test).
9
+
*__RFC-4180 specs compliant__.
10
+
*__Stable__. Fully Test Driven Developed (TDD) library, ([60/60 test passed](https://github.com/ws-garcia/VBA-CSV-interface/blob/master/testing/tests/results/)), that includes 650+ line of code for testing. See [VBA test library by Tim Hall](https://github.com/ws-garcia/vba-test).
10
11
*__Fast__. Writes and reads files at the highest speed.
11
12
*__Memory-friendly__. CSV/[TSV](https://www.iana.org/assignments/media-types/text/tab-separated-values) files are processed using a custom stream technique, only 0.5MB are in memory at a time.
12
13
*__Robust__. Parser and writer accept [Unix-style quotes escape sequences](https://www.loc.gov/preservation/digital/formats/fdd/fdd000323.shtml#notes).
13
14
*__Easy to use__. A few lines of code can do the work!
15
+
*__Automatic delimiter guesser__. Don't worry if you forgot the file configuration. The interface has a solid strategy for guessing delimiters!
14
16
*__Highly Configurable__. User can configure the parser to work with a wide range of CSV files.
15
17
*__CSV data subsetting__. Split CSV data into a set of files with related data.
16
18
*__Like SQL queries on CSV files__. Add your own logic to mimic SQL queries and filter data by criteria (=, <>, >=, <=, AND, OR).
17
-
*__Automatic delimiter guesser__. Don't worry if you forgot the file configuration!
18
19
*__Flexible__. Import only certain range of records from the given file, import fields (columns) by indexes or names, read records in sequential mode.
19
20
*__Dynamic Typing support__. Turn CSV data field to a desired VBA data type.
20
21
*__Data sorting__. Sort CSV imported data using the hyper-fast(100k records per second) [Yaroslavskiy Dual-Pivot Quicksort](https://web.archive.org/web/20151002230717/http://iaroslavski.narod.ru/quicksort/DualPivotQuicksort.pdf) like Java.
21
22
*__Microsoft Access compatible__. The library has a version for those who feel in comfort working through DAO databases, [download from here](https://github.com/ws-garcia/VBA-CSV-interface/raw/master/src/Access_version.zip).
The benchmark provided here is focused on the supposed most critical operation, this is the parse one for many authors.
164
164
165
-
The class was tested against two solutions (the one from [@Senipah](https://github.com/Senipah/VBA-Better-Array) and the other from [@sdkn104](https://github.com/sdkn104/VBA-CSV)) using a laptop running `Win 10 Pro x64, Intel® Core™ i7-4500U CPU @1.80-2.40 GHz, 8 GB RAM, Excel 2019 x86`. We will call the import procedure over different files, increasing the file size, and the number of record per file, in each subsequent call. The CSV files are:
165
+
The class was tested using a laptop running `Win 10 Pro x64, Intel® Core™ i7-4500U CPU @1.80-2.40 GHz, 8 GB RAM, Excel 2019 x86`. We will call the import procedure over different files, increasing the file size, and the number of record per file, in each subsequent call. The CSV files are:
166
166
167
167
<table>
168
168
<thead>
@@ -204,19 +204,9 @@ The images below shows the overall performance for the imports operations from t
204
204
205
205

206
206
207
-
The benchmarks from the above charts are compared in the following chart:
207
+
### Conclusion
208
208
209
-

210
-
211
-
Finally, the below chart shows the overheat for the Sorting and Dynamic Typing operations. These features are available on the VBA CSV interface since its version 3.
212
-
213
-

214
-
215
-
### Conclusions
216
-
217
-
-`ImportFromCSV` is the faster one import method.
218
-
- The CSV syntax slow-down the performance. When the number of escaped fields are increased, the performance decrease, this is especially noticeable for the @sdkn104 solution.
219
-
- The Dynamic Typing causes more overheat than the Sort operation. This can be explained by the great performance of the Yaroslavskiy sorting algorithm used.
209
+
- The CSV syntax slow-down the performance. When the number of escaped fields are increased, the performance decrease.
Copy file name to clipboardExpand all lines: docs/home/rules.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,23 +30,23 @@ In the table bellow all the rules of [RFC-4180](https://www.ietf.org/rfc/rfc4180
30
30
</tr>
31
31
<tr>
32
32
<tdstyle="text-align: left;"><em>There maybe an optional header line appearing as<br> the first line of the file with the same format<br> as normal record lines. This header will contain<br> names corresponding to the fields in the file and<br> should contain the same number of fields as the<br> records in the rest of the file.</em></td>
33
-
<tdstyle="text-align: left;">In the same way. The presence or absence of the<br> header line should be indicated via the optional<br> "HeadersOmission" parameter.</td>
33
+
<tdstyle="text-align: left;">In the same way. The presence or absence of the<br> header line should be indicated via the option<br> <code>headersOmission</code>.</td>
34
34
</tr>
35
35
<tr>
36
36
<tdstyle="text-align: left;"><em>Within the header and each record, there may be<br> one or more fields, separated by commas. Each<br> line should contain the same number of fields<br> throughout the file. Spaces are considered part<br> of a field and should not be ignored. The last<br> field in the record must not be followed by a<br> comma.</em></td>
37
37
<tdstyle="text-align: left;">The class accepts CSV files with different numbers<br> of fields per record. The spaces betwen the<br> fields separator char and a single field is ignored<br> only if that field is enclosed in double quotes.</td>
38
38
</tr>
39
39
<tr>
40
40
<tdstyle="text-align: left;"><em>Each field may or may not be enclosed in double<br> quotes (however some programs, such as Microsoft<br> Excel, do not use double quotes at all). If<br> fields are not enclosed with double quotes, then<br> double quotes may not appear inside the fields</em></td>
41
-
<tdstyle="text-align: left;">In the same way. The class accepts also the<br> apostrophe char for indicate fields needing to<br> be escaped. It's important to notice that a<br> single CSV record may have fields enclosed and<br> not enclosed by the escape char.</td>
41
+
<tdstyle="text-align: left;">In the same way. The class accepts also the<br> apostrophe and tilde char for indicate fields needing to<br> be escaped. It's important to<br> notice that a single CSV record may have fields enclosed and<br> not enclosed by the escape char.</td>
42
42
</tr>
43
43
<tr>
44
44
<tdstyle="text-align: left;"><em>Fields containing line breaks (CRLF), double<br> quotes, and commas should be enclosed in double<br> quotes</em></td>
45
45
<tdstyle="text-align: left;">In the same way. Also accepts fields enclosed by<br> the apostrophe char.</td>
46
46
</tr>
47
47
<tr>
48
48
<tdstyle="text-align: left;"><em>If double-quotes are used to enclose fields, then<br> a double-quote appearing inside a field must be<br> escaped by preceding it with another double quote.</em></td>
49
-
<tdstyle="text-align: left;">Ignored rule. The class accepts the apostrophe<br> as escape char, and follow the specs claims<br> may cause conflict with some abbreviate US<br> slangs (e.g.: "<strong>isn't</strong>").</td>
49
+
<tdstyle="text-align: left;">In the same way. The class also accepts a Unix-style quote escape by preceding the quote with a<br> backslash (<strong>"\"</strong>").</td>
Copy file name to clipboardExpand all lines: docs/index.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,17 +13,17 @@ VBA CSV interface is the most complete, and open source, CSV/TSV VBA parser libr
13
13
{: .fs-6 .fw-300 }
14
14
15
15
## Advantages
16
-
*__Stable__. Fully Test Driven Developed (TDD) library, ([50/50 test passed](https://github.com/ws-garcia/VBA-CSV-interface/blob/master/testing/tests/results/)), that includes 500+ line of code for testing. See [VBA test library by Tim Hall](https://github.com/ws-garcia/vba-test).
16
+
*__RFC-4180 specs compliant__.
17
+
*__Stable__. Fully Test Driven Developed (TDD) library, ([60/60 test passed](https://github.com/ws-garcia/VBA-CSV-interface/blob/master/testing/tests/results/)), that includes 650+ line of code for testing. See [VBA test library by Tim Hall](https://github.com/ws-garcia/vba-test).
17
18
*__Fast__. Writes and reads files at the highest speed.
18
19
*__Memory-friendly__. CSV/[TSV](https://www.iana.org/assignments/media-types/text/tab-separated-values) files are processed using a custom stream technique, only 0.5MB are in memory at a time.
19
-
*__Robust__. Parser and writer accept [Unix-style quotes escape sequences](https://www.loc.gov/preservation/digital/formats/fdd/fdd000323.shtml#notes).
20
+
*__Robust__. Parser and writer accept [Unix-style quotes escape sequences](https://www.loc.gov/preservation/digital/formats/fdd/fdd000323.shtml#notes).
20
21
*__Easy to use__. A few lines of code can do the work!
22
+
*__Automatic delimiter guesser__. Don't worry if you forgot the file configuration. The interface has a solid strategy for guessing delimiters!
21
23
*__Highly Configurable__. User can configure the parser to work with a wide range of CSV files.
22
24
*__CSV data subsetting__. Split CSV data into a set of files with related data.
23
25
*__Like SQL queries on CSV files__. Add your own logic to mimic SQL queries and filter data by criteria (=, <>, >=, <=, AND, OR).
24
-
*__Automatic delimiter guesser__. Don't worry if you forgot the file configuration!
25
26
*__Flexible__. Import only certain range of records from the given file, import fields (columns) by indexes or names, read records in sequential mode.
26
27
*__Dynamic Typing support__. Turn CSV data field to a desired VBA data type.
27
28
*__Data sorting__. Sort CSV imported data using the hyper-fast(100k records per second) [Yaroslavskiy Dual-Pivot Quicksort](https://web.archive.org/web/20151002230717/http://iaroslavski.narod.ru/quicksort/DualPivotQuicksort.pdf) like Java.
28
-
*__Microsoft Access compatible__. The library has a version for those who feel in comfort working through DAO databases, [download from here](https://github.com/ws-garcia/VBA-CSV-interface/raw/master/src/Access_version.zip).
29
-
*__RFC-4180 specs compliant__.
29
+
*__Microsoft Access compatible__. The library has a version for those who feel in comfort working through DAO databases, [download from here](https://github.com/ws-garcia/VBA-CSV-interface/raw/master/src/Access_version.zip).
Copy file name to clipboardExpand all lines: docs/limitations/csv_file_size.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,7 @@ VBA is the version of Visual Basic shipped with Microsoft Office. At this time,
15
15
16
16
At the same time, Microsoft aims the 32-bit version of Excel is [limited to 4GB of RAM](https://docs.microsoft.com/en-us/office/troubleshoot/excel/laa-capability-change), on x64 OS’s, for most recent versions and to 2GB for versions up to 2013. Again, VBA can’t avoid this limitation.
17
17
18
-
By the fact CSV interface works, primarily, with strings and [VBA uses 10 bytes + string length](https://docs.microsoft.com/en-us/office/vba/language/reference/user-interface-help/data-type-summary) for store this data type, a huge working load can let application run out of memory. Then, it’s crucial to set some boundary over the file size users can work from the CSV interface library.
18
+
By the fact CSV interface works, primarily, with `Variant/Strings` and [VBA uses 10 bytes + string length](https://docs.microsoft.com/en-us/office/vba/language/reference/user-interface-help/data-type-summary) for store this data type, a huge working load can let application run out of memory. Then, it’s crucial to set some boundary over the file size users can work from the CSV interface library.
19
19
20
20
To achieve this, the first step is to study the performance of the VBA-CSV interface when parsing 5K records from the top and from the end of files with varying size, and then establish the recommended maximum amount of data that can be handled from the interface.
0 commit comments