Skip to content

Commit 98f87bf

Browse files
committed
Mentioning prior works in base jumping article
1 parent a235ec0 commit 98f87bf

File tree

1 file changed

+7
-0
lines changed

1 file changed

+7
-0
lines changed

docs/blog/csv_base_jumping.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -277,6 +277,13 @@ Note however that the specifics of the used hardware and filesystem must be take
277277

278278
Finding the optimal number of threads can also be a balancing act since using too many of them might put too much pressure on IO, counter-intuitively. Inter-thread communication and synchronization might also become a problem with too many threads.
279279

280+
*Prior work*
281+
282+
As pointed out on [Lobste.rs](https://lobste.rs/s/tbsdd4/cursed_engineering_jumping_randomly), there are some others libs/tools using a similar technique as the one described in this article:
283+
284+
* [CSV.jl](https://csv.juliadata.org), a CSV parsing library for `Julia`, has [CSV.Chunks](https://csv.juliadata.org/stable/reading.html#CSV.Chunks)
285+
* MySQL Shell has some parallel table import feature, as documented [here](https://dev.mysql.com/doc/mysql-shell/8.4/en/mysql-shell-utilities-parallel-table.html)
286+
280287
*Regarding grep*
281288

282289
Funnily enough, this logic (fast segmentation + parallelization) can easily be ported to `grep`-like tools. Finding the next line in a stream is way easier than finding the next CSV row (unless you jumped right in the middle of a `CRLF` pair, but I don't think this is such an issue). In fact you don't even need to collect a sample at the beginning of the file since you don't need to mind thorny CSV quotation rules. This could provide a nice boost also to process newline-delimited JSON files etc.

0 commit comments

Comments
 (0)