Replies: 1 comment
-
Thanks, missed that. R is single-threaded, if you're writing in parallel, you could try writing in a way that allows you to recombine the data later. To avoid rewriting the entire Parquet file, can you write the deltas and compute the final dataset in a postprocessing step? I have a writeup at https://github.com/cynkra/historian . |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello!
The Problem
I'm trying do parallel computations in duckdb and then export the results (in R using windows). My actual dataset is quite massive and I need to do hundreds of computations every day, each result containing around 30k rows.
I am aware that duckdb can either have one read-write process or multiple read only processes. However concurrency is supported within a single process by threading (https://duckdb.org/docs/guides/python/multiple_threads). I am assuming this is not possible in R?
I could write in parallel to parquet but I often need to do backfills which is quite heavy operation since with parquet I need to open and write the whole dataset due to change in one column.
Second option is exporting to SQLite using the extension but in order for that to work I need to query the duckdb result in memory and then export to SQLite. In my reprex the second method gives lock errors.
This quite unambiguous problem and probably wrong place to ask but here you go
Reproducible Example
Created on 2025-01-03 with reprex v2.0.2
Beta Was this translation helpful? Give feedback.
All reactions