Replies: 10 comments 1 reply
-
|
Side note for this: if you have more than the main process writing to the database, you will need to deal with the access-contention. DuckDB supports either multiple readonly readers, or only one writeable connection, you cannot have one connection write while any other connection is reading. See Concurrency. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for pointing that out. This may be a deal-breaker for metadata. Redis seems like another good option if the profiling data shows a speedup. Though I don't think I'd make Redis the default because it can be a burden to install. |
Beta Was this translation helpful? Give feedback.
-
|
I have considered duckdb for many larger-scale projects where having a local quasi-permanent store would be good, but the concurrency issue is a deal-breaker, and not one they seem eager to remedy, unfortunately. Redis would be easy enough for many, but I agree that even as easy as it is, running a redis instance "just for this" might be more than some people prefer. If you're considering Redis but don't want the server overhead (and perhaps would like on-disk persistence), I haven't benchmarked it nor verified its concurrency (other than finding seppo0010/rlite#13 (comment)). Admittedly, it hasn't seen commits in many years, I don't know if that means it's awesome-stable or not. richfitx is the author/maintainer (also maintains |
Beta Was this translation helpful? Give feedback.
-
|
I have been profiling example pipelines, and the bottleneck seems to be reopening the |
Beta Was this translation helpful? Give feedback.
-
|
Maintaining a persistent connection seems to reduce execution time from around 60 seconds down to around 23 seconds on an M2 Mac in the following 10000-target pipeline: library(targets)
tar_option_set(
controller = crew::crew_controller_local(workers = 25L)
)
list(
tar_target(datasets, seq_len(1e4), memory = "persistent"),
tar_target(models, datasets, pattern = map(datasets), retrieval = "main")
) |
Beta Was this translation helpful? Give feedback.
-
|
There's still a ~30% bottleneck in |
Beta Was this translation helpful? Give feedback.
-
|
Negligible improvement with |
Beta Was this translation helpful? Give feedback.
-
This is only when targets complete instantaneously. Moving to a database may be necessary, but on reflection, it is a bit extreme. Converting to a discussion. |
Beta Was this translation helpful? Give feedback.
-
|
Since you had considered redis, have you looked at |
Beta Was this translation helpful? Give feedback.
-
|
I'm encountering a hard-to-track bug, I'm running a targets pipeline on a Shiny server hosted inside Azure App Service. Moving the metadata file to a proper database might be a solution. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
targetsuses simple text files for metadata (pipe-separated values). These files can get large in pipelines with many targets (#1390) and appending to them creates overhead intar_make(). Maybetargetscan instead use a DuckDB database for metadata.Beta Was this translation helpful? Give feedback.
All reactions