Replies: 2 comments
-
|
This is a bit fragile, but library(targets)
tar_script(
{
library(targets)
library(tarchetypes)
tar_option_set(packages = c("dplyr"))
list(
tar_group_by(
data,
tibble(group = rep(letters[1:10], each = 10), x = rnorm(100)),
group,
iteration = "group"
),
tar_target(
huge_results,
data |>
summarise(group = group[1], really_big = list(x^2)),
pattern = map(data)
),
tar_target(
tiny_summaries,
huge_results |>
mutate(really_small = mean(unlist(really_big))) |>
select(group, really_small),
pattern = map(huge_results)
),
tar_target(
select_index,
order(tiny_summaries$really_small)[1:2]
),
tar_target(
huge_subset,
{
index <- readRDS(tar_path_target("select_index"))
objects <- tar_branches(huge_results)$huge_results[index]
lapply(objects, function(x) {
# Using this internal because I can't figure out how deal with `tar_path_target()`'s non-standard eval
readRDS(tar_runtime_object()$meta$get_record(x)$path)
}) |>
bind_rows()
},
retrieval = "none"
)
)
},
ask = FALSE
)
tar_make()
tar_read(huge_subset) |
Beta Was this translation helpful? Give feedback.
0 replies
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment


Uh oh!
There was an error while loading. Please reload this page.
-
Help
Description
I'm trying to find a way to efficiently retrieve a subset of results from a large target in a pipeline without having to load the entire target into memory. Specifically, I have a target that produces list columns with a lot of data, and I want to filter specific elements based on some criteria computed in much smaller summary target. I then want to have additional targets downstream working on the filtered subset.
It's straight forward to do this using
tar_read()outside of the pipeline. But this approach would mean that I'd have to split up the project into multiple targets projects, which is inconvenient. Another approach might be switching to static branching, but that would become unwieldy with my actual data.Is there any way to achieve this within a single
targetspipeline?Here's a toy example to clarify what I'm after:
Created on 2025-11-17 with reprex v2.1.1
Beta Was this translation helpful? Give feedback.
All reactions