-
Notifications
You must be signed in to change notification settings - Fork 185
Description
bind_rows() in dplyr allows efficiently combining a list of data frames into a single data frame by wrapping vec_rbind. This does not work for dbplyr tbl objects.
This is a common pattern so it would be good to implement it without needing to collect the data locally. The SQL equivalent is UNION, so a way to implement it would be to reduce the list using the existing union or union_all (I think dbplyr's implementation would match union_all) translation, e.g.
library(dplyr, warn.conflicts = FALSE)
#> Warning: package 'dplyr' was built under R version 4.2.3
library(purrr)
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
# Create random slices of mtcars
tables <-
1:5 |>
map(function(n) {
tbl_name <- paste0("mtcars_", n)
copy_to(con, slice_sample(mtcars, n = 3), tbl_name)
tbl(con, tbl_name)
})
reduce(tables, union) |>
show_query()
#> <SQL>
#> SELECT *
#> FROM `mtcars_1`
#> UNION
#> SELECT *
#> FROM `mtcars_2`
#> UNION
#> SELECT *
#> FROM `mtcars_3`
#> UNION
#> SELECT *
#> FROM `mtcars_4`
#> UNION
#> SELECT *
#> FROM `mtcars_5`Created on 2023-08-02 with reprex v2.0.2
This would be a straightforward way to implement it, and it produces a good translation that is similar to what someone would manually write. I think some checks would need to be done when using select * to ensure that column order matches, as some SQL engines check only that column types align, not column names.
It would be good also to support the .id argument to append a column with the list name for each item. This could be achieved by running mutate on each item prior to the reduce operation.