-
Couldn't load subscription status.
- Fork 185
Open
Labels
bugan unexpected problem or unintended behavioran unexpected problem or unintended behavior
Milestone
Description
Hello--it appears that dplyr's grouped distinct applies calculations to the whole frame, while dbplyr uses a partition to apply calculations within groups.
reprex using dplyr v1.0.10:
library(dplyr, warn.conflicts = FALSE)
library(dbplyr, warn.conflicts = FALSE)
df <- tibble(g = c("a", "a", "b"), x = c(1, 2, 3))
df %>% group_by(g) %>% distinct(avg = mean(x))
#> # A tibble: 2 × 2
#> # Groups: g [2]
#> g avg
#> <chr> <dbl>
#> 1 a 2
#> 2 b 2
memdb_frame(df) %>% group_by(g) %>% distinct(avg = mean(x, na.rm=TRUE)) %>% show_query()
#> <SQL>
#> SELECT DISTINCT `g`, AVG(`x`) OVER (PARTITION BY `g`) AS `avg`
#> FROM `dbplyr_001`Created on 2022-10-06 by the reprex package (v2.0.1)
Metadata
Metadata
Assignees
Labels
bugan unexpected problem or unintended behavioran unexpected problem or unintended behavior