Skip to content

dplyr grouped distinct does not split-apply-combine calculations, but dbplyr does #1081

@machow

Description

@machow

Hello--it appears that dplyr's grouped distinct applies calculations to the whole frame, while dbplyr uses a partition to apply calculations within groups.

reprex using dplyr v1.0.10:

library(dplyr, warn.conflicts = FALSE)
library(dbplyr, warn.conflicts = FALSE)

df <- tibble(g = c("a", "a", "b"), x = c(1, 2, 3))
df %>% group_by(g) %>% distinct(avg = mean(x))
#> # A tibble: 2 × 2
#> # Groups:   g [2]
#>   g       avg
#>   <chr> <dbl>
#> 1 a         2
#> 2 b         2

memdb_frame(df) %>% group_by(g) %>% distinct(avg = mean(x, na.rm=TRUE)) %>% show_query()
#> <SQL>
#> SELECT DISTINCT `g`, AVG(`x`) OVER (PARTITION BY `g`) AS `avg`
#> FROM `dbplyr_001`

Created on 2022-10-06 by the reprex package (v2.0.1)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugan unexpected problem or unintended behavior

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions