Skip to content

[BUG] Incorrect dtplyr translation of inner_join(multiple = "any") #488

@Nick-Eagles

Description

@Nick-Eagles

Hello,

In short, the multiple = "any" argument to inner_join() is ignored when translating the initial dplyr code to data.table, resulting in incorrect output without any warnings or errors.

Here's a minimal reprex:

library(dtplyr)
library(dplyr)
library(data.table)

a = tibble(barcode_id = c(3, 1))
b = tibble(
    gene_id = c("gene1", "gene2", "gene1"),
    barcode_id = c(1, 1, 3),
    cluster_id = c(2, 2, 5)
)

expected_result = a |>
    inner_join(b, by = 'barcode_id', multiple = 'any')
dtplyr_result = a |>
    data.table() |>
    lazy_dt() |>
    inner_join(b, by = 'barcode_id', multiple = 'any') |>
    as_tibble()

Here's expected_result:

  barcode_id gene_id cluster_id
       <dbl> <chr>        <dbl>
1          3 gene1            5
2          1 gene1            2

In contrast, the actual output dtplyr_result incorrectly keeps all rows in a:

  barcode_id gene_id cluster_id
       <dbl> <chr>        <dbl>
1          1 gene1            2
2          1 gene2            2
3          3 gene1            5

It's totally understandable that the logic behind every single parameter in every single dplyr verb is not yet implemented, but I found it quite concerning that there were no apparent checks in place to warn or error when an unimplemented parameter was detected (i.e. multiple here). The silent failure makes dtplyr output difficult to trust more generally, especially when complex dplyr starting code is used.

Thanks for the development of this package, as it clearly addresses a highly important purpose-- getting performant dplyr code with very little additional effort. I verified I'm using dtplyr 1.3.1 here.

Best,
-Nick

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions