Skip to content

join on: d1.a IS NULL OR d1.a = d2.a #7439

@jan-swissre

Description

@jan-swissre

There does not seem to be easy way to perform join on conditions like those. We are therefore forced to use duckdb. It would be nice to provide data.table functionality to cover use cases like that.

library(data.table)
d1 = data.table(a = c(1:2,NA,4L), b = c(NA,2:4), x = 1:4)
d2 = data.table(a = c(1L,NA,3:4), b = c(1L,NA,3:4), y = 1:4)

# d1[d2] ??

## duckdb way
conn = duckdb::dbConnect(duckdb::duckdb(), dbdir = ":memory:")
duckdb::duckdb_register(conn, name = "d1", df = d1)
duckdb::duckdb_register(conn, name = "d2", df = d2)
ans = DBI::dbGetQuery(
  conn,
  "SELECT *
FROM d1
LEFT JOIN d2 ON (
  (d1.a IS NULL OR d1.a = d2.a)
  AND
  (d1.b IS NULL OR d1.b = d2.b)
)")
ans
#   a  b x  a  b  y
#1  1 NA 1  1  1  1
#2 NA  3 3  3  3  3
#3  4  4 4  4  4  4
#4  2  2 2 NA NA NA

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestjoinsUse label:"non-equi joins" for rolling, overlapping, and non-equi joins

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions