Skip to content

Semi join #103

@lucasxteixeira

Description

@lucasxteixeira

Hi,

I recently encountered an issue while using dplyr::semi_join with Clickhouse. The default code generated by dplyr produces a subquery with dependencies, and this isn't supported in Clickhouse (or am I wrong?). However, I noticed that Clickhouse does support LEFT SEMI JOIN. Consequently, I've wrote the following function to address this:

#' @export
#' @importFrom dbplyr sql_query_semi_join
sql_query_semi_join.ClickhouseConnection <- function(con, x, y, anti, by, where, vars, ..., lvl = 0) {

  x <- dbplyr:::dbplyr_sql_subquery(con, x, name = by$x_as, lvl = lvl)
  y <- dbplyr:::dbplyr_sql_subquery(con, y, name = by$y_as, lvl = lvl)

  on <- dbplyr:::sql_join_tbls(con, by)

  JOIN <- ifelse(anti, dplyr::sql("ANTI LEFT JOIN"), dplyr::sql("SEMI LEFT JOIN"))

  # Wrap with SELECT since callers assume a valid query is returned
  clauses <- list(
    dbplyr:::sql_clause_select(con, vars),
    dbplyr:::sql_clause_from(x),
    dbplyr:::sql_clause(JOIN, y),
    dbplyr:::sql_clause("ON", on, sep = " AND", parens = TRUE, lvl = 1)
  )
  dbplyr:::sql_format_clauses(clauses, lvl, con)
}

Nonetheless, I'm aware that my function uses some internal dbplyr functions, and I'm uncertain about the permissibility of this approach. Could someone provide some directions on how to refine this function for a potential PR?

Thank you in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions