Skip to content

Multiple joins of same table #148

@rgreminger

Description

@rgreminger

I think I found another edge case. Below are two contrived examples for cases where I first want to extract some data/variable from dt1 and merge it into dt2. Then I want to do something with that variable in dt2 (e.g., define another variable) and merge the result back to the original main table dt1.

using DataFrames
using TidierData
import TidierDB as DB

df1 = DataFrame(id = [string('A' + i ÷ 26, 'A' + i % 26) for i in 0:9], 
                          value0 = [i % 2 == 0 ? "aa" : "bb" for i in 1:10], 
                          value1 = repeat(1:5, 2)) 

df2 = DataFrame(id = df1.id, value2 = rand(10))

db = DB.connect(DB.duckdb())

DB.copy_to(db, df1, "t1");
DB.copy_to(db, df2, "t2");

dt1 = @chain DB.dt(db, "t1") begin
    DB.@group_by(value0)
    DB.@summarize(id = first(id), value1 = sum(value1))
end

dt2 = @chain DB.dt(db, "t2") begin
    DB.@inner_join(DB.t(dt1), id == id)
    DB.@distinct() 
end

df_merged = @chain DB.dt(db, "t1") begin
    DB.@select(id, value0) 
    DB.@inner_join(dt2, id == id)
    @aside DB.@show_query _
    DB.@collect()
end

WITH j1j1cte_1 AS (
SELECT value0, first(id) AS id, SUM(value1) AS value1
        FROM  t1
        GROUP BY value0),
j1cte_1 AS (
SELECT  DISTINCT value2, value0, value1
        FROM t2
        INNER JOIN j1cte_1 ON t2.id = j1cte_1.id),
j1cte_2 AS (
SELECT *
        FROM  j1cte_1),
cte_1 AS (
SELECT  COALESCE(t1.id, j1cte_2.id) AS id, t1.value0, j1cte_2.value2, j1cte_2.value0, j1cte_2.value1
        FROM t1
        INNER JOIN j1cte_2 ON t1.id = j1cte_2.id)  
SELECT *
        FROM cte_1
ERROR: Binder Error: There is a WITH item named "j1cte_1", but it cannot be referenced from this part of the query.
Stacktrace:
....
df1 = DataFrame(id = [string('A' + i ÷ 26, 'A' + i % 26) for i in 0:9], 
                          value0 = [i % 2 == 0 ? "aa" : "bb" for i in 1:10], 
                          value1 = repeat(1:5, 2)) 

df2 = DataFrame(id = df1.id, value2 = rand(10))

db = DB.connect(DB.duckdb())

DB.copy_to(db, df1, "t1");
DB.copy_to(db, df2, "t2");

dt1 = @chain DB.dt(db, "t1") begin
    DB.@distinct() 
end

dt2 = @chain DB.dt(db, "t2") begin
    DB.@inner_join(DB.t(dt1), id)
end

df_merged = @chain DB.dt(db, "t1") begin
    DB.@inner_join(DB.t(dt2), id == id)
    @aside DB.@show_query _
    DB.@collect()
end

WITH j1j1cte_1 AS (
SELECT  DISTINCT id, value0, value1
        FROM t1),
j1j1cte_2 AS (
SELECT *
        FROM  j1cte_1),
j1cte_1 AS (
SELECT COALESCE(t2.id, j1cte_2.id) AS id, t2.value2, j1cte_2.value0, j1cte_2.value1
        FROM  j1cte_0)  
SELECT COALESCE(t1.id, j1cte_1.id) AS id, t1.value0, t1.value1, j1cte_1.value2, j1cte_1.value0, j1cte_1.value1
        FROM t1
        INNER JOIN j1cte_1 ON t1.id = j1cte_1.id
ERROR: Catalog Error: Table with name j1cte_0 does not exist!
Did you mean "sqlite_master"?

LINE 1: ....id) AS id, t2.value2, j1cte_2.value0, j1cte_2.value1 FROM  j1cte_0)  SELECT COALESCE(t1.id, j1cte_1.id) AS id, t1.value0...
                                                                       ^
Stacktrace:
....

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions