Skip to content

Comments

prelim unnesting#117

Merged
drizk1 merged 16 commits intomainfrom
unnesting
Mar 9, 2025
Merged

prelim unnesting#117
drizk1 merged 16 commits intomainfrom
unnesting

Conversation

@drizk1
Copy link
Member

@drizk1 drizk1 commented Feb 26, 2025

Some experimenting with unnesting has lead me here (wider only supports SQL STRUCTS for the time being, longer is more versatile)
Would still like to add

  • multiple column support
  • unnest_longer
  • tidy selection for multiple columns
  • docs
julia> @chain DB.db_table(db, "test/data.json")  begin 
            DB.@mutate(og = pos)
            DB.@select !name
            DB.@unnest_wider(pos)
            DB.@collect
       end
3×3 DataFrame
 Row │ lat      lon        og                          
     │ Float64  Float64?   NamedTup…?                  
─────┼─────────────────────────────────────────────────
   1 │    10.1       30.3  (lat = 10.1, lon = 30.3)
   2 │    10.2       30.2  (lat = 10.2, lon = 30.2)
   3 │    10.3  missing    (lat = 10.3, lon = missing)

Also pr adds dt

@drizk1
Copy link
Member Author

drizk1 commented Feb 27, 2025

julia> @chain db_table(db, :nt) begin 
          @unnest_wider data  
          @unnest_longer a b 
          @collect
       end
WITH cte_1 AS (
SELECT id, data.a AS a, data.b AS b
        FROM nt)  
SELECT id, unnest(a) AS a, unnest(b) AS b
        FROM cte_1
7×3 DataFrame
 Row │ id     a        b     
     │ Int32  Int32?   Int32 
─────┼───────────────────────
   1 │     1        1      3
   2 │     1        2      4
   3 │     2        5      7
   4 │     2        6      8
   5 │     2  missing      9
   6 │     3       10     12
   7 │     3       11     13

@drizk1
Copy link
Member Author

drizk1 commented Mar 2, 2025

stumbled across this blog post unnesting and joining data in duckdb and decided to see how much i could recreate w tidierdb. I ended up running into a strange edge case that was more on the duckdb side than julia side it think (couldnt unnest part of a struct because of the type?).

Anyway, part of this blog, explodes (?) a nested struct column into a new table by leveraging duckdbs unnest to create a new table. with minimal changes, i was able to make @transmute support this.

This is a bit of deviation from duckplyr, duckdb, or ibis ( while still maintaining the unnest_wider/unnest_longer features), but i think it might be a useful feature to have nonetheless (with documentation)

CREATE TABLE stations AS 
    SELECT u.* 
    FROM (SELECT UNNEST(items) AS u FROM stations_stg);

CREATE TABLE measures AS 
    SELECT u.* 
    FROM (SELECT UNNEST(items) AS u FROM measures_stg);

CREATE TABLE readings AS 
    SELECT u.* 
    FROM (SELECT UNNEST(items) AS u FROM readings_stg);

SELECT
      "r_\0": COLUMNS(r.*),
      "m_\0": COLUMNS(m.*),
      "s_\0": COLUMNS(s.*) 
       FROM
      readings r
      INNER JOIN m:measures ON r.measure = m."@id"
      INNER JOIN s:stations ON m.station = s."@id"
    LIMIT 1;

tidierdb equivalent below (used views for simplicity)

@chain DB.db_table(db, :measures_stg) begin
    DB.@transmute(unnest(items))
    DB.@create_view(meas_view)
end

@chain DB.db_table(db,:stations_stg ) begin
    DB.@transmute(unnest(items))
    DB.@create_view(stat_view, true)
end

@chain DB.db_table(db, :readings_stg) begin
    DB.@transmute(unnest(items))
    DB.@inner_join("meas_view", measure = "@id") 
    DB.@inner_join("stat_view", station = "@id") 
    @aside DB.@show_query  _
    DB.@collect
end

@drizk1 drizk1 merged commit b23a0f5 into main Mar 9, 2025
5 checks passed
@drizk1 drizk1 deleted the unnesting branch March 9, 2025 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant