Skip to content

Conversation

@mchav
Copy link

@mchav mchav commented Nov 20, 2025

Also tested out that this works end to end on a c6id.4xlarge instance.

Copy link
Collaborator

@Tmonster Tmonster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this! Looks pretty good to me. I tried to test it on my own c6id.4xlarge instance and was getting errors at the build step. Potentially the dataframe library has been updated recently?

This is the error I saw

Downloading the latest package list from hackage.haskell.org
Package list of hackage.haskell.org is up to date.
The index-state is set to 2026-01-02T18:38:39Z.
Build profile: -w ghc-9.4.7 -O2
In order, the following will be built (use -v for more details):
 - haskell-benchmark-0.1.0.0 (exe:groupby-haskell) (first run)
 - haskell-benchmark-0.1.0.0 (exe:join-haskell) (first run)
Preprocessing executable 'join-haskell' for haskell-benchmark-0.1.0.0...
Preprocessing executable 'groupby-haskell' for haskell-benchmark-0.1.0.0...
Building executable 'join-haskell' for haskell-benchmark-0.1.0.0...
Building executable 'groupby-haskell' for haskell-benchmark-0.1.0.0...
[1 of 1] Compiling Main             ( groupby-haskell.hs, /var/lib/mount/db-benchmark-metal/haskell/dist-newstyle/build/x86_64-linux/ghc-9.4.7/haskell-benchmark-0.1.0.0/x/groupby-haskell/opt/build/groupby-haskell/groupby-haskell-tmp/Main.o )
[1 of 1] Compiling Main             ( join-haskell.hs, /var/lib/mount/db-benchmark-metal/haskell/dist-newstyle/build/x86_64-linux/ghc-9.4.7/haskell-benchmark-0.1.0.0/x/join-haskell/opt/build/join-haskell/join-haskell-tmp/Main.o )

join-haskell.hs:126:34: error:
    • Couldn't match expected type ‘D.Expr Double’
                  with actual type ‘T.Text’
    • In the first argument of ‘D.columnAsDoubleVector’, namely
        ‘(T.pack name)’
      In the expression: D.columnAsDoubleVector (T.pack name) df
      In the expression:
        case D.columnAsDoubleVector (T.pack name) df of
          Right vec -> VU.sum vec
          Left _ -> 0.0
    |
126 |     case D.columnAsDoubleVector (T.pack name) df of
    |                                  ^^^^^^^^^^^

groupby-haskell.hs:175:31: error:
    • Couldn't match expected type ‘D.Expr Int’ with actual type ‘Text’
    • In the first argument of ‘D.columnAsIntVector’, namely
        ‘(T.pack col)’
      In the expression: D.columnAsIntVector (T.pack col) df
      In the expression:
        case D.columnAsIntVector (T.pack col) df of
          Right vec -> fromIntegral $ VU.sum vec
          Left _ -> 0.0
    |
175 |     case D.columnAsIntVector (T.pack col) df of
    |                               ^^^^^^^^^^

groupby-haskell.hs:181:34: error:
    • Couldn't match expected type ‘D.Expr Double’
                  with actual type ‘Text’
    • In the first argument of ‘D.columnAsDoubleVector’, namely
        ‘(T.pack col)’
      In the expression: D.columnAsDoubleVector (T.pack col) df
      In the expression:
        case D.columnAsDoubleVector (T.pack col) df of
          Right vec -> VU.sum vec
          Left _ -> 0.0
    |
181 |     case D.columnAsDoubleVector (T.pack col) df of
    |                                  ^^^^^^^^^^
Error: [Cabal-7125]

@mchav
Copy link
Author

mchav commented Jan 3, 2026

@Tmonster updated the implementation. I pinned it to a major version so it doesn't get broken by version updates.

@Tmonster
Copy link
Collaborator

Hi @mchav, seems like some other package got updated causing the regression tests to start failing. I'm gonna try and fix that first, then I'll go ahead and merge this. Also, the DuckDB release was pushed back a week, so results will therefore also be about a week later

@mchav
Copy link
Author

mchav commented Jan 19, 2026

@Tmonster alright. I noticed the failures in the last CI check were about trailing commas I had left in some R files. I made sure to fix those as well.

@Tmonster
Copy link
Collaborator

Hi @mchav,

Thanks, was going to mention an issue with ver-haskell.hs but looks like you solved it. I think something may be wrong with the join script though? I ran it myself and got the following errors.

Seems like something is wrong with how the join data file names are read/parsed?

ubuntu@ip-172-31-22-80:/var/lib/mount/db-benchmark-metal$ cat out/run_haskell_join_J1_1e7_NA_0_0.err
join-haskell: ./data/J1_1e7_10e4_0_0.csv: openBinaryFile: does not exist (No such file or directory)
ubuntu@ip-172-31-22-80:/var/lib/mount/db-benchmark-metal$ ls data
G1_1e7_1e2_0_0.csv  J1_1e7_1e1_0_0.csv  J1_1e7_1e4_0_0.csv  J1_1e7_1e7_0_0.csv  J1_1e7_NA_0_0.csv

@mchav
Copy link
Author

mchav commented Jan 19, 2026

@Tmonster was a small bug when inferring how to replace the NA. Should be fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants