Add Haskell implementation to benchmark #144

mchav · 2025-11-20T10:47:22Z

Also tested out that this works end to end on a c6id.4xlarge instance.

* Add Haskell dataframe benchmark entry --------- Co-authored-by: Claude <noreply@anthropic.com>

Tmonster

Thank you for this! Looks pretty good to me. I tried to test it on my own c6id.4xlarge instance and was getting errors at the build step. Potentially the dataframe library has been updated recently?

This is the error I saw

Downloading the latest package list from hackage.haskell.org
Package list of hackage.haskell.org is up to date.
The index-state is set to 2026-01-02T18:38:39Z.
Build profile: -w ghc-9.4.7 -O2
In order, the following will be built (use -v for more details):
 - haskell-benchmark-0.1.0.0 (exe:groupby-haskell) (first run)
 - haskell-benchmark-0.1.0.0 (exe:join-haskell) (first run)
Preprocessing executable 'join-haskell' for haskell-benchmark-0.1.0.0...
Preprocessing executable 'groupby-haskell' for haskell-benchmark-0.1.0.0...
Building executable 'join-haskell' for haskell-benchmark-0.1.0.0...
Building executable 'groupby-haskell' for haskell-benchmark-0.1.0.0...
[1 of 1] Compiling Main             ( groupby-haskell.hs, /var/lib/mount/db-benchmark-metal/haskell/dist-newstyle/build/x86_64-linux/ghc-9.4.7/haskell-benchmark-0.1.0.0/x/groupby-haskell/opt/build/groupby-haskell/groupby-haskell-tmp/Main.o )
[1 of 1] Compiling Main             ( join-haskell.hs, /var/lib/mount/db-benchmark-metal/haskell/dist-newstyle/build/x86_64-linux/ghc-9.4.7/haskell-benchmark-0.1.0.0/x/join-haskell/opt/build/join-haskell/join-haskell-tmp/Main.o )

join-haskell.hs:126:34: error:
    • Couldn't match expected type ‘D.Expr Double’
                  with actual type ‘T.Text’
    • In the first argument of ‘D.columnAsDoubleVector’, namely
        ‘(T.pack name)’
      In the expression: D.columnAsDoubleVector (T.pack name) df
      In the expression:
        case D.columnAsDoubleVector (T.pack name) df of
          Right vec -> VU.sum vec
          Left _ -> 0.0
    |
126 |     case D.columnAsDoubleVector (T.pack name) df of
    |                                  ^^^^^^^^^^^

groupby-haskell.hs:175:31: error:
    • Couldn't match expected type ‘D.Expr Int’ with actual type ‘Text’
    • In the first argument of ‘D.columnAsIntVector’, namely
        ‘(T.pack col)’
      In the expression: D.columnAsIntVector (T.pack col) df
      In the expression:
        case D.columnAsIntVector (T.pack col) df of
          Right vec -> fromIntegral $ VU.sum vec
          Left _ -> 0.0
    |
175 |     case D.columnAsIntVector (T.pack col) df of
    |                               ^^^^^^^^^^

groupby-haskell.hs:181:34: error:
    • Couldn't match expected type ‘D.Expr Double’
                  with actual type ‘Text’
    • In the first argument of ‘D.columnAsDoubleVector’, namely
        ‘(T.pack col)’
      In the expression: D.columnAsDoubleVector (T.pack col) df
      In the expression:
        case D.columnAsDoubleVector (T.pack col) df of
          Right vec -> VU.sum vec
          Left _ -> 0.0
    |
181 |     case D.columnAsDoubleVector (T.pack col) df of
    |                                  ^^^^^^^^^^
Error: [Cabal-7125]

haskell/setup-haskell.sh

time.csv

…ectory.

mchav · 2026-01-03T16:18:26Z

@Tmonster updated the implementation. I pinned it to a major version so it doesn't get broken by version updates.

Tmonster · 2026-01-19T06:30:50Z

Hi @mchav, seems like some other package got updated causing the regression tests to start failing. I'm gonna try and fix that first, then I'll go ahead and merge this. Also, the DuckDB release was pushed back a week, so results will therefore also be about a week later

mchav · 2026-01-19T06:46:06Z

@Tmonster alright. I noticed the failures in the last CI check were about trailing commas I had left in some R files. I made sure to fix those as well.

Tmonster · 2026-01-19T20:54:21Z

Hi @mchav,

Thanks, was going to mention an issue with ver-haskell.hs but looks like you solved it. I think something may be wrong with the join script though? I ran it myself and got the following errors.

Seems like something is wrong with how the join data file names are read/parsed?

ubuntu@ip-172-31-22-80:/var/lib/mount/db-benchmark-metal$ cat out/run_haskell_join_J1_1e7_NA_0_0.err
join-haskell: ./data/J1_1e7_10e4_0_0.csv: openBinaryFile: does not exist (No such file or directory)
ubuntu@ip-172-31-22-80:/var/lib/mount/db-benchmark-metal$ ls data
G1_1e7_1e2_0_0.csv  J1_1e7_1e1_0_0.csv  J1_1e7_1e4_0_0.csv  J1_1e7_1e7_0_0.csv  J1_1e7_NA_0_0.csv

mchav · 2026-01-19T21:24:46Z

@Tmonster was a small bug when inferring how to replace the NA. Should be fixed now.

mchav and others added 7 commits November 19, 2025 13:03

Add Haskell implementation (#1)

75f608c

* Add Haskell dataframe benchmark entry --------- Co-authored-by: Claude <noreply@anthropic.com>

feat: Add flushed file writing.

e20c882

chore: Reformat files to be more readable.

ac21b23

chore: Remove stray comments

efaf2ba

fix: Force evaluation of the dataframe before running script.

96f1bb9

fix: Remove strict since it means we do more unnecessary work.

5447f71

Add results for G1_1e7_1e2_0_0 groupby

808a4c8

mchav mentioned this pull request Nov 29, 2025

Create a benchmark in duckdblabs db-benchmark DataHaskell/dataframe#115

Open

Tmonster reviewed Jan 2, 2026

View reviewed changes

haskell/setup-haskell.sh Show resolved Hide resolved

time.csv Outdated Show resolved Hide resolved

mchav added 5 commits January 3, 2026 07:50

fix: Pin to a major version of dataframe.

58e1a8d

fix: Make file run relative to root instead of CDing into haskell dir…

1ee6f49

…ectory.

fix: Source ghcup encironment before calling it.

b7cc815

fix: Merge conflicts

bfe6d0e

Merge branch 'main' into main

09f16a0

fix: Remove trailing commas

76a5094

fix: Upgrade script was stale.

09ec2a0

fix: Properly strip zeroes when converting to scientific.

2965de1

fix: Properly write csv file if none exists.

472dd0f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Haskell implementation to benchmark #144

Add Haskell implementation to benchmark #144

Uh oh!

mchav commented Nov 20, 2025

Uh oh!

Tmonster left a comment

Uh oh!

Uh oh!

Uh oh!

mchav commented Jan 3, 2026

Uh oh!

Tmonster commented Jan 19, 2026

Uh oh!

mchav commented Jan 19, 2026

Uh oh!

Tmonster commented Jan 19, 2026

Uh oh!

mchav commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add Haskell implementation to benchmark #144

Are you sure you want to change the base?

Add Haskell implementation to benchmark #144

Uh oh!

Conversation

mchav commented Nov 20, 2025

Uh oh!

Tmonster left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mchav commented Jan 3, 2026

Uh oh!

Tmonster commented Jan 19, 2026

Uh oh!

mchav commented Jan 19, 2026

Uh oh!

Tmonster commented Jan 19, 2026

Uh oh!

mchav commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants