Skip to content

Conversation

@mchav
Copy link
Collaborator

@mchav mchav commented Jan 22, 2026

There were a few sources of memory inefficiency in the encode function:

  • converting everything to an intermediate list rather than working with the vector.
  • creating a builder, converting it to a bytestring, just to make it a builder again.

This PR removes those allocations and also takes inspiration on efficient bytestring building from hsthrift.

Benchmarks: 20% improvement in encoding.

had to fix the benchmark to actually be meaningful for encode since it was always returning immediately. Also augmented the presidents dataset by replicating the columns until we had 20k of them to make sure the improvements weren't noise.

Before:

benchmarking positional/encode/presidents/with conversion
time                 15.21 ms   (14.52 ms .. 15.82 ms)
                     0.989 R²   (0.983 R² .. 0.995 R²)
mean                 13.57 ms   (13.26 ms .. 14.00 ms)
std dev              943.0 μs   (723.5 μs .. 1.047 ms)
variance introduced by outliers: 33% (moderately inflated)

benchmarking named/encode/presidents/with conversion
time                 22.10 ms   (21.98 ms .. 22.22 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 22.15 ms   (22.04 ms .. 22.29 ms)
std dev              280.2 μs   (204.9 μs .. 370.6 μs)

After:

benchmarking positional/encode/presidents/with conversion
time                 13.13 ms   (13.07 ms .. 13.23 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 13.14 ms   (13.10 ms .. 13.18 ms)
std dev              110.9 μs   (83.62 μs .. 159.7 μs)

benchmarking named/encode/presidents/with conversion
time                 19.80 ms   (19.62 ms .. 19.95 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 20.15 ms   (20.02 ms .. 20.47 ms)
std dev              416.2 μs   (173.1 μs .. 772.1 μs)

Copy link
Member

@andreasabel andreasabel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the CI failures can be ignored. nightly is currently broken on Stackage as it uses compiler GHC 9.12.3 which is buggy and thus not supported by GHCup.

@mchav
Copy link
Collaborator Author

mchav commented Jan 22, 2026

I'll try and fix that.

I.e remove it from ci. In a follow up change of course.

@andreasabel
Copy link
Member

andreasabel commented Jan 22, 2026

I'll try and fix that.

I.e remove it from ci. In a follow up change of course.

I guess the best fix would be to pin nightly there when it still was GHC-9.12.2, that would be nightly-2025-12-30, see https://www.stackage.org/snapshots .

@mchav mchav merged commit 8b83e06 into haskell-hvr:master Jan 22, 2026
21 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants