perf: implement fast Get for integral types #216

TerrorJack · 2025-11-02T14:23:40Z

This patch implements fast Get logic for integral types based on:

Use a single load operation when loading with same endianness of the
host, otherwise do a host load and a byteSwap. This avoids the
overhead of multiple single-byte loads in the previous
implementation.
Use the unaligned Addr# load/store primops added since GHC 9.10 when
available, otherwise do a plain peek. This ensures the GHC backends
see the right AlignmentSpec at the Cmm level and can correctly emit
unaligned load instructions.

There's no need for changing Put logic they're backed by FixedPrim
logic in Data.ByteString.Builder.Prim.Binary that already does
similar optimization.

Closes #215.

This patch implements fast `Get` logic for integral types based on: - Use a single load operation when loading with same endianness of the host, otherwise do a host load and a byteSwap. This avoids the overhead of multiple single-byte loads in the previous implementation. - Use the unaligned Addr# load/store primops added since GHC 9.10 when available, otherwise do a plain peek. This ensures the GHC backends see the right AlignmentSpec at the Cmm level and can correctly emit unaligned load instructions. There's no need for changing `Put` logic they're backed by `FixedPrim` logic in `Data.ByteString.Builder.Prim.Binary` that already does similar optimization.

Bodigrim

(I'm not a maintainer here)

Bodigrim · 2025-11-02T15:19:40Z

binary.cabal


 name:            binary
-version:         0.8.9.2
+version:         0.8.9.3


The fourth digit is for packaging patches and such. Substantial implementation changes warrant the third digit (to allow downstream to distinguish with MIN_VERSION_binary(x,y,z).

this commit is introduced from the ghc gitlab mirror's master branch which already is ahead of this repo. i'm fine with a bump though imo it's better left in a future patch before we push another hackage release

Bodigrim · 2025-11-02T15:23:44Z

src/Data/Binary/Get.hs

-        (fromIntegral (s `B.unsafeIndex` 1))
-{-# INLINE[2] getWord16be #-}
-{-# INLINE word16be #-}
+#if defined(WORDS_BIGENDIAN)


Is it feasible to add a s390x job to CI? See https://github.com/haskell/bytestring/blob/master/.github/workflows/ci.yml#L121 for instance. Otherwise #if defined(WORDS_BIGENDIAN) tends to bit rot really quickly.

that'll be an extra source of flakiness before https://gitlab.haskell.org/ghc/ghc/-/issues/25541 is sorted out

bgamari and others added 2 commits January 28, 2025 11:33

Release 0.8.9.3

a625eee

Bodigrim reviewed Nov 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: implement fast Get for integral types #216

perf: implement fast Get for integral types #216

Uh oh!

TerrorJack commented Nov 2, 2025

Uh oh!

Bodigrim left a comment

Uh oh!

Bodigrim Nov 2, 2025

Uh oh!

TerrorJack Nov 2, 2025

Uh oh!

Bodigrim Nov 2, 2025

Uh oh!

TerrorJack Nov 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

perf: implement fast Get for integral types #216

Are you sure you want to change the base?

perf: implement fast Get for integral types #216

Uh oh!

Conversation

TerrorJack commented Nov 2, 2025

Uh oh!

Bodigrim left a comment

Choose a reason for hiding this comment

Uh oh!

Bodigrim Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

TerrorJack Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Bodigrim Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

TerrorJack Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants