Skip to content

Conversation

@TerrorJack
Copy link

This patch implements fast Get logic for integral types based on:

  • Use a single load operation when loading with same endianness of the
    host, otherwise do a host load and a byteSwap. This avoids the
    overhead of multiple single-byte loads in the previous
    implementation.
  • Use the unaligned Addr# load/store primops added since GHC 9.10 when
    available, otherwise do a plain peek. This ensures the GHC backends
    see the right AlignmentSpec at the Cmm level and can correctly emit
    unaligned load instructions.

There's no need for changing Put logic they're backed by FixedPrim
logic in Data.ByteString.Builder.Prim.Binary that already does
similar optimization.

Closes #215.

bgamari and others added 2 commits January 28, 2025 11:33
This patch implements fast `Get` logic for integral types based on:

- Use a single load operation when loading with same endianness of the
  host, otherwise do a host load and a byteSwap. This avoids the
  overhead of multiple single-byte loads in the previous
  implementation.
- Use the unaligned Addr# load/store primops added since GHC 9.10 when
  available, otherwise do a plain peek. This ensures the GHC backends
  see the right AlignmentSpec at the Cmm level and can correctly emit
  unaligned load instructions.

There's no need for changing `Put` logic they're backed by `FixedPrim`
logic in `Data.ByteString.Builder.Prim.Binary` that already does
similar optimization.
Copy link
Contributor

@Bodigrim Bodigrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm not a maintainer here)


name: binary
version: 0.8.9.2
version: 0.8.9.3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fourth digit is for packaging patches and such. Substantial implementation changes warrant the third digit (to allow downstream to distinguish with MIN_VERSION_binary(x,y,z).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this commit is introduced from the ghc gitlab mirror's master branch which already is ahead of this repo. i'm fine with a bump though imo it's better left in a future patch before we push another hackage release

(fromIntegral (s `B.unsafeIndex` 1))
{-# INLINE[2] getWord16be #-}
{-# INLINE word16be #-}
#if defined(WORDS_BIGENDIAN)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it feasible to add a s390x job to CI? See https://github.com/haskell/bytestring/blob/master/.github/workflows/ci.yml#L121 for instance. Otherwise #if defined(WORDS_BIGENDIAN) tends to bit rot really quickly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that'll be an extra source of flakiness before https://gitlab.haskell.org/ghc/ghc/-/issues/25541 is sorted out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use unaligned Addr# primops for performant loads/stores?

3 participants