-
Notifications
You must be signed in to change notification settings - Fork 68
perf: implement fast Get for integral types #216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This patch implements fast `Get` logic for integral types based on: - Use a single load operation when loading with same endianness of the host, otherwise do a host load and a byteSwap. This avoids the overhead of multiple single-byte loads in the previous implementation. - Use the unaligned Addr# load/store primops added since GHC 9.10 when available, otherwise do a plain peek. This ensures the GHC backends see the right AlignmentSpec at the Cmm level and can correctly emit unaligned load instructions. There's no need for changing `Put` logic they're backed by `FixedPrim` logic in `Data.ByteString.Builder.Prim.Binary` that already does similar optimization.
Bodigrim
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'm not a maintainer here)
|
|
||
| name: binary | ||
| version: 0.8.9.2 | ||
| version: 0.8.9.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fourth digit is for packaging patches and such. Substantial implementation changes warrant the third digit (to allow downstream to distinguish with MIN_VERSION_binary(x,y,z).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this commit is introduced from the ghc gitlab mirror's master branch which already is ahead of this repo. i'm fine with a bump though imo it's better left in a future patch before we push another hackage release
| (fromIntegral (s `B.unsafeIndex` 1)) | ||
| {-# INLINE[2] getWord16be #-} | ||
| {-# INLINE word16be #-} | ||
| #if defined(WORDS_BIGENDIAN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it feasible to add a s390x job to CI? See https://github.com/haskell/bytestring/blob/master/.github/workflows/ci.yml#L121 for instance. Otherwise #if defined(WORDS_BIGENDIAN) tends to bit rot really quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that'll be an extra source of flakiness before https://gitlab.haskell.org/ghc/ghc/-/issues/25541 is sorted out
This patch implements fast
Getlogic for integral types based on:host, otherwise do a host load and a byteSwap. This avoids the
overhead of multiple single-byte loads in the previous
implementation.
available, otherwise do a plain peek. This ensures the GHC backends
see the right AlignmentSpec at the Cmm level and can correctly emit
unaligned load instructions.
There's no need for changing
Putlogic they're backed byFixedPrimlogic in
Data.ByteString.Builder.Prim.Binarythat already doessimilar optimization.
Closes #215.