Skip to content

Comments

Optimize Integer <-> ByteString conversions#7439

Merged
zliu41 merged 4 commits intoIntersectMBO:masterfrom
mlabs-haskell:koz/optimize-cip121
Dec 15, 2025
Merged

Optimize Integer <-> ByteString conversions#7439
zliu41 merged 4 commits intoIntersectMBO:masterfrom
mlabs-haskell:koz/optimize-cip121

Conversation

@kozross
Copy link
Contributor

@kozross kozross commented Nov 20, 2025

This optimizes the conversions between Integer and ByteString, as we can now rely on Integer internals, which lets us use direct copying instead of digit-by-digit extraction and reconstruction.

I did not changelog this, as this change is designed to be invisible at the API level. This will require these operations to be re-costed, as this change means we should see an algorithmic improvement (from $\Theta(n^2)$ to $\Theta(n)$ ).

@basetunnel basetunnel requested a review from kwxm November 21, 2025 13:00
@zliu41 zliu41 requested a review from a team December 5, 2025 01:44
Copy link
Member

@zliu41 zliu41 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite low level stuff, but I've gone through the logic and it all checks out.

LittleEndian -> pure ()
BigEndian -> reverseBuffer ptr desiredLength
goSmallLE :: Ptr Word8 -> Int -> Int# -> IO ()
goSmallLE ptr offset remaining#
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a bang to offset? Ditto for all other Int arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, no idea how I missed that.

copyByteArrayToAddr ptr (ByteArray ba#) 0 minLength
case requestedByteOrder of
LittleEndian -> pure ()
BigEndian -> reverseBuffer ptr desiredLength
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea how much overhead reverseBuffer has?

Copy link
Contributor Author

@kozross kozross Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The short answer: not very much.

The slightly longer answer: In the 'big endian' case, we have to do a reverse copy. We can do this in one of two ways:

  • Manually perform a reverse copy by reading bytes from the source, then writing them to the destination with flipped indices; or
  • Copy normally, then reverse in-place (which is what I chose to do).

A reverse copy is always going to have overheads compared to copyByteArrayToAddr, for several reasons. Some of these are inherent: the way caches work on current-day CPUs means that, in general, traversing an array in ascending order of indices will be much faster than in descending order, for example. However, in our case, the main reason is that copyByteArrayToAddr is implemented using memcpy, which the runtime calls without an FFI penalty. Among other things (also for reasons of caching), for arrays that aren't significantly larger than a memory page (8KiB for all the platforms we care about), copyByteArrayToAddr might as well be a constant-time operation. Nothing I could implement would even come close to that.

Reversing in-place, while still carrying a penalty, is less bad, for two reasons:

  • The number of loop iterations we must perform to reverse an array in-place is only half of what would be required to do a reverse copy. In line with what I mentioned above with regard to copyByteArrayToAddr's performance, we're actually winning as a result.
  • The implementation, especially when accelerated with loop sectioning, for an in-place reversal is a lot less confusing than for a reverse copy: reverse copies require 'reading behind yourself', which is written weirdly, and if you want to loop section, you have to worry about byte order as well.

Unsurprisingly, I found reversing in-place was slightly faster, or at least no worse at the upper end of what I benchmarked with. When compared with the 'little endian' case, it's about a factor of 3 overhead at worst.

@kwxm kwxm added the No Changelog Required Add this to skip the Changelog Check label Dec 15, 2025
@kwxm
Copy link
Contributor

kwxm commented Dec 15, 2025

I don't understand all of the details yet, but I'm going to merge this to get it out of the way and update the costs. I'll come back with a pencil and paper and check the logic of the conversions after that.

Copy link
Contributor

@kwxm kwxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to come back and work through the details, but let's merge it.

@kozross
Copy link
Contributor Author

kozross commented Dec 15, 2025

@kwxm - if you need any assistance understanding what's going on here, I'm happy to explain.

@zliu41 zliu41 merged commit ffc6471 into IntersectMBO:master Dec 15, 2025
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Builtins No Changelog Required Add this to skip the Changelog Check Performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants