Reduce frequency of memory overallocation in some int additions and subtractions

When adding two multidigit (in the `PyLong` sense of "digit") ints `a` and `b` with matching signs, we [currently allocate](https://github.com/python/cpython/blob/9dee9731663d670652586c929190f227ab56bd8f/Objects/longobject.c#L3397) `max(size_a, size_b) + 1` digits for the result. But in "most" cases (see below), we only need `max(size_a, size_b)` digits, so we're allocating more space than we need. The same applies to subtraction of multidigit ints with opposite signs, which ends up in the same codepath.

The effect on RAM usage is complicated by the fact that memory allocations are (typically, on a 64-bit platform) aligned to multiples of 16 bytes. So for example if `max(size_a, size_b)` is `3`, then the overallocation costs nothing: assuming a typical 64-bit machine, it causes us to ask for `40` bytes instead of `36` (refcount + type pointer + size field = 24 bytes; add 4 bytes per digit), and both those values round up to `48`. But if `max(size_a, size_b)` is `2` then that same alignment means that we end up allocating `48` bytes of RAM instead of `32`.

For that use of "most" above: in the case that `size_a == size_b`, if we were to assume that the top digits of `a` and `b` were independent of one another and uniformly distributed in `[1, PyLong_BASE)`, we'd be overallocating around 50% of the time. But that's a bad assumption; a more realistic model would be something along the lines of [Benford's law](https://en.wikipedia.org/wiki/Benford%27s_law), where the probability of the top digit having value `d` is `log(1+(1/d)) / log(PyLong_BASE)`; under that model, the extra digit is needed less than 0.2% of the time, so the current code ends up overallocating a touch over 99.8% of the time. The "true" model (if such a thing exists) is likely somewhere between the two extremes.

In the case that `size_a != size_b`, the extra digit is almost always unnecessary.

Proposed change: if we were to check the sum of the topmost digits of `a` and `b` before entering the main loop in `x_add`, the vast majority of cases of overallocation could be avoided: we'd only end up overallocating in some of the (negligibly rare) cases where that sum was exactly `PyLong_BASE - 1`.

I'll open a PR and post some benchmarks shortly.



### Linked PRs
* gh-100688

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Reduce frequency of memory overallocation in some int additions and subtractions #100687

Linked PRs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

Reduce frequency of memory overallocation in some int additions and subtractions #100687

Description

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions