-
-
Couldn't load subscription status.
- Fork 33.2k
Description
When adding two multidigit (in the PyLong sense of "digit") ints a and b with matching signs, we currently allocate max(size_a, size_b) + 1 digits for the result. But in "most" cases (see below), we only need max(size_a, size_b) digits, so we're allocating more space than we need. The same applies to subtraction of multidigit ints with opposite signs, which ends up in the same codepath.
The effect on RAM usage is complicated by the fact that memory allocations are (typically, on a 64-bit platform) aligned to multiples of 16 bytes. So for example if max(size_a, size_b) is 3, then the overallocation costs nothing: assuming a typical 64-bit machine, it causes us to ask for 40 bytes instead of 36 (refcount + type pointer + size field = 24 bytes; add 4 bytes per digit), and both those values round up to 48. But if max(size_a, size_b) is 2 then that same alignment means that we end up allocating 48 bytes of RAM instead of 32.
For that use of "most" above: in the case that size_a == size_b, if we were to assume that the top digits of a and b were independent of one another and uniformly distributed in [1, PyLong_BASE), we'd be overallocating around 50% of the time. But that's a bad assumption; a more realistic model would be something along the lines of Benford's law, where the probability of the top digit having value d is log(1+(1/d)) / log(PyLong_BASE); under that model, the extra digit is needed less than 0.2% of the time, so the current code ends up overallocating a touch over 99.8% of the time. The "true" model (if such a thing exists) is likely somewhere between the two extremes.
In the case that size_a != size_b, the extra digit is almost always unnecessary.
Proposed change: if we were to check the sum of the topmost digits of a and b before entering the main loop in x_add, the vast majority of cases of overallocation could be avoided: we'd only end up overallocating in some of the (negligibly rare) cases where that sum was exactly PyLong_BASE - 1.
I'll open a PR and post some benchmarks shortly.