-
-
Notifications
You must be signed in to change notification settings - Fork 33.4k
Description
Problem
Pure Python __len__ methods can return values larger than sys.maxsize:
from sys import maxsize
class A:
def __len__(self):
return maxsize * 2
>>> A().__len__()
18446744073709551614
However, the builtin len() function unnecessarily fails:
>>> len(A())
Traceback (most recent call last):
...
OverflowError: cannot fit 'int' into an index-sized integer
The builtin range() type added support for ranges larger than sys.maxsize. Larger indices work; negative indices work; forward iteration works; reverse iteration works; and access to the attributes work:
>>> s = range(maxsize*10)
>>> s[maxsize * 2]
18446744073709551614
>>> s[-1]
92233720368547758069
>>> next(iter(s))
0
>>> next(reversed(s))
92233720368547758069
>>> s.start, s.stop, s.step
(0, 92233720368547758070, 1)
However, len() unnecessarily fails:
len(s)
Traceback (most recent call last):
...
Error: Python int too large to convert to C ssize_t
The random.sample() and random.choice() functions both depend on the builtin len() function, so they unnecessarily fail when used with large range objects or with large user defined sequence objects. Users have reported this issue on multiple occasions. We closed those issues because there was no practical way to fix them short of repairing the builtin len() function:
>>> import random
>>> random.choice(range(maxsize * 5))
Traceback (most recent call last):
...
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/random.py", line 371, in choice
return seq[self._randbelow(len(seq))]
OverflowError: Python int too large to convert to C ssize_t
>>> random.sample(range(maxsize * 5), k=10)
Traceback (most recent call last):
...
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/random.py", line 438, in sample
n = len(population)
OverflowError: Python int too large to convert to C ssize_t
Proposal
Make the builtin len() function smarter. Let it continue to first try the C PyObject_Size() function which is restricted to Py_ssize_t. Now add two new fallbacks, one for range objects and the other for calling the __len__ method though the C API allowing arbitrary objects to be returned.
Rough sketch:
def builtin_len(obj):
try:
ssize_t_int = PyObject_Size(obj)
return PyLong_FromSsize_t(ssize_t_int)
except OverflowError:
pass
if isinstance(obj, type(range)):
start, stop, step = obj.start, obj.stop, obj.step
assert step != 0
if step > 0:
return (stop - start + step - 1) // step
return (start - stop - step - 1) // -step
return PyObject_CallMethod(obj, '__len__', NULL)
Bug or Feature Request
Traditionally, extending support for sizes beyond Py_ssize_t has been considered a new feature, range() and itertools.count() for example.
In this case though, arguably it is a bug because the range() support was only 90% complete, leaving off the ability to call len(). Also it could be considered a bug because users could always write a __len__ method returning values larger than Py_ssize_t and could access that value with obj.__len__ but the len() function inexplicably failed due to an unnecessary and implementation dependent range restriction.
Other other thought: maxsize varies across builds, so it is easily possible to get code tested and working on one Python and have it fail on another. All 32-bits builds are affected and all Windows builds.
It would be easy for us to remove the artificial limitation for range objects and for objects that define __len__ directly rather than through sq_length or mp_length. That includes all pure Python classes and any C classes that want to support large lengths.