`__len__` documentation should state size restriction

The [documentation for the `__len__` magic method](https://github.com/PyO3/pyo3/blob/main/guide/src/class/protocols.md#mapping--sequence-types) says the method must return a `usize`, and indeed PyO3 enforces that restriction; however, the [CPython type is `Py_ssize_t` and further restricted to a non-negative value](https://docs.python.org/3/reference/datamodel.html#object.__len__), so on a 64-bit machine, `__len__` must return a value in $[0, 2^{63}-1]$. Though it's documented in Python's data model, I think it's worth adding that note to PyO3's docs, since the `usize` explanation (and the introductory notes) seem to imply $[2^{64}, 2^{64}-1]$ are OK.

This came up for me in a "virtual sequence" for which the results of `__len__` and `__getitem__` are calculated on-demand using values supplied when constructing the object.

The [conversion code](https://github.com/PyO3/pyo3/blob/86c48d3ece40f5c2c9a3cbe338c4b03b4bdbe63e/src/impl_/callback.rs#L130) does check for out-of-bound values and reports `OverflowError` in cases that Python would reject, but it is confusing to track down the source. In the example below, note the equivalent Python error message has a bit more information about the cause, so perhaps it'd be worth matching that message.

Alternatively, you could require `__len__` return `isize` and document that Python requires it be non-negative, but that would be a breaking change for a use that probably doesn't impact most users. The advantage there is that people are probably less likely to try returning a negative value, given the semantics of the method.

----

Here is some example code that demonstrates the differences in error messages. The overall behavior seems that PyO3 enables the same behavior that one could get in Python; the mismatch is really between the Rust `usize` and the actual Python maximum `__len__`.

Rust-based extension:

```rust
use pyo3::{prelude::*, types::PySequence};

#[pyclass(sequence)]
struct MyRustSeq {
    length: usize,
}

#[pymethods]
impl MyRustSeq {
    #[new]
    fn __new__(length: usize) -> Self {
        Self { length }
    }

    fn __len__(&self) -> usize {
        self.length
    }

    #[allow(unused_variables)]
    fn __getitem__(&self, index: Bound<'_, PyAny>) -> i32 {
        42
    }
}

#[pyclass(sequence)]
struct WrapperSeq {
    inner: Py<PySequence>,
}

#[pymethods]
impl WrapperSeq {
    #[new]
    fn __new__(inner: Py<PySequence>) -> Self {
        Self { inner }
    }

    fn __len__<'py>(slf: PyRef<'_, Self>, py: Python<'py>) -> PyResult<usize> {
        py.import("builtins")?.getattr("len")?.call1((slf.inner.bind_borrowed(py),))?.extract()
    }

    #[allow(unused_variables)]
    fn __getitem__(&self, index: Bound<'_, PyAny>) -> i32 {
        42
    }
}

#[pymodule]
pub(crate) fn my_module(m: &Bound<'_, PyModule>) -> PyResult<()> {
    m.add_class::<MyRustSeq>()?;
    m.add_class::<WrapperSeq>()?;
    Ok(())
}
```

Python code and test cases (using `pytest`):

```python
from collections.abc import Sequence

import pytest

import my_module


class PySeq(Sequence):
    def __init__(self, length):
        self.length = length

    def __len__(self):
        return self.length

    def __getitem__(self, index):
        return 42


class TestRustSeq:
    def test_ok(self):
        s = my_module.MyRustSeq(pow(2, 63)-1)
        assert s[0] == s[pow(2, 128)] == s[-1] == 42
        assert len(s) == pow(2, 63)-1

    def test_too_big(self):
        s = my_module.MyRustSeq(pow(2, 63))
        assert s[0] == s[pow(2, 128)] == s[-1] == 42
        with pytest.raises(OverflowError, match=("^$")):
            assert len(s) == pow(2, 63)

    def test_neg_len(self):
        with pytest.raises(OverflowError, match=("^can't convert negative int to unsigned$")):
            _ = my_module.MyRustSeq(-1)


class TestPySeq:
    def test_ok(self):
        s = PySeq(pow(2, 63)-1)
        assert s[0] == s[pow(2, 128)] == s[-1] == 42
        assert len(s) == pow(2, 63)-1

    def test_too_big(self):
        s = PySeq(pow(2, 63))
        assert s[0] == s[pow(2, 128)] == s[-1] == 42
        with pytest.raises(OverflowError, match=("^cannot fit 'int' into an index-sized integer$")):
            assert len(s) == pow(2, 63)

    def test_neg_len(self):
        s = PySeq(-1)
        assert s[0] == s[pow(2, 128)] == s[-1] == 42
        with pytest.raises(ValueError, match=(r"^__len__ should return >= 0$")):
            assert len(s) == -1


class TestWrappedSeq:
    def test_ok(self):
        s = my_module.WrapperSeq(PySeq(pow(2, 63)-1))
        assert s[0] == s[pow(2, 128)] == s[-1] == 42
        assert len(s) == pow(2, 63)-1

    def test_too_big(self):
        s = my_module.WrapperSeq(PySeq(pow(2, 63)))
        assert s[0] == s[pow(2, 128)] == s[-1] == 42
        with pytest.raises(OverflowError, match=("^cannot fit 'int' into an index-sized integer$")):
            assert len(s) == pow(2, 63)

    def test_neg_len(self):
        s = my_module.WrapperSeq(PySeq(-1))
        assert s[0] == s[pow(2, 128)] == s[-1] == 42
        with pytest.raises(ValueError, match=(r"^__len__ should return >= 0$")):
            assert len(s) == -1
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`len` documentation should state size restriction #5843

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

__len__ documentation should state size restriction #5843

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`len` documentation should state size restriction #5843