Skip to content

Align default empty buffers of bytearray and array.arrayย #140557

@jakelishman

Description

@jakelishman

Feature or enhancement

Proposal:

At the moment, the Py_buffer view onto the empty bytearray() or array.array(typecode) has arbitrary alignment. This is fine in C, since the buffers are length 0. I found on Linux x86-64 with at least CPython 3.14 and building 3.15a1 from source, the buffer backing bytearray is reliably aligned on an odd pointer. E.g., given the script code

import array
from ctypes import *

class py_buffer(Structure):
    _fields_ = [
        ("buf", c_void_p),
        ("obj", py_object),
        ("len", c_ssize_t),
        ("itemsize", c_ssize_t),
        ("readonly", c_int),
        ("ndim", c_int),
        ("format", c_char_p),
        ("shape", POINTER(c_ssize_t)),
        ("strides", POINTER(c_ssize_t)),
        ("suboffsets", POINTER(c_ssize_t)),
        ("internal", c_void_p),
    ]

def buf_ptr(buffer):
    buf = py_buffer()
    assert not pythonapi.PyObject_GetBuffer(py_object(buffer), byref(buf), 0)
    ptr = buf.buf
    pythonapi.PyBuffer_Release(byref(buf))
    return ptr

ptr = buf_ptr(bytearray())
print(f"bytearray()     : {ptr:#x}")
ptr = buf_ptr(bytes())
print(f"bytes()         : {ptr:#x}")
for t in "bBwhHiIlLqQfd":
    ptr = buf_ptr(array.array(t))
    print(f"array.array('{t}'): {ptr:#x}")

I got, for example

bytearray()     : 0x586d06546869
bytes()         : 0x586d064fe218
array.array('b'): 0x7ba46a224b99
array.array('B'): 0x7ba46a224b99
array.array('w'): 0x7ba46a224b99
array.array('h'): 0x7ba46a224b99
array.array('H'): 0x7ba46a224b99
array.array('i'): 0x7ba46a224b99
array.array('I'): 0x7ba46a224b99
array.array('l'): 0x7ba46a224b99
array.array('L'): 0x7ba46a224b99
array.array('q'): 0x7ba46a224b99
array.array('Q'): 0x7ba46a224b99
array.array('f'): 0x7ba46a224b99
array.array('d'): 0x7ba46a224b99

Some other languages (Rust, for example) have stricter alignment requirements to produce slice-like object from a raw pointer, even if the number of elements you're allowed to read from it is 0. Concretely in Rust, any &[T] has to be backed by a non-null pointer that's aligned for type T, even if the slice is empty.

It would be convenient for interop if the default empty buffers of bytearray and array.array are aligned for any type; so this fast-path default case doesn't need special handling when taking out direct views onto the data buffer.

I came across this in numpy/numpy#30062, where the Pickle protocol-5 implementation for an empty array causes an empty bytearray to be loaded from the pickle stream (assuming inline buffers) and directly loaded to back the Numpy array, which was detectable in a Rust extension module I own.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    extension-modulesC modules in the Modules dirinterpreter-core(Objects, Python, Grammar, and Parser dirs)type-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions