Consider stack-allocated `PyBuffer` variant

## Summary

`PyBuffer` [heap-allocates](https://github.com/PyO3/pyo3/blob/f0c1523b69fd86f2a0e263d1b5830335912f574a/src/buffer.rs#L484) the `Py_buffer` struct via `Box<RawPyBuffer>`, which adds significant overhead for short-lived buffer access patterns. In profiling a non-cryptographic hash function, `PyBuffer::get` accounts for ~850ns per call, with `malloc`/`free` alone contributing ~780ns. For a hash that completes in ~60ns on small inputs, this is a significant regression.

## Profiling

CodSpeed flamegraph comparing `&[u8]` (baseline) vs `PyBuffer<u8>`:

<img width="1547" height="353" alt="Image" src="https://github.com/user-attachments/assets/484fc9ad-0fae-48cf-8f5b-b647e36c7f08" />
<br></br>

| Source | Self Time |
|---|---|
| `<pyo3::buffer::PyBuffer<u8>>::get` | 507ns |
| `malloc` | 642ns |
| `free` | 143ns |
| `new_uninit<pyo3::buffer::RawPyBuffer>` | 89ns |

Replacing `PyBuffer<u8>` with a raw FFI stack-allocated `Py_buffer` + `PyBUF_SIMPLE` eliminated the regression almost entirely (~9ns overhead vs ~850ns).

Current: ~850ns overhead per call

```rs
fn hash(&self, data: PyBuffer<u8>) -> u128 {
    hasher(data.as_bytes(), self.seed)
}
```

Raw FFI workaround: ~9ns overhead per call

```rs
fn hash(&self, data: Bound<'_, PyAny>) -> 
    let mut view = std::mem::MaybeUninit::<pyo3::ffi::Py_buffer>::uninit();

    unsafe {
        // err validation is omitted here
        pyo3::ffi::PyObject_GetBuffer(data.as_ptr(), view.as_mut_ptr(), pyo3::ffi::PyBUF_SIMPLE);
        let mut view = view.assume_init();

        let result = hasher(
            std::slice::from_raw_parts(view.buf as *const u8, view.len as usize),
            self.seed,
        );

        pyo3::ffi::PyBuffer_Release(&mut view);
        result
    }
}
```

Flamegraph after implementing the workaround

<img width="1533" height="382" alt="Image" src="https://github.com/user-attachments/assets/4312208c-234f-4610-8ebf-9335220dc1d9" />

## Proposal

A stack-allocated buffer type for the common "oneshot" pattern.

### Scoped closure

```rust
PyBuffer::with(obj, PyBUF_SIMPLE, |buf, len| {
    // Py_buffer on the stack, released when closure returns
})?;
```

### Non-Send stack type

```rust
let buffer = PyBufferRef::get(obj)?; // stack-allocated + non-Send
let slice = buffer.as_bytes();
// PyBuffer_Release called on drop
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider stack-allocated `PyBuffer` variant #5836

Summary

Profiling

Proposal

Scoped closure

Non-Send stack type

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Source	Self Time
`<pyo3::buffer::PyBuffer<u8>>::get`	507ns
`malloc`	642ns
`free`	143ns
`new_uninit<pyo3::buffer::RawPyBuffer>`	89ns

Consider stack-allocated PyBuffer variant #5836

Description

Summary

Profiling

Proposal

Scoped closure

Non-Send stack type

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Consider stack-allocated `PyBuffer` variant #5836