Skip to content

Possible UB in bytes_characters #106693

@pablogsal

Description

@pablogsal

in bytes_characters in https://github.com/python/cpython/blob/main/Include/internal/pycore_runtime_init.h we are doing this:

.bytes_characters = _Py_bytes_characters_INIT, \

where

#define _Py_bytes_characters_INIT { \
    _PyBytes_CHAR_INIT(0), \
    _PyBytes_CHAR_INIT(1), \
    _PyBytes_CHAR_INIT(2), \
    _PyBytes_CHAR_INIT(3), \
    _PyBytes_CHAR_INIT(4), \
    _PyBytes_CHAR_INIT(5), \
    _PyBytes_CHAR_INIT(6), \
    _PyBytes_CHAR_INIT(7), \
    _PyBytes_CHAR_INIT(8), \
    _PyBytes_CHAR_INIT(9), \
...
    _PyBytes_CHAR_INIT(255), \

with

#define _PyBytes_CHAR_INIT(CH) \
    { \
        _PyBytes_SIMPLE_INIT((CH), 1) \
    }

#define _PyBytes_SIMPLE_INIT(CH, LEN) \
    { \
        _PyVarObject_HEAD_INIT(&PyBytes_Type, (LEN)) \
        .ob_shash = -1, \
        .ob_sval = { (CH) }, \
    }

but

struct {
PyBytesObject ob;
char eos;
} bytes_characters[256];

and

typedef struct {
PyObject_VAR_HEAD
Py_DEPRECATED(3.11) Py_hash_t ob_shash;
char ob_sval[1];
/* Invariants:
* ob_sval contains space for 'ob_size+1' elements.
* ob_sval[ob_size] == 0.
* ob_shash is the hash of the byte string or -1 if not computed yet.
*/
} PyBytesObject;

I think this is UB because when we are basically assigning 255 to:

https://github.com/python/cpython/blob/e4b88c1e4ac129b36f99a534387d64f7b8cda8ef/Include/cpython/bytesobject.h#L8C10-L8C17

and that should be a unsigned char because the range of char can be 0 to 255 or -128 to 127 depending on the platform.

Linked PRs

Metadata

Metadata

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions