Skip to content
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Speed up :class:`bytes` creation from :class:`list` and :class:`tuple` of integers. Benchmarks show that from a list with 1000000 random numbers the time to create a bytes object is reduced by around 31%, or 30% with 10000 numbers, or 27% with 100 numbers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have the pyperf benchmarks on the PR as well? (namely, the nice table with two columns and the diffs as well as the benchmark script? thanks)


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, NEWS should not contain an empty line.

Patch by Ben Hsing
101 changes: 33 additions & 68 deletions Objects/bytesobject.c
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
#include "pycore_bytesobject.h" // _PyBytes_Find(), _PyBytes_Repeat()
#include "pycore_call.h" // _PyObject_CallNoArgs()
#include "pycore_ceval.h" // _PyEval_GetBuiltin()
#include "pycore_critical_section.h" // Py_BEGIN_CRITICAL_SECTION_SEQUENCE_FAST()
#include "pycore_format.h" // F_LJUST
#include "pycore_global_objects.h"// _Py_GET_GLOBAL_OBJECT()
#include "pycore_initconfig.h" // _PyStatus_OK()
Expand Down Expand Up @@ -2810,82 +2811,44 @@
}

static PyObject*
_PyBytes_FromList(PyObject *x)
_PyBytes_FromSequence(PyObject *x)
{
Py_ssize_t i, size = PyList_GET_SIZE(x);
Py_ssize_t value;
char *str;
PyObject *item;
_PyBytesWriter writer;

_PyBytesWriter_Init(&writer);
str = _PyBytesWriter_Alloc(&writer, size);
if (str == NULL)
Py_ssize_t size = PySequence_Fast_GET_SIZE(x);
PyObject *bytes = _PyBytes_FromSize(size, 0);
if (bytes == NULL) {
return NULL;
writer.overallocate = 1;
size = writer.allocated;

for (i = 0; i < PyList_GET_SIZE(x); i++) {
item = PyList_GET_ITEM(x, i);
Py_INCREF(item);
value = PyNumber_AsSsize_t(item, NULL);
Py_DECREF(item);
if (value == -1 && PyErr_Occurred())
goto error;

if (value < 0 || value >= 256) {
PyErr_SetString(PyExc_ValueError,
"bytes must be in range(0, 256)");
goto error;
}

if (i >= size) {
str = _PyBytesWriter_Resize(&writer, str, size+1);
if (str == NULL)
return NULL;
size = writer.allocated;
}
*str++ = (char) value;
}
return _PyBytesWriter_Finish(&writer, str);

error:
_PyBytesWriter_Dealloc(&writer);
return NULL;
}

static PyObject*
_PyBytes_FromTuple(PyObject *x)
{
PyObject *bytes;
Py_ssize_t i, size = PyTuple_GET_SIZE(x);
Py_ssize_t value;
char *str;
PyObject *item;

bytes = PyBytes_FromStringAndSize(NULL, size);
if (bytes == NULL)
return NULL;
str = ((PyBytesObject *)bytes)->ob_sval;

for (i = 0; i < size; i++) {
item = PyTuple_GET_ITEM(x, i);
value = PyNumber_AsSsize_t(item, NULL);
if (value == -1 && PyErr_Occurred())
char *str = PyBytes_AS_STRING(bytes);
PyObject *const *items = PySequence_Fast_ITEMS(x);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going for performance, then we can do even better. PySequence_Fast_ITEMS will call PyList_Check, but we already know that it's exactly a list or tuple here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But all we know is that it is either a list or a tuple, but we still don't know which of the two it is, so a PyList_Check call is still in order.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but PyList_Check is extra work because that will check for a subclass. It's not really going to be noticable, but it's something to think about :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, you can just duplicate the functions as we did previously. This is not really an issue IMO, and we couold use the fact that tuples are immutables for instance to avoid INCREF/DECREF values.

Py_BEGIN_CRITICAL_SECTION_SEQUENCE_FAST(x);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to acquire the critical section before calling PySequence_Fast_ITEMS and PySequence_Fast_GET_SIZE. Attempting to read mutable data without a lock isn't thread safe.

Comment on lines +2822 to +2823
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
PyObject *const *items = PySequence_Fast_ITEMS(x);
Py_BEGIN_CRITICAL_SECTION_SEQUENCE_FAST(x);
Py_BEGIN_CRITICAL_SECTION_SEQUENCE_FAST(x);
PyObject *const *items = PySequence_Fast_ITEMS(x);

for (Py_ssize_t i = 0; i < size; i++) {
if (!PyLong_Check(items[i])) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyLong_CheckExact probably fits better here, and is faster!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The previous code wasn't doing an exact check so we shouldn't change it)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it wouldn't be a breaking change, just a performance loss for the (very niche!) set of cases that use special ints. I'm also slightly worried that non-exact ints might have some nasty side effects that we aren't anticipating here (e.g. can they mess with the Py_ssize_t value?)

Copy link
Member

@picnixz picnixz Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can mess with Py_ssize_t values but only through __index__ and we already check that with PyNumber_AsSsize_t, but the problem is wider. For example, np.int32() values can be passed to bytes([...]) but without this check, we will impact performances of numpy-related code which is something we don't want to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is casting to bytes a common thing to do with numpy integers, or is that speculation? (I see your point, I'm just sort of gauging what the cost-benefit would be here.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say yes, if we're considering serialization or introspection. I imagine that we can have such things when people are working with images because their arrays won't necessarily be pure Python lists but np.ndarray objects which may have non-primitive data types. I'm not aware of a wild opensource usage though but IMO, since we want to improve performance overall, we shouldn't penalize existing users. If we can slightly decrease our performance gain so to have a stable result, it's good.

Nonetheless, if we want to fallback to a slow path for non-exact ints, benchmarks should show how performances are impacted (a simple timeit could be sufficient but I'd be more comfortable with a much precise benchmark).

Py_DECREF(bytes);
/* Py_None as a fallback sentinel to the slow path */
bytes = Py_None;
goto done;
}
int overflow;
long value = PyLong_AsLongAndOverflow(items[i], &overflow);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're still assuming a long object,. we can just use PyLong_AsInt instead. We don't care about an overflow as we're only interested in values in [0, 255].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I copied this code from bytearray but you're absolutely right that PyLong_AsInt is a much better fit here.

Copy link
Contributor Author

@blhsing blhsing Dec 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An existing test with -sys.maxsize failed with PyLong_AsInt. A fix would be to check for OverflowError and reraise ValueError instead, but I think it's easier to simply revert to PyLong_AsLongAndOverflow and let the range check that follows raise ValueError when out of the byte range.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, we used PyNumber_AsSsize_t (which invokes _PyNumber_Index, hence possibly creating side effects). This will be a small behavioral change. While I understand Guido's comment, I'm wondering whether we should keep the old behaviour (though I don't know how it could be useful in production, namely making bytes() have side-effects on lists if one of their element being converted invokes __index__).

Note that the current code also prevented crashes by temporarily increfing item before calling PyNumber_AsSsize_t.

Copy link
Contributor Author

@blhsing blhsing Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this PR is not supposed to be about altering behaviors, I'll keep the current behavior for this PR and file a separate bug against the current behavior then. It should be considered a bug because like Guido pointed out, a list can potentially be mutated inside __index__, resulting in the item copy loop accessing freed memory. Despite the unlikelihood that such code exists in the real world, it can still potentially happen and cause a crash.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have a crash because the list size is checked at every iteration (though what should be checked perhaps is that the list pointer is not NULL). We incref the item before calling __index__ on it so it shouldn't cause crashes. INCREFing before is a trick you also use for use-after-free issues and evil mutations (but we can easily check if this is crashing or not as follows:

class EvilInt:
	def __index__(self):
		x.clear()
		return 0

x = [1,2, EvilInt(), 4]
bytes(x)

and this does not crash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh now I see what INCREF and DECREF are doing there. I'll add them back in a bit too then. Thanks.

if (value == -1 && PyErr_Occurred()) {
goto error;

}
if (value < 0 || value >= 256) {
/* this includes an overflow in converting to C long */
PyErr_SetString(PyExc_ValueError,
"bytes must be in range(0, 256)");
goto error;
}
*str++ = (char) value;
}
return bytes;

goto done;
error:
Py_DECREF(bytes);
return NULL;
bytes = NULL;
done:

Check failure on line 2848 in Objects/bytesobject.c

View workflow job for this annotation

GitHub Actions / Address sanitizer (ubuntu-24.04)

label at end of compound statement
/* both success and failure need to end the critical section */
Py_END_CRITICAL_SECTION_SEQUENCE_FAST();

Check failure on line 2850 in Objects/bytesobject.c

View workflow job for this annotation

GitHub Actions / Windows / build and test (arm64)

syntax error: missing ';' before '}' [D:\a\cpython\cpython\PCbuild\_freeze_module.vcxproj]

Check failure on line 2850 in Objects/bytesobject.c

View workflow job for this annotation

GitHub Actions / Windows (free-threading) / build and test (arm64)

syntax error: missing ';' before '}' [D:\a\cpython\cpython\PCbuild\_freeze_module.vcxproj]

Check failure on line 2850 in Objects/bytesobject.c

View workflow job for this annotation

GitHub Actions / Windows / build and test (x64)

syntax error: missing ';' before '}' [D:\a\cpython\cpython\PCbuild\_freeze_module.vcxproj]

Check failure on line 2850 in Objects/bytesobject.c

View workflow job for this annotation

GitHub Actions / Windows (free-threading) / build and test (x64)

syntax error: missing ';' before '}' [D:\a\cpython\cpython\PCbuild\_freeze_module.vcxproj]
return bytes;
}

static PyObject *
Expand Down Expand Up @@ -2968,11 +2931,13 @@
if (PyObject_CheckBuffer(x))
return _PyBytes_FromBuffer(x);

if (PyList_CheckExact(x))
return _PyBytes_FromList(x);

if (PyTuple_CheckExact(x))
return _PyBytes_FromTuple(x);
if (PyList_CheckExact(x) || PyTuple_CheckExact(x)) {
PyObject *bytes = _PyBytes_FromSequence(x);
/* Py_None as a fallback sentinel to the slow path */
if (bytes != Py_None) {
return bytes;
}
}

if (!PyUnicode_Check(x)) {
it = PyObject_GetIter(x);
Expand Down
Loading