Skip to content

Conversation

@vstinner
Copy link
Member

@vstinner vstinner commented Oct 11, 2025

PyObject* PyDict_FromItems(
    PyObject *const *keys,
    Py_ssize_t keys_offset,
    PyObject *const *values,
    Py_ssize_t values_offset,
    Py_ssize_t length)

📚 Documentation preview 📚: https://cpython-previews--139963.org.readthedocs.build/

@vstinner
Copy link
Member Author

cc @scoder @davidhewitt

@davidhewitt
Copy link
Contributor

davidhewitt commented Oct 11, 2025

For consuming from PyO3 / Rust, I can see this function being obviously useful for cases of small dictionaries with statically known keys (think producing things that look like TypedDict to Python code). Many functions produce such dictionaries and we could generate optimal code which creates arrays on the stack for each of keys and items before performing the insert.

I think for arbitrary-sized collections, it's probably the case (in Rust) that either:

  • it's easier to create a single array containing (Rust) tuples of key/value pairs, which the current private _PyDict_FromItems taking offsets of each of keys and values would be able to consume. (Maybe that could be a separate PyDict_FromItemsAndOffsets?)
  • or we just want to pass a hint to how large we expect the dictionary to be and then pass all the items in, e.g. to consume a Rust iterator (PyDictWriter? 😉)

@vstinner
Copy link
Member Author

it's easier to create a single array containing (Rust) tuples of key/value pairs, which the current private _PyDict_FromItems taking offsets of each of keys and values would be able to consume. (Maybe that could be a separate PyDict_FromItemsAndOffsets?)

Do you mean producing an array of (key1, value1, key2, value2, ..., keyN, valueN) and then use an offset of 2?

@vstinner
Copy link
Member Author

Adding this function would avoid having to make the private _PyStack_AsDict() function public, since its code is simple. It's basically a call to _PyDict_FromItems():

PyObject *
_PyStack_AsDict(PyObject *const *values, PyObject *kwnames)
{
    Py_ssize_t nkwargs;

    assert(kwnames != NULL);
    nkwargs = PyTuple_GET_SIZE(kwnames);
    return _PyDict_FromItems(&PyTuple_GET_ITEM(kwnames, 0), 1,
                             values, 1, nkwargs);
}

@davidhewitt
Copy link
Contributor

I was thinking more like 2-tuples, the type might be written in Rust as Vec<(*mut PyObject, *mut PyObject)>. I think in practice it would be laid out in memory like the array of alternating key/value you propose but I think that's not necessarily guaranteed. With the API taking offsets I could query the actual layout information from the Rust compiler and use that to calculate the offsets correctly.

The 2-tuples are quite a natural structure for Rust producers of "items" (it's what they would expect when iterating a mapping type, for example).

But maybe the more common case would be the second one I suggest - a rust iterator producing item 2-tuples with a size hint. At the moment we just start from PyDict_New and call PyDict_SetItem for each value produced by the iterator. It would be nice to have an API which enables using the size hint well to minimise allocations.

I could of course use the PyDict_FromItems proposed in this PR, just would need to collect two temporary allocations for all the items produced by the iterator first.

@scoder
Copy link
Contributor

scoder commented Oct 12, 2025

  • create a single array containing (Rust) tuples of key/value pairs, which the current private _PyDict_FromItems taking offsets of each of keys and values would be able to consume. (Maybe that could be a separate PyDict_FromItemsAndOffsets?)

Or name the function in this PR PyDict_FromKeysAndValues() and add a PyDict_FromItems() that takes a single pointer and two strides, one for the items and one for the values in the items. (EDIT: Actually, a single pointer won't suffice due to the initial irregular offset of the values, so that brings us basically to the interface of the current _PyDict_FromItems().)

Note that the current private _PyDict_FromItems() calculates the offsets from pointer sized steps, so if that remains the implementation, then the C structures would need to be pointer aligned. That seems ok from my side but I can't reason about Rust here.

The case of building small literal dicts could also use a PyDict_FromItems() with alternating keys and values, BTW, so maybe that's generally the better API?

  • or we just want to pass a hint to how large we expect the dictionary to be and then pass all the items in, e.g. to consume a Rust iterator (PyDictWriter? 😉)

That's really not much different from _PyDict_NewPresized() plus repeated PyDict_SetItem() calls.

@davidhewitt
Copy link
Contributor

Note that the current private _PyDict_FromItems() calculates the offsets from pointer sized steps, so if that remains the implementation, then the C structures would need to be pointer aligned. That seems ok from my side but I can't reason about Rust here.

I think it's ok, the individual tuple items are pointers and so will be aligned appropriately. AFAIK Rust is allowed to reorder tuples to improve packing but guarantees all elements are properly aligned for their type.

That's really not much different from _PyDict_NewPresized() plus repeated PyDict_SetItem() calls.

True, just that we try not to use private APIs at all in PyO3 so having a public API for this would open up the possibility to use it in PyO3. I understand there's a question about what to do about the unicode optimization with the "presized" API, I suggest we just make it roughly match whatever a normal Python dict would do if created empty and then had items repeatedly added (with the exception that the storage is preallocated).

@vstinner
Copy link
Member Author

@scoder:

Or name the function in this PR PyDict_FromKeysAndValues() and add a PyDict_FromItems() that takes a single pointer and two strides, one for the items and one for the values in the items. (EDIT: Actually, a single pointer won't suffice due to the initial irregular offset of the values, so that brings us basically to the interface of the current _PyDict_FromItems().)

So you would prefer this API?

PyObject *
PyDict_FromItems(PyObject *const *keys, Py_ssize_t keys_offset,
                 PyObject *const *values, Py_ssize_t values_offset, 
                 Py_ssize_t length)

@vstinner
Copy link
Member Author

@davidhewitt:

But maybe the more common case would be the second one I suggest - a rust iterator producing item 2-tuples with a size hint. At the moment we just start from PyDict_New and call PyDict_SetItem for each value produced by the iterator. It would be nice to have an API which enables using the size hint well to minimise allocations.

In short, you would prefer #139773 API?

@davidhewitt
Copy link
Contributor

I think so, yes (will comment on that thread).

@vstinner
Copy link
Member Author

Benchmark comparing:

Benchmark new presized fromitems
dict-1 278 ns 281 ns: 1.01x slower 319 ns: 1.15x slower
dict-10 2.69 us 2.62 us: 1.03x faster 2.48 us: 1.08x faster
dict-100 29.6 us 27.5 us: 1.08x faster 24.4 us: 1.21x faster
dict-1,000 301 us 283 us: 1.06x faster 244 us: 1.23x faster
dict-10,000 3.51 ms 3.18 ms: 1.11x faster 2.84 ms: 1.24x faster
Geometric mean (ref) 1.05x faster 1.12x faster
diff --git a/Modules/_testcapimodule.c b/Modules/_testcapimodule.c
index 4e73be20e1b..adae3fa2dc3 100644
--- a/Modules/_testcapimodule.c
+++ b/Modules/_testcapimodule.c
@@ -2562,6 +2562,90 @@ toggle_reftrace_printer(PyObject *ob, PyObject *arg)
     Py_RETURN_NONE;
 }
 
+
+static PyObject *
+bench_dict_new(PyObject *ob, PyObject *args)
+{
+    Py_ssize_t size, loops;
+    if (!PyArg_ParseTuple(args, "nn", &size, &loops)) {
+        return NULL;
+    }
+
+    PyTime_t t1, t2;
+    PyTime_PerfCounterRaw(&t1);
+    for (Py_ssize_t loop=0; loop < loops; loop++) {
+        PyObject *d = PyDict_New();
+        if (d == NULL) {
+            return NULL;
+        }
+
+        for (Py_ssize_t i=0; i < size; i++) {
+            PyObject *key = PyUnicode_FromFormat("%zi", i);
+            assert(key != NULL);
+
+            PyObject *value = PyLong_FromLong(i);
+            assert(value != NULL);
+
+            assert(PyDict_SetItem(d, key, value) == 0);
+        }
+
+        assert(PyDict_Size(d) == size);
+        Py_DECREF(d);
+    }
+    PyTime_PerfCounterRaw(&t2);
+
+    return PyFloat_FromDouble(PyTime_AsSecondsDouble(t2 - t1));
+}
+
+
+static PyObject *
+bench_dict_fromitems(PyObject *ob, PyObject *args)
+{
+    Py_ssize_t size, loops;
+    if (!PyArg_ParseTuple(args, "nn", &size, &loops)) {
+        return NULL;
+    }
+
+    PyTime_t t1, t2;
+    PyTime_PerfCounterRaw(&t1);
+    for (Py_ssize_t loop=0; loop < loops; loop++) {
+        PyObject **keys = (PyObject **)PyMem_Malloc(size * sizeof(PyObject*));
+        if (keys == NULL) {
+            return NULL;
+        }
+        PyObject **values = (PyObject **)PyMem_Malloc(size * sizeof(PyObject*));
+        if (values == NULL) {
+            return NULL;
+        }
+
+        for (Py_ssize_t i=0; i < size; i++) {
+            PyObject *key = PyUnicode_FromFormat("%zi", i);
+            assert(key != NULL);
+
+            PyObject *value = PyLong_FromLong(i);
+            assert(value != NULL);
+
+            keys[i] = key;
+            values[i] = value;
+        }
+
+        PyObject *d = PyDict_FromItems(keys, values, size);
+        assert(d != NULL);
+        Py_DECREF(d);
+
+        for (Py_ssize_t i=0; i < size; i++) {
+            Py_DECREF(keys[i]);
+            Py_DECREF(values[i]);
+        }
+        PyMem_Free(keys);
+        PyMem_Free(values);
+    }
+    PyTime_PerfCounterRaw(&t2);
+
+    return PyFloat_FromDouble(PyTime_AsSecondsDouble(t2 - t1));
+}
+
+
 static PyMethodDef TestMethods[] = {
     {"set_errno",               set_errno,                       METH_VARARGS},
     {"test_config",             test_config,                     METH_NOARGS},
@@ -2656,6 +2740,8 @@ static PyMethodDef TestMethods[] = {
     {"test_atexit", test_atexit, METH_NOARGS},
     {"code_offset_to_line", _PyCFunction_CAST(code_offset_to_line), METH_FASTCALL},
     {"toggle_reftrace_printer", toggle_reftrace_printer, METH_O},
+    {"bench_dict_new", bench_dict_new, METH_VARARGS},
+    {"bench_dict_fromitems", bench_dict_fromitems, METH_VARARGS},
     {NULL, NULL} /* sentinel */
 };

@scoder
Copy link
Contributor

scoder commented Oct 19, 2025

Regarding the benchmark numbers, internal loops are obviously faster than a large series of repeated API calls, but I doubt that a PyDict_FromItems() would be used for large dicts. There may be a few use cases where keys and dicts are really structurally available as Python object pointers in a C array compatible memory layout already, and for the right use cases, the performance difference will be visible But deliberately laying out a lot of objects in a memory array, just to pass them into dict creation, seems a waste of resources and unlikely to be overall faster than step by step creation. Good performance for large dict shouldn't be the guiding factor here.

@encukou
Copy link
Member

encukou commented Nov 11, 2025

What about a PyDict_MergeItems with similar signature as _PyDict_FromItems, but taking a dict to update. A new dict created if you pass in NULL.

@vstinner
Copy link
Member Author

What about a PyDict_MergeItems with similar signature as _PyDict_FromItems, but taking a dict to update. A new dict created if you pass in NULL.

PyDict_FromItems() scans keys to initialize unicode_keys before creating the dict. We would lose this optimization if the API expects an existing dict.

@vstinner vstinner marked this pull request as draft November 17, 2025 17:00
@vstinner
Copy link
Member Author

I wrote #141682 to add PyDict_FromKeysAndValues() and PyDict_FromItems() functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants