-
Notifications
You must be signed in to change notification settings - Fork 2
Proposed API: add PyDict_FromItems() #55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
vstinner
wants to merge
1
commit into
capi-workgroup:main
Choose a base branch
from
vstinner:pydict_fromitems
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,190 @@ | ||
| ++++++++++++++++++++++++++++++++++++++++++++++++ | ||
| Add ``PyDict_FromItems()`` function to the C API | ||
| ++++++++++++++++++++++++++++++++++++++++++++++++ | ||
|
|
||
| Rationale | ||
| ========= | ||
|
|
||
| For historical reasons, Python exposes private functions in its public C API. | ||
| These functions are not tested, not documented, can change or even be removed | ||
| anytime without notifying users. Private functions used by 3rd party projects | ||
| should be promoted to public functions to add tests, documentation and | ||
| backward compatibility warranties. | ||
|
|
||
| The private ``_PyDict_NewPresized()`` function is used by 11 projects of the | ||
| PyPI top 15,000. It has a *min_used* argument to preallocate enough items for | ||
| the created dictionary. Dictionaries can be optimized for Unicode keys, but | ||
| ``_PyDict_NewPresized()`` doesn't support that. | ||
|
|
||
| Specification | ||
| ============= | ||
|
|
||
| Add ``PyDict_FromItems()`` function to the C API:: | ||
|
|
||
| PyObject* PyDict_FromItems( | ||
| PyObject *const *keys, | ||
| Py_ssize_t keys_stride, | ||
| PyObject *const *values, | ||
| Py_ssize_t values_stride, | ||
| Py_ssize_t length) | ||
|
|
||
| Usage: | ||
|
|
||
| * ``PyDict_FromItems(items, 2, items + 1, 2, length)``: *items* is an array of | ||
| keys and values (``key1, value1, key2, value2, ..., keyN, valueN``). | ||
| * ``PyDict_FromItems(keys, 1, values, 1, length)``: *keys* is an array of keys | ||
| and *values* is an array of values. | ||
| * ``PyDict_FromItems(keys, 1, &value, 0, length)``: *keys* is an array of keys | ||
| and *value* is copied for all items. | ||
|
|
||
| On CPython, the function preallocates *length* items and scans *keys* to check | ||
| if keys are all Unicode strings. | ||
|
|
||
| Calls to the private ``PyObject* _PyStack_AsDict(PyObject *const *values, | ||
| PyObject *kwnames)`` function can be replaced with:: | ||
|
|
||
| PyDict_FromItems(&PyTuple_GET_ITEM(kwnames, 0), 1, | ||
| values, 1, | ||
| PyTuple_GET_SIZE(kwnames)); | ||
|
|
||
|
|
||
| Benchmark | ||
| ========= | ||
|
|
||
| `Benchmark | ||
| <https://github.com/python/cpython/pull/139963#issuecomment-3412864991>`__ on a | ||
| regular Python release build with Unicode keys and integer values: | ||
|
|
||
| =============== ======= ===================== | ||
| Benchmark setitem fromitems | ||
| =============== ======= ===================== | ||
| dict-1 278 ns 319 ns: 1.15x slower | ||
| dict-10 2.69 us 2.48 us: 1.08x faster | ||
| dict-100 29.6 us 24.4 us: 1.21x faster | ||
| dict-1,000 301 us 244 us: 1.23x faster | ||
| dict-10,000 3.51 ms 2.84 ms: 1.24x faster | ||
| Geometric mean (ref) **1.12x faster** | ||
| =============== ======= ===================== | ||
|
|
||
| `Benchmark | ||
| <https://github.com/python/cpython/pull/139963#issuecomment-3612465437>`__ on | ||
| **Free Threaded** build with Unicode keys and integer values: | ||
|
|
||
| ============== ======= ===================== | ||
| Benchmark setitem fromitems | ||
| ============== ======= ===================== | ||
| dict-1 467 ns 447 ns: 1.04x faster | ||
| dict-5 1.95 us 1.64 us: 1.19x faster | ||
| dict-10 3.71 us 3.22 us: 1.15x faster | ||
| dict-25 9.05 us 7.78 us: 1.16x faster | ||
| dict-50 17.9 us 15.2 us: 1.18x faster | ||
| dict-100 35.4 us 30.0 us: 1.18x faster | ||
| dict-500 164 us 137 us: 1.20x faster | ||
| dict-1,000 330 us 274 us: 1.20x faster | ||
| Geometric mean (ref) **1.16x faster** | ||
| ============== ======= ===================== | ||
|
|
||
|
|
||
| PyPI projects using ``_PyDict_NewPresized()`` | ||
| ============================================= | ||
|
|
||
| * Nuitka (2.7.16) | ||
| * cython (3.1.4): see `issue <https://github.com/cython/cython/issues/7201>`__. | ||
| * frozendict (2.4.6) | ||
| * gevent (25.9.1) | ||
| * mercurial (7.1.1): see `issue | ||
| <https://foss.heptapod.net/mercurial/mercurial-devel/-/issues/10029>`__. | ||
| * mypy (1.18.2) | ||
| * mypy_dev-1.19.0a2 | ||
| * numba (0.62.0) | ||
| * orjson (3.11.3): see `comment | ||
| <https://github.com/python/cpython/issues/139772#issuecomment-3393830209>`__. | ||
| * ormsgpack (1.10.0) | ||
| * serpyco_rs (1.17.1) | ||
|
|
||
|
|
||
| Rejected Ideas | ||
| ============== | ||
|
|
||
| Add ``PyDict_NewPresized()`` function | ||
| ------------------------------------- | ||
|
|
||
| Add ``PyDict_NewPresized()`` function to the C API:: | ||
|
|
||
| PyObject* PyDict_NewPresized(Py_ssize_t size, int unicode_keys) | ||
|
|
||
| If *unicode_keys* is true, optimize the created dictionary for keys which are | ||
| only Unicode strings. | ||
|
|
||
| The problem of this API is that it's too low-level: it exposes *unicode_keys* | ||
| optimization which is an implementation detail. | ||
|
|
||
| INADA-san wrote that most users either overestimate its effectiveness or don't | ||
| fully understand how it operates. | ||
|
|
||
|
|
||
| Add ``PyDict_FromKeysAndValues()`` and ``PyDict_FromItems()`` | ||
| ------------------------------------------------------------- | ||
|
|
||
| Add ``PyDict_FromKeysAndValues()`` and ``PyDict_FromItems()`` functions to the | ||
| C API:: | ||
|
|
||
| PyObject* PyDict_FromKeysAndValues( | ||
| PyObject *const *keys, | ||
| PyObject *const *values, | ||
| Py_ssize_t length) | ||
|
|
||
| PyObject* PyDict_FromItems( | ||
| PyObject *const *items, | ||
| Py_ssize_t length) | ||
|
|
||
| These functions are very close to the API proposed in this PEP, but have no | ||
| "stride" argument. They are less error-prone since *stride* arguments don't | ||
| exist and so cannot be misused. | ||
|
|
||
| But these functions are less generic and don't support ``values_stride=0`` to | ||
| reuse the same value for all items, or strides greater than ``2`` for more | ||
| complex arrays. | ||
|
|
||
| Add ``PyDict_MergeItems()`` function | ||
| ------------------------------------ | ||
|
|
||
| Add ``PyDict_MergeItems()`` function to the C API:: | ||
|
|
||
| PyObject* PyDict_MergeItems( | ||
| PyObject *dict, | ||
| PyObject *const *keys, | ||
| Py_ssize_t keys_stride, | ||
| PyObject *const *values, | ||
| Py_ssize_t values_stride, | ||
| Py_ssize_t length) | ||
|
|
||
| Create a new dictionary if *dict* is ``NULL``, or update an existing dictionary | ||
| otherwise. | ||
|
|
||
| Such function lacks an *override* argument to decide how to deal with | ||
| overridden keys on updating an existing dictionary. | ||
|
Comment on lines
+165
to
+166
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So if we add an |
||
|
|
||
|
|
||
| Discussions | ||
| =========== | ||
|
|
||
| * Issue `Add PyDict_NewPresized() function | ||
| <https://github.com/python/cpython/issues/139772>`_ | ||
| * PR `Add PyDict_NewPresized() function | ||
| <https://github.com/python/cpython/pull/139773>`__ | ||
| * PR `Add PyDict_FromItems() function | ||
| <https://github.com/python/cpython/pull/139963>`__ | ||
| * PR `Add PyDict_FromKeysAndValues() function | ||
| <https://github.com/python/cpython/pull/141682>`__ | ||
| * C API Working Group decision issue `Add PyDict_NewPresized() function | ||
| <https://github.com/capi-workgroup/decisions/issues/80>`__ | ||
| * C API Working Group decision issue `Add PyDict_FromItems() function | ||
| <https://github.com/capi-workgroup/decisions/issues/90>`__ | ||
|
|
||
|
|
||
| Copyright | ||
| ========= | ||
|
|
||
| This document is placed in the public domain or under the | ||
| CC0-1.0-Universal license, whichever is more permissive. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this apply to some other proposals as well?
How do the other functions behave if input contains duplicate keys?