Skip to content

Conversation

@BobTheBuidler
Copy link
Contributor

This PR microoptimizes the usage of _Py_IDENTIFIER and _PyUnicode_FromId in various C files.

Changes:

  • For each identifier (e.g., setdefault, update, keys, values, items, clear, copy), a static cache variable (e.g., setdefault_id_unicode) is used to store the result of _PyUnicode_FromId.
  • The _Py_IDENTIFIER(name) macro is now declared only inside the conditional block that runs the first time a function is called (i.e., when the corresponding cache variable is NULL).
  • This ensures that the interned unicode object is initialized only once, and only if needed.
  • All subsequent calls reuse the cached unicode object, avoiding repeated calls to _PyUnicode_FromId and repeated static identifier declarations.

Example pattern after refactor:

static PyObject *setdefault_id_unicode = NULL;

PyObject *CPyDict_SetDefault(PyObject *dict, PyObject *key, PyObject *value) {
    if (PyDict_CheckExact(dict)) {
        PyObject* ret = PyDict_SetDefault(dict, key, value);
        Py_XINCREF(ret);
        return ret;
    }
    if (setdefault_id_unicode == NULL) {
        _Py_IDENTIFIER(setdefault);
        setdefault_id_unicode = _PyUnicode_FromId(&PyId_setdefault); /* borrowed */
        if (setdefault_id_unicode == NULL) {
            return NULL;
        }
    }
    return PyObject_CallMethodObjArgs(dict, setdefault_id_unicode, key, value, NULL);
}

@BobTheBuidler BobTheBuidler changed the title [mypyc] feat: cache ids for fallback pythonic method lookups [mypyc] feat: cache ids for fallback pythonic method lookups [1/1] Oct 1, 2025
return NULL;
if (join_id_unicode == NULL) {
_Py_IDENTIFIER(join);
join_id_unicode = _PyUnicode_FromId(&PyId_join); /* borrowed */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not thread-safe on free-threaded builds. I'm not sure what's the best way to work around this though. Using a relaxed memory order read could be sufficient. If we intern the string, this seems like a thread safe approach (as long as we don't use multiple subinterpreters, which are currently not supported).

Also the _Py_IDENTIFIER API is no longer part of the public API: python/cpython#108593. It would be good to have a replacement that uses public API as much as feasible if we are going to change these.

Below I give some ideas.

Here's how the atomic read/store might work (didn't check, based on LLM output):

#include <stdatomic.h>

static _Atomic(PyObject *) join_id_unicode = ATOMIC_VAR_INIT(NULL); 

...
    if (atomic_load_explicit(&join_id_unicode, memory_order_relaxed) == NULL) { 
        ... atomic_store_explicit(...) ...
    }

Use PyUnicode_InternFromString to create a unicode object (once), instead of _PyUnicode_FromId.

Only update one use case first, and once we've agreed on a good approach, create a follow-up PR that migrates remaining use cases. This minimizes extra effort required to iteratively update based on review feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants