- 
          
 - 
                Notifications
    
You must be signed in to change notification settings  - Fork 33.3k
 
gh-116738: Make _suggestions module thread-safe #140321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Py_ssize_t size = PyList_Size(candidates); | ||
| for (Py_ssize_t i = 0; i < size; ++i) { | ||
| PyObject *elem = PyList_GetItem(candidates, i); | ||
| PyObject *elem = PyList_GET_ITEM(candidates, i); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using PyList_GetItemRef with this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@corona10, Thank you so much for your comment!
Just to make sure I understand your suggestion correctly, did you mean:
- To use 
PyList_GetItemRef()instead ofPyList_GET_ITEM()within the currentcritical_section? - Or to remove the 
critical_sectionand usePyList_GetItemRef()in both loops I mentioned in the description? 
Regarding 1: I replaced PyList_GetItem() with PyList_GET_ITEM() because, in the with-GIL build, it’s already assumed that PyList_GetItem() cannot fail, so the returned value isn't checked. Within the critical_section, the assumption is that the candidates list cannot be mutated, and PyList_GET_ITEM() avoids extra atomic operations.
Regarding 2: I think this could be implemented with PyList_GetItemRef() and no critical_section, as long as errors returned from PyList_GetItemRef() are properly handled. However, in my opinion, using a critical_section makes the code a bit easier to read and reason about, since the candidates list cannot be mutated in this context. This means we don’t need to consider update cases between blocks or iterations.
What do you think, would it be better to use PyList_GetItemRef() here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yoney
Hmm, even if you use PyList_GET_ITEM, it won’t check whether the returned value is NULL, so it won’t actually fix the UB you mentioned. What do you think?
cpython/Include/cpython/listobject.h
Line 40 in f11ec6e
| #define PyList_GET_ITEM(op, index) (_PyList_CAST(op)->ob_item[(index)]) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@corona10
Sorry, my previous comment wasn't clear before. I was referring to cases where PyList_GetItem() could fail due to PyList_Check() or valid_index(), particularly in the free-threading build. I was trying to explain why we need a critical_section.
Lines 384 to 397 in f11ec6e
| PyObject * | |
| PyList_GetItem(PyObject *op, Py_ssize_t i) | |
| { | |
| if (!PyList_Check(op)) { | |
| PyErr_BadInternalCall(); | |
| return NULL; | |
| } | |
| if (!valid_index(i, Py_SIZE(op))) { | |
| _Py_DECLARE_STR(list_err, "list index out of range"); | |
| PyErr_SetObject(PyExc_IndexError, &_Py_STR(list_err)); | |
| return NULL; | |
| } | |
| return ((PyListObject *)op) -> ob_item[i]; | |
| } | 
These properties are checked in _suggestions__generate_suggestions_impl() earlier, before calling PyList_GetItem(), and the GIL or the critical_section "protects" them. That's why I replaced PyList_GetItem() with PyList_GET_ITEM(). However, the actual element of the list could still be NULL. I had assumed this wasn’t possible, but should we check if the element is NULL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I get it, looks good to me. Thank you for explain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
          
 | 
    
Added a critical section to protect the
candidateslist in the free-threading build, addressing two possible issues below:PyList_GetItem()could fail and returnNULL. Passing it toPyUnicode_Check()results in UB in the loop below.cpython/Modules/_suggestions.c
Lines 31 to 33 in bedaea0
_Py_CalculateSuggestions().cpython/Python/suggestions.c
Lines 150 to 153 in bedaea0
cc: @mpage @colesbury
Note: I considered placing the critical section inside
_Py_CalculateSuggestions(). However, due to the loop in_suggestions__generate_suggestions_impl(), the critical section needs to be earlier. I also checked other places where_Py_CalculateSuggestions()is called, and all of them use local list (candidates) variables in C code, so they are safe.