- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 33.2k
gh-128679: fix race condition in tracemalloc #128695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1 @@ | ||
| Fix race condition in tracemalloc causing abort. | 
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
|  | @@ -538,11 +538,20 @@ tracemalloc_alloc(int use_calloc, void *ctx, size_t nelem, size_t elsize) | |
| return NULL; | ||
|  | ||
| TABLES_LOCK(); | ||
| if (ADD_TRACE(ptr, nelem * elsize) < 0) { | ||
| /* Failed to allocate a trace for the new memory block */ | ||
| TABLES_UNLOCK(); | ||
| alloc->free(alloc->ctx, ptr); | ||
| return NULL; | ||
| /* This operation can be executed outside of GIL protection due to | ||
| allocations needed for an initial PyGILState_Ensure(). Because of this, | ||
| tracing may be turned off concurrently in another thread which currently | ||
| holds the GIL between the initial check of `tracing` variable and here. | ||
| This is a sanity check to account for this possibility and avoid a | ||
| potential abort where it can be handled gracefully instead. | ||
| See gh-128679. */ | ||
| if (tracemalloc_config.tracing) { | ||
| if (ADD_TRACE(ptr, nelem * elsize) < 0) { | ||
| /* Failed to allocate a trace for the new memory block */ | ||
| TABLES_UNLOCK(); | ||
| alloc->free(alloc->ctx, ptr); | ||
| return NULL; | ||
| } | ||
| } | ||
| TABLES_UNLOCK(); | ||
| return ptr; | ||
|  | @@ -963,8 +972,11 @@ _PyTraceMalloc_Stop(void) | |
| if (!tracemalloc_config.tracing) | ||
| return; | ||
|  | ||
| /* stop tracing Python memory allocations */ | ||
| /* stop tracing Python memory allocations, | ||
| but not while something might be in the middle of an operation */ | ||
| TABLES_LOCK(); | ||
| There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we're going to lock writes, we need to lock reads as well, but see my other comment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Reading without lock like the first check in  There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can't really mix and match atomics. If something relies on a lock for thread-safety, then you must hold that lock in order to read or write to it. It's not safe to rely on the GIL for some reads, and then rely on a different lock for others. It might be OK in some cases with the GIL because of implicit lock-ordering, but that can break very easily. In this specific case, this would race with  I would either go with atomic reads and writes, or a compare-exchange if there ends up being some problems with inconsistent state. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not convinced that adding TABLES_LOCK/TABLES_UNLOCK while setting tracing avoids a race condition. At least, I don't understand how. What are your trying to protect? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 
 I'm not sure either, I was practicing defensive programming. Dunno if setting this to 0 here would cause a problem for an operation simultaneously taking place in another thread in a  
 I am assuming @vstinner you are familiar with the codebase so if you say this would not cause a problem then I can certainly remove it. 
 Neither of those would solve the OP problem by themselves and just changing the  There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, switching it to completely atomic won't fix it, but it's a good start because again, this approach is error-prone. It might pass on the test case, but that doesn't necessarily mean it's the correct approach. Imagine a case where: 
 If you think that atomic operations will be too difficult to use here, then you can use a dedicated  | ||
| tracemalloc_config.tracing = 0; | ||
| TABLES_UNLOCK(); | ||
|  | ||
| /* unregister the hook on memory allocators */ | ||
| #ifdef TRACE_RAW_MALLOC | ||
|  | @@ -1317,6 +1329,12 @@ PyTraceMalloc_Track(unsigned int domain, uintptr_t ptr, | |
|  | ||
| gil_state = PyGILState_Ensure(); | ||
|  | ||
| if (!tracemalloc_config.tracing) { | ||
| /* tracing may have been turned off as we were acquiring the GIL */ | ||
| PyGILState_Release(gil_state); | ||
| return -2; | ||
| } | ||
|  | ||
| TABLES_LOCK(); | ||
| res = tracemalloc_add_trace(domain, ptr, size); | ||
| TABLES_UNLOCK(); | ||
|  | ||
Uh oh!
There was an error while loading. Please reload this page.