Fix Coverity defects: Resolve data race conditions and implement timeout-based thread join#193
Fix Coverity defects: Resolve data race conditions and implement timeout-based thread join#193
Conversation
|
|
…er synchronization Co-authored-by: shibu-kv <89052442+shibu-kv@users.noreply.github.com>
|
b'## Copyright scan failure |
Co-authored-by: shibu-kv <89052442+shibu-kv@users.noreply.github.com>
… mutex before pthread_join Co-authored-by: shibu-kv <89052442+shibu-kv@users.noreply.github.com>
| pthread_mutex_unlock(&plMutex); | ||
| pthread_join(singleProfile->reportThread, NULL); | ||
| pthread_join(threadToJoin, NULL); | ||
| reportThreadExits = false; |
There was a problem hiding this comment.
Coverity issue no longer present as of: undefined
Show issue
Coverity Issue - Data race condition
Accessing "reportThreadExits" without holding lock "plMutex". Elsewhere, "reportThreadExits" is written to with "plMutex" held 2 out of 3 times.
Medium Impact, CWE-366
MISSING_LOCK
There was a problem hiding this comment.
Fixed the race condition by adding mutex protection around reportThreadExits = false assignment. The variable is now accessed consistently with proper locking. Commit: 7d78766
source/bulkdata/profilexconf.c
Outdated
| pthread_t threadToJoin = reportThread; | ||
| pthread_mutex_unlock(&plMutex); | ||
| pthread_join(singleProfile->reportThread, NULL); | ||
| pthread_join(threadToJoin, NULL); |
There was a problem hiding this comment.
@copilot Instead of join and wait forever use strategy similar to trylock
There was a problem hiding this comment.
Implemented timeout-based thread join using pthread_timedjoin_np with 5-second timeout instead of blocking indefinitely. This prevents hanging during uninit operations while maintaining proper cleanup. Commit: 7d78766
shibu-kv
left a comment
There was a problem hiding this comment.
Incorporate the review comments to avoid indefinite wait / deadlocks
…ad join Co-authored-by: shibu-kv <89052442+shibu-kv@users.noreply.github.com>
| pthread_mutex_lock(&plMutex); | ||
| reportThreadExits = false; | ||
| singleProfile->reportInProgress = false ; | ||
| singleProfile->reportInProgress = false; |
There was a problem hiding this comment.
Coverity Issue - Check of thread-shared field evades lock acquisition
Thread1 sets "reportInProgress" to a new value. Now the two threads have an inconsistent view of "reportInProgress" and updates to fields correlated with "reportInProgress" may be lost.
High Impact, CWE-543
LOCK_EVASION
How to fix
Guard the modification of "reportInProgress" and the read used to decide whether to modify "reportInProgress" with the same set of locks.
Problem
Coverity static analysis identified critical data race conditions in the telemetry component:
reportThreadwithout holding lockplMutexpthread_join(reportThread, NULL)called without proper synchronizationreportThreadExitswithout proper mutex protectionThe root cause was that
reportThreadwas a member of theProfileXConfstructure, which could be accessed concurrently during aggressive reloads, leading to race conditions and potential crashes.Solution
Thread Management Refactor
reportThreadfromProfileXConfstructure to a static global variablepthread_t reportThreadfield from the ProfileXConf structure definitionpthread_createandpthread_joincalls to use the static variableRace Condition Fix
The critical issue was:
pthread_create(&reportThread, ...)called while holdingplMutexpthread_join(reportThread, NULL)called without holdingplMutexFixed by implementing proper synchronization:
This ensures:
Timeout-Based Thread Management
pthread_join()withpthread_timedjoin_np()using a 5-second timeoutAdditional Race Condition Fixes
reportThreadExitsvariable access by adding proper mutex protectionSafety Enhancements
CollectAndReportXconfto handle profile reload scenariosProfileXConf_deletefunction by removing complex thread managementTesting
The changes address the specific race conditions that occur during aggressive reload scenarios. The static thread variable approach eliminates the concurrent access issues while maintaining proper thread lifecycle management. The timeout mechanism prevents indefinite blocking during cleanup operations.
Impact
Fixes: ISSUE-76
Resolves: Coverity defects CWE-366 and MISSING_LOCK for both
reportThreadandreportThreadExitsvariablesOriginal prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.