Use NewThreadPool in dynamic mode. By default use only one instance of ThreadPool per device.#6254
Conversation
Greptile SummaryThis PR exposes Python bindings for Key changes:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant EvalContext
participant ThreadPool as ThreadPool (Python)
participant DefaultTP as _DefaultThreadPool
participant NewTP as _NewThreadPool (C++)
participant Facade as PyThreadPoolFacade (C++)
participant Workspace as _Workspace (C++)
User->>EvalContext: EvalContext(thread_pool=tp)
EvalContext->>EvalContext: validate tp.device_id == device_id
EvalContext->>EvalContext: self._thread_pool = tp
alt thread_pool is None (default pool)
User->>EvalContext: ctx.thread_pool (property)
EvalContext->>DefaultTP: _get_default_thread_pool(device_id)
DefaultTP->>DefaultTP: get() — double-checked lock
DefaultTP->>ThreadPool: ThreadPool(num_threads, device_id)
ThreadPool->>NewTP: super().__init__(num_threads, device_id)
NewTP-->>DefaultTP: shared_ptr[ThreadPoolBase]
DefaultTP-->>EvalContext: ThreadPool instance
end
User->>EvalContext: operator.__call__(inputs)
EvalContext->>ThreadPool: ctx.thread_pool._create_facade()
ThreadPool->>Facade: PyThreadPoolFacade(shared_ptr[ThreadPoolBase])
Facade-->>EvalContext: shared_ptr[ThreadPool] (facade)
EvalContext->>Workspace: _Workspace(facade, cuda_stream)
Workspace->>Workspace: SetupAndRun(workspace, batch_size)
Workspace-->>EvalContext: outputs
User->>EvalContext: set_num_threads(n)
EvalContext->>DefaultTP: _set_num_threads(n) — nullify stale pools
DefaultTP->>DefaultTP: self._thread_pool = None
|
be9f2df to
9d223c1
Compare
9d223c1 to
d767903
Compare
90c4b1d to
1665da1
Compare
|
|
||
| mod = sys.modules[__name__] | ||
|
|
||
| if nenv := os.environ.get("DALI_NUM_THREADS", None): |
There was a problem hiding this comment.
This should be documented.
There was a problem hiding this comment.
It's not new (just moved here). We should documented but it seems a bit out of scope.
| _global_default_thread_pool = {} | ||
|
|
||
|
|
||
| class _DefaultThreadPool: |
There was a problem hiding this comment.
The name is a bit confusing as it's not really a thread pool but a wrapper on a thread pool
4cb8eb7 to
74e2353
Compare
|
@greptileai the value of CPU_ONLY_DEVICE_ID is -99999, so there's no collision with _CURRENT_DEVICE_ID at -1. |
|
You're absolutely right on both points:
The design trade-off makes sense - optimizing for the correct-use case rather than adding defensive overhead for documented misuse. |
|
CI MESSAGE: [46421321]: BUILD STARTED |
|
CI MESSAGE: [46421321]: BUILD FAILED |
…f ThreadPool per device. Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Add thread-safety test for set_num_threads and default thread pool. Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
74e2353 to
522782e
Compare
|
CI MESSAGE: [46508447]: BUILD STARTED |
|
CI MESSAGE: [46508447]: BUILD PASSED |
|
@greptileai CPU_ONLY_DEVICE_ID is -99999, there no collision with _CURRENT_DEVICE_ID. |
|
Got it, thanks for the clarification. With |
Category:
Refactoring (Redesign of existing code that doesn't affect functionality)
Description:
This change exposes Python bindings for NewThreadPool and ThreadPoolFacade (with owning pointer) and uses them in Dynamic Mode. The ThreadPool is no longer thread-local.
Also, since the new thread pool can be shared, it is now possible to give a thread pool instance to EvalContext contructor.
Additional information:
Affected modules and functionalities:
Key points relevant for the review:
Tests:
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: DALI-4640