-
Notifications
You must be signed in to change notification settings - Fork 539
Eager tag send inline ucp_tag_send_inline is not called on a systems with CUDA/RoCM by default #4275
Description
UCP has two max shorts (for different memory types: ON and OFF). They are set here:
Line 1085 in b987de5
| static void ucp_ep_config_set_memtype_thresh(ucp_memtype_thresh_t *max_eager_short, |
static void ucp_ep_config_set_memtype_thresh(ucp_memtype_thresh_t *max_eager_short,
ssize_t max_short, int num_mem_type_mds)
{
if (!num_mem_type_mds) {
max_eager_short->memtype_off = max_short;
}
max_eager_short->memtype_on = max_short;
}max_short::memtype_off is -1 if there are CUDA or RoCM MDs.
UCP checks those values when trying to do tag send operation using UCT AM Short or UCT TAG offload Eager Short:
Line 148 in 0815086
| ucp_tag_eager_is_inline(ucp_ep_h ep, const ucp_memtype_thresh_t *max_eager_short, |
static UCS_F_ALWAYS_INLINE int
ucp_tag_eager_is_inline(ucp_ep_h ep, const ucp_memtype_thresh_t *max_eager_short,
ssize_t length)
{
return (ucs_likely(length <= max_eager_short->memtype_off) ||
(length <= max_eager_short->memtype_on &&
ucp_memory_type_cache_is_empty(ep->worker->context)));
}If there are at least one non-host MDs (CUDA or RoCM), first statement (length <= max_eager_short->memtype_off) is always false.
And UCP checks that's (length <= max_eager_short->memtype_on) and MemoryType Cache is empty assuming that if MemoryType Cache isn't empty if a user does UCP TAG Send for CUDA memory. But MemType Cache is always not empty, since (#4022) UCS MemType Cache initializes UCM Event handler with UCM_EVENT_FLAG_EXISTING_ALLOC flag and all inaccessible mappings and Nvidia file mappings are added to a Memory Type Cache with memtype == UCS_MEMORY_TYPE_LAST. So, it leads that we don't use neither UCT AM Short nor UCT TAG offload Eager Short for TAG sends even for HOST memory.
Setting UCX_TLS w/o CUDA (or RoCM) TLs fixes the issue and it works fine (Since only HOST MDs are used). Also UCX_MEMTYPE_CACHE=n fixes the issue as it disables MemoryType Cache and MemoryType Cache is considered empty in this case. We call them after checking this
ucx/src/ucp/proto/proto_am.inl
Line 404 in b987de5
| ucp_proto_get_short_max(const ucp_request_t *req, |
static UCS_F_ALWAYS_INLINE ssize_t
ucp_proto_get_short_max(const ucp_request_t *req,
const ucp_ep_msg_config_t *msg_config)
{
return (!UCP_DT_IS_CONTIG(req->send.datatype) ||
(req->flags & UCP_REQUEST_FLAG_SYNC) ||
(!UCP_MEM_IS_HOST(req->send.mem_type))) ?
-1 : msg_config->max_short;
}when doing send with UCP request allocated in ucp_tag_send_req