Skip to content

Eager tag send inline ucp_tag_send_inline is not called on a systems with CUDA/RoCM by default #4275

@dmitrygx

Description

@dmitrygx

UCP has two max shorts (for different memory types: ON and OFF). They are set here:

static void ucp_ep_config_set_memtype_thresh(ucp_memtype_thresh_t *max_eager_short,

static void ucp_ep_config_set_memtype_thresh(ucp_memtype_thresh_t *max_eager_short,
                                             ssize_t max_short, int num_mem_type_mds)
{
    if (!num_mem_type_mds) {
        max_eager_short->memtype_off = max_short;
    }

    max_eager_short->memtype_on = max_short;
}

max_short::memtype_off is -1 if there are CUDA or RoCM MDs.

UCP checks those values when trying to do tag send operation using UCT AM Short or UCT TAG offload Eager Short:

ucp_tag_eager_is_inline(ucp_ep_h ep, const ucp_memtype_thresh_t *max_eager_short,

static UCS_F_ALWAYS_INLINE int
ucp_tag_eager_is_inline(ucp_ep_h ep, const ucp_memtype_thresh_t *max_eager_short,
                        ssize_t length)
{
    return (ucs_likely(length <= max_eager_short->memtype_off) ||
            (length <= max_eager_short->memtype_on &&
             ucp_memory_type_cache_is_empty(ep->worker->context)));
}

If there are at least one non-host MDs (CUDA or RoCM), first statement (length <= max_eager_short->memtype_off) is always false.
And UCP checks that's (length <= max_eager_short->memtype_on) and MemoryType Cache is empty assuming that if MemoryType Cache isn't empty if a user does UCP TAG Send for CUDA memory. But MemType Cache is always not empty, since (#4022) UCS MemType Cache initializes UCM Event handler with UCM_EVENT_FLAG_EXISTING_ALLOC flag and all inaccessible mappings and Nvidia file mappings are added to a Memory Type Cache with memtype == UCS_MEMORY_TYPE_LAST. So, it leads that we don't use neither UCT AM Short nor UCT TAG offload Eager Short for TAG sends even for HOST memory.

Setting UCX_TLS w/o CUDA (or RoCM) TLs fixes the issue and it works fine (Since only HOST MDs are used). Also UCX_MEMTYPE_CACHE=n fixes the issue as it disables MemoryType Cache and MemoryType Cache is considered empty in this case. We call them after checking this

ucp_proto_get_short_max(const ucp_request_t *req,

static UCS_F_ALWAYS_INLINE ssize_t
ucp_proto_get_short_max(const ucp_request_t *req,
                        const ucp_ep_msg_config_t *msg_config)
{
    return  (!UCP_DT_IS_CONTIG(req->send.datatype) ||
            (req->flags & UCP_REQUEST_FLAG_SYNC) ||
            (!UCP_MEM_IS_HOST(req->send.mem_type))) ?
           -1 : msg_config->max_short;
}

when doing send with UCP request allocated in ucp_tag_send_req

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions