Skip to content

Commit ccc7528

Browse files
akkart-awsbwbarrett
authored andcommitted
config: Enable DMA-BUF by default (except old EFA)
Change the default value of OFI_NCCL_DISABLE_DMABUF to 0 so that DMA-BUF is enabled by default on platforms that have Libfabric 1.20, kernel 5.12 or later, and CUDA 11.7 with DMA-BUF support. At the same time, disable DMA-BUF support on 1st, 2nd, and 3rd generation EFA, due to an issue with operations using very large page entry count MRs and an issue with page merging in the RDMA subsystem (see comment in nccl_ofi_net.c for more details). Signed-off-by: Arun Karthik <[email protected]> Signed-off-by: Brian Barrett <[email protected]>
1 parent eb4bb6b commit ccc7528

File tree

2 files changed

+34
-3
lines changed

2 files changed

+34
-3
lines changed

include/nccl_ofi_param.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -273,9 +273,9 @@ OFI_NCCL_PARAM_INT(disable_gdr_required_check, "DISABLE_GDR_REQUIRED_CHECK", 0);
273273
* the plugin has no freedom to renegotiate DMABUF support with NCCL, and so it
274274
* is fatal. Under those conditions, users should ensure that they have set this
275275
* environment variable to '1' to force NCCL to avoid providing dmabuf file
276-
* desciptors. This is the default, pending perf investigations.
276+
* desciptors.
277277
*/
278-
OFI_NCCL_PARAM_INT(disable_dmabuf, "DISABLE_DMABUF", 1);
278+
OFI_NCCL_PARAM_INT(disable_dmabuf, "DISABLE_DMABUF", 0);
279279

280280
/*
281281
* Messages sized larger than this threshold will be striped across multiple rails

src/nccl_ofi_net.cpp

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -578,11 +578,42 @@ int nccl_net_ofi_info_properties(nccl_net_ofi_plugin_t *plugin, struct fi_info *
578578

579579
props->max_mr_key_size = nic_prov->domain_attr->mr_key_size;
580580

581-
582581
props->dmabuf_support = ((nic_prov->caps & FI_HMEM) != 0) &&
583582
FI_VERSION_GE(nic_prov->fabric_attr->api_version, FI_VERSION(1, 20)) &&
584583
nccl_ofi_dmabuf_viable()
585584
;
585+
if (props->dmabuf_support && strncmp("efa", nic_prov->fabric_attr->prov_name, strlen("efa")) == 0) {
586+
// Generations 1-3 of EFA have a firmware issue that can result
587+
// in communication failures with MRs that cover a large number
588+
// of page entries. This is not usually a problem, because page
589+
// merging greatly reduces the number of page entries in the MR.
590+
// However, the RDMA subsystem in the Linux kernel did not
591+
// properly execute page merging for dmabuf entries until a
592+
// recent patch
593+
// (https://web.git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?id=486055f5e09df9),
594+
// and the lack of page merging increased the probability of
595+
// hitting the EFA issue. Testing for the fixed kernel version
596+
// is effectively impossible (the issue can also be fixed in the
597+
// EFA kmod itself, and backports are likely, so a simple kernel
598+
// version check is insufficient), so instead we only support
599+
// dmabuf by default in Generation 4 of EFA. When the
600+
// communication failure issue is resolved in previous
601+
// generations, this code will be removed and dmabuf will be
602+
// available by default everywhere.
603+
if (nic_prov->nic == NULL || nic_prov->nic->device_attr == NULL) {
604+
NCCL_OFI_TRACE(NCCL_INIT | NCCL_NET,
605+
"DMA-BUF disabled due to missing nic data");
606+
props->dmabuf_support = false;
607+
} else if (strcmp("efa0", nic_prov->nic->device_attr->device_id) == 0 ||
608+
strcmp("efa1", nic_prov->nic->device_attr->device_id) == 0 ||
609+
strcmp("efa2", nic_prov->nic->device_attr->device_id) == 0) {
610+
NCCL_OFI_TRACE(NCCL_INIT | NCCL_NET,
611+
"DMA-BUF disabled due to EFA device id %s",
612+
nic_prov->nic->device_attr->device_id);
613+
props->dmabuf_support = false;
614+
}
615+
}
616+
586617
if (props->dmabuf_support) {
587618
NCCL_OFI_TRACE(NCCL_INIT | NCCL_NET, "DMA-BUF support is advertised in properties.");
588619
}

0 commit comments

Comments
 (0)