Releases: mercury-hpc/mercury
mercury 2.4.1
Summary
This new version brings both bug fixes and feature updates to mercury.
New features
- [NA]
- Remove
NA_DEFAULT_PLUGIN_PATHand useNA_PLUGIN_RELATIVE_PATHinstead- Use relative path for NA plugin search
- Calculate relative path at build time and use it at runtime to find the plugin directory
- Remove
- [NA OFI]
- Fix compatibility with libfabric 2.0
- Pass down NA flags for firewall support in prov/tcp
- Indicate if client bulk address is behind firewall by using address deserialization callback functions
- [HG/NA perf]
- Add
-Noption to keep perf server up after client exits - Remove barrier by default in perf loop and add
--barrieras optional option to use barrier again- Add min/max measurements when barrier is not used
- Print only first and last targets when reading config
- Re-organize and clean up printed fields
- Add
-Koption to increment key based on rank (used for testing) - Verify source handle matching on RMA when
-voption is passed
- Add
- [HG Util]
- Add
fatalandinfolog levels - This replaces the previous fatal log subsys, default log level is now
fatal
- Add
Bug fixes
- [HG]
- Ensure that one-way RPCs can overflow
- Use existing ack notifications to ensure send buffer remains available
- Fix handling of multi-recv operations returning NULL buffers and repost multi-recv buffer if released
- Fix possible erroneous refcount when bulk create/transfer fails
- Enable diagnostic counters outside of debug builds
- Enable HG proc overflow when using XDR
- Fix hg_proc_save_ptr() error handling and allocation with XDR
- Multiple proc fixes for XDR encoding
- Ensure that one-way RPCs can overflow
- [HG Core]
- Check for mismatching builds when using checksums
- [HG Core/Bulk]
- Print destination address string in error messages
- [NA]
- Fix plugin scan to continue if one plugin cannot load
- Add
na_contextparameter tocontext_createplugin callback
- [NA OFI]
- Check against
FI_REMOTE_CQ_DATAbefore accessingcq_event->data - Fix case of
FI_MULTI_RECVevent returned without buffer - Fix completion of multi-recv cancelation with prov/cxi
- Only complete in error path when
FI_MULTI_RECVis set - Multi-recv operations may still be used even after an error has occurred
- Only complete in error path when
- Improve logging of canceled events
- Add missing op type from op completed error log
- Fix compile error on older prov/cxi platforms
- Attempt to use
ip_subnetwithFI_SOCKADDR_INformat - Refactor msg_send/msg_recv calls and add debug info
- Fix compilation under FreeBSD
- Disable RNR protocol by default when using prov/cxi
- Prevent the use of
FI_AV_AUTH_KEYwith prov/cxi when number of auth keys is 1 - Fix tx/rx sizes to appropriate values with prov/tcp and prov/cxi
- Add
NA_OFI_TX_SIZE/NA_OFI_RX_SIZEenv vars to manually control sizes
- Add
- Ensure rx message ordering is set to
FI_ORDER_NONE - Improve error and debug logs
- Check against
- [NA UCX]
- Use
ucp_worker_query()instead of deprecateducp_worker_get_address() - Switch to using
ucp_ep_close_nbx() - Rework address EP close to be async and check on address close list during progress
- Ensure address is resolved on RMA
- Queue up pending connection if address exists and reject connection after timeout if no progress is made
- Enable
UCS_LOG_LEVEL_PRINTas info log - Set
UCP_ERR_HANDLING_MODE_PEERfor all endpoint types
- Use
- [NA BMI]
- Do not BMI_initialize() servers with address
0.0.0.0and detect address to use
- Do not BMI_initialize() servers with address
- [HG/NA Perf]
- Fix potential race when re-using exp op ID
- Add spin_flag to prevent from excessively sleeping
- Reduce overhead of hg_poll_wait()
- [HG util]
- Fix global buffer overflow in
hg_log_outlet_activeandhg_log_get_subsys(void) - Fix error return of
hg_mem_pool_extend() - Fix
kqueueimplementation - Ensure parent log is registered first
- Fix rare case where log was not being printed even if environment variables were set
- Fix dlog to use tail queue
- Bump max log buffer size
- Fix global buffer overflow in
- [CMake]
- Fix tirpc to be an external dependency
- Add
MERCURY_LIB_DEBUG_NAME_IS_RELEASEoption to setOUTPUT_NAME_DEBUGtoLIB_RELEASE_NAME
⚠️ Known Issues
- [NA OFI]
- [tcp/verbs;ofi_rxm] Using more than 256 peers requires
FI_UNIVERSE_SIZEto be set.
- [tcp/verbs;ofi_rxm] Using more than 256 peers requires
mercury 2.4.1rc5
Summary
This new version brings both bug fixes and feature updates to mercury.
New features
- [NA]
- Remove
NA_DEFAULT_PLUGIN_PATHand useNA_PLUGIN_RELATIVE_PATHinstead- Use relative path for NA plugin search
- Calculate relative path at build time and use it at runtime to find the plugin directory
- Remove
- [NA OFI]
- Fix compatibility with libfabric 2.0
- Pass down NA flags for firewall support in prov/tcp
- Indicate if client bulk address is behind firewall by using address deserialization callback functions
- [HG/NA perf]
- Add
-Noption to keep perf server up after client exits - Remove barrier by default in perf loop and add
--barrieras optional option to use barrier again- Add min/max measurements when barrier is not used
- Print only first and last targets when reading config
- Re-organize and clean up printed fields
- Add
-Koption to increment key based on rank (used for testing) - Verify source handle matching on RMA when
-voption is passed
- Add
- [HG Util]
- Add
fatalandinfolog levels - This replaces the previous fatal log subsys, default log level is now
fatal
- Add
Bug fixes
- [HG]
- Ensure that one-way RPCs can overflow
- Use existing ack notifications to ensure send buffer remains available
- Fix handling of multi-recv operations returning NULL buffers and repost multi-recv buffer if released
- Fix possible erroneous refcount when bulk create/transfer fails
- Enable diagnostic counters outside of debug builds
- Enable HG proc overflow when using XDR
- Fix hg_proc_save_ptr() error handling and allocation with XDR
- Multiple proc fixes for XDR encoding
- Ensure that one-way RPCs can overflow
- [HG Core]
- Check for mismatching builds when using checksums
- [HG Core/Bulk]
- Print destination address string in error messages
- [NA]
- Fix plugin scan to continue if one plugin cannot load
- Add
na_contextparameter tocontext_createplugin callback
- [NA OFI]
- Check against
FI_REMOTE_CQ_DATAbefore accessingcq_event->data - Fix case of
FI_MULTI_RECVevent returned without buffer - Fix completion of multi-recv cancelation with prov/cxi
- Only complete in error path when
FI_MULTI_RECVis set - Multi-recv operations may still be used even after an error has occurred
- Only complete in error path when
- Improve logging of canceled events
- Add missing op type from op completed error log
- Fix compile error on older prov/cxi platforms
- Attempt to use
ip_subnetwithFI_SOCKADDR_INformat - Refactor msg_send/msg_recv calls and add debug info
- Fix compilation under FreeBSD
- Disable RNR protocol by default when using prov/cxi
- Prevent the use of
FI_AV_AUTH_KEYwith prov/cxi when number of auth keys is 1 - Fix tx/rx sizes to appropriate values with prov/tcp and prov/cxi
- Add
NA_OFI_TX_SIZE/NA_OFI_RX_SIZEenv vars to manually control sizes
- Add
- Ensure rx message ordering is set to
FI_ORDER_NONE - Improve error and debug logs
- Check against
- [NA UCX]
- Use
ucp_worker_query()instead of deprecateducp_worker_get_address() - Switch to using
ucp_ep_close_nbx() - Rework address EP close to be async and check on address close list during progress
- Ensure address is resolved on RMA
- Queue up pending connection if address exists and reject connection after timeout if no progress is made
- Enable
UCS_LOG_LEVEL_PRINTas info log - Set
UCP_ERR_HANDLING_MODE_PEERfor all endpoint types
- Use
- [NA BMI]
- Do not BMI_initialize() servers with address
0.0.0.0and detect address to use
- Do not BMI_initialize() servers with address
- [HG/NA Perf]
- Fix potential race when re-using exp op ID
- Add spin_flag to prevent from excessively sleeping
- Reduce overhead of hg_poll_wait()
- [HG util]
- Fix global buffer overflow in
hg_log_outlet_activeandhg_log_get_subsys(void) - Fix error return of
hg_mem_pool_extend() - Fix
kqueueimplementation - Ensure parent log is registered first
- Fix rare case where log was not being printed even if environment variables were set
- Fix dlog to use tail queue
- Bump max log buffer size
- Fix global buffer overflow in
- [CMake]
- Fix tirpc to be an external dependency
- Add
MERCURY_LIB_DEBUG_NAME_IS_RELEASEoption to setOUTPUT_NAME_DEBUGtoLIB_RELEASE_NAME
⚠️ Known Issues
- [NA OFI]
- [tcp/verbs;ofi_rxm] Using more than 256 peers requires
FI_UNIVERSE_SIZEto be set.
- [tcp/verbs;ofi_rxm] Using more than 256 peers requires
mercury 2.4.1rc4
Summary
This new version brings both bug fixes and feature updates to mercury.
New features
- [NA]
- Remove
NA_DEFAULT_PLUGIN_PATHand useNA_PLUGIN_RELATIVE_PATHinstead- Use relative path for NA plugin search
- Calculate relative path at build time and use it at runtime to find the plugin directory
- Remove
- [NA OFI]
- Fix compatibility with libfabric 2.0
- Pass down NA flags for firewall support in prov/tcp
- Indicate if client bulk address is behind firewall by using address deserialization callback functions
- [HG/NA perf]
- Add
-Noption to keep perf server up after client exits - Remove barrier by default in perf loop and add
--barrieras optional option to use barrier again- Add min/max measurements when barrier is not used
- Print only first and last targets when reading config
- Re-organize and clean up printed fields
- Add
-Koption to increment key based on rank (used for testing)
- Add
- [HG Util]
- Add
fatalandinfolog levels - This replaces the previous fatal log subsys, default log level is now
fatal
- Add
Bug fixes
- [HG]
- Ensure that one-way RPCs can overflow
- Use existing ack notifications to ensure send buffer remains available
- Fix handling of multi-recv operations returning NULL buffers and repost multi-recv buffer if released
- Fix possible erroneous refcount when bulk create/transfer fails
- Enable diagnostic counters outside of debug builds
- Enable HG proc overflow when using XDR
- Fix hg_proc_save_ptr() error handling and allocation with XDR
- Multiple proc fixes for XDR encoding
- Ensure that one-way RPCs can overflow
- [HG Core]
- Check for mismatching builds when using checksums
- [HG Core/Bulk]
- Print destination address string in error messages
- [NA]
- Fix plugin scan to continue if one plugin cannot load
- Add
na_contextparameter tocontext_createplugin callback
- [NA OFI]
- Check against
FI_REMOTE_CQ_DATAbefore accessingcq_event->data - Fix case of
FI_MULTI_RECVevent returned without buffer - Fix completion of multi-recv cancelation with prov/cxi
- Only complete in error path when
FI_MULTI_RECVis set - Multi-recv operations may still be used even after an error has occurred
- Only complete in error path when
- Improve logging of canceled events
- Add missing op type from op completed error log
- Fix compile error on older prov/cxi platforms
- Attempt to use
ip_subnetwithFI_SOCKADDR_INformat - Refactor msg_send/msg_recv calls and add debug info
- Fix compilation under FreeBSD
- Disable RNR protocol by default when using prov/cxi
- Prevent the use of FI_AV_AUTH_KEY with prov/cxi when number of auth keys is 1
- Check against
- [NA UCX]
- Use
ucp_worker_query()instead of deprecateducp_worker_get_address() - Switch to using
ucp_ep_close_nbx() - Rework address EP close to be async and check on address close list during progress
- Ensure address is resolved on RMA
- Queue up pending connection if address exists and reject connection after timeout if no progress is made
- Use
- [NA BMI]
- Do not BMI_initialize() servers with address
0.0.0.0and detect address to use
- Do not BMI_initialize() servers with address
- [HG/NA Perf]
- Fix potential race when re-using exp op ID
- Add spin_flag to prevent from excessively sleeping
- Reduce overhead of hg_poll_wait()
- [HG util]
- Fix global buffer overflow in
hg_log_outlet_activeandhg_log_get_subsys(void) - Fix error return of
hg_mem_pool_extend() - Fix
kqueueimplementation - Ensure parent log is registered first
- Fix rare case where log was not being printed even if environment variables were set
- Fix global buffer overflow in
- [CMake]
- Fix tirpc to be an external dependency
⚠️ Known Issues
- [NA OFI]
- [tcp/verbs;ofi_rxm] Using more than 256 peers requires
FI_UNIVERSE_SIZEto be set.
- [tcp/verbs;ofi_rxm] Using more than 256 peers requires
mercury 2.4.1rc3
Summary
This new version brings both bug fixes and feature updates to mercury.
New features
- [NA]
- Remove
NA_DEFAULT_PLUGIN_PATHand useNA_PLUGIN_RELATIVE_PATHinstead- Use relative path for NA plugin search
- Calculate relative path at build time and use it at runtime to find the plugin directory
- Remove
- [NA OFI]
- Fix compatibility with libfabric 2.0
- Pass down NA flags for firewall support in prov/tcp
- Indicate if client bulk address is behind firewall by using address deserialization callback functions
- [HG/NA perf]
- Add
-Noption to keep perf server up after client exits - Remove barrier by default in perf loop and add
--barrieras optional option to use barrier again- Add min/max measurements when barrier is not used
- Print only first and last targets when reading config
- Re-organize and clean up printed fields
- Add
-Koption to increment key based on rank (used for testing)
- Add
- [HG Util]
- Add
fatalandinfolog levels - This replaces the previous fatal log subsys, default log level is now
fatal
- Add
Bug fixes
- [HG]
- Ensure that one-way RPCs can overflow
- Use existing ack notifications to ensure send buffer remains available
- Fix handling of multi-recv operations returning NULL buffers and repost multi-recv buffer if released
- Fix possible erroneous refcount when bulk create/transfer fails
- Enable diagnostic counters outside of debug builds
- Enable HG proc overflow when using XDR
- Fix hg_proc_save_ptr() error handling and allocation with XDR
- Multiple proc fixes for XDR encoding
- Ensure that one-way RPCs can overflow
- [HG Core]
- Check for mismatching builds when using checksums
- [HG Core/Bulk]
- Print destination address string in error messages
- [NA]
- Fix plugin scan to continue if one plugin cannot load
- Add
na_contextparameter tocontext_createplugin callback
- [NA OFI]
- Check against
FI_REMOTE_CQ_DATAbefore accessingcq_event->data - Fix case of
FI_MULTI_RECVevent returned without buffer - Fix completion of multi-recv cancelation with prov/cxi
- Only complete in error path when
FI_MULTI_RECVis set - Multi-recv operations may still be used even after an error has occurred
- Only complete in error path when
- Improve logging of canceled events
- Add missing op type from op completed error log
- Fix compile error on older prov/cxi platforms
- Attempt to use
ip_subnetwithFI_SOCKADDR_INformat - Refactor msg_send/msg_recv calls and add debug info
- Fix compilation under FreeBSD
- Check against
- [NA UCX]
- Use
ucp_worker_query()instead of deprecateducp_worker_get_address() - Switch to using
ucp_ep_close_nbx() - Rework address EP close to be async and check on address close list during progress
- Ensure address is resolved on RMA
- Queue up pending connection if address exists and reject connection after timeout if no progress is made
- Use
- [NA BMI]
- Do not BMI_initialize() servers with address
0.0.0.0and detect address to use
- Do not BMI_initialize() servers with address
- [HG/NA Perf]
- Fix potential race when re-using exp op ID
- Add spin_flag to prevent from excessively sleeping
- Reduce overhead of hg_poll_wait()
- [HG util]
- Fix global buffer overflow in
hg_log_outlet_activeandhg_log_get_subsys(void) - Fix error return of
hg_mem_pool_extend() - Fix
kqueueimplementation - Ensure parent log is registered first
- Fix rare case where log was not being printed even if environment variables were set
- Fix global buffer overflow in
- [CMake]
- Fix tirpc to be an external dependency
⚠️ Known Issues
- [NA OFI]
- [tcp/verbs;ofi_rxm] Using more than 256 peers requires
FI_UNIVERSE_SIZEto be set.
- [tcp/verbs;ofi_rxm] Using more than 256 peers requires
mercury 2.4.1rc2
Summary
This new version brings both bug fixes and feature updates to mercury.
New features
- [NA]
- Remove
NA_DEFAULT_PLUGIN_PATHand useNA_PLUGIN_RELATIVE_PATHinstead- Use relative path for NA plugin search
- Calculate relative path at build time and use it at runtime to find the plugin directory
- Remove
- [NA OFI]
- Fix compatibility with libfabric 2.0
- Pass down NA flags for firewall support in prov/tcp
- Indicate if client bulk address is behind firewall by using address deserialization callback functions
- [HG/NA perf]
- Add
-Noption to keep perf server up after client exits - Remove barrier by default in perf loop and add
--barrieras optional option to use barrier again- Add min/max measurements when barrier is not used
- Print only first and last targets when reading config
- Re-organize and clean up printed fields
- Add
-Koption to increment key based on rank (used for testing)
- Add
- [HG Util]
- Add
fatalandinfolog levels - This replaces the previous fatal log subsys, default log level is now
fatal
- Add
Bug fixes
- [HG]
- Ensure that one-way RPCs can overflow
- Use existing ack notifications to ensure send buffer remains available
- Fix handling of multi-recv operations returning NULL buffers and repost multi-recv buffer if released
- Fix possible erroneous refcount when bulk create/transfer fails
- Enable diagnostic counters outside of debug builds
- Enable HG proc overflow when using XDR
- Fix hg_proc_save_ptr() error handling and allocation with XDR
- Multiple proc fixes for XDR encoding
- Ensure that one-way RPCs can overflow
- [HG Core]
- Check for mismatching builds when using checksums
- [HG Core/Bulk]
- Print destination address string in error messages
- [NA]
- Fix plugin scan to continue if one plugin cannot load
- Add
na_contextparameter tocontext_createplugin callback
- [NA OFI]
- Check against
FI_REMOTE_CQ_DATAbefore accessingcq_event->data - Fix case of
FI_MULTI_RECVevent returned without buffer - Fix completion of multi-recv cancelation with prov/cxi
- Only complete in error path when
FI_MULTI_RECVis set - Multi-recv operations may still be used even after an error has occurred
- Only complete in error path when
- Improve logging of canceled events
- Add missing op type from op completed error log
- Fix compile error on older prov/cxi platforms
- Attempt to use
ip_subnetwithFI_SOCKADDR_INformat - Refactor msg_send/msg_recv calls and add debug info
- Fix compilation under FreeBSD
- Check against
- [NA UCX]
- Use
ucp_worker_query()instead of deprecateducp_worker_get_address() - Switch to using
ucp_ep_close_nbx() - Rework address EP close to be async and check on address close list during progress
- Ensure address is resolved on RMA
- Queue up pending connection if address exists and reject connection after timeout if no progress is made
- Use
- [NA BMI]
- Do not BMI_initialize() servers with address
0.0.0.0and detect address to use
- Do not BMI_initialize() servers with address
- [HG/NA Perf]
- Fix potential race when re-using exp op ID
- Add spin_flag to prevent from excessively sleeping
- Reduce overhead of hg_poll_wait()
- [HG util]
- Fix global buffer overflow in
hg_log_outlet_activeandhg_log_get_subsys(void) - Fix error return of
hg_mem_pool_extend() - Fix
kqueueimplementation
- Fix global buffer overflow in
- [CMake]
- Fix tirpc to be an external dependency
⚠️ Known Issues
- [NA OFI]
- [tcp/verbs;ofi_rxm] Using more than 256 peers requires
FI_UNIVERSE_SIZEto be set.
- [tcp/verbs;ofi_rxm] Using more than 256 peers requires
mercury 2.4.1rc1
Summary
This new version brings both bug fixes and feature updates to mercury.
New features
- [NA]
- Remove
NA_DEFAULT_PLUGIN_PATHand useNA_PLUGIN_RELATIVE_PATHinstead- Use relative path for NA plugin search
- Calculate relative path at build time and use it at runtime to find the plugin directory
- Remove
- [NA OFI]
- Fix compatibility with libfabric 2.0
- Pass down NA flags for firewall support in prov/tcp
- Indicate if client bulk address is behind firewall by using address deserialization callback functions
- [HG/NA perf]
- Add
-Noption to keep perf server up after client exits - Remove barrier by default in perf loop and add
--barrieras optional option to use barrier again- Add min/max measurements when barrier is not used
- Print only first and last targets when reading config
- Re-organize and clean up printed fields
- Add
- [HG Util]
- Add
fatalandinfolog levels - This replaces the previous fatal log subsys, default log level is now
fatal
- Add
Bug fixes
- [HG]
- Ensure that one-way RPCs can overflow
- Use existing ack notifications to ensure send buffer remains available
- Fix handling of multi-recv operations returning NULL buffers and repost multi-recv buffer if released
- Fix possible erroneous refcount when bulk create/transfer fails
- Enable diagnostic counters outside of debug builds
- Enable HG proc overflow when using XDR
- Fix hg_proc_save_ptr() error handling and allocation with XDR
- Multiple proc fixes for XDR encoding
- Ensure that one-way RPCs can overflow
- [NA]
- Fix plugin scan to continue if one plugin cannot load
- [NA OFI]
- Check against
FI_REMOTE_CQ_DATAbefore accessingcq_event->data - Fix case of
FI_MULTI_RECVevent returned without buffer - Fix completion of multi-recv cancelation with prov/cxi
- Only complete in error path when
FI_MULTI_RECVis set - Multi-recv operations may still be used even after an error has occurred
- Only complete in error path when
- Improve logging of canceled events
- Add missing op type from op completed error log
- Fix compile error on older prov/cxi platforms
- Attempt to use
ip_subnetwithFI_SOCKADDR_INformat
- Check against
- [NA BMI]
- Do not BMI_initialize() servers with address
0.0.0.0and detect address to use
- Do not BMI_initialize() servers with address
- [HG/NA Perf]
- Fix potential race when re-using exp op ID
- Add spin_flag to prevent from excessively sleeping
- Reduce overhead of hg_poll_wait()
- [HG util]
- Fix global buffer overflow in
hg_log_outlet_active - Fix error return of
hg_mem_pool_extend()
- Fix global buffer overflow in
- [CMake]
- Fix tirpc to be an external dependency
⚠️ Known Issues
- [NA OFI]
- [tcp/verbs;ofi_rxm] Using more than 256 peers requires
FI_UNIVERSE_SIZEto be set.
- [tcp/verbs;ofi_rxm] Using more than 256 peers requires
mercury 2.4.0
Summary
This new version brings both bug fixes and feature updates to mercury. Notable are the addition of a new progress mechanism, new initialization parameters for the handling of multi-recv buffers and the support of cxi with HPE SHS 11.0.
New features
- [HG]
- Add
HG_Get_input_payload_size()/HG_Get_output_payload_size()- Add the ability to query input / output payload sizes
- Add
HG_Diag_dump_counters()to dump diagnostic counters- Add
rpc_req_recv_active_countandrpc_multi_recv_copy_countcounters
- Add
- Add
HG_Class_get_counters()to retrieve internal counters - Add
multi_recv_copy_thresholdinit parameter- Use this new parameter to fallback to memcpy to prevent starvation of multi-recv buffers
- Add
multi_recv_op_maxinit parameter- This allows users to control number of multi-recv buffers posted (libfabric plugin only)
- Add
no_overflowinit option to prevent use of overflow buffers - Improve multi-recv buffer warning messages
- Associate handle to HG proc
hg_proc_get_handle()can be used to retrieve handle within proc functions
- Add
HG_Event_get_wait_fd()to retrieve internal wait object - Add
HG_Event_ready()/HG_Event_progress()/HG_Event_trigger()to support wait fd progress model- Simplify progress mechanism and remove use of internal timers
- Always make NA progress when
HG_Event_progress()is called - Update HG progress to use new NA progress routines
- Add missing
HG_WARN_UNUSED_RESULTto HG calls - Switch to using standard types and align with NA
- Keep some
uint8_tinstances instead ofhg_bool_tfor ABI compatibility
- Keep some
- Add
HG_IO_ERRORreturn code
- Add
- [NA]
- Bump NA version to v5.0.0
- Add
NA_Poll()andNA_Poll_wait()routines - Deprecate
NA_Progress()in favor of poll routines - Add
NA_Context_get_completion_count()to retrieve size of completion queue - Update plugins to use new
pollandpoll_waitcallbackspoll_waitplugin callback remains for compatibility
- Fix documentation of
NA_Poll_get_fd() - Add missing
NA_WARN_UNUSED_RESULTqualifiers - Remove deprecated CCI plugin
- Return last known error when plugin loading fails
- Add init info version compatibility wrappers
- Add support for
traffic_classinit info (only supported by ofi plugin) - Add
NA_IO_ERRORreturn code for generic I/O errors- Update OFI and UCX plugins to use new code
- [NA OFI]
- Support use of cxi provider with SHS 11.0
- Add support for
FI_AV_AUTH_KEY(requires libfabric >= 1.20)- Add runtime check for cxi provider version
- Setting multiple auth keys disables
FI_DIRECTED_RECV - Separate opening of AV and auth key insertion
- Parse auth key range when
FI_AV_AUTH_KEYis available - Encode/decode auth key when serializing addrs
- Add support for
FI_AV_USER_ID - Always use
FI_SOURCEandFI_SOURCE_ERRwhen both are supported- Clean up handling of
FI_SOURCE_ERR - Remove support of
FI_SOURCEw/oFI_SOURCE_ERR
- Clean up handling of
- Add support for new CXI address format
- Attempt to distribute multi-NIC domains based on selected CPU ID
- Support selection of traffic classes (single class per NA class)
- Add support for
FI_PROTO_CXI_RNR - Add
NA_OFI_SKIP_DOMAIN_OPSenv variable to skip cxi domain ops - Remove unused
NA_OFI_DOM_SHAREDflag
- [NA UCX]
- Add
ucxlog outlet and redirect UCX log- Use default HG log level if
UCX_LOG_LEVELis not set
- Use default HG log level if
- Add
- [HG/NA perf]
- Add
hg_firstperf test to measure cost of initial RPC - Add
-uoption to control number of multi-recv ops (server only) - Add
-ioption to control number of handles posted (server only) - Add
-f/--hostfileoption to select hostfile to write to / read from - Add
-T/--tclassoption to select trafic class - Autodetect MPI implementation in perf utilities
- MPI can now be autodetected and dynamically loaded in utilities, even if
MERCURY_TESTING_ENABLE_PARALLELwas turned off. IfMERCURY_TESTING_ENABLE_PARALLELis turned on, tests remain manually linked against MPI as they used to be.
- MPI can now be autodetected and dynamically loaded in utilities, even if
- Print registration and deregistration times when
-Roption is used - Update to use new HG/NA progress routines and remove use of
hg_request - Support forced registration in
hg_bw_read/hg_bw_write
- Add
- [HG Util]
- Add
hg_log_vwrite()to write log fromva_list - Add
hg_log_level_to_string() - Clean up
mercury_eventcode and addconstqualifier tohg_poll_get_fd() - Add
conston atomic gets - Switch to using
sys/queue.hdirectly - Remove
HG_QUEUEandHG_LISTdefinitions - Add
hg_dl_error()to return last error
- Add
Bug fixes
- [HG]
- Fix shared-memory path that was previously disabled in conjunction with libfabric transports that use the multi-recv capability
- Fix behavior of
request_post_incrinit parameterrequest_post_incrcannot be disabled (set to -1) with multi-recv
- [HG/NA]
- HG NA init info is fixed to v4.0 for now and duplicates tclass info
- [NA]
- Fix missing free of dynamic plugin entries
- [NA BMI/MPI]
- Return actual msg size through cb info
- [NA OFI]
- Fix cxi domain ops settings and disable
PROV_KEY_CACHE - Fix shm provider flags
- Remove excessive MR count warning message
- Fix cxi domain ops settings and disable
- [NA UCX]
- Fix
hg_infonot filtering protocol- Allow
na_ucx_get_protocol_info()to resolve ucx tl name aliases
- Allow
- Fix context thread mode to default to
UCS_THREAD_MODE_MULTI
- Fix
- [HG/NA Perf]
- Ensure NA perf tests wait on send completion
- Fix bulk permission flag in
hg_bw_read - Add some missing error checks in mercury_perf
- [HG util]
- Multiple logging fixes:
- Fix
dlog_freenot called when parent/child have separate dlogs - Fix mercury log to correctly generate outlet names
- Fix log outlets to use prefixed subsys name
- Fix use of macros in debug log
- Use destructor to free log outlets
- Fix
- Add missing prototype to
hg_atomic_fence()definition
- Multiple logging fixes:
- [CMake]
- Fix cmake_minimum_required() warning
- Update kwsys and mchecksum dependencies
⚠️ Known Issues
- [NA OFI]
- [tcp/verbs;ofi_rxm] Using more than 256 peers requires
FI_UNIVERSE_SIZEto be set.
- [tcp/verbs;ofi_rxm] Using more than 256 peers requires
mercury 2.4.0rc5
Summary
This is a preview release of the v2.4.0 release.
New features
Added in rc5
- [HG]
- Add
HG_Get_input_payload_size()/HG_Get_output_payload_size()- Add the ability to query input / output payload sizes
- Add
HG_Diag_dump_counters()to dump diagnostic counters- Add
rpc_req_recv_active_countandrpc_multi_recv_copy_countcounters
- Add
- Add
HG_Class_get_counters()to retrieve internal counters
- Add
Added in rc4
- [HG]
- Add
multi_recv_copy_thresholdinit parameter- Use this new parameter to fallback to memcpy to prevent starvation of multi-recv buffers
- Associate handle to HG proc
hg_proc_get_handle()can be used to retrieve handle within proc functions
- Add
Added in rc3
- [HG]
- Add
multi_recv_op_maxinit parameter- This allows users to control number of multi-recv buffers posted (libfabric plugin only)
- Add
no_overflowinit option to prevent use of overflow buffers - Improve multi-recv buffer warning messages
- Add
HG_Event_get_wait_fd()to retrieve internal wait object - Add
HG_Event_ready()/HG_Event_progress()/HG_Event_trigger()to support wait fd progress model- Simplify progress mechanism and remove use of internal timers
- Always make NA progress when
HG_Event_progress()is called - Update HG progress to use new NA progress routines
- Add missing
HG_WARN_UNUSED_RESULTto HG calls - Switch to using standard types and align with NA
- Keep some
uint8_tinstances instead ofhg_bool_tfor ABI compatibility
- Keep some
- Add
- [NA]
- Add
NA_Poll()andNA_Poll_wait()routines - Deprecate
NA_Progress()in favor of poll routines - Add
NA_Context_get_completion_count()to retrieve size of completion queue - Update plugins to use new
pollandpoll_waitcallbackspoll_waitplugin callback remains for compatibility
- Fix documentation of
NA_Poll_get_fd() - Add missing
NA_WARN_UNUSED_RESULTqualifiers - Bump NA version to 5.0.0
- Remove deprecated CCI plugin
- Return last known error when plugin loading fails
- Add
- [NA OFI]
- Remove unused
NA_OFI_DOM_SHAREDflag - Always use
FI_SOURCEandFI_SOURCE_ERRwhen both are supported
- Remove unused
- [NA UCX]
- Add
ucxlog outlet and redirect UCX log- Use default HG log level if
UCX_LOG_LEVELis not set
- Use default HG log level if
- Add
- [HG Util]
- Add
hg_log_vwrite()to write log fromva_list - Add
hg_log_level_to_string() - Clean up
mercury_eventcode and addconstqualifier tohg_poll_get_fd() - Add
conston atomic gets - Switch to using
sys/queue.hdirectly - Remove
HG_QUEUEandHG_LISTdefinitions - Add
hg_dl_error()to return last error
- Add
- [HG/NA Perf Test]
- Add
-uoption to control number of multi-recv ops (server only) - Add
-ioption to control number of handles posted (server only) - Update to use new HG/NA progress routines and remove use of
hg_request
- Add
Added in rc2
- [NA OFI]
- Add support for
FI_AV_AUTH_KEY(requires libfabric >= 1.20)- Add runtime check for cxi provider version
- Setting multiple auth keys disables
FI_DIRECTED_RECV - Separate opening of AV and auth key insertion
- Parse auth key range when
FI_AV_AUTH_KEYis available - Encode/decode auth key when serializing addrs
- Add support for
FI_AV_USER_ID - Clean up handling of
FI_SOURCE_ERR - Remove support of
FI_SOURCEw/oFI_SOURCE_ERR - Add support for new CXI address format
- Add support for
Added in rc1
- [NA]
- Add init info version compatibility wrappers
- Bump NA version to v4.1.0
- Add support for
traffic_classinit info (only supported by ofi plugin)
- [NA OFI]
- Attempt to distribute multi-NIC domains based on selected CPU ID
- Support selection of traffic classes (single class per NA class)
- [HG/NA Perf Test]
- Add
-f/--hostfileoption to select hostfile to write to / read from - Add
-T/--tclassoption to select trafic class - Autodetect MPI implementation in perf utilities
- MPI can now be autodetected and dynamically loaded in utilities, even if
MERCURY_TESTING_ENABLE_PARALLELwas turned off. IfMERCURY_TESTING_ENABLE_PARALLELis turned on, tests remain manually linked against MPI as they used to be.
- MPI can now be autodetected and dynamically loaded in utilities, even if
- Add
Bug fixes
Added in rc5
- [HG]
- Make
HG_Core_event_ready()non-inline to fix NA dependency and removeHG_Core_event_ready_loopback()from public API - Fix NA init info not correctly set from HG
- Make
- [NA BMI/MPI]
- Return actual msg size through cb info
Added in rc4
- [HG]
- Fix couple of type changes introduced in rc1 that could have broken ABI
- Fix shared-memory path that was previously disabled in conjunction with libfabric transports that use the multi-recv capability
- [HG util]
- Fix
dlog_freenot called when parent/child have separate dlogs
- Fix
- [HG/NA]
- Fix init info changes made in previous rcs to prevent ABI breakage
- HG NA init info is fixed to v4.0 for now and duplicates tclass info
Added in rc3
- [HG]
- Fix behavior of
request_post_incrinit parameterrequest_post_incrcannot be disabled (set to -1) with multi-recv
- Fix behavior of
- [HG Util]
- Fix mercury log to correctly generate outlet names
- Fix log outlets to use prefixed subsys name
- Fix use of macros in debug log
- [CMake]
- Fix cmake_minimum_required() warning
- Update kwsys and mchecksum dependencies
Added in rc2
- [HG Util]
- Use destructor to free log outlets
- [NA]
- Fix missing free of dynamic plugin entries
- [NA UCX]
- Fix
hg_infonot filtering protocol- Allow
na_ucx_get_protocol_info()to resolve ucx tl name aliases
- Allow
- Fix
- [NA OFI]
- Fix shm provider flags
- [NA Test]
- Remove could not find MPI message
Added in rc1
- [HG Util]
- Add missing prototype to
hg_atomic_fence()definition
- Add missing prototype to
- [NA OFI]
- Remove excessive MR count warning message
- [NA Perf]
- Ensure perf tests wait on send completion
⚠️ Known Issues
- [NA OFI]
- [tcp/verbs;ofi_rxm] Using more than 256 peers requires
FI_UNIVERSE_SIZEto be set.
- [tcp/verbs;ofi_rxm] Using more than 256 peers requires
mercury 2.4.0rc4
Summary
This is a preview release of the v2.4.0 release.
New features
Added in rc4
- [HG]
- Add
multi_recv_copy_thresholdinit parameter- Use this new parameter to fallback to memcpy to prevent starvation of multi-recv buffers
- Associate handle to HG proc
hg_proc_get_handle()can be used to retrieve handle within proc functions
- Add
Added in rc3
- [HG]
- Add
multi_recv_op_maxinit parameter- This allows users to control number of multi-recv buffers posted (libfabric plugin only)
- Add
no_overflowinit option to prevent use of overflow buffers - Improve multi-recv buffer warning messages
- Add
HG_Event_get_wait_fd()to retrieve internal wait object - Add
HG_Event_ready()/HG_Event_progress()/HG_Event_trigger()to support wait fd progress model- Simplify progress mechanism and remove use of internal timers
- Always make NA progress when
HG_Event_progress()is called - Update HG progress to use new NA progress routines
- Add missing
HG_WARN_UNUSED_RESULTto HG calls - Switch to using standard types and align with NA
- Keep some
uint8_tinstances instead ofhg_bool_tfor ABI compatibility
- Keep some
- Add
- [NA]
- Add
NA_Poll()andNA_Poll_wait()routines - Deprecate
NA_Progress()in favor of poll routines - Add
NA_Context_get_completion_count()to retrieve size of completion queue - Update plugins to use new
pollandpoll_waitcallbackspoll_waitplugin callback remains for compatibility
- Fix documentation of
NA_Poll_get_fd() - Add missing
NA_WARN_UNUSED_RESULTqualifiers - Bump NA version to 5.0.0
- Remove deprecated CCI plugin
- Return last known error when plugin loading fails
- Add
- [NA OFI]
- Remove unused
NA_OFI_DOM_SHAREDflag - Always use
FI_SOURCEandFI_SOURCE_ERRwhen both are supported
- Remove unused
- [NA UCX]
- Add
ucxlog outlet and redirect UCX log- Use default HG log level if
UCX_LOG_LEVELis not set
- Use default HG log level if
- Add
- [HG Util]
- Add
hg_log_vwrite()to write log fromva_list - Add
hg_log_level_to_string() - Clean up
mercury_eventcode and addconstqualifier tohg_poll_get_fd() - Add
conston atomic gets - Switch to using
sys/queue.hdirectly - Remove
HG_QUEUEandHG_LISTdefinitions - Add
hg_dl_error()to return last error
- Add
- [HG/NA Perf Test]
- Add
-uoption to control number of multi-recv ops (server only) - Add
-ioption to control number of handles posted (server only) - Update to use new HG/NA progress routines and remove use of
hg_request
- Add
Added in rc2
- [NA OFI]
- Add support for
FI_AV_AUTH_KEY(requires libfabric >= 1.20)- Add runtime check for cxi provider version
- Setting multiple auth keys disables
FI_DIRECTED_RECV - Separate opening of AV and auth key insertion
- Parse auth key range when
FI_AV_AUTH_KEYis available - Encode/decode auth key when serializing addrs
- Add support for
FI_AV_USER_ID - Clean up handling of
FI_SOURCE_ERR - Remove support of
FI_SOURCEw/oFI_SOURCE_ERR - Add support for new CXI address format
- Add support for
Added in rc1
- [NA]
- Add init info version compatibility wrappers
- Bump NA version to v4.1.0
- Add support for
traffic_classinit info (only supported by ofi plugin)
- [NA OFI]
- Attempt to distribute multi-NIC domains based on selected CPU ID
- Support selection of traffic classes (single class per NA class)
- [HG/NA Perf Test]
- Add
-f/--hostfileoption to select hostfile to write to / read from - Add
-T/--tclassoption to select trafic class - Autodetect MPI implementation in perf utilities
- MPI can now be autodetected and dynamically loaded in utilities, even if
MERCURY_TESTING_ENABLE_PARALLELwas turned off. IfMERCURY_TESTING_ENABLE_PARALLELis turned on, tests remain manually linked against MPI as they used to be.
- MPI can now be autodetected and dynamically loaded in utilities, even if
- Add
Bug fixes
Added in rc4
- [HG]
- Fix couple of type changes introduced in rc1 that could have broken ABI
- Fix shared-memory path that was previously disabled in conjunction with libfabric transports that use the multi-recv capability
- [HG util]
- Fix
dlog_freenot called when parent/child have separate dlogs
- Fix
- [HG/NA]
- Fix init info changes made in previous rcs to prevent ABI breakage
- HG NA init info is fixed to v4.0 for now and duplicates tclass info
Added in rc3
- [HG]
- Fix behavior of
request_post_incrinit parameterrequest_post_incrcannot be disabled (set to -1) with multi-recv
- Fix behavior of
- [HG Util]
- Fix mercury log to correctly generate outlet names
- Fix log outlets to use prefixed subsys name
- Fix use of macros in debug log
- [CMake]
- Fix cmake_minimum_required() warning
- Update kwsys and mchecksum dependencies
Added in rc2
- [HG Util]
- Use destructor to free log outlets
- [NA]
- Fix missing free of dynamic plugin entries
- [NA UCX]
- Fix
hg_infonot filtering protocol- Allow
na_ucx_get_protocol_info()to resolve ucx tl name aliases
- Allow
- Fix
- [NA OFI]
- Fix shm provider flags
- [NA Test]
- Remove could not find MPI message
Added in rc1
- [HG Util]
- Add missing prototype to
hg_atomic_fence()definition
- Add missing prototype to
- [NA OFI]
- Remove excessive MR count warning message
- [NA Perf]
- Ensure perf tests wait on send completion
⚠️ Known Issues
- [NA OFI]
- [tcp/verbs;ofi_rxm] Using more than 256 peers requires
FI_UNIVERSE_SIZEto be set.
- [tcp/verbs;ofi_rxm] Using more than 256 peers requires
mercury 2.4.0rc3
Summary
This is a preview release of the v2.4.0 release.
New features
Added in rc3
- [HG]
- Add
multi_recv_op_maxinit parameter- This allows users to control number of multi-recv buffers posted (libfabric plugin only)
- Add
no_overflowinit option to prevent use of overflow buffers - Improve multi-recv buffer warning messages
- Add
HG_Event_get_wait_fd()to retrieve internal wait object - Add
HG_Event_ready()/HG_Event_progress()/HG_Event_trigger()to support wait fd progress model- Simplify progress mechanism and remove use of internal timers
- Always make NA progress when
HG_Event_progress()is called - Update HG progress to use new NA progress routines
- Add missing
HG_WARN_UNUSED_RESULTto HG calls - Switch to using standard types and align with NA
- Keep some
uint8_tinstances instead ofhg_bool_tfor ABI compatibility
- Keep some
- Add
- [NA]
- Add
NA_Poll()andNA_Poll_wait()routines - Deprecate
NA_Progress()in favor of poll routines - Add
NA_Context_get_completion_count()to retrieve size of completion queue - Update plugins to use new
pollandpoll_waitcallbackspoll_waitplugin callback remains for compatibility
- Fix documentation of
NA_Poll_get_fd() - Add missing
NA_WARN_UNUSED_RESULTqualifiers - Bump NA version to 5.0.0
- Remove deprecated CCI plugin
- Return last known error when plugin loading fails
- Add
- [NA OFI]
- Remove unused
NA_OFI_DOM_SHAREDflag - Always use
FI_SOURCEandFI_SOURCE_ERRwhen both are supported
- Remove unused
- [NA UCX]
- Add
ucxlog outlet and redirect UCX log- Use default HG log level if
UCX_LOG_LEVELis not set
- Use default HG log level if
- Add
- [HG Util]
- Add
hg_log_vwrite()to write log fromva_list - Add
hg_log_level_to_string() - Clean up
mercury_eventcode and addconstqualifier tohg_poll_get_fd() - Add
conston atomic gets - Switch to using
sys/queue.hdirectly - Remove
HG_QUEUEandHG_LISTdefinitions - Add
hg_dl_error()to return last error
- Add
- [HG/NA Perf Test]
- Add
-uoption to control number of multi-recv ops (server only) - Add
-ioption to control number of handles posted (server only) - Update to use new HG/NA progress routines and remove use of
hg_request
- Add
Added in rc2
- [NA OFI]
- Add support for
FI_AV_AUTH_KEY(requires libfabric >= 1.20)- Add runtime check for cxi provider version
- Setting multiple auth keys disables
FI_DIRECTED_RECV - Separate opening of AV and auth key insertion
- Parse auth key range when
FI_AV_AUTH_KEYis available - Encode/decode auth key when serializing addrs
- Add support for
FI_AV_USER_ID - Clean up handling of
FI_SOURCE_ERR - Remove support of
FI_SOURCEw/oFI_SOURCE_ERR - Add support for new CXI address format
- Add support for
Added in rc1
- [NA]
- Add init info version compatibility wrappers
- Bump NA version to v4.1.0
- Add support for
traffic_classinit info (only supported by ofi plugin)
- [NA OFI]
- Attempt to distribute multi-NIC domains based on selected CPU ID
- Support selection of traffic classes (single class per NA class)
- [HG/NA Perf Test]
- Add
-f/--hostfileoption to select hostfile to write to / read from - Add
-T/--tclassoption to select trafic class - Autodetect MPI implementation in perf utilities
- MPI can now be autodetected and dynamically loaded in utilities, even if
MERCURY_TESTING_ENABLE_PARALLELwas turned off. IfMERCURY_TESTING_ENABLE_PARALLELis turned on, tests remain manually linked against MPI as they used to be.
- MPI can now be autodetected and dynamically loaded in utilities, even if
- Add
Bug fixes
Added in rc3
- [HG]
- Fix behavior of
request_post_incrinit parameterrequest_post_incrcannot be disabled (set to -1) with multi-recv
- Fix behavior of
- [HG Util]
- Fix mercury log to correctly generate outlet names
- Fix log outlets to use prefixed subsys name
- Fix use of macros in debug log
- [CMake]
- Fix cmake_minimum_required() warning
- Update kwsys and mchecksum dependencies
Added in rc2
- [HG Util]
- Use destructor to free log outlets
- [NA]
- Fix missing free of dynamic plugin entries
- [NA UCX]
- Fix
hg_infonot filtering protocol- Allow
na_ucx_get_protocol_info()to resolve ucx tl name aliases
- Allow
- Fix
- [NA OFI]
- Fix shm provider flags
- [NA Test]
- Remove could not find MPI message
Added in rc1
- [HG Util]
- Add missing prototype to
hg_atomic_fence()definition
- Add missing prototype to
- [NA OFI]
- Remove excessive MR count warning message
- [NA Perf]
- Ensure perf tests wait on send completion
⚠️ Known Issues
- [NA OFI]
- [tcp/verbs;ofi_rxm] Using more than 256 peers requires
FI_UNIVERSE_SIZEto be set.
- [tcp/verbs;ofi_rxm] Using more than 256 peers requires