|
39 | 39 | <li> \ref miscobjs
|
40 | 40 | <li> \ref attributes
|
41 | 41 | <li> \ref topoattrs
|
| 42 | + <li> \ref heteromem |
42 | 43 | <li> \ref xml
|
43 | 44 | <li> \ref synthetic
|
44 | 45 | <li> \ref interoperability
|
@@ -1117,6 +1118,31 @@ following environment variables.
|
1117 | 1118 | Bit 2 enables the use of target/initiator information.
|
1118 | 1119 | </dd>
|
1119 | 1120 |
|
| 1121 | +<dt>HWLOC_MEMTIERS_GUESS=none</dt> |
| 1122 | +<dt>HWLOC_MEMTIERS_GUESS=all</dt> |
| 1123 | + <dd>Disable or enable all heuristics to guess memory subtypes and tiers. |
| 1124 | + By default, hwloc only uses heuristics that are likely correct |
| 1125 | + and disables those that are unlikely. |
| 1126 | + </dd> |
| 1127 | +<!-- since 2.10, not stable yet, hence not documented |
| 1128 | + HWLOC_MEMTIERS_GUESS=spm_is_hbm,node0_is_dram |
| 1129 | + assume all SPM nodes are HBM, assume node0 is in the DRAM tier |
| 1130 | +--> |
| 1131 | + |
| 1132 | +<dt>HWLOC_MEMTIERS=0x0f=HBM;0xf=DRAM</dt> |
| 1133 | + <dd>Enforce the memory tiers from the given semi-colon separated list. |
| 1134 | + Each entry specifies a bitmask (nodeset) of NUMA nodes and their subtype. |
| 1135 | + Nodes not listed in any entry are not placed in any tier. |
| 1136 | + |
| 1137 | + If an empty value or <tt>none</tt> is given, tiers are entirely disabled. |
| 1138 | + </dd> |
| 1139 | + |
| 1140 | +<dt>HWLOC_MEMTIERS_REFRESH=1</dt> |
| 1141 | + <dd>Force the rebuilding of memory tiers. |
| 1142 | + This is mostly useful when importing a XML topology from an old hwloc |
| 1143 | + version which was not able to guess memory subtypes and tiers. |
| 1144 | + </dd> |
| 1145 | + |
1120 | 1146 | <dt>HWLOC_GROUPING=1</dt>
|
1121 | 1147 | <dd>enables or disables objects grouping based on distances.
|
1122 | 1148 | By default, hwloc uses distance matrices between objects (either read
|
@@ -1925,8 +1951,13 @@ subtype <tt>DRAM</tt> (for usual main memory),
|
1925 | 1951 | <tt>HBM</tt> (high-bandwidth memory),
|
1926 | 1952 | <tt>SPM</tt> (specific-purpose memory, usually reserved for some custom applications),
|
1927 | 1953 | <tt>NVM</tt> (non-volatile memory when used as main memory),
|
1928 |
| -<tt>MCDRAM</tt> (on KNL) |
1929 |
| -or <tt>GPUMemory</tt> (on POWER architecture with NVIDIA GPU memory shared over NVLink). |
| 1954 | +<tt>MCDRAM</tt> (on KNL), |
| 1955 | +<tt>GPUMemory</tt> (on POWER architecture with NVIDIA GPU memory shared over NVLink), |
| 1956 | +<tt>CXL-DRAM</tt> or <tt>CXL-NVM</tt> for CXL DRAM or non-volatile memory. |
| 1957 | +Note that some of these subtypes are guessed by the library, |
| 1958 | +they might be missing or slightly wrong in some corner cases. |
| 1959 | +See \ref heteromem for details, and HWLOC_MEMTIERS and HWLOC_MEMTIERS_GUESS |
| 1960 | +in \ref envvar for tuning these. |
1930 | 1961 | </li>
|
1931 | 1962 | <li>Groups:
|
1932 | 1963 | subtype <tt>Cluster</tt>, <tt>Module</tt>, <tt>Tile</tt>, <tt>Compute Unit</tt>,
|
@@ -2258,6 +2289,12 @@ and GID #1 of port #3.
|
2258 | 2289 | These info attributes are attached to objects specified in parentheses.
|
2259 | 2290 |
|
2260 | 2291 | <dl>
|
| 2292 | +<dt>MemoryTier (NUMA Nodes)</dt> |
| 2293 | +<dd>The rank of the memory tier of this node. |
| 2294 | +Ranks start from 0 for highest bandwidth nodes. |
| 2295 | +The attribute is only set if multiple tiers are found. |
| 2296 | +See \ref heteromem. |
| 2297 | +</dd> |
2261 | 2298 | <dt>CXLDevice (NUMA Nodes or DAX Memory OS devices)</dt>
|
2262 | 2299 | <dd>The PCI/CXL bus ID of a device whose CXL Type-3 memory is exposed here.
|
2263 | 2300 | If multiple devices are interleaved, their bus IDs are separated by commas,
|
@@ -2465,6 +2502,11 @@ The memory attributes API is located in hwloc/memattrs.h,
|
2465 | 2502 | see \ref hwlocality_memattrs and \ref hwlocality_memattrs_manage for details.
|
2466 | 2503 | See also an example in doc/examples/memory-attributes.c in the source tree.
|
2467 | 2504 |
|
| 2505 | +Memory attributes are the low-level solution to selecting target |
| 2506 | +memory. hwloc uses them internally to build Memory Tiers which provide |
| 2507 | +an easy way to distinguish NUMA nodes of different kinds, as explained |
| 2508 | +in \ref heteromem. |
| 2509 | + |
2468 | 2510 |
|
2469 | 2511 | \htmlonly
|
2470 | 2512 | </div><div class="section" id="topoattrs_cpukinds">
|
@@ -2524,6 +2566,186 @@ See \ref hwlocality_cpukinds for details.
|
2524 | 2566 |
|
2525 | 2567 |
|
2526 | 2568 |
|
| 2569 | +\page heteromem Heterogeneous Memory |
| 2570 | + |
| 2571 | +\htmlonly |
| 2572 | +<div class="section"> |
| 2573 | +\endhtmlonly |
| 2574 | + |
| 2575 | +Heterogeneous memory hardware exposes different NUMA nodes for |
| 2576 | +different memory technologies. |
| 2577 | +On the image below, a dual-socket server has both HBM (high bandwidth |
| 2578 | +memory) and usual DRAM connected to each socket, as well as some |
| 2579 | +CXL memory connected to the entire machine. |
| 2580 | + |
| 2581 | +\image html heteromem.png |
| 2582 | +\image latex heteromem.png "" width=\textwidth |
| 2583 | + |
| 2584 | +The hardware usually exposes "normal" memory first because it is |
| 2585 | +where "normal" data buffers should be allocated by default. |
| 2586 | +However there is no guarantee about whether HBM, NVM, CXL will appear |
| 2587 | +second. |
| 2588 | +Hence there is a need to explicit memory technologies and performance |
| 2589 | +to help users decide where to allocate. |
| 2590 | + |
| 2591 | +\htmlonly |
| 2592 | +</div><div class="section" id="heteromem_memtiers"> |
| 2593 | +\endhtmlonly |
| 2594 | +\section heteromem_memtiers Memory Tiers |
| 2595 | + |
| 2596 | +hwloc builds <i>Memory Tiers</i> to identify different kinds of |
| 2597 | +NUMA nodes. |
| 2598 | +On the above machine, the first tier would contain both HBM NUMA nodes |
| 2599 | +(L\#1 and L\#3), while the second tier would contain both DRAM nodes |
| 2600 | +(L\#0 and L\#2), and the CXL memory (L\#4) would be in the third tier. |
| 2601 | +NUMA nodes are then annotated accordingly: |
| 2602 | +<ul> |
| 2603 | +<li> Each node object has its <tt>subtype</tt> field set to <tt>HBM</tt>, |
| 2604 | + <tt>DRAM</tt> or <tt>CXL-DRAM</tt> |
| 2605 | + (see other possible values in \ref attributes_normal). |
| 2606 | +<li> Each node also has a string info attribute with name |
| 2607 | +<tt>MemoryTier</tt> and value <tt>0</tt> for the first tier, |
| 2608 | +<tt>1</tt> for the second, etc. |
| 2609 | +</ul> |
| 2610 | + |
| 2611 | +Tiers are built using two kinds of information: |
| 2612 | +<ul> |
| 2613 | +<li>First hwloc looks into operating system information to find out |
| 2614 | +whether a node is non-volatile, CXL, special-purpose, etc. |
| 2615 | +<li>Then it combines that knowledge with performance metrics exposed |
| 2616 | +by the hardware to guess what's actually DRAM, HBM, etc. |
| 2617 | +These metrics are also exposed in hwloc Memory Attributes, for |
| 2618 | +instance bandwidth and latency, for read and write. |
| 2619 | +See \ref topoattrs_memattrs and \ref hwlocality_memattrs for more details. |
| 2620 | +</ul> |
| 2621 | + |
| 2622 | +Once nodes with similar or different characteristics are identified, |
| 2623 | +they are placed in tiers. |
| 2624 | +Tiers are then sorted by bandwidth so that the highest bandwidth |
| 2625 | +is ranked first, etc. |
| 2626 | + |
| 2627 | +If hwloc fails to build tiers properly, see <tt>HWLOC_MEMTIERS</tt> |
| 2628 | +and <tt>HWLOC_MEMTIERS_GUESS</tt> in \ref envvar. |
| 2629 | + |
| 2630 | + |
| 2631 | +\htmlonly |
| 2632 | +</div><div class="section" id="heteromem_use_cli"> |
| 2633 | +\endhtmlonly |
| 2634 | +\section heteromem_use_cli Using Heterogeneous Memory from the command-line |
| 2635 | + |
| 2636 | +Tiers may be specified in location filters when using NUMA nodes |
| 2637 | +in hwloc command-line tools. |
| 2638 | +For instance, binding memory on the first HBM node (<tt>numa[hbm]:0</tt>) |
| 2639 | +is actually equivalent to binding on the second node (<tt>numa:1</tt>) |
| 2640 | +on our example platform: |
| 2641 | +\verbatim |
| 2642 | +$ hwloc-bind --membind 'numa[hbm]:0' -- myprogram |
| 2643 | +$ hwloc-bind --membind 'numa:1' -- myprogram |
| 2644 | +\endverbatim |
| 2645 | +To count DRAM nodes in the first CPU package, or all nodes: |
| 2646 | +\verbatim |
| 2647 | +$ hwloc-calc -N 'numa[dram]' package:0 |
| 2648 | +1 |
| 2649 | +$ hwloc-calc -N 'numa' package:0 |
| 2650 | +2 |
| 2651 | +\endverbatim |
| 2652 | +To list all the physical indexes of Tier-0 NUMA nodes (HBM P\#2 and P\#3 not shown on the figure): |
| 2653 | +\verbatim |
| 2654 | +$ hwloc-calc -I 'numa[tier=0]' -p all |
| 2655 | +2,3 |
| 2656 | +\endverbatim |
| 2657 | + |
| 2658 | +hwloc-calc and hwloc-bind also have options such as |
| 2659 | +<tt>\--local-memory</tt> and <tt>\--best-memattr</tt> |
| 2660 | +to select the best NUMA node among the local ones. |
| 2661 | +For instance, the following command-lines say that, |
| 2662 | +among nodes near node:0 (DRAM L\#0), |
| 2663 | +the best one for latency is itself |
| 2664 | +while the best one for bandwidth is node:1 (HBM L\#1). |
| 2665 | +\verbatim |
| 2666 | +$ hwloc-calc --best-memattr latency node:0 |
| 2667 | +0 |
| 2668 | +$ hwloc-calc --best-memattr bandwidth node:0 |
| 2669 | +1 |
| 2670 | +\endverbatim |
| 2671 | + |
| 2672 | + |
| 2673 | +\htmlonly |
| 2674 | +</div><div class="section" id="heteromem_use_api"> |
| 2675 | +\endhtmlonly |
| 2676 | +\section heteromem_use_api Using Heterogeneous Memory from the C API |
| 2677 | + |
| 2678 | +There are two major changes introduced by heterogeneous memory |
| 2679 | +when looking at the hierarchical tree of objects. |
| 2680 | +<ul> |
| 2681 | +<li> First, there may be multiple memory children attached at the same |
| 2682 | +place. |
| 2683 | +For instance, each Package in the above image has two memory children, |
| 2684 | +one for the DRAM NUMA node, and another one for the HBM node. |
| 2685 | +<li> Second, memory children may be attached at different levels. |
| 2686 | +In the above image, CXL memory is attached to the root Machine object |
| 2687 | +instead of below a Package. |
| 2688 | +</ul> |
| 2689 | + |
| 2690 | +Hence, one may have to rethink the way it selects NUMA nodes. |
| 2691 | + |
| 2692 | +\subsection heteromem_use_api_iterate Iterating over the list of (heterogeneous) NUMA nodes |
| 2693 | + |
| 2694 | +A common need consists in iterating over the list of NUMA nodes |
| 2695 | +(e.g. using hwloc_get_next_obj_by_type()). |
| 2696 | +This is useful for counting some domains before partitioning a job, |
| 2697 | +or for finding a node that is local to some objects. |
| 2698 | +With heterogeneous memory, one should remember that multiple nodes may |
| 2699 | +now have the same locality (HBM and DRAM above) or overlapping localities |
| 2700 | +(e.g. DRAM and CXL above). |
| 2701 | +Checking NUMA node subtype or tier attributes is a good way to avoid |
| 2702 | +this issue by ignoring nodes of different kinds. |
| 2703 | + |
| 2704 | +Another solution consists in ignoring nodes whose cpuset overlap the |
| 2705 | +previously selected ones. |
| 2706 | +For instance, in the above example, one could first select DRAM L\#0 |
| 2707 | +but ignore HBM L\#1 (because it overlaps with DRAM L\#0), |
| 2708 | +then select DRAM L\#2 but ignore HBM L\#3 and CXL L\#4 |
| 2709 | +(overlap wih DRAM L\#2). |
| 2710 | + |
| 2711 | +<br/> |
| 2712 | + |
| 2713 | +It is also possible to iterate over the memory parents (e.g. Packages |
| 2714 | +in our example) and select only one memory child for each of them. |
| 2715 | +hwloc_get_memory_parents_depth() may be used to find the depth |
| 2716 | +of these parents. |
| 2717 | +However this method only works if all memory parents are at the same level. |
| 2718 | +It would fail in our example: the root Machine object |
| 2719 | +also has a memory child (CXL), hence hwloc_get_memory_parents_depth() |
| 2720 | +would returns ::HWLOC_TYPE_DEPTH_MULTIPLE. |
| 2721 | + |
| 2722 | + |
| 2723 | +\subsection heteromem_use_api_vertical Iterating over local (heterogeneous) NUMA nodes |
| 2724 | + |
| 2725 | +Another common need is to find NUMA nodes that are local to some |
| 2726 | +objects (e.g. a Core). |
| 2727 | +A basic solution consists in looking at the Core nodeset and iterating |
| 2728 | +over NUMA nodes to select those whose nodeset are included. |
| 2729 | +A nicer solution is to walk up the tree to find ancestors with a |
| 2730 | +memory child. |
| 2731 | +With heterogeneous memory, multiple such ancestors may exist |
| 2732 | +(Package and Machine in our example) and they may have multiple memory |
| 2733 | +children. |
| 2734 | + |
| 2735 | +Both these methods may be replaced with hwloc_get_local_numanode_objs() |
| 2736 | +which provides a convenient and flexible way to retrieve local NUMA nodes. |
| 2737 | +One may then iterate over the returned array to select the appropriate one(s) |
| 2738 | +depending on their subtype, tier or performance attributes. |
| 2739 | + |
| 2740 | +<br> |
| 2741 | + |
| 2742 | +hwloc_memattr_get_best_target() is also a convenient way to select |
| 2743 | +the best local NUMA node according to performance metrics. |
| 2744 | +See also \ref hwlocality_memattrs. |
| 2745 | + |
| 2746 | + |
| 2747 | + |
| 2748 | + |
2527 | 2749 | \page xml Importing and exporting topologies from/to XML files
|
2528 | 2750 |
|
2529 | 2751 | \htmlonly
|
|
0 commit comments