|
39 | 39 | <li> \ref miscobjs
|
40 | 40 | <li> \ref attributes
|
41 | 41 | <li> \ref topoattrs
|
| 42 | + <li> \ref heteromem |
42 | 43 | <li> \ref xml
|
43 | 44 | <li> \ref synthetic
|
44 | 45 | <li> \ref interoperability
|
@@ -1955,7 +1956,7 @@ subtype <tt>DRAM</tt> (for usual main memory),
|
1955 | 1956 | <tt>CXL-DRAM</tt> or <tt>CXL-NVM</tt> for CXL DRAM or non-volatile memory.
|
1956 | 1957 | Note that some of these subtypes are guessed by the library,
|
1957 | 1958 | they might be missing or slightly wrong in some corner cases.
|
1958 |
| -See HWLOC_MEMTIERS and HWLOC_MEMTIERS_GUESS |
| 1959 | +See \ref heteromem for details, and HWLOC_MEMTIERS and HWLOC_MEMTIERS_GUESS |
1959 | 1960 | in \ref envvar for tuning these.
|
1960 | 1961 | </li>
|
1961 | 1962 | <li>Groups:
|
@@ -2292,6 +2293,7 @@ These info attributes are attached to objects specified in parentheses.
|
2292 | 2293 | <dd>The rank of the memory tier of this node.
|
2293 | 2294 | Ranks start from 0 for highest bandwidth nodes.
|
2294 | 2295 | The attribute is only set if multiple tiers are found.
|
| 2296 | +See \ref heteromem. |
2295 | 2297 | </dd>
|
2296 | 2298 | <dt>CXLDevice (NUMA Nodes or DAX Memory OS devices)</dt>
|
2297 | 2299 | <dd>The PCI/CXL bus ID of a device whose CXL Type-3 memory is exposed here.
|
@@ -2500,6 +2502,11 @@ The memory attributes API is located in hwloc/memattrs.h,
|
2500 | 2502 | see \ref hwlocality_memattrs and \ref hwlocality_memattrs_manage for details.
|
2501 | 2503 | See also an example in doc/examples/memory-attributes.c in the source tree.
|
2502 | 2504 |
|
| 2505 | +Memory attributes are the low-level solution to selecting target |
| 2506 | +memory. hwloc uses them internally to build Memory Tiers which provide |
| 2507 | +an easy way to distinguish NUMA nodes of different kinds, as explained |
| 2508 | +in \ref heteromem. |
| 2509 | + |
2503 | 2510 |
|
2504 | 2511 | \htmlonly
|
2505 | 2512 | </div><div class="section" id="topoattrs_cpukinds">
|
@@ -2559,6 +2566,186 @@ See \ref hwlocality_cpukinds for details.
|
2559 | 2566 |
|
2560 | 2567 |
|
2561 | 2568 |
|
| 2569 | +\page heteromem Heterogeneous Memory |
| 2570 | + |
| 2571 | +\htmlonly |
| 2572 | +<div class="section"> |
| 2573 | +\endhtmlonly |
| 2574 | + |
| 2575 | +Heterogeneous memory hardware exposes different NUMA nodes for |
| 2576 | +different memory technologies. |
| 2577 | +On the image below, a dual-socket server has both HBM (high bandwidth |
| 2578 | +memory) and usual DRAM connected to each socket, as well as some |
| 2579 | +CXL memory connected to the entire machine. |
| 2580 | + |
| 2581 | +\image html heteromem.png |
| 2582 | +\image latex heteromem.png "" width=\textwidth |
| 2583 | + |
| 2584 | +The hardware usually exposes "normal" memory first because it is |
| 2585 | +where "normal" data buffers should be allocated by default. |
| 2586 | +However there is no guarantee about whether HBM, NVM, CXL will appear |
| 2587 | +second. |
| 2588 | +Hence there is a need to explicit memory technologies and performance |
| 2589 | +to help users decide where to allocate. |
| 2590 | + |
| 2591 | +\htmlonly |
| 2592 | +</div><div class="section" id="heteromem_memtiers"> |
| 2593 | +\endhtmlonly |
| 2594 | +\section heteromem_memtiers Memory Tiers |
| 2595 | + |
| 2596 | +hwloc builds <i>Memory Tiers</i> to identify different kinds of |
| 2597 | +NUMA nodes. |
| 2598 | +On the above machine, the first tier would contain both HBM NUMA nodes |
| 2599 | +(L\#1 and L\#3), while the second tier would contain both DRAM nodes |
| 2600 | +(L\#0 and L\#2), and the CXL memory (L\#4) would be in the third tier. |
| 2601 | +NUMA nodes are then annotated accordingly: |
| 2602 | +<ul> |
| 2603 | +<li> Each node object has its <tt>subtype</tt> field set to <tt>HBM</tt>, |
| 2604 | + <tt>DRAM</tt> or <tt>CXL-DRAM</tt> |
| 2605 | + (see other possible values in \ref attributes_normal). |
| 2606 | +<li> Each node also has a string info attribute with name |
| 2607 | +<tt>MemoryTier</tt> and value <tt>0</tt> for the first tier, |
| 2608 | +<tt>1</tt> for the second, etc. |
| 2609 | +</ul> |
| 2610 | + |
| 2611 | +Tiers are built using two kinds of information: |
| 2612 | +<ul> |
| 2613 | +<li>First hwloc looks into operating system information to find out |
| 2614 | +whether a node is non-volatile, CXL, special-purpose, etc. |
| 2615 | +<li>Then it combines that knowledge with performance metrics exposed |
| 2616 | +by the hardware to guess what's actually DRAM, HBM, etc. |
| 2617 | +These metrics are also exposed in hwloc Memory Attributes, for |
| 2618 | +instance bandwidth and latency, for read and write. |
| 2619 | +See \ref topoattrs_memattrs and \ref hwlocality_memattrs for more details. |
| 2620 | +</ul> |
| 2621 | + |
| 2622 | +Once nodes with similar or different characteristics are identified, |
| 2623 | +they are placed in tiers. |
| 2624 | +Tiers are then sorted by bandwidth so that the highest bandwidth |
| 2625 | +is ranked first, etc. |
| 2626 | + |
| 2627 | +If hwloc fails to build tiers properly, see <tt>HWLOC_MEMTIERS</tt> |
| 2628 | +and <tt>HWLOC_MEMTIERS_GUESS</tt> in \ref envvar. |
| 2629 | + |
| 2630 | + |
| 2631 | +\htmlonly |
| 2632 | +</div><div class="section" id="heteromem_use_cli"> |
| 2633 | +\endhtmlonly |
| 2634 | +\section heteromem_use_cli Using Heterogeneous Memory from the command-line |
| 2635 | + |
| 2636 | +Tiers may be specified in location filters when using NUMA nodes |
| 2637 | +in hwloc command-line tools. |
| 2638 | +For instance, binding memory on the first HBM node (<tt>numa[hbm]:0</tt>) |
| 2639 | +is actually equivalent to binding on the second node (<tt>numa:1</tt>) |
| 2640 | +on our example platform: |
| 2641 | +\verbatim |
| 2642 | +$ hwloc-bind --membind 'numa[hbm]:0' -- myprogram |
| 2643 | +$ hwloc-bind --membind 'numa:1' -- myprogram |
| 2644 | +\endverbatim |
| 2645 | +To count DRAM nodes in the first CPU package, or all nodes: |
| 2646 | +\verbatim |
| 2647 | +$ hwloc-calc -N 'numa[dram]' package:0 |
| 2648 | +1 |
| 2649 | +$ hwloc-calc -N 'numa' package:0 |
| 2650 | +2 |
| 2651 | +\endverbatim |
| 2652 | +To list all the physical indexes of Tier-0 NUMA nodes (HBM P\#2 and P\#3 not shown on the figure): |
| 2653 | +\verbatim |
| 2654 | +$ hwloc-calc -I 'numa[tier=0]' -p all |
| 2655 | +2,3 |
| 2656 | +\endverbatim |
| 2657 | + |
| 2658 | +hwloc-calc and hwloc-bind also have options such as |
| 2659 | +<tt>\--local-memory</tt> and <tt>\--best-memattr</tt> |
| 2660 | +to select the best NUMA node among the local ones. |
| 2661 | +For instance, the following command-lines say that, |
| 2662 | +among nodes near node:0 (DRAM L\#0), |
| 2663 | +the best one for latency is itself |
| 2664 | +while the best one for bandwidth is node:1 (HBM L\#1). |
| 2665 | +\verbatim |
| 2666 | +$ hwloc-calc --best-memattr latency node:0 |
| 2667 | +0 |
| 2668 | +$ hwloc-calc --best-memattr bandwidth node:0 |
| 2669 | +1 |
| 2670 | +\endverbatim |
| 2671 | + |
| 2672 | + |
| 2673 | +\htmlonly |
| 2674 | +</div><div class="section" id="heteromem_use_api"> |
| 2675 | +\endhtmlonly |
| 2676 | +\section heteromem_use_api Using Heterogeneous Memory from the C API |
| 2677 | + |
| 2678 | +There are two major changes introduced by heterogeneous memory |
| 2679 | +when looking at the hierarchical tree of objects. |
| 2680 | +<ul> |
| 2681 | +<li> First, there may be multiple memory children attached at the same |
| 2682 | +place. |
| 2683 | +For instance, each Package in the above image has two memory children, |
| 2684 | +one for the DRAM NUMA node, and another one for the HBM node. |
| 2685 | +<li> Second, memory children may be attached at different levels. |
| 2686 | +In the above image, CXL memory is attached to the root Machine object |
| 2687 | +instead of below a Package. |
| 2688 | +</ul> |
| 2689 | + |
| 2690 | +Hence, one may have to rethink the way it selects NUMA nodes. |
| 2691 | + |
| 2692 | +\subsection heteromem_use_api_iterate Iterating over the list of (heterogeneous) NUMA nodes |
| 2693 | + |
| 2694 | +A common need consists in iterating over the list of NUMA nodes |
| 2695 | +(e.g. using hwloc_get_next_obj_by_type()). |
| 2696 | +This is useful for counting some domains before partitioning a job, |
| 2697 | +or for finding a node that is local to some objects. |
| 2698 | +With heterogeneous memory, one should remember that multiple nodes may |
| 2699 | +now have the same locality (HBM and DRAM above) or overlapping localities |
| 2700 | +(e.g. DRAM and CXL above). |
| 2701 | +Checking NUMA node subtype or tier attributes is a good way to avoid |
| 2702 | +this issue by ignoring nodes of different kinds. |
| 2703 | + |
| 2704 | +Another solution consists in ignoring nodes whose cpuset overlap the |
| 2705 | +previously selected ones. |
| 2706 | +For instance, in the above example, one could first select DRAM L\#0 |
| 2707 | +but ignore HBM L\#1 (because it overlaps with DRAM L\#0), |
| 2708 | +then select DRAM L\#2 but ignore HBM L\#3 and CXL L\#4 |
| 2709 | +(overlap wih DRAM L\#2). |
| 2710 | + |
| 2711 | +<br/> |
| 2712 | + |
| 2713 | +It is also possible to iterate over the memory parents (e.g. Packages |
| 2714 | +in our example) and select only one memory child for each of them. |
| 2715 | +hwloc_get_memory_parents_depth() may be used to find the depth |
| 2716 | +of these parents. |
| 2717 | +However this method only works if all memory parents are at the same level. |
| 2718 | +It would fail in our example: the root Machine object |
| 2719 | +also has a memory child (CXL), hence hwloc_get_memory_parents_depth() |
| 2720 | +would returns ::HWLOC_TYPE_DEPTH_MULTIPLE. |
| 2721 | + |
| 2722 | + |
| 2723 | +\subsection heteromem_use_api_vertical Iterating over local (heterogeneous) NUMA nodes |
| 2724 | + |
| 2725 | +Another common need is to find NUMA nodes that are local to some |
| 2726 | +objects (e.g. a Core). |
| 2727 | +A basic solution consists in looking at the Core nodeset and iterating |
| 2728 | +over NUMA nodes to select those whose nodeset are included. |
| 2729 | +A nicer solution is to walk up the tree to find ancestors with a |
| 2730 | +memory child. |
| 2731 | +With heterogeneous memory, multiple such ancestors may exist |
| 2732 | +(Package and Machine in our example) and they may have multiple memory |
| 2733 | +children. |
| 2734 | + |
| 2735 | +Both these methods may be replaced with hwloc_get_local_numanode_objs() |
| 2736 | +which provides a convenient and flexible way to retrieve local NUMA nodes. |
| 2737 | +One may then iterate over the returned array to select the appropriate one(s) |
| 2738 | +depending on their subtype, tier or performance attributes. |
| 2739 | + |
| 2740 | +<br> |
| 2741 | + |
| 2742 | +hwloc_memattr_get_best_target() is also a convenient way to select |
| 2743 | +the best local NUMA node according to performance metrics. |
| 2744 | +See also \ref hwlocality_memattrs. |
| 2745 | + |
| 2746 | + |
| 2747 | + |
| 2748 | + |
2562 | 2749 | \page xml Importing and exporting topologies from/to XML files
|
2563 | 2750 |
|
2564 | 2751 | \htmlonly
|
|
0 commit comments