Skip to content

Commit a474591

Browse files
committed
doxy: add a new section about heterogeneous memory
talks about tiers, subtypes, memory attrs, and how to iterate/select NUMA nodes Signed-off-by: Brice Goglin <[email protected]>
1 parent 1b089e9 commit a474591

File tree

5 files changed

+258
-1
lines changed

5 files changed

+258
-1
lines changed

doc/hwloc.doxy

Lines changed: 188 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@
3939
<li> \ref miscobjs
4040
<li> \ref attributes
4141
<li> \ref topoattrs
42+
<li> \ref heteromem
4243
<li> \ref xml
4344
<li> \ref synthetic
4445
<li> \ref interoperability
@@ -1955,7 +1956,7 @@ subtype <tt>DRAM</tt> (for usual main memory),
19551956
<tt>CXL-DRAM</tt> or <tt>CXL-NVM</tt> for CXL DRAM or non-volatile memory.
19561957
Note that some of these subtypes are guessed by the library,
19571958
they might be missing or slightly wrong in some corner cases.
1958-
See HWLOC_MEMTIERS and HWLOC_MEMTIERS_GUESS
1959+
See \ref heteromem for details, and HWLOC_MEMTIERS and HWLOC_MEMTIERS_GUESS
19591960
in \ref envvar for tuning these.
19601961
</li>
19611962
<li>Groups:
@@ -2292,6 +2293,7 @@ These info attributes are attached to objects specified in parentheses.
22922293
<dd>The rank of the memory tier of this node.
22932294
Ranks start from 0 for highest bandwidth nodes.
22942295
The attribute is only set if multiple tiers are found.
2296+
See \ref heteromem.
22952297
</dd>
22962298
<dt>CXLDevice (NUMA Nodes or DAX Memory OS devices)</dt>
22972299
<dd>The PCI/CXL bus ID of a device whose CXL Type-3 memory is exposed here.
@@ -2500,6 +2502,11 @@ The memory attributes API is located in hwloc/memattrs.h,
25002502
see \ref hwlocality_memattrs and \ref hwlocality_memattrs_manage for details.
25012503
See also an example in doc/examples/memory-attributes.c in the source tree.
25022504

2505+
Memory attributes are the low-level solution to selecting target
2506+
memory. hwloc uses them internally to build Memory Tiers which provide
2507+
an easy way to distinguish NUMA nodes of different kinds, as explained
2508+
in \ref heteromem.
2509+
25032510

25042511
\htmlonly
25052512
</div><div class="section" id="topoattrs_cpukinds">
@@ -2559,6 +2566,186 @@ See \ref hwlocality_cpukinds for details.
25592566

25602567

25612568

2569+
\page heteromem Heterogeneous Memory
2570+
2571+
\htmlonly
2572+
<div class="section">
2573+
\endhtmlonly
2574+
2575+
Heterogeneous memory hardware exposes different NUMA nodes for
2576+
different memory technologies.
2577+
On the image below, a dual-socket server has both HBM (high bandwidth
2578+
memory) and usual DRAM connected to each socket, as well as some
2579+
CXL memory connected to the entire machine.
2580+
2581+
\image html heteromem.png
2582+
\image latex heteromem.png "" width=\textwidth
2583+
2584+
The hardware usually exposes "normal" memory first because it is
2585+
where "normal" data buffers should be allocated by default.
2586+
However there is no guarantee about whether HBM, NVM, CXL will appear
2587+
second.
2588+
Hence there is a need to explicit memory technologies and performance
2589+
to help users decide where to allocate.
2590+
2591+
\htmlonly
2592+
</div><div class="section" id="heteromem_memtiers">
2593+
\endhtmlonly
2594+
\section heteromem_memtiers Memory Tiers
2595+
2596+
hwloc builds <i>Memory Tiers</i> to identify different kinds of
2597+
NUMA nodes.
2598+
On the above machine, the first tier would contain both HBM NUMA nodes
2599+
(L\#1 and L\#3), while the second tier would contain both DRAM nodes
2600+
(L\#0 and L\#2), and the CXL memory (L\#4) would be in the third tier.
2601+
NUMA nodes are then annotated accordingly:
2602+
<ul>
2603+
<li> Each node object has its <tt>subtype</tt> field set to <tt>HBM</tt>,
2604+
<tt>DRAM</tt> or <tt>CXL-DRAM</tt>
2605+
(see other possible values in \ref attributes_normal).
2606+
<li> Each node also has a string info attribute with name
2607+
<tt>MemoryTier</tt> and value <tt>0</tt> for the first tier,
2608+
<tt>1</tt> for the second, etc.
2609+
</ul>
2610+
2611+
Tiers are built using two kinds of information:
2612+
<ul>
2613+
<li>First hwloc looks into operating system information to find out
2614+
whether a node is non-volatile, CXL, special-purpose, etc.
2615+
<li>Then it combines that knowledge with performance metrics exposed
2616+
by the hardware to guess what's actually DRAM, HBM, etc.
2617+
These metrics are also exposed in hwloc Memory Attributes, for
2618+
instance bandwidth and latency, for read and write.
2619+
See \ref topoattrs_memattrs and \ref hwlocality_memattrs for more details.
2620+
</ul>
2621+
2622+
Once nodes with similar or different characteristics are identified,
2623+
they are placed in tiers.
2624+
Tiers are then sorted by bandwidth so that the highest bandwidth
2625+
is ranked first, etc.
2626+
2627+
If hwloc fails to build tiers properly, see <tt>HWLOC_MEMTIERS</tt>
2628+
and <tt>HWLOC_MEMTIERS_GUESS</tt> in \ref envvar.
2629+
2630+
2631+
\htmlonly
2632+
</div><div class="section" id="heteromem_use_cli">
2633+
\endhtmlonly
2634+
\section heteromem_use_cli Using Heterogeneous Memory from the command-line
2635+
2636+
Tiers may be specified in location filters when using NUMA nodes
2637+
in hwloc command-line tools.
2638+
For instance, binding memory on the first HBM node (<tt>numa[hbm]:0</tt>)
2639+
is actually equivalent to binding on the second node (<tt>numa:1</tt>)
2640+
on our example platform:
2641+
\verbatim
2642+
$ hwloc-bind --membind 'numa[hbm]:0' -- myprogram
2643+
$ hwloc-bind --membind 'numa:1' -- myprogram
2644+
\endverbatim
2645+
To count DRAM nodes in the first CPU package, or all nodes:
2646+
\verbatim
2647+
$ hwloc-calc -N 'numa[dram]' package:0
2648+
1
2649+
$ hwloc-calc -N 'numa' package:0
2650+
2
2651+
\endverbatim
2652+
To list all the physical indexes of Tier-0 NUMA nodes (HBM P\#2 and P\#3 not shown on the figure):
2653+
\verbatim
2654+
$ hwloc-calc -I 'numa[tier=0]' -p all
2655+
2,3
2656+
\endverbatim
2657+
2658+
hwloc-calc and hwloc-bind also have options such as
2659+
<tt>\--local-memory</tt> and <tt>\--best-memattr</tt>
2660+
to select the best NUMA node among the local ones.
2661+
For instance, the following command-lines say that,
2662+
among nodes near node:0 (DRAM L\#0),
2663+
the best one for latency is itself
2664+
while the best one for bandwidth is node:1 (HBM L\#1).
2665+
\verbatim
2666+
$ hwloc-calc --best-memattr latency node:0
2667+
0
2668+
$ hwloc-calc --best-memattr bandwidth node:0
2669+
1
2670+
\endverbatim
2671+
2672+
2673+
\htmlonly
2674+
</div><div class="section" id="heteromem_use_api">
2675+
\endhtmlonly
2676+
\section heteromem_use_api Using Heterogeneous Memory from the C API
2677+
2678+
There are two major changes introduced by heterogeneous memory
2679+
when looking at the hierarchical tree of objects.
2680+
<ul>
2681+
<li> First, there may be multiple memory children attached at the same
2682+
place.
2683+
For instance, each Package in the above image has two memory children,
2684+
one for the DRAM NUMA node, and another one for the HBM node.
2685+
<li> Second, memory children may be attached at different levels.
2686+
In the above image, CXL memory is attached to the root Machine object
2687+
instead of below a Package.
2688+
</ul>
2689+
2690+
Hence, one may have to rethink the way it selects NUMA nodes.
2691+
2692+
\subsection heteromem_use_api_iterate Iterating over the list of (heterogeneous) NUMA nodes
2693+
2694+
A common need consists in iterating over the list of NUMA nodes
2695+
(e.g. using hwloc_get_next_obj_by_type()).
2696+
This is useful for counting some domains before partitioning a job,
2697+
or for finding a node that is local to some objects.
2698+
With heterogeneous memory, one should remember that multiple nodes may
2699+
now have the same locality (HBM and DRAM above) or overlapping localities
2700+
(e.g. DRAM and CXL above).
2701+
Checking NUMA node subtype or tier attributes is a good way to avoid
2702+
this issue by ignoring nodes of different kinds.
2703+
2704+
Another solution consists in ignoring nodes whose cpuset overlap the
2705+
previously selected ones.
2706+
For instance, in the above example, one could first select DRAM L\#0
2707+
but ignore HBM L\#1 (because it overlaps with DRAM L\#0),
2708+
then select DRAM L\#2 but ignore HBM L\#3 and CXL L\#4
2709+
(overlap wih DRAM L\#2).
2710+
2711+
<br/>
2712+
2713+
It is also possible to iterate over the memory parents (e.g. Packages
2714+
in our example) and select only one memory child for each of them.
2715+
hwloc_get_memory_parents_depth() may be used to find the depth
2716+
of these parents.
2717+
However this method only works if all memory parents are at the same level.
2718+
It would fail in our example: the root Machine object
2719+
also has a memory child (CXL), hence hwloc_get_memory_parents_depth()
2720+
would returns ::HWLOC_TYPE_DEPTH_MULTIPLE.
2721+
2722+
2723+
\subsection heteromem_use_api_vertical Iterating over local (heterogeneous) NUMA nodes
2724+
2725+
Another common need is to find NUMA nodes that are local to some
2726+
objects (e.g. a Core).
2727+
A basic solution consists in looking at the Core nodeset and iterating
2728+
over NUMA nodes to select those whose nodeset are included.
2729+
A nicer solution is to walk up the tree to find ancestors with a
2730+
memory child.
2731+
With heterogeneous memory, multiple such ancestors may exist
2732+
(Package and Machine in our example) and they may have multiple memory
2733+
children.
2734+
2735+
Both these methods may be replaced with hwloc_get_local_numanode_objs()
2736+
which provides a convenient and flexible way to retrieve local NUMA nodes.
2737+
One may then iterate over the returned array to select the appropriate one(s)
2738+
depending on their subtype, tier or performance attributes.
2739+
2740+
<br>
2741+
2742+
hwloc_memattr_get_best_target() is also a convenient way to select
2743+
the best local NUMA node according to performance metrics.
2744+
See also \ref hwlocality_memattrs.
2745+
2746+
2747+
2748+
25622749
\page xml Importing and exporting topologies from/to XML files
25632750

25642751
\htmlonly

doc/images/HACKING

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,7 @@ done
1010
for f in ppc64-without-smt ppc64-with-smt ppc64-full-with-smt ; do
1111
LANG=C lstopo -i ${f}.xml --horiz --no-legend --logical --no-index --index=pu --index=numa --index=core --index=pack --no-factorize -f ${f}.png ;
1212
done
13+
14+
for f in heteromem ; do
15+
LANG=C lstopo -i ${f}.xml --horiz --no-legend --logical --no-index --ignore pu --index=numa --index=core --index=pack --no-factorize -f ${f}.png ;
16+
done

doc/images/heteromem.png

9.71 KB
Loading

doc/images/heteromem.xml

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<!DOCTYPE topology SYSTEM "hwloc2.dtd">
3+
<topology version="3.0">
4+
<object type="Machine" os_index="0" cpuset="0x000000ff" complete_cpuset="0x000000ff" allowed_cpuset="0x000000ff" nodeset="0x0000001f" complete_nodeset="0x0000001f" allowed_nodeset="0x0000001f" gp_index="1" id="obj1">
5+
<object type="NUMANode" os_index="4" cpuset="0x000000ff" complete_cpuset="0x000000ff" nodeset="0x00000010" complete_nodeset="0x00000010" gp_index="24" id="obj24" local_memory="1073741824" subtype="CXL-DRAM">
6+
<page_type size="4096" count="262144"/>
7+
<info name="MemoryTier" value="2"/>
8+
</object>
9+
<object type="Package" os_index="0" cpuset="0x0000000f" complete_cpuset="0x0000000f" nodeset="0x00000013" complete_nodeset="0x00000013" gp_index="10" id="obj10">
10+
<object type="NUMANode" os_index="0" cpuset="0x0000000f" complete_cpuset="0x0000000f" nodeset="0x00000001" complete_nodeset="0x00000001" gp_index="11" id="obj11" local_memory="1073741824" subtype="DRAM">
11+
<page_type size="4096" count="262144"/>
12+
<info name="MemoryTier" value="1"/>
13+
</object>
14+
<object type="NUMANode" os_index="1" cpuset="0x0000000f" complete_cpuset="0x0000000f" nodeset="0x00000002" complete_nodeset="0x00000002" gp_index="12" id="obj12" local_memory="1073741824" subtype="HBM">
15+
<page_type size="4096" count="262144"/>
16+
<info name="MemoryTier" value="0"/>
17+
</object>
18+
<object type="Core" os_index="0" cpuset="0x00000001" complete_cpuset="0x00000001" nodeset="0x00000013" complete_nodeset="0x00000013" gp_index="3" id="obj3">
19+
<object type="PU" os_index="0" cpuset="0x00000001" complete_cpuset="0x00000001" nodeset="0x00000013" complete_nodeset="0x00000013" gp_index="2" id="obj2"/>
20+
</object>
21+
<object type="Core" os_index="1" cpuset="0x00000002" complete_cpuset="0x00000002" nodeset="0x00000013" complete_nodeset="0x00000013" gp_index="5" id="obj5">
22+
<object type="PU" os_index="1" cpuset="0x00000002" complete_cpuset="0x00000002" nodeset="0x00000013" complete_nodeset="0x00000013" gp_index="4" id="obj4"/>
23+
</object>
24+
<object type="Core" os_index="2" cpuset="0x00000004" complete_cpuset="0x00000004" nodeset="0x00000013" complete_nodeset="0x00000013" gp_index="7" id="obj7">
25+
<object type="PU" os_index="2" cpuset="0x00000004" complete_cpuset="0x00000004" nodeset="0x00000013" complete_nodeset="0x00000013" gp_index="6" id="obj6"/>
26+
</object>
27+
<object type="Core" os_index="3" cpuset="0x00000008" complete_cpuset="0x00000008" nodeset="0x00000013" complete_nodeset="0x00000013" gp_index="9" id="obj9">
28+
<object type="PU" os_index="3" cpuset="0x00000008" complete_cpuset="0x00000008" nodeset="0x00000013" complete_nodeset="0x00000013" gp_index="8" id="obj8"/>
29+
</object>
30+
</object>
31+
<object type="Package" os_index="1" cpuset="0x000000f0" complete_cpuset="0x000000f0" nodeset="0x0000001c" complete_nodeset="0x0000001c" gp_index="21" id="obj21">
32+
<object type="NUMANode" os_index="2" cpuset="0x000000f0" complete_cpuset="0x000000f0" nodeset="0x00000004" complete_nodeset="0x00000004" gp_index="22" id="obj22" local_memory="1073741824" subtype="DRAM">
33+
<page_type size="4096" count="262144"/>
34+
<info name="MemoryTier" value="1"/>
35+
</object>
36+
<object type="NUMANode" os_index="3" cpuset="0x000000f0" complete_cpuset="0x000000f0" nodeset="0x00000008" complete_nodeset="0x00000008" gp_index="23" id="obj23" local_memory="1073741824" subtype="HBM">
37+
<page_type size="4096" count="262144"/>
38+
<info name="MemoryTier" value="0"/>
39+
</object>
40+
<object type="Core" os_index="4" cpuset="0x00000010" complete_cpuset="0x00000010" nodeset="0x0000001c" complete_nodeset="0x0000001c" gp_index="14" id="obj14">
41+
<object type="PU" os_index="4" cpuset="0x00000010" complete_cpuset="0x00000010" nodeset="0x0000001c" complete_nodeset="0x0000001c" gp_index="13" id="obj13"/>
42+
</object>
43+
<object type="Core" os_index="5" cpuset="0x00000020" complete_cpuset="0x00000020" nodeset="0x0000001c" complete_nodeset="0x0000001c" gp_index="16" id="obj16">
44+
<object type="PU" os_index="5" cpuset="0x00000020" complete_cpuset="0x00000020" nodeset="0x0000001c" complete_nodeset="0x0000001c" gp_index="15" id="obj15"/>
45+
</object>
46+
<object type="Core" os_index="6" cpuset="0x00000040" complete_cpuset="0x00000040" nodeset="0x0000001c" complete_nodeset="0x0000001c" gp_index="18" id="obj18">
47+
<object type="PU" os_index="6" cpuset="0x00000040" complete_cpuset="0x00000040" nodeset="0x0000001c" complete_nodeset="0x0000001c" gp_index="17" id="obj17"/>
48+
</object>
49+
<object type="Core" os_index="7" cpuset="0x00000080" complete_cpuset="0x00000080" nodeset="0x0000001c" complete_nodeset="0x0000001c" gp_index="20" id="obj20">
50+
<object type="PU" os_index="7" cpuset="0x00000080" complete_cpuset="0x00000080" nodeset="0x0000001c" complete_nodeset="0x0000001c" gp_index="19" id="obj19"/>
51+
</object>
52+
</object>
53+
</object>
54+
<support name="discovery.pu"/>
55+
<support name="discovery.numa"/>
56+
<support name="discovery.numa_memory"/>
57+
<support name="custom.exported_support"/>
58+
<info name="Backend" value="Synthetic"/>
59+
<info name="SyntheticDescription" value="[numa] pack:2 [numa] [numa] core:4 1"/>
60+
<info name="hwlocVersion" value="3.0.0a1-git"/>
61+
<info name="ProcessName" value="lstopo"/>
62+
</topology>

include/hwloc/memattrs.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,10 @@ extern "C" {
5454
* Attribute values for these nodes, if any, may then be obtained with
5555
* hwloc_memattr_get_value() and manually compared with the desired criteria.
5656
*
57+
* Memory attributes are also used internally to build Memory Tiers which provide
58+
* an easy way to distinguish NUMA nodes of different kinds, as explained
59+
* in \ref heteromem.
60+
*
5761
* \sa An example is available in doc/examples/memory-attributes.c in the source tree.
5862
*
5963
* \note The API also supports specific objects as initiator,

0 commit comments

Comments
 (0)