|
| 1 | +# In Progress Metrics |
| 2 | + |
| 3 | +These are metrics that are consistered under development (and likely need more eyes) to get fully working. |
| 4 | + |
| 5 | +## Network |
| 6 | + |
| 7 | +### network-chatterbug |
| 8 | + |
| 9 | + - [Standalone Metric Set](user-guide.md#application-metric-set) |
| 10 | + - *[network-chatterbug](https://github.com/converged-computing/metrics-operator/tree/main/examples/tests/network-chatterbug)* |
| 11 | + |
| 12 | +Chatterbug provides a [suite of communication proxy applications](https://github.com/hpcgroup/chatterbug) for HPC. |
| 13 | +We use a launcher/worker design. |
| 14 | + |
| 15 | +|Name | Description | Type | Default | |
| 16 | +|-----|--------------|------|---------| |
| 17 | +| mpirun | The options to give to mpirun (includes tasks) | string | `-N 8` | |
| 18 | +| command | The chatterbug command (subdirectory) to run, see options below | string | stencil3d | |
| 19 | +| args | Arguments for the command | string | `1 2 2 10 10 10 4 1` | |
| 20 | +| sole-tenancy | Require sole tenancy | string ("true" or "false") | "true" | |
| 21 | + |
| 22 | +By default, we require sole-tenancy, but you can disable this. Note that the best place to look for "documentation" |
| 23 | +on the commands seems to be [the source code]((https://github.com/hpcgroup/chatterbug)). The following command options |
| 24 | +are available for `command`: |
| 25 | + |
| 26 | +- pairs |
| 27 | +- ping-ping |
| 28 | +- spread |
| 29 | +- stencil3d |
| 30 | +- stencil4d |
| 31 | +- subcom2d-coll |
| 32 | +- subcom2d-a2a |
| 33 | +- unstr-mesh |
| 34 | + |
| 35 | +We have tested mostly stencil3d. Note that the mpirun command is parsed as follows: |
| 36 | + |
| 37 | +```bash |
| 38 | +$ mpirun --hostfile ./hostfile.txt --allow-run-as-root -N 4 /root/chatterbug/${command}/${executable} ${args} |
| 39 | +``` |
| 40 | + |
| 41 | +Thus for the defaults, you'd get this command (on one pod): |
| 42 | + |
| 43 | +```bash |
| 44 | +$ mpirun --hostfile ./hostfile.txt --allow-run-as-root -N 4 /root/chatterbug/stencil3d/stencil3d.x 1 2 2 10 10 10 4 1 |
| 45 | +``` |
| 46 | + |
| 47 | +See the example linked in the header for a metrics.yaml example. |
| 48 | + |
| 49 | +## Standalone |
| 50 | + |
| 51 | +### app-hpl |
| 52 | + |
| 53 | + - [Standalone Metric Set](user-guide.md#application-metric-set) |
| 54 | + - *[app-hpl](https://github.com/converged-computing/metrics-operator/tree/main/examples/tests/app-hpl)* |
| 55 | + |
| 56 | +The [Linpack](https://ulhpc-tutorials.readthedocs.io/en/production/parallel/mpi/HPL/) benchmark is used for the [Top500](https://www.top500.org/project/linpack/), |
| 57 | +and generally is solving a dense system of linear equations. Arguments to customize include the following: |
| 58 | + |
| 59 | +| Name | Description | Type | Default | |
| 60 | +|-----|-------------|------|---------| |
| 61 | +| mpiargs | Arguments to give to mpi | string | empty string | |
| 62 | +| tasks | Number of tasks per node | int32 | detected used nproc | |
| 63 | +| ratio | target memory occupation | string (but as a float, e.g., "0.3") | "0.3" | |
| 64 | +| memory | memory in GiB | int32 | detected from proc | |
| 65 | +| blocksize | blocksize is the NBs "number blocks" value | int32 | | |
| 66 | +| pfact | | int32 | | |
| 67 | +| nbmin | | int32 | | |
| 68 | +| ndiv | | int32 | | |
| 69 | +| row_or_colmajor_pmapping | PMAP process mapping (0=Row-,1=Column-major) | int32 | 0 | |
| 70 | +| rfact | (0=left, 1=Crout, 2=Right) | int32 | 0 | |
| 71 | +| bcast | (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) | int32 | 0 | |
| 72 | +| depth | number of lookahead depth | int32 | 0 | |
| 73 | +| swap | (0=bin-exch,1=long,2=mix) | int32 | 0 | |
| 74 | +| swappingThreshold | | int32 | 64 | |
| 75 | +| l1transposed | (0=transposed,1=no-transposed) | int32 | 0 | |
| 76 | +| utransposed | (0=transposed,1=no-transposed) | int32 | 0 | |
| 77 | +| memAlignment | memory alignment in double (> 0) (4,8,16) | int32 | | |
| 78 | + |
| 79 | +For the meaning of each of these, see [this documentation](https://ulhpc-tutorials.readthedocs.io/en/production/parallel/mpi/HPL/#hpl-main-parameters) |
| 80 | +and how they are used in [hpl.go](https://github.com/converged-computing/metrics-operator/tree/main/pkg/metrics/app/hpl.go) |
| 81 | +I made an effort to define them above, but you should consult the documentation above, because I don't fully |
| 82 | +understand these yet. |
| 83 | + |
| 84 | +We provide a simple build here, as typically vendors spend a lot of time custom-compiling the code |
| 85 | +for their architectures (and we are compiling for general use). We will use a script `compute_N` from the OLHPC Tutorials to generate input data for a particular |
| 86 | +problem size, and you can vary the input to this script via the `computeArgs` parameters. We use a default, and you can inspect the |
| 87 | +script help below: |
| 88 | + |
| 89 | +<details> |
| 90 | + |
| 91 | +<summary>`compute_N --help`</summary> |
| 92 | + |
| 93 | +```console |
| 94 | +# compute_N -h |
| 95 | +Compute N for HPL runs. |
| 96 | + |
| 97 | +SYNOPSIS |
| 98 | + compute_N [-v] [--mem <SIZE_IN_GB>] [-N <NODES>] [-r <RATIO>] [-NB <NB>] |
| 99 | + compute_N [-v] [--mem <SIZE_IN_GB>] [-N <NODES>] [-p <PERCENTAGE_MEM>] [-NB <NB>] |
| 100 | + |
| 101 | + The following formulae is used (when using '-r <ratio>'): |
| 102 | + N = <ratio>*SQRT( Total Memory Size in bytes / sizeof(double) ) |
| 103 | + = <ratio>*SQRT( <nnodes> * <ram_size> / 8) |
| 104 | + |
| 105 | + Alternatively you may wish to specify a memory usage ratio (with -p <percentage_mem>), |
| 106 | + in which case the following formulae is used: |
| 107 | + N = SQRT( <percentage_mem>/100 * Total Memory Size in bytes / sizeof(doubl) |
| 108 | + |
| 109 | +OPTIONS |
| 110 | + -m --mem --ramsize <SIZE> |
| 111 | + Specify the total memory size per node, in GiB. |
| 112 | + Default RAM size consider (yet in KiB): 16051112 KiB |
| 113 | + -N --nodes <N> |
| 114 | + Number of compute nodes |
| 115 | + -NB <NB> |
| 116 | + NB parameters to use. Default: 192 (384 for skylake) |
| 117 | + -p --memshare <PERCENTAGE_MEM> |
| 118 | + Percentage of the total memory size to use. |
| 119 | + Derived from the below global ratio (i.e. 0% since RATIO=0.8) |
| 120 | + -r --ratio <RATIO> |
| 121 | + Global ratio to apply. Default: 0.8 |
| 122 | + |
| 123 | +EXAMPLE |
| 124 | + For 2 broadwell nodes on iris cluster, using 30% of the total memory per node: |
| 125 | + compute_N -N 2 -p 30 -m 128 -NB 192 |
| 126 | + For 4 skylake nodes on iris cluster, using 85% of the total memory per node: |
| 127 | + compute_N -N 4 -p 85 -m 128 -NB 384 |
| 128 | + |
| 129 | +AUTHORS |
| 130 | + Sebastien Varrette <[email protected]> and UL HPC Team |
| 131 | + |
| 132 | +COPYRIGHT |
| 133 | + This is free software; see the source for copying conditions. There is |
| 134 | + NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. |
| 135 | +``` |
| 136 | + |
| 137 | +</details> |
| 138 | + |
| 139 | +The following examples are [provided](https://ulhpc-tutorials.readthedocs.io/en/production/parallel/mpi/HPL/) to generate the HPL.dat for the analysis: |
| 140 | + |
| 141 | +```bash |
| 142 | +/opt/tutorials/benchmarks/HPL/scripts/compute_N -h |
| 143 | +# 1 Broadwell node, alpha = 0.3 |
| 144 | +/opt/tutorials/benchmarks/HPL/scripts/compute_N -m 128 -NB 192 -r 0.3 -N 1 |
| 145 | +# 2 Skylake (regular) nodes, alpha = 0.3 |
| 146 | +/opt/tutorials/benchmarks/HPL/scripts/compute_N -m 128 -NB 384 -r 0.3 -N 2 |
| 147 | +# 4 bigmem (skylake) nodes, beta = 0.85 |
| 148 | +/opt/tutorials/benchmarks/HPL/scripts/compute_N -m 3072 -NB 384 -p 85 -N 4 |
| 149 | +``` |
| 150 | + |
| 151 | +Here is a tiny setup I created for a testing case: |
| 152 | + |
| 153 | +```bash |
| 154 | +/opt/tutorials/benchmarks/HPL/scripts/compute_N -m 128 -NB 192 -r 0.3 -N 2 |
| 155 | +``` |
| 156 | + |
| 157 | +Next, you might care about the input data, a file called `hpl.dat`. By default we use |
| 158 | +a template that is populated by the above variables, and here is another example that I found |
| 159 | +in the repository: |
| 160 | + |
| 161 | +<details> |
| 162 | + |
| 163 | +<summary>Default hpl.dat</summary> |
| 164 | + |
| 165 | +```console |
| 166 | +HPLinpack benchmark input file |
| 167 | +Innovative Computing Laboratory, University of Tennessee |
| 168 | +HPL.out output file name (if any) |
| 169 | +6 device out (6=stdout,7=stderr,file) |
| 170 | +1 # of problems sizes (N) |
| 171 | +24650 Ns |
| 172 | +1 # of NBs |
| 173 | +192 NBs |
| 174 | +0 PMAP process mapping (0=Row-,1=Column-major) |
| 175 | +2 # of process grids (P x Q) |
| 176 | +2 4 Ps |
| 177 | +14 7 Qs |
| 178 | +16.0 threshold |
| 179 | +1 # of panel fact |
| 180 | +2 PFACTs (0=left, 1=Crout, 2=Right) |
| 181 | +1 # of recursive stopping criterium |
| 182 | +4 NBMINs (>= 1) |
| 183 | +1 # of panels in recursion |
| 184 | +2 NDIVs |
| 185 | +1 # of recursive panel fact. |
| 186 | +1 RFACTs (0=left, 1=Crout, 2=Right) |
| 187 | +1 # of broadcast |
| 188 | +1 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) |
| 189 | +1 # of lookahead depth |
| 190 | +1 DEPTHs (>=0) |
| 191 | +2 SWAP (0=bin-exch,1=long,2=mix) |
| 192 | +64 swapping threshold |
| 193 | +0 L1 in (0=transposed,1=no-transposed) form |
| 194 | +0 U in (0=transposed,1=no-transposed) form |
| 195 | +1 Equilibration (0=no,1=yes) |
| 196 | +8 memory alignment in double (> 0) |
| 197 | +##### This line (no. 32) is ignored (it serves as a separator). ###### |
| 198 | +0 Number of additional problem sizes for PTRANS |
| 199 | +1200 10000 30000 values of N |
| 200 | +0 number of additional blocking sizes for PTRANS |
| 201 | +40 9 8 13 13 20 16 32 64 values of NB |
| 202 | +``` |
| 203 | + |
| 204 | +</details> |
| 205 | + |
| 206 | +If there is something above not properly exposed please [let us know](https://github.com/converged-computing/metrics-operator/issues). |
0 commit comments