Skip to content

Commit 975eb46

Browse files
author
Matt Pryor
committed
Updates to README
1 parent 1aa9ae8 commit 975eb46

File tree

1 file changed

+135
-39
lines changed

1 file changed

+135
-39
lines changed

README.md

Lines changed: 135 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,14 @@
44
[Kubernetes](https://kubernetes.io/) clusters.
55

66
- [Installation](#installation)
7+
- [Network selection](#network-selection)
8+
- [Benchmark set](#benchmark-set)
79
- [Benchmarks](#benchmarks)
810
- [iperf](#iperf)
9-
- [Intel MPI Benchmarks (IMB) MPI1 PingPong](#intel-mpi-benchmarks-imb-mpi1-pingpong)
11+
- [MPI PingPong](#mpi-pingpong)
1012
- [OpenFOAM](#openfoam)
11-
- [Benchmark set](#benchmark-set)
13+
- [RDMA Bandwidth](#rdma-bandwidth)
14+
- [RDMA Latency](#rdma-latency)
1215

1316
## Installation
1417

@@ -26,6 +29,75 @@ helm upgrade \
2629

2730
For most use cases, no customisations to the Helm values will be necessary.
2831

32+
## Network selection
33+
34+
All the benchmarks are capable of running using the Kubernetes pod network or the host network
35+
(using `hostNetwork: true` on the benchmark pods).
36+
37+
Benchmarks are also able to run on accelerated networks where available, using
38+
[Multus](https://github.com/k8snetworkplumbingwg/multus-cni) for multiple CNIs and
39+
[device plugins](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/)
40+
to request network resources.
41+
42+
This allows benchmarks to levarage technologies such as
43+
[SR-IOV](https://en.wikipedia.org/wiki/Single-root_input/output_virtualization)
44+
(via the [SR-IOV network device plugin](https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin)),
45+
[macvlan](https://backreference.org/2014/03/20/some-notes-on-macvlanmacvtap/) (via the
46+
[macvlan CNI plugin](https://www.cni.dev/plugins/current/main/macvlan/)) and
47+
[RDMA](https://en.wikipedia.org/wiki/Remote_direct_memory_access)
48+
(e.g. via the [RDMA shared device plugin](https://github.com/Mellanox/k8s-rdma-shared-dev-plugin)).
49+
50+
The networking is configured using the following properties of the benchmark `spec`:
51+
52+
```yaml
53+
spec:
54+
  # Indicates whether to use host networking or not
55+
# If true, networkName is not used
56+
hostNetwork: false
57+
# The name of a Multus network to use
58+
# Only used if hostNetwork is false
59+
# If not given, the Kubernetes pod network is used
60+
networkName: namespace/netname
61+
# The resources for benchmark pods
62+
resources:
63+
limits:
64+
# E.g. requesting a share of an RDMA device
65+
rdma/hca_shared_devices_a: 1
66+
# The MTU to set on the interface *inside* the container
67+
# If not given, the default MTU is used
68+
mtu: 9000
69+
```
70+
71+
## Benchmark set
72+
73+
The `kube-perftest` operator provides a `BenchmarkSet` resource that can be used to run
74+
the same benchmark over a sweep of parameters:
75+
76+
```yaml
77+
apiVersion: perftest.stackhpc.com/v1alpha1
78+
kind: BenchmarkSet
79+
metadata:
80+
name: iperf
81+
spec:
82+
# The template for the fixed parts of the benchmark
83+
template:
84+
apiVersion: perftest.stackhpc.com/v1alpha1
85+
kind: IPerf
86+
spec:
87+
duration: 30
88+
# Defines the permutations for the set
89+
# Each permutation is merged into the spec of the template
90+
permutations:
91+
# Permutations are calculated as a cross-product of the specified names and values
92+
product:
93+
hostNetwork: [true, false]
94+
streams: [1, 2, 4, 8, 16, 32, 64]
95+
# A list of explicit permutations to include
96+
explicit:
97+
- hostNetwork: true
98+
streams: 128
99+
```
100+
29101
## Benchmarks
30102

31103
Currently, the following benchmarks are supported:
@@ -35,92 +107,116 @@ Currently, the following benchmarks are supported:
35107
Runs the [iperf](https://en.wikipedia.org/wiki/Iperf) network performance tool to measure bandwidth
36108
for a transfer between two pods.
37109

38-
Can be run using CNI or host networking, using a Kubernetes `Service` or connecting
39-
directly to the server pod and using a configurable number of client streams.
40-
41110
```yaml
42111
apiVersion: perftest.stackhpc.com/v1alpha1
43112
kind: IPerf
44113
metadata:
45114
name: iperf
46115
spec:
47-
# Indicates whether to use the host network or the pod network
48-
hostNetwork: true
49116
# The number of parallel streams to use
50117
streams: 8
51118
# The duration of the test
52119
duration: 30
53120
```
54121

55-
### Intel MPI Benchmarks (IMB) MPI1 PingPong
122+
### MPI PingPong
56123

57124
Runs the
58-
[IMB-MPI1 PingPong](https://www.intel.com/content/www/us/en/develop/documentation/imb-user-guide/top/mpi-1-benchmarks/single-transfer-benchmarks/pingpong-pingpongspecificsource-pingponganysource.html)
125+
[Intel MPI Benchmarks (IMB) MPI1 PingPong](https://www.intel.com/content/www/us/en/develop/documentation/imb-user-guide/top/mpi-1-benchmarks/single-transfer-benchmarks/pingpong-pingpongspecificsource-pingponganysource.html)
59126
benchmark to measure the average round-trip time and bandwidth for MPI messages of different sizes
60127
between two pods.
61128

62-
Currently uses MPI over TCP, initialised over SSH, and can be run using CNI or host networking.
129+
Uses [Open MPI](https://www.open-mpi.org/) initialised over SSH. The data plane can use TCP
130+
or, hardware and network permitting, [RDMA](https://en.wikipedia.org/wiki/Remote_direct_memory_access)
131+
via [UCX](https://openucx.org/).
63132

64133
```yaml
65134
apiVersion: perftest.stackhpc.com/v1alpha1
66135
kind: MPIPingPong
67136
metadata:
68137
name: mpi-pingpong
69138
spec:
70-
# Indicates whether to use the host network or the pod network
71-
hostNetwork: true
139+
# The MPI transport to use - one of TCP, RDMA
140+
transport: TCP
72141
```
73142

74143
### OpenFOAM
75144

145+
[OpenFOAM](https://www.openfoam.com/) is a toolbox for solving problems in
146+
[computational fluid dynamics (CFD)](https://en.wikipedia.org/wiki/Computational_fluid_dynamics).
147+
It is included here as an example of a "real world" workload.
148+
76149
This benchmark runs the
77150
[3-D Lid Driven cavity flow benchmark](https://develop.openfoam.com/committees/hpc#3-d-lid-driven-cavity-flow)
78-
from the OpenFOAM(https://www.openfoam.com/) benchmark suite.
151+
from the OpenFOAM benchmark suite.
79152

80-
Currently uses MPI over TCP, initialised over SSH, and can be run using CNI or host networking.
153+
Uses [Open MPI](https://www.open-mpi.org/) initialised over SSH. The data plane can use TCP
154+
or, hardware and network permitting, [RDMA](https://en.wikipedia.org/wiki/Remote_direct_memory_access)
155+
via [UCX](https://openucx.org/).
81156

82157
```yaml
83158
apiVersion: perftest.stackhpc.com/v1alpha1
84159
kind: OpenFOAM
85160
metadata:
86161
name: openfoam
87162
spec:
88-
# Indicates whether to use the host network or the pod network
89-
hostNetwork: false
90-
# The problem size to use (S, M, XL, XXL)
163+
# The MPI transport to use - one of TCP, RDMA
164+
transport: TCP
165+
# The problem size to use - one of S, M, XL, XXL
91166
problemSize: S
92167
# The number of MPI processes to use
93168
numProcs: 16
94169
# The number of MPI pods to launch
95170
numNodes: 8
96171
```
97172

98-
## Benchmark set
173+
### RDMA Bandwidth
99174

100-
The `kube-perftest` operator provides a `BenchmarkSet` resource that can be used to run
101-
the same benchmark over a sweep of parameters:
175+
Runs the RDMA bandwidth benchmarks (i.e. `ib_{read,write}_bw`) from the
176+
[perftest collection](https://github.com/linux-rdma/perftest).
177+
178+
This benchmark requires an RDMA-capable network to be specified.
102179

103180
```yaml
104181
apiVersion: perftest.stackhpc.com/v1alpha1
105-
kind: BenchmarkSet
182+
kind: RDMABandwidth
106183
metadata:
107-
name: iperf
184+
name: rdma-bandwidth
108185
spec:
109-
# The template for the fixed parts of the benchmark
110-
template:
111-
apiVersion: perftest.stackhpc.com/v1alpha1
112-
kind: IPerf
113-
spec:
114-
duration: 30
115-
# Defines the permutations for the set
116-
# Each permutation is merged into the spec of the template
117-
permutations:
118-
# Permutations are calculated as a cross-product of the specified names and values
119-
product:
120-
hostNetwork: [true, false]
121-
streams: [1, 2, 4, 8, 16, 32, 64]
122-
# A list of explicit permutations to include
123-
explicit:
124-
- hostNetwork: true
125-
streams: 128
186+
# The mode for the test - read or write
187+
mode: read
188+
# The number of iterations to do at each message size
189+
# Defaults to 1000 if not given
190+
iterations: 1000
191+
# The number of queue pairs to use
192+
# Defaults to 1 if not given
193+
# A higher number of queue pairs can help to spread traffic,
194+
# e.g. over NICs in a bond when using RoCE-LAG
195+
qps: 1
196+
# Extra arguments to be added to the command
197+
extraArgs:
198+
- --tclass=96
199+
```
200+
201+
### RDMA Latency
202+
203+
Runs the RDMA latency benchmarks (i.e. `ib_{read,write}_lat`) from the
204+
[perftest collection](https://github.com/linux-rdma/perftest).
205+
206+
This benchmark requires an RDMA-capable network to be specified.
207+
208+
```yaml
209+
apiVersion: perftest.stackhpc.com/v1alpha1
210+
kind: RDMALatency
211+
metadata:
212+
name: rdma-latency
213+
spec:
214+
# The mode for the test - read or write
215+
mode: read
216+
# The number of iterations to do at each message size
217+
# Defaults to 1000 if not given
218+
iterations: 1000
219+
# Extra arguments to be added to the command
220+
extraArgs:
221+
- --tclass=96
126222
```

0 commit comments

Comments
 (0)