4
4
[ Kubernetes] ( https://kubernetes.io/ ) clusters.
5
5
6
6
- [ Installation] ( #installation )
7
+ - [ Network selection] ( #network-selection )
8
+ - [ Benchmark set] ( #benchmark-set )
7
9
- [ Benchmarks] ( #benchmarks )
8
10
- [ iperf] ( #iperf )
9
- - [ Intel MPI Benchmarks (IMB) MPI1 PingPong] ( #intel- mpi-benchmarks-imb-mpi1 -pingpong )
11
+ - [ MPI PingPong] ( #mpi-pingpong )
10
12
- [ OpenFOAM] ( #openfoam )
11
- - [ Benchmark set] ( #benchmark-set )
13
+ - [ RDMA Bandwidth] ( #rdma-bandwidth )
14
+ - [ RDMA Latency] ( #rdma-latency )
12
15
13
16
## Installation
14
17
@@ -26,6 +29,75 @@ helm upgrade \
26
29
27
30
For most use cases, no customisations to the Helm values will be necessary.
28
31
32
+ ## Network selection
33
+
34
+ All the benchmarks are capable of running using the Kubernetes pod network or the host network
35
+ (using ` hostNetwork: true ` on the benchmark pods).
36
+
37
+ Benchmarks are also able to run on accelerated networks where available, using
38
+ [ Multus] ( https://github.com/k8snetworkplumbingwg/multus-cni ) for multiple CNIs and
39
+ [ device plugins] ( https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/ )
40
+ to request network resources.
41
+
42
+ This allows benchmarks to levarage technologies such as
43
+ [ SR-IOV] ( https://en.wikipedia.org/wiki/Single-root_input/output_virtualization )
44
+ (via the [ SR-IOV network device plugin] ( https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin ) ),
45
+ [ macvlan] ( https://backreference.org/2014/03/20/some-notes-on-macvlanmacvtap/ ) (via the
46
+ [ macvlan CNI plugin] ( https://www.cni.dev/plugins/current/main/macvlan/ ) ) and
47
+ [ RDMA] ( https://en.wikipedia.org/wiki/Remote_direct_memory_access )
48
+ (e.g. via the [ RDMA shared device plugin] ( https://github.com/Mellanox/k8s-rdma-shared-dev-plugin ) ).
49
+
50
+ The networking is configured using the following properties of the benchmark ` spec ` :
51
+
52
+ ``` yaml
53
+ spec :
54
+ # Indicates whether to use host networking or not
55
+ # If true, networkName is not used
56
+ hostNetwork : false
57
+ # The name of a Multus network to use
58
+ # Only used if hostNetwork is false
59
+ # If not given, the Kubernetes pod network is used
60
+ networkName : namespace/netname
61
+ # The resources for benchmark pods
62
+ resources :
63
+ limits :
64
+ # E.g. requesting a share of an RDMA device
65
+ rdma/hca_shared_devices_a : 1
66
+ # The MTU to set on the interface *inside* the container
67
+ # If not given, the default MTU is used
68
+ mtu : 9000
69
+ ` ` `
70
+
71
+ ## Benchmark set
72
+
73
+ The ` kube-perftest` operator provides a `BenchmarkSet` resource that can be used to run
74
+ the same benchmark over a sweep of parameters :
75
+
76
+ ` ` ` yaml
77
+ apiVersion: perftest.stackhpc.com/v1alpha1
78
+ kind: BenchmarkSet
79
+ metadata:
80
+ name: iperf
81
+ spec:
82
+ # The template for the fixed parts of the benchmark
83
+ template:
84
+ apiVersion: perftest.stackhpc.com/v1alpha1
85
+ kind: IPerf
86
+ spec:
87
+ duration: 30
88
+ # Defines the permutations for the set
89
+ # Each permutation is merged into the spec of the template
90
+ permutations:
91
+ # Permutations are calculated as a cross-product of the specified names and values
92
+ product:
93
+ hostNetwork: [true, false]
94
+ streams: [1, 2, 4, 8, 16, 32, 64]
95
+ # A list of explicit permutations to include
96
+ explicit:
97
+ - hostNetwork: true
98
+ streams: 128
99
+ ` ` `
100
+
29
101
# # Benchmarks
30
102
31
103
Currently, the following benchmarks are supported :
@@ -35,92 +107,116 @@ Currently, the following benchmarks are supported:
35
107
Runs the [iperf](https://en.wikipedia.org/wiki/Iperf) network performance tool to measure bandwidth
36
108
for a transfer between two pods.
37
109
38
- Can be run using CNI or host networking, using a Kubernetes ` Service ` or connecting
39
- directly to the server pod and using a configurable number of client streams.
40
-
41
110
` ` ` yaml
42
111
apiVersion: perftest.stackhpc.com/v1alpha1
43
112
kind: IPerf
44
113
metadata:
45
114
name: iperf
46
115
spec:
47
- # Indicates whether to use the host network or the pod network
48
- hostNetwork : true
49
116
# The number of parallel streams to use
50
117
streams: 8
51
118
# The duration of the test
52
119
duration: 30
53
120
` ` `
54
121
55
- ### Intel MPI Benchmarks (IMB) MPI1 PingPong
122
+ # ## MPI PingPong
56
123
57
124
Runs the
58
- [IMB- MPI1 PingPong](https://www.intel.com/content/www/us/en/develop/documentation/imb-user-guide/top/mpi-1-benchmarks/single-transfer-benchmarks/pingpong-pingpongspecificsource-pingponganysource.html)
125
+ [Intel MPI Benchmarks ( IMB) MPI1 PingPong](https://www.intel.com/content/www/us/en/develop/documentation/imb-user-guide/top/mpi-1-benchmarks/single-transfer-benchmarks/pingpong-pingpongspecificsource-pingponganysource.html)
59
126
benchmark to measure the average round-trip time and bandwidth for MPI messages of different sizes
60
127
between two pods.
61
128
62
- Currently uses MPI over TCP, initialised over SSH, and can be run using CNI or host networking.
129
+ Uses [Open MPI](https://www.open-mpi.org/) initialised over SSH. The data plane can use TCP
130
+ or, hardware and network permitting, [RDMA](https://en.wikipedia.org/wiki/Remote_direct_memory_access)
131
+ via [UCX](https://openucx.org/).
63
132
64
133
` ` ` yaml
65
134
apiVersion: perftest.stackhpc.com/v1alpha1
66
135
kind: MPIPingPong
67
136
metadata:
68
137
name: mpi-pingpong
69
138
spec:
70
- # Indicates whether to use the host network or the pod network
71
- hostNetwork : true
139
+ # The MPI transport to use - one of TCP, RDMA
140
+ transport: TCP
72
141
` ` `
73
142
74
143
# ## OpenFOAM
75
144
145
+ [OpenFOAM](https://www.openfoam.com/) is a toolbox for solving problems in
146
+ [computational fluid dynamics (CFD)](https://en.wikipedia.org/wiki/Computational_fluid_dynamics).
147
+ It is included here as an example of a "real world" workload.
148
+
76
149
This benchmark runs the
77
150
[3-D Lid Driven cavity flow benchmark](https://develop.openfoam.com/committees/hpc#3-d-lid-driven-cavity-flow)
78
- from the OpenFOAM(https://www.openfoam.com/) benchmark suite.
151
+ from the OpenFOAM benchmark suite.
79
152
80
- Currently uses MPI over TCP, initialised over SSH, and can be run using CNI or host networking.
153
+ Uses [Open MPI](https://www.open-mpi.org/) initialised over SSH. The data plane can use TCP
154
+ or, hardware and network permitting, [RDMA](https://en.wikipedia.org/wiki/Remote_direct_memory_access)
155
+ via [UCX](https://openucx.org/).
81
156
82
157
` ` ` yaml
83
158
apiVersion: perftest.stackhpc.com/v1alpha1
84
159
kind: OpenFOAM
85
160
metadata:
86
161
name: openfoam
87
162
spec:
88
- # Indicates whether to use the host network or the pod network
89
- hostNetwork : false
90
- # The problem size to use ( S, M, XL, XXL)
163
+ # The MPI transport to use - one of TCP, RDMA
164
+ transport: TCP
165
+ # The problem size to use - one of S, M, XL, XXL
91
166
problemSize: S
92
167
# The number of MPI processes to use
93
168
numProcs: 16
94
169
# The number of MPI pods to launch
95
170
numNodes: 8
96
171
` ` `
97
172
98
- ## Benchmark set
173
+ # ## RDMA Bandwidth
99
174
100
- The ` kube-perftest` operator provides a `BenchmarkSet` resource that can be used to run
101
- the same benchmark over a sweep of parameters :
175
+ Runs the RDMA bandwidth benchmarks (i.e. `ib_{read,write}_bw`) from the
176
+ [perftest collection](https://github.com/linux-rdma/perftest).
177
+
178
+ This benchmark requires an RDMA-capable network to be specified.
102
179
103
180
` ` ` yaml
104
181
apiVersion: perftest.stackhpc.com/v1alpha1
105
- kind: BenchmarkSet
182
+ kind: RDMABandwidth
106
183
metadata:
107
- name: iperf
184
+ name: rdma-bandwidth
108
185
spec:
109
- # The template for the fixed parts of the benchmark
110
- template:
111
- apiVersion: perftest.stackhpc.com/v1alpha1
112
- kind: IPerf
113
- spec:
114
- duration: 30
115
- # Defines the permutations for the set
116
- # Each permutation is merged into the spec of the template
117
- permutations:
118
- # Permutations are calculated as a cross-product of the specified names and values
119
- product:
120
- hostNetwork: [true, false]
121
- streams: [1, 2, 4, 8, 16, 32, 64]
122
- # A list of explicit permutations to include
123
- explicit:
124
- - hostNetwork: true
125
- streams: 128
186
+ # The mode for the test - read or write
187
+ mode: read
188
+ # The number of iterations to do at each message size
189
+ # Defaults to 1000 if not given
190
+ iterations: 1000
191
+ # The number of queue pairs to use
192
+ # Defaults to 1 if not given
193
+ # A higher number of queue pairs can help to spread traffic,
194
+ # e.g. over NICs in a bond when using RoCE-LAG
195
+ qps: 1
196
+ # Extra arguments to be added to the command
197
+ extraArgs:
198
+ - --tclass=96
199
+ ` ` `
200
+
201
+ # ## RDMA Latency
202
+
203
+ Runs the RDMA latency benchmarks (i.e. `ib_{read,write}_lat`) from the
204
+ [perftest collection](https://github.com/linux-rdma/perftest).
205
+
206
+ This benchmark requires an RDMA-capable network to be specified.
207
+
208
+ ` ` ` yaml
209
+ apiVersion: perftest.stackhpc.com/v1alpha1
210
+ kind: RDMALatency
211
+ metadata:
212
+ name: rdma-latency
213
+ spec:
214
+ # The mode for the test - read or write
215
+ mode: read
216
+ # The number of iterations to do at each message size
217
+ # Defaults to 1000 if not given
218
+ iterations: 1000
219
+ # Extra arguments to be added to the command
220
+ extraArgs:
221
+ - --tclass=96
126
222
` ` `
0 commit comments