Skip to content

Commit 83c03c0

Browse files
committed
doc/dev/crimson: Add BalanceCPUCrimson with clarifications as per conversation/review
Signed-off-by: Jose J Palacios-Perez <[email protected]>
1 parent 16e04f5 commit 83c03c0

5 files changed

+75
-0
lines changed
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# Balance CPU Crimson.
2+
3+
----------
4+
5+
We introduced the following utilities to help analysing the Performance impact of two strategies for
6+
allocation of CPU cores to Seastar reactor threads. This is limited to a single host deployment at the moment.
7+
8+
- OSD-based: this consists on allocating CPU cores from the same NUMA socket to the same OSD.
9+
for simplicity, if the OSD id is even, all its reactor threads are allocated to NUMA socket 0, and
10+
consequently if the OSD is is odd, all its reactor threads are allocated to NUMA socket 1.
11+
12+
- NUMA socket based: this consists of allocating evenly CPU cores from each NUMA socket to the reactors, so
13+
all the OSD end up with reactor on both NUMA sockets.
14+
15+
A new option `--crimson-balance-cpu <osd|socket>` has been implemented in `vstart.sh` to support these strategies.
16+
17+
Worth pointing out, there are *three* CPU allocation strategies:
18+
19+
- when the new flag is not specified (default), Seastar reactors to use CPUs in ascending contiguous order (unbalanced across sockets),
20+
- osd: distribute across sockets uniformly, don't split within an OSD,
21+
- socket: distribute across sockets uniformly, split within an OSD.
22+
23+
The utilities introduced are:
24+
25+
- `balance-cpu.py`: a stand-alone script to produce the list of CPU core ids to use by `vstart.sh` when allocating
26+
Seastar reactor threads. It uses as input the .json produced by `lscpu.py`.
27+
- `lscpu.py`: a Python module to parse the .json file created by `lscpu --json`. This produces a Python dictionary
28+
with the NUMA details, that is, number of sockets, range of CPU core ids (physical and HT-siblings).
29+
- `tasksetcpu.py`: a stand-alone script to produce a grid showing the current CPU allocation, useful to quickly
30+
visualise the allocation strategy.
31+
32+
## Usage:
33+
34+
The following is a typical example of creating a cluster with three OSDs and three reactors per OSD, and
35+
the desired CPU allocation policy:
36+
37+
```
38+
# MDS=0 MON=1 OSD=3 MGR=1 /ceph/src/vstart.sh --new -x --localhost --without-dashboard --cyanstore --redirect-output --crimson --crimson-smp 3 --no-restart --crimson-balance-cpu osd
39+
```
40+
41+
The following is the corresponding CPU distribution:
42+
43+
![cyan_3osd_3react_bal_osd](./cyan_3osd_3react_bal_osd.png)
44+
45+
The following snippet shows the typical usage of the `balance-cpu.py` script:
46+
47+
```
48+
lscpu --json > /tmp/numa_nodes.json
49+
python3 ${CEPH_DIR}/../src/tools/contrib/balance-cpu.py -o $CEPH_NUM_OSD -r $crimson_smp \
50+
-b $balance_strategy -u /tmp/numa_nodes.json > /tmp/numa_args.out
51+
```
52+
* the accepted balance strategies are "osd" or "socket".
53+
* the file produced `/tmp/numa_args.out` contains the list of CPU ids that `vstart.sh` consumes to issue the corresponding ceph configuration commands.
54+
55+
The grid can be printed as follows:
56+
57+
```
58+
[ ! -f "${NUMA_NODES_OUT}" ] && lscpu --json > ${NUMA_NODES_OUT}
59+
python3 /ceph/src/tools/contrib/tasksetcpu.py -c $TEST_NAME -u ${NUMA_NODES_OUT} -d ${RUN_DIR}
60+
```
61+
62+
## Performance
63+
64+
The following charts show the comparison of IOPs for the three CPU allocation policies: default
65+
(contiguous allocation, no balance), OSD-based, NUMA socket-based. It is interesting to note that
66+
there does not seem to be any significant throughput degradation, for this small configuration
67+
(3 OSD, 3 reactors). However, the OSD-based allocation requires higher memory utilisation than the other
68+
two configurations, which is an interesting finding and requires further investigation.
69+
70+
71+
![cyan_3osd_3react_bal_vs_unbal_4krandread_iops_vs_lat](./cyan_3osd_3react_bal_vs_unbal_4krandread_iops_vs_lat.png)
72+
73+
![cyan_3osd_3react_bal_vs_unbal_4krandread_osd_cpu](./cyan_3osd_3react_bal_vs_unbal_4krandread_osd_cpu.png)
74+
75+
![cyan_3osd_3react_bal_vs_unbal_4krandread_osd_mem](./cyan_3osd_3react_bal_vs_unbal_4krandread_osd_mem.png)
34.7 KB
Loading
34.2 KB
Loading
34 KB
Loading
20.8 KB
Loading

0 commit comments

Comments
 (0)