|
| 1 | +# Balance CPU Crimson. |
| 2 | + |
| 3 | +---------- |
| 4 | + |
| 5 | +We introduced the following utilities to help analysing the Performance impact of two strategies for |
| 6 | +allocation of CPU cores to Seastar reactor threads. This is limited to a single host deployment at the moment. |
| 7 | + |
| 8 | +- OSD-based: this consists on allocating CPU cores from the same NUMA socket to the same OSD. |
| 9 | + for simplicity, if the OSD id is even, all its reactor threads are allocated to NUMA socket 0, and |
| 10 | + consequently if the OSD is is odd, all its reactor threads are allocated to NUMA socket 1. |
| 11 | + |
| 12 | +- NUMA socket based: this consists of allocating evenly CPU cores from each NUMA socket to the reactors, so |
| 13 | + all the OSD end up with reactor on both NUMA sockets. |
| 14 | + |
| 15 | +A new option `--crimson-balance-cpu <osd|socket>` has been implemented in `vstart.sh` to support these strategies. |
| 16 | + |
| 17 | +Worth pointing out, there are *three* CPU allocation strategies: |
| 18 | + |
| 19 | +- when the new flag is not specified (default), Seastar reactors to use CPUs in ascending contiguous order (unbalanced across sockets), |
| 20 | +- osd: distribute across sockets uniformly, don't split within an OSD, |
| 21 | +- socket: distribute across sockets uniformly, split within an OSD. |
| 22 | + |
| 23 | +The utilities introduced are: |
| 24 | + |
| 25 | +- `balance-cpu.py`: a stand-alone script to produce the list of CPU core ids to use by `vstart.sh` when allocating |
| 26 | + Seastar reactor threads. It uses as input the .json produced by `lscpu.py`. |
| 27 | +- `lscpu.py`: a Python module to parse the .json file created by `lscpu --json`. This produces a Python dictionary |
| 28 | + with the NUMA details, that is, number of sockets, range of CPU core ids (physical and HT-siblings). |
| 29 | +- `tasksetcpu.py`: a stand-alone script to produce a grid showing the current CPU allocation, useful to quickly |
| 30 | + visualise the allocation strategy. |
| 31 | + |
| 32 | +## Usage: |
| 33 | + |
| 34 | +The following is a typical example of creating a cluster with three OSDs and three reactors per OSD, and |
| 35 | +the desired CPU allocation policy: |
| 36 | + |
| 37 | +``` |
| 38 | +# MDS=0 MON=1 OSD=3 MGR=1 /ceph/src/vstart.sh --new -x --localhost --without-dashboard --cyanstore --redirect-output --crimson --crimson-smp 3 --no-restart --crimson-balance-cpu osd |
| 39 | +``` |
| 40 | + |
| 41 | +The following is the corresponding CPU distribution: |
| 42 | + |
| 43 | + |
| 44 | + |
| 45 | +The following snippet shows the typical usage of the `balance-cpu.py` script: |
| 46 | + |
| 47 | +``` |
| 48 | +lscpu --json > /tmp/numa_nodes.json |
| 49 | +python3 ${CEPH_DIR}/../src/tools/contrib/balance-cpu.py -o $CEPH_NUM_OSD -r $crimson_smp \ |
| 50 | + -b $balance_strategy -u /tmp/numa_nodes.json > /tmp/numa_args.out |
| 51 | +``` |
| 52 | +* the accepted balance strategies are "osd" or "socket". |
| 53 | +* the file produced `/tmp/numa_args.out` contains the list of CPU ids that `vstart.sh` consumes to issue the corresponding ceph configuration commands. |
| 54 | + |
| 55 | +The grid can be printed as follows: |
| 56 | + |
| 57 | +``` |
| 58 | + [ ! -f "${NUMA_NODES_OUT}" ] && lscpu --json > ${NUMA_NODES_OUT} |
| 59 | + python3 /ceph/src/tools/contrib/tasksetcpu.py -c $TEST_NAME -u ${NUMA_NODES_OUT} -d ${RUN_DIR} |
| 60 | +``` |
| 61 | + |
| 62 | +## Performance |
| 63 | + |
| 64 | +The following charts show the comparison of IOPs for the three CPU allocation policies: default |
| 65 | +(contiguous allocation, no balance), OSD-based, NUMA socket-based. It is interesting to note that |
| 66 | +there does not seem to be any significant throughput degradation, for this small configuration |
| 67 | +(3 OSD, 3 reactors). However, the OSD-based allocation requires higher memory utilisation than the other |
| 68 | +two configurations, which is an interesting finding and requires further investigation. |
| 69 | + |
| 70 | + |
| 71 | + |
| 72 | + |
| 73 | + |
| 74 | + |
| 75 | + |
0 commit comments