|
| 1 | +Sysbench workflow on LBS |
| 2 | +======================== |
| 3 | + |
| 4 | +[Sysbench](https://github.com/akopytov/sysbench) |
| 5 | +[FOSDEM 2017 PDF slides](https://archive.fosdem.org/2017/schedule/event/sysbench/attachments/slides/1519/export/events/attachments/sysbench/slides/1519/sysbench.pdf) |
| 6 | +is a scriptable multi-threaded benchmark tool used for benchmarking databases. |
| 7 | + |
| 8 | +kdevops supports automating tests with sysbench, and the first test it supports |
| 9 | +is one focused on quantifying TPS variability results when one disable the |
| 10 | +MySQL double write buffer only across a series of different filesystems. kdevops |
| 11 | +makes adding support for testing TPS variability on different filesystems |
| 12 | +easy, and allows you to easily run these tests on bare metal, virtualized |
| 13 | +guests (optionally with |
| 14 | +[kdevops PCIe passthrough](../libvirt-pcie-passthrough.md)) |
| 15 | +and on different cloud providers which kdevops supports. |
| 16 | + |
| 17 | +Support was based on transcribing an initial simple shell docker proof of |
| 18 | +concept |
| 19 | +[plot-sysbench](https://github.com/mcgrof/plot-sysbench) |
| 20 | +into kdevops. |
| 21 | +Results of using |
| 22 | +[plot-sysbench](https://github.com/mcgrof/plot-sysbench) |
| 23 | +on AWS i4i.4xlarge instance was used with debian-12 image, docker MySQL and |
| 24 | +sysbench images and plots for a 12 hour run collected below to demonstrate what |
| 25 | +you should expect with automation from kdevops. Initial testing with of the TPS |
| 26 | +variability workflow on kdevops has been done with VMs only, but enough work is |
| 27 | +in place to easily verify / extend it with |
| 28 | +[kdevops PCIe passthrough](../libvirt-pcie-passthrough.md) |
| 29 | +and cloud support, it should just be a matter of a few Kconfig changes, and then |
| 30 | +also for cloud making sure we move around some folders into partitions with more |
| 31 | +space like the docker folders as some cloud instances use a small root disk. For |
| 32 | +demonstration purposes image results from |
| 33 | +[plot-sysbench](https://github.com/mcgrof/plot-sysbench) are used for now but |
| 34 | +the same should easily be possible with kdevops on the cloud as well. |
| 35 | + |
| 36 | +# Running TPS variability tests |
| 37 | + |
| 38 | +Just use: |
| 39 | + |
| 40 | +```bash |
| 41 | +make defconfig-sysbench-mysql-atomic-tps-variability |
| 42 | +make -j$(nproc) |
| 43 | +make bringup |
| 44 | +make sysbench |
| 45 | +make sysbench-test |
| 46 | +``` |
| 47 | + |
| 48 | +The results will be placed in workflows/sysbench/results/ |
| 49 | + |
| 50 | +## Bringup tests |
| 51 | + |
| 52 | +Below are the bringup methods tested so far which should work well |
| 53 | + |
| 54 | + * VMs with guestfs |
| 55 | + |
| 56 | +## TODO |
| 57 | + |
| 58 | + * Test VMs with PCI passthrough |
| 59 | + * Test at least one cloud provider |
| 60 | + * Add PostgreSQL support |
| 61 | + |
| 62 | +# Definitions |
| 63 | + |
| 64 | +## TPS variability |
| 65 | + |
| 66 | +We define the TPS variability the square of the standard deviation. |
| 67 | + |
| 68 | +## TPS outliers |
| 69 | + |
| 70 | +Outliers are TPS values 1.5 outside |
| 71 | +[IQR](https://en.wikipedia.org/wiki/Interquartile_range). |
| 72 | +There is likely a better value other than 1.5, a database expert should provide |
| 73 | +input here. |
| 74 | + |
| 75 | +# Example image results from kdevops |
| 76 | + |
| 77 | +## TPS changes |
| 78 | + |
| 79 | +### xfs 16k vs ext4 bigalloc 16k - 12 hour MySQL run |
| 80 | + |
| 81 | +This compares XFS with 16k block size filesystem against ext4 using a 4k |
| 82 | +block size and 16k bigalloc cluster size. |
| 83 | + |
| 84 | +<img src="ext4-bigalloc-16k-Vs-xfs-16k-reflink-24-tables-512-threads-aws-i4i-4xlarge.png" align=center alt="xfs 16k vs ext4 bigalloc"> |
| 85 | + |
| 86 | +### xfs 16k the effects of disabling the double write buffer |
| 87 | + |
| 88 | +This compares running XFS with a 16k block size filesystem on two nodes, with |
| 89 | +one node with the double write buffer enabled Vs on the other node the double |
| 90 | +write buffer disabled. |
| 91 | + |
| 92 | +TPS results: |
| 93 | + |
| 94 | +<img src="xfs-16k-Vs-xfs-16k-doublewrite-vs-nodoublewrite.png" align=center alt="xfs 16k vs ext4 bigalloc"> |
| 95 | + |
| 96 | +Visualizing TPS variability: since TPS variability is defined as the square of |
| 97 | +the standard deviation, this image shows us what the standard deviation is |
| 98 | +using a bell curve. The standard deviation starts at the center peak of the |
| 99 | +curve and is the length from the center to the right or left of the same |
| 100 | +colored vertical dotted line. |
| 101 | + |
| 102 | +<img src="combined_hist_bell_curve.png" align=center alt="xfs 16k vs ext4 bigalloc"> |
| 103 | + |
| 104 | +TPS variability factor change: this just squares the standard deviation. But |
| 105 | +it visualizes the delta in terms of factors of the difference in TPS |
| 106 | +variability. |
| 107 | + |
| 108 | +<img src="variance_bar.png" align=center alt="xfs 16k vs ext4 bigalloc"> |
| 109 | + |
| 110 | +Visualizing TPS outliers: using the definition of an outlier above, this |
| 111 | +visualizes the outliers, those outside the box. |
| 112 | + |
| 113 | +<img src="outliers_plot.png" align=center alt="xfs 16k vs ext4 bigalloc"> |
| 114 | + |
| 115 | +kdevops has a way to do a factor analysis. |
| 116 | + |
| 117 | +# Hacking |
| 118 | + |
| 119 | +Things to know if you're going to hack on this. |
| 120 | + |
| 121 | +## Use of Kconfig for support |
| 122 | + |
| 123 | +Everything is defined through Kconfig, specially now that |
| 124 | +kdevops supports an extention of kconfig which lets us be selective over |
| 125 | +which Kconfig symbols we want to propagate onto kdevops extra_vars.yaml. |
| 126 | +Look for "output yaml" on Kconfig files. |
| 127 | + |
| 128 | +Adding different filesystems and filesystems configurations should mostly be |
| 129 | +a matter of Kconfig edits. |
| 130 | + |
| 131 | +To review support for PCI passthrough or cloud support all you need to do is |
| 132 | +extend modify SYSBENCH_DEVICE so the correct drive is used in |
| 133 | +workflows/sysbench/Kconfig.fs. For example right now: |
| 134 | + |
| 135 | + * if libvirt is used and virtio is used /dev/disk/by-id/virtio-kdevops1 is used |
| 136 | + * if libvirt is used and nvme is used /dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_kdevops1 is used |
| 137 | + * if the AWS m5ad_4xlarge instance is used /dev/nvme2n1 is used and |
| 138 | + * if OCI is used the sparse volume defined in TERRAFORM_OCI_SPARSE_VOLUME_DEVICE_FILE_NAME |
| 139 | +is used. |
| 140 | + |
| 141 | +Experience shows at least that AWS needs also some pre-run work to ensure all |
| 142 | +extra data for docker is on a partition which won't fill /. Future work to |
| 143 | +kdevops should be done for cloud providers per type of target instance to |
| 144 | +adjust data. |
| 145 | + |
| 146 | +Since we already have support for testing fstests with real NVMe drives with |
| 147 | +[kdevops PCIe passthrough](../libvirt-pcie-passthrough.md) support, it should |
| 148 | +easily be possible to leverage that as a way to also support for |
| 149 | +[kdevops PCIe passthrough](../libvirt-pcie-passthrough.md) for sysbench testing. |
0 commit comments