test(perf): add block latency test #5166

roypat · 2025-04-23T14:30:25Z

fio emits latency metrics regarding how much time was spent inside the
guest operating system (submission latency, slat) or how much time was
spent in the device (clat). For firecracker, the latter could be
relevant, so emit them from the block performance tests.

Signed-off-by: Patrick Roy [email protected]## Changes

...

Reason

...

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

I have read and understand CONTRIBUTING.md.
I have run tools/devtool checkstyle to verify that the PR passes the
automated style checks.
I have described what is done in these changes, why they are needed, and
how they are solving the problem in a clear and encompassing way.
I have updated any relevant documentation (both in code and in the docs)
in the PR.
I have mentioned all user-facing changes in CHANGELOG.md.
If a specific issue led to this PR, this PR closes the issue.
When making API changes, I have followed the
Runbook for Firecracker API changes.
I have tested all new and changed functionalities in unit tests and/or
integration tests.
I have linked an issue to every new TODO.

This functionality cannot be added in rust-vmm.

this option only affects the output to stdout, but we are ignoring fio's
stdout (we only work with the log files, which are separate). So drop
this parameter.

Signed-off-by: Patrick Roy [email protected]

this option only affects the output to stdout, but we are ignoring fio's stdout (we only work with the log files, which are separate). So drop this parameter. Signed-off-by: Patrick Roy <[email protected]>

This test just boots a VM, which a ton of other tests also do, so if memory overhead really does change, we'll catch it in other tests. On the other hand, having this test just crash if memory overhead goes above 5MB is not very useful, because it prevent this test from being run as a A/B-test in scenarios where memory overhead is indeed increasing. Signed-off-by: Patrick Roy <[email protected]>

Ensure that `ps(1)` does not truncate the command, which might result in the grep failing (if the jailer_id gets truncated), using the -ww option. While we're at it, also use -o cmd so that ps only prints the command names and nothing else (as we're not using anything else from this output). This causes false-positives instead of false-negatives funnily enough, because we're using check_output, meaning if the grep doesnt find anything we fail the command (in the "everything works" scenario, firecracker is dead but grep still matches the "ps | grep" process itself). Signed-off-by: Patrick Roy <[email protected]>

codecov · 2025-04-23T14:34:52Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.06%. Comparing base (52919c4) to head (f15271a).
Report is 8 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5166      +/-   ##
==========================================
+ Coverage   83.01%   83.06%   +0.05%     
==========================================
  Files         250      250              
  Lines       26897    26897              
==========================================
+ Hits        22328    22342      +14     
+ Misses       4569     4555      -14

Flag	Coverage Δ
5.10-c5n.metal	`83.56% <ø> (ø)`
5.10-m5n.metal	`83.56% <ø> (ø)`
5.10-m6a.metal	`82.79% <ø> (+<0.01%)`	⬆️
5.10-m6g.metal	`79.34% <ø> (ø)`
5.10-m6i.metal	`83.55% <ø> (ø)`
5.10-m7a.metal-48xl	`82.77% <ø> (?)`
5.10-m7g.metal	`79.34% <ø> (ø)`
5.10-m7i.metal-24xl	`83.52% <ø> (?)`
5.10-m7i.metal-48xl	`83.52% <ø> (?)`
5.10-m8g.metal-24xl	`79.34% <ø> (?)`
5.10-m8g.metal-48xl	`79.34% <ø> (?)`
6.1-c5n.metal	`83.61% <ø> (+<0.01%)`	⬆️
6.1-m5n.metal	`83.61% <ø> (+<0.01%)`	⬆️
6.1-m6a.metal	`82.83% <ø> (-0.01%)`	⬇️
6.1-m6g.metal	`79.34% <ø> (ø)`
6.1-m6i.metal	`83.59% <ø> (-0.01%)`	⬇️
6.1-m7a.metal-48xl	`82.82% <ø> (?)`
6.1-m7g.metal	`79.33% <ø> (-0.01%)`	⬇️
6.1-m7i.metal-24xl	`83.62% <ø> (?)`
6.1-m7i.metal-48xl	`83.62% <ø> (?)`
6.1-m8g.metal-24xl	`79.34% <ø> (?)`
6.1-m8g.metal-48xl	`79.34% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tests/integration_tests/performance/test_block_ab.py

Currently, if something matches the A/B-testing ignore list, then all metrics emitted from a test with a dimension set that is a super set of an ignored one is ignored. Refine this to allow only ignoring specific metrics. Realize this by synthesizing a fake dimension called 'metric' that stores the metric. This will later be used when we introduce block latency tests, as we will want to A/B-test throughput but ignore latency in scenarios where fio's async workload generator is used. Signed-off-by: Patrick Roy <[email protected]>

When an A/B-Test fails, it prints all dimensions associated with the metric that changed. However, if some dimension is the same across literally all metrics emitted (for example, instance name and host kernel version will never change in the middle of a test run), then that's arguably just noise, and makes it hard to parse potentially interesting dimensions. So avoid printing all dimensions that are literally the same across all metrics. Note that this does _not_ mean for that example if cpu_utilization only changes to read throughput that the "read vs write" dimension won't be printed anymore. We only drop dimensions if the are the same across _all_ metrics, regardless of whether they had a statistically significant change. In this scenario, the "mode: write" metric still exists, it simply didn't change, and so the "mode: read" line won't be dropped from the output. Before: [Firecracker A/B-Test Runner] A/B-testing shows a change of -2.07μs, or -4.70%, (from 44.04μs to 41.98μs) for metric clat_read with p=0.0002. This means that observing a change of this magnitude or worse, assuming that performance characteristics did not change across the tested commits, has a probability of 0.02%. Tested Dimensions: { "cpu_model": "AMD EPYC 7R13 48-Core Processor", "fio_block_size": "4096", "fio_mode": "randrw", "guest_kernel": "linux-6.1", "guest_memory": "1024.0MB", "host_kernel": "linux-6.8", "instance": "m6a.metal", "io_engine": "Sync", "performance_test": "test_block_latency", "rootfs": "ubuntu-24.04.squashfs", "vcpus": "2" } After: [Firecracker A/B-Test Runner] A/B-testing shows a change of -2.07μs, or -4.70%, (from 44.04μs to 41.98μs) for metric clat_read with p=0.0002. This means that observing a change of this magnitude or worse, assuming that performance characteristics did not change across the tested commits, has a probability of 0.02%. Tested Dimensions: { "guest_kernel": "linux-6.1", "io_engine": "Sync", "vcpus": "2" } Signed-off-by: Patrick Roy <[email protected]>

Allow passing arbitrary pytest options through to the ab-testing script, so that things like `-k` can be used for test selection. Signed-off-by: Patrick Roy <[email protected]>

This has two reasons: - When adding block latency tests (e.g. duplicating all existing test cases to also run with fio's sync workload generator), the runtime will exceed 1 hour, which is the buildkite pipeline timeout) - Having the sync and async cases in the same buildkite step means that the A/B-testing framework will try to cross-correct between the sync and async engine, but comparing results from these makes no sense because they are completely disjoint code paths in firecracker and the host kernel, so there is no reason to believe that their regressions should be correlated. Signed-off-by: Patrick Roy <[email protected]>

fio emits latency metrics regarding how much time was spent inside the guest operating system (submission latency, slat) or how much time was spent in the device (clat). For firecracker, the latter could be relevant, so emit these from our perf tests. To get non-volatile latency numbers, we need to use a synchronous fio worker to get non-volatile metrics. However, for throughput tests the use of the async engine in the guest is required to get maximum throughput. Signed-off-by: Patrick Roy <[email protected]>

roypat added 3 commits April 23, 2025 15:27

drop --output-format=json+ from block throughput tests

1952339

this option only affects the output to stdout, but we are ignoring fio's stdout (we only work with the log files, which are separate). So drop this parameter. Signed-off-by: Patrick Roy <[email protected]>

roypat added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Apr 23, 2025

kalyazin previously approved these changes Apr 23, 2025

View reviewed changes

Manciukic reviewed Apr 23, 2025

View reviewed changes

tests/integration_tests/performance/test_block_ab.py Show resolved Hide resolved

roypat marked this pull request as draft April 23, 2025 15:30

roypat dismissed kalyazin’s stale review via 9220da8 April 23, 2025 15:48

roypat force-pushed the block-latency-test branch 2 times, most recently from fa0b2dd to 391331a Compare April 23, 2025 17:17

roypat marked this pull request as ready for review April 24, 2025 06:08

roypat force-pushed the block-latency-test branch from 391331a to 2b5b1f6 Compare April 24, 2025 06:14

Manciukic previously approved these changes Apr 24, 2025

View reviewed changes

roypat added 2 commits April 24, 2025 12:02

roypat dismissed Manciukic’s stale review via e6b81de April 24, 2025 11:50

roypat force-pushed the block-latency-test branch from 2b5b1f6 to e6b81de Compare April 24, 2025 11:50

roypat added 3 commits April 24, 2025 13:02

test(ab): generalize --test to --pytest-opts

f01e16b

Allow passing arbitrary pytest options through to the ab-testing script, so that things like `-k` can be used for test selection. Signed-off-by: Patrick Roy <[email protected]>

roypat force-pushed the block-latency-test branch from e6b81de to 9bb205f Compare April 24, 2025 12:02

Merge branch 'main' into block-latency-test

f15271a

roypat enabled auto-merge (rebase) April 24, 2025 14:29

kalyazin approved these changes Apr 25, 2025

View reviewed changes

Manciukic approved these changes Apr 25, 2025

View reviewed changes

roypat merged commit ae078ee into firecracker-microvm:main Apr 25, 2025
6 of 7 checks passed

roypat deleted the block-latency-test branch April 25, 2025 10:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(perf): add block latency test #5166

test(perf): add block latency test #5166

Uh oh!

roypat commented Apr 23, 2025

Uh oh!

codecov bot commented Apr 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

test(perf): add block latency test #5166

test(perf): add block latency test #5166

Uh oh!

Conversation

roypat commented Apr 23, 2025

Reason

License Acceptance

PR Checklist

Uh oh!

codecov bot commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Apr 23, 2025 •

edited

Loading