-
Notifications
You must be signed in to change notification settings - Fork 2.1k
test(perf): add block latency test #5166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
this option only affects the output to stdout, but we are ignoring fio's stdout (we only work with the log files, which are separate). So drop this parameter. Signed-off-by: Patrick Roy <[email protected]>
This test just boots a VM, which a ton of other tests also do, so if memory overhead really does change, we'll catch it in other tests. On the other hand, having this test just crash if memory overhead goes above 5MB is not very useful, because it prevent this test from being run as a A/B-test in scenarios where memory overhead is indeed increasing. Signed-off-by: Patrick Roy <[email protected]>
Ensure that `ps(1)` does not truncate the command, which might result in the grep failing (if the jailer_id gets truncated), using the -ww option. While we're at it, also use -o cmd so that ps only prints the command names and nothing else (as we're not using anything else from this output). This causes false-positives instead of false-negatives funnily enough, because we're using check_output, meaning if the grep doesnt find anything we fail the command (in the "everything works" scenario, firecracker is dead but grep still matches the "ps | grep" process itself). Signed-off-by: Patrick Roy <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #5166 +/- ##
==========================================
+ Coverage 83.01% 83.06% +0.05%
==========================================
Files 250 250
Lines 26897 26897
==========================================
+ Hits 22328 22342 +14
+ Misses 4569 4555 -14
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
kalyazin
previously approved these changes
Apr 23, 2025
Manciukic
reviewed
Apr 23, 2025
fa0b2dd
to
391331a
Compare
391331a
to
2b5b1f6
Compare
Manciukic
previously approved these changes
Apr 24, 2025
Currently, if something matches the A/B-testing ignore list, then all metrics emitted from a test with a dimension set that is a super set of an ignored one is ignored. Refine this to allow only ignoring specific metrics. Realize this by synthesizing a fake dimension called 'metric' that stores the metric. This will later be used when we introduce block latency tests, as we will want to A/B-test throughput but ignore latency in scenarios where fio's async workload generator is used. Signed-off-by: Patrick Roy <[email protected]>
When an A/B-Test fails, it prints all dimensions associated with the metric that changed. However, if some dimension is the same across literally all metrics emitted (for example, instance name and host kernel version will never change in the middle of a test run), then that's arguably just noise, and makes it hard to parse potentially interesting dimensions. So avoid printing all dimensions that are literally the same across all metrics. Note that this does _not_ mean for that example if cpu_utilization only changes to read throughput that the "read vs write" dimension won't be printed anymore. We only drop dimensions if the are the same across _all_ metrics, regardless of whether they had a statistically significant change. In this scenario, the "mode: write" metric still exists, it simply didn't change, and so the "mode: read" line won't be dropped from the output. Before: [Firecracker A/B-Test Runner] A/B-testing shows a change of -2.07μs, or -4.70%, (from 44.04μs to 41.98μs) for metric clat_read with p=0.0002. This means that observing a change of this magnitude or worse, assuming that performance characteristics did not change across the tested commits, has a probability of 0.02%. Tested Dimensions: { "cpu_model": "AMD EPYC 7R13 48-Core Processor", "fio_block_size": "4096", "fio_mode": "randrw", "guest_kernel": "linux-6.1", "guest_memory": "1024.0MB", "host_kernel": "linux-6.8", "instance": "m6a.metal", "io_engine": "Sync", "performance_test": "test_block_latency", "rootfs": "ubuntu-24.04.squashfs", "vcpus": "2" } After: [Firecracker A/B-Test Runner] A/B-testing shows a change of -2.07μs, or -4.70%, (from 44.04μs to 41.98μs) for metric clat_read with p=0.0002. This means that observing a change of this magnitude or worse, assuming that performance characteristics did not change across the tested commits, has a probability of 0.02%. Tested Dimensions: { "guest_kernel": "linux-6.1", "io_engine": "Sync", "vcpus": "2" } Signed-off-by: Patrick Roy <[email protected]>
2b5b1f6
to
e6b81de
Compare
Allow passing arbitrary pytest options through to the ab-testing script, so that things like `-k` can be used for test selection. Signed-off-by: Patrick Roy <[email protected]>
This has two reasons: - When adding block latency tests (e.g. duplicating all existing test cases to also run with fio's sync workload generator), the runtime will exceed 1 hour, which is the buildkite pipeline timeout) - Having the sync and async cases in the same buildkite step means that the A/B-testing framework will try to cross-correct between the sync and async engine, but comparing results from these makes no sense because they are completely disjoint code paths in firecracker and the host kernel, so there is no reason to believe that their regressions should be correlated. Signed-off-by: Patrick Roy <[email protected]>
fio emits latency metrics regarding how much time was spent inside the guest operating system (submission latency, slat) or how much time was spent in the device (clat). For firecracker, the latter could be relevant, so emit these from our perf tests. To get non-volatile latency numbers, we need to use a synchronous fio worker to get non-volatile metrics. However, for throughput tests the use of the async engine in the guest is required to get maximum throughput. Signed-off-by: Patrick Roy <[email protected]>
e6b81de
to
9bb205f
Compare
kalyazin
approved these changes
Apr 25, 2025
Manciukic
approved these changes
Apr 25, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
fio emits latency metrics regarding how much time was spent inside the
guest operating system (submission latency, slat) or how much time was
spent in the device (clat). For firecracker, the latter could be
relevant, so emit them from the block performance tests.
Signed-off-by: Patrick Roy [email protected]## Changes
...
Reason
...
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md
.PR Checklist
tools/devtool checkstyle
to verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
in the PR.
CHANGELOG.md
.Runbook for Firecracker API changes.
integration tests.
TODO
.rust-vmm
.this option only affects the output to stdout, but we are ignoring fio's
stdout (we only work with the log files, which are separate). So drop
this parameter.
Signed-off-by: Patrick Roy [email protected]