- 
                Notifications
    You must be signed in to change notification settings 
- Fork 2.1k
test(perf): add block latency test #5166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Merged
      
      
    
                
     Merged
            
            
          Conversation
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
    this option only affects the output to stdout, but we are ignoring fio's stdout (we only work with the log files, which are separate). So drop this parameter. Signed-off-by: Patrick Roy <[email protected]>
This test just boots a VM, which a ton of other tests also do, so if memory overhead really does change, we'll catch it in other tests. On the other hand, having this test just crash if memory overhead goes above 5MB is not very useful, because it prevent this test from being run as a A/B-test in scenarios where memory overhead is indeed increasing. Signed-off-by: Patrick Roy <[email protected]>
Ensure that `ps(1)` does not truncate the command, which might result in the grep failing (if the jailer_id gets truncated), using the -ww option. While we're at it, also use -o cmd so that ps only prints the command names and nothing else (as we're not using anything else from this output). This causes false-positives instead of false-negatives funnily enough, because we're using check_output, meaning if the grep doesnt find anything we fail the command (in the "everything works" scenario, firecracker is dead but grep still matches the "ps | grep" process itself). Signed-off-by: Patrick Roy <[email protected]>
| Codecov ReportAll modified and coverable lines are covered by tests ✅ 
 Additional details and impacted files@@            Coverage Diff             @@
##             main    #5166      +/-   ##
==========================================
+ Coverage   83.01%   83.06%   +0.05%     
==========================================
  Files         250      250              
  Lines       26897    26897              
==========================================
+ Hits        22328    22342      +14     
+ Misses       4569     4555      -14     
 Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
 | 
            
                  kalyazin
  
            
            previously approved these changes
            
                
                  Apr 23, 2025 
                
            
            
          
          
            
                  Manciukic
  
            
            reviewed
            
                
                  Apr 23, 2025 
                
            
            
          
          
fa0b2dd    to
    391331a      
    Compare
  
    391331a    to
    2b5b1f6      
    Compare
  
    
            
                  Manciukic
  
            
            previously approved these changes
            
                
                  Apr 24, 2025 
                
            
            
          
          
Currently, if something matches the A/B-testing ignore list, then all metrics emitted from a test with a dimension set that is a super set of an ignored one is ignored. Refine this to allow only ignoring specific metrics. Realize this by synthesizing a fake dimension called 'metric' that stores the metric. This will later be used when we introduce block latency tests, as we will want to A/B-test throughput but ignore latency in scenarios where fio's async workload generator is used. Signed-off-by: Patrick Roy <[email protected]>
When an A/B-Test fails, it prints all dimensions associated with the
metric that changed. However, if some dimension is the same across
literally all metrics emitted (for example, instance name and host
kernel version will never change in the middle of a test run), then
that's arguably just noise, and makes it hard to parse potentially
interesting dimensions. So avoid printing all dimensions that are
literally the same across all metrics.
Note that this does _not_ mean for that example if cpu_utilization only
changes to read throughput that the "read vs write" dimension won't be
printed anymore. We only drop dimensions if the are the same across
_all_ metrics, regardless of whether they had a statistically
significant change. In this scenario, the "mode: write" metric still
exists, it simply didn't change, and so the "mode: read" line won't be
dropped from the output.
Before:
[Firecracker A/B-Test Runner] A/B-testing shows a change of -2.07μs, or
-4.70%, (from 44.04μs to 41.98μs) for metric clat_read with p=0.0002.
This means that observing a change of this magnitude or worse, assuming
that performance characteristics did not change across the tested
commits, has a probability of 0.02%. Tested Dimensions:
{
  "cpu_model": "AMD EPYC 7R13 48-Core Processor",
  "fio_block_size": "4096",
  "fio_mode": "randrw",
  "guest_kernel": "linux-6.1",
  "guest_memory": "1024.0MB",
  "host_kernel": "linux-6.8",
  "instance": "m6a.metal",
  "io_engine": "Sync",
  "performance_test": "test_block_latency",
  "rootfs": "ubuntu-24.04.squashfs",
  "vcpus": "2"
}
After:
[Firecracker A/B-Test Runner] A/B-testing shows a change of -2.07μs, or
-4.70%, (from 44.04μs to 41.98μs) for metric clat_read with p=0.0002.
This means that observing a change of this magnitude or worse, assuming
that performance characteristics did not change across the tested
commits, has a probability of 0.02%. Tested Dimensions:
{
  "guest_kernel": "linux-6.1",
  "io_engine": "Sync",
  "vcpus": "2"
}
Signed-off-by: Patrick Roy <[email protected]>
    2b5b1f6    to
    e6b81de      
    Compare
  
    Allow passing arbitrary pytest options through to the ab-testing script, so that things like `-k` can be used for test selection. Signed-off-by: Patrick Roy <[email protected]>
This has two reasons: - When adding block latency tests (e.g. duplicating all existing test cases to also run with fio's sync workload generator), the runtime will exceed 1 hour, which is the buildkite pipeline timeout) - Having the sync and async cases in the same buildkite step means that the A/B-testing framework will try to cross-correct between the sync and async engine, but comparing results from these makes no sense because they are completely disjoint code paths in firecracker and the host kernel, so there is no reason to believe that their regressions should be correlated. Signed-off-by: Patrick Roy <[email protected]>
fio emits latency metrics regarding how much time was spent inside the guest operating system (submission latency, slat) or how much time was spent in the device (clat). For firecracker, the latter could be relevant, so emit these from our perf tests. To get non-volatile latency numbers, we need to use a synchronous fio worker to get non-volatile metrics. However, for throughput tests the use of the async engine in the guest is required to get maximum throughput. Signed-off-by: Patrick Roy <[email protected]>
e6b81de    to
    9bb205f      
    Compare
  
    
            
                  kalyazin
  
            
            approved these changes
            
                
                  Apr 25, 2025 
                
            
            
          
          
            
                  Manciukic
  
            
            approved these changes
            
                
                  Apr 25, 2025 
                
            
            
          
          
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
      
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
fio emits latency metrics regarding how much time was spent inside the
guest operating system (submission latency, slat) or how much time was
spent in the device (clat). For firecracker, the latter could be
relevant, so emit them from the block performance tests.
Signed-off-by: Patrick Roy [email protected]## Changes
...
Reason
...
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.PR Checklist
tools/devtool checkstyleto verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
in the PR.
CHANGELOG.md.Runbook for Firecracker API changes.
integration tests.
TODO.rust-vmm.this option only affects the output to stdout, but we are ignoring fio's
stdout (we only work with the log files, which are separate). So drop
this parameter.
Signed-off-by: Patrick Roy [email protected]