test: do not run criterion benchmarks in no_block_pr #5273

Manciukic · 2025-06-19T11:51:06Z

Changes

With this change, we are disabling them in the PR - Optional step. We will still be able to notice performance regressions from the performance tests, so we're not losing much test coverage.

Reason

These tests have been failing consistently in our CI for as long as I can remember. While each individual test false positive rate is not high, the fact that we run many of them in a multitude of combinations means that at every CI run at least one fails.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

I have read and understand CONTRIBUTING.md.
I have run tools/devtool checkstyle to verify that the PR passes the
automated style checks.
I have described what is done in these changes, why they are needed, and
how they are solving the problem in a clear and encompassing way.
I have updated any relevant documentation (both in code and in the docs)
in the PR.
I have mentioned all user-facing changes in CHANGELOG.md.
If a specific issue led to this PR, this PR closes the issue.
When making API changes, I have followed the
Runbook for Firecracker API changes.
I have tested all new and changed functionalities in unit tests and/or
integration tests.
I have linked an issue to every new TODO.

This functionality cannot be added in rust-vmm.

codecov · 2025-06-19T11:55:05Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.91%. Comparing base (6a8838b) to head (0a9f3b1).
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5273      +/-   ##
==========================================
+ Coverage   82.86%   82.91%   +0.05%     
==========================================
  Files         250      250              
  Lines       26902    26902              
==========================================
+ Hits        22292    22306      +14     
+ Misses       4610     4596      -14

Flag	Coverage Δ
5.10-c5n.metal	`83.35% <ø> (+<0.01%)`	⬆️
5.10-m5n.metal	`83.35% <ø> (+<0.01%)`	⬆️
5.10-m6a.metal	`82.56% <ø> (-0.01%)`	⬇️
5.10-m6g.metal	`79.17% <ø> (ø)`
5.10-m6i.metal	`83.34% <ø> (-0.01%)`	⬇️
5.10-m7a.metal-48xl	`82.55% <ø> (?)`
5.10-m7g.metal	`79.17% <ø> (ø)`
5.10-m7i.metal-24xl	`83.31% <ø> (?)`
5.10-m7i.metal-48xl	`83.31% <ø> (?)`
5.10-m8g.metal-24xl	`79.17% <ø> (?)`
5.10-m8g.metal-48xl	`79.17% <ø> (?)`
6.1-c5n.metal	`83.39% <ø> (-0.01%)`	⬇️
6.1-m5n.metal	`83.39% <ø> (-0.01%)`	⬇️
6.1-m6a.metal	`82.61% <ø> (ø)`
6.1-m6g.metal	`79.17% <ø> (+<0.01%)`	⬆️
6.1-m6i.metal	`83.38% <ø> (+<0.01%)`	⬆️
6.1-m7a.metal-48xl	`82.59% <ø> (?)`
6.1-m7g.metal	`79.17% <ø> (ø)`
6.1-m7i.metal-24xl	`83.41% <ø> (?)`
6.1-m7i.metal-48xl	`83.41% <ø> (?)`
6.1-m8g.metal-24xl	`79.17% <ø> (?)`
6.1-m8g.metal-48xl	`79.17% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

kalyazin

Unusually green

roypat

I still think we should just delete the python test altogether

Manciukic · 2025-06-24T15:57:11Z

I still think we should just delete the python test altogether

While I see your point, and how this test could become stale, I'm thinking how we'd run these benchmark on all instances if we needed in the future. Having this test in the codebase would save up some time, as it'd be just a manual test run through BK, and it's not trivial to replicate what this python file does.

These tests have been failing consistently in our CI for as long as I can remember. While each individual test false positive rate is not high, the fact that we run many of them in a multitude of combinations means that at every CI run at least one fails. With this change, we are removing them from the PR - Optional step. We will still be able to notice performance regressions from the performance tests, so we're not losing much test coverage. Signed-off-by: Riccardo Mancini <[email protected]>

Manciukic marked this pull request as ready for review June 19, 2025 13:18

Manciukic added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Jun 19, 2025

kalyazin previously approved these changes Jun 19, 2025

View reviewed changes

roypat reviewed Jun 19, 2025

View reviewed changes

Manciukic dismissed kalyazin’s stale review via 87c1549 June 26, 2025 08:57

Manciukic force-pushed the disable-criterion branch from 036fd98 to 87c1549 Compare June 26, 2025 08:57

Manciukic linked an issue Jun 26, 2025 that may be closed by this pull request

Parametrize test_benchmarks.py test by criterion benchmarks #4832

Closed

roypat approved these changes Jun 26, 2025

View reviewed changes

Manciukic mentioned this pull request Jun 26, 2025

tests: parametrize bench mark tests #4974

Closed

10 tasks

roypat enabled auto-merge (rebase) June 26, 2025 09:02

kalyazin approved these changes Jun 26, 2025

View reviewed changes

Merge branch 'main' into disable-criterion

0a9f3b1

roypat merged commit 504b94d into firecracker-microvm:main Jun 26, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: do not run criterion benchmarks in no_block_pr #5273

test: do not run criterion benchmarks in no_block_pr #5273

Uh oh!

Manciukic commented Jun 19, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jun 19, 2025 •

edited

Loading

Uh oh!

kalyazin left a comment

Uh oh!

roypat left a comment

Uh oh!

Manciukic commented Jun 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

test: do not run criterion benchmarks in no_block_pr #5273

test: do not run criterion benchmarks in no_block_pr #5273

Uh oh!

Conversation

Manciukic commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Reason

License Acceptance

PR Checklist

Uh oh!

codecov bot commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kalyazin left a comment

Choose a reason for hiding this comment

Uh oh!

roypat left a comment

Choose a reason for hiding this comment

Uh oh!

Manciukic commented Jun 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Manciukic commented Jun 19, 2025 •

edited

Loading

codecov bot commented Jun 19, 2025 •

edited

Loading