Skip to content

Fix flaky test, add retry to testcloud#4717

Open
bajertom wants to merge 2 commits intomainfrom
tbajer-fix-flaky-tests
Open

Fix flaky test, add retry to testcloud#4717
bajertom wants to merge 2 commits intomainfrom
tbajer-fix-flaky-tests

Conversation

@bajertom
Copy link
Contributor

Pull Request Checklist

  • implement the feature
  • extend the test coverage

Fixes #4557

First test mentioned above sometimes fail due to the timeout being too fast and cuts off right before /plans/multi/plan/one gets through, so I've added another expect phrase that comes after that line and increased timeout to 3 seconds.

Second test imho failed due to messed up download of the qcow2 image.

        progress: downloading...
Unknown download size.
        effective hardware: {}

The unindented "Unknown download size." error seems odd. I've checked successful runs of that test in other PRs and "Unknown download size" does not appear there. Another line says:

qemu-img: /tmp/tmp.9726GhCN02/testcloud/instances/tmt-001-RuKJKCkN/tmt-001-RuKJKCkN-local.qcow2: Image is not in qcow2 format

So I think that something did get downloaded but the file was corrupted.

I wrapped the responsible code part into a function prepare_image and added a validation of the qcow2. If that fails, the downloaded file will be removed and prepare_image will be retried.

Hard to fix flaky test when they're flaky and can't be reproduced at will 😐

Co-generated by Claude Code.

@bajertom bajertom requested a review from lbrabec as a code owner March 18, 2026 19:58
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request refactors the testcloud image preparation logic to include validation and retry mechanisms. It introduces new constants for image preparation retry attempts and interval, imports a retry utility, and wraps the image preparation and validation steps in a retry loop. A new validation step uses qemu-img info to verify the integrity of downloaded qcow2 images, removing corrupt ones and raising a ProvisionError if validation fails. Review comments suggest improving the descriptive comment for the new retry constants and enhancing the clarity and readability of the ProvisionError message when a corrupt image is encountered.

@bajertom bajertom removed the request for review from lbrabec March 18, 2026 20:13
@bajertom bajertom added this to the 1.70 milestone Mar 18, 2026
@github-project-automation github-project-automation bot moved this to backlog in planning Mar 18, 2026
@bajertom bajertom moved this from backlog to review in planning Mar 18, 2026
@happz happz added the ci | full test Pull request is ready for the full test execution label Mar 18, 2026
f"'{self.testcloud_image_dirpath}' directory permissions."
) from error
except KeyError as error:
raise ProvisionError(f"Failed to prepare image '{self.image_url}'.") from error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If any of these exceptions occur, the function will get retried by rety(). Do they have a chance of succeeding if retried? If not, only 15 seconds are wasted, so it doesn't seem like a huge deal. Just wondering if it was intended that these would also cause retries.

@psss psss self-assigned this Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci | full test Pull request is ready for the full test execution

Projects

Status: review

Development

Successfully merging this pull request may close these issues.

Review flaky tests and make them reliable

4 participants