chore(windows): Fix tests CS stop reliability LS-381#771
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request aims to enhance the reliability of system tests on Windows by mitigating timing-related flakiness. It introduces a small delay after chunkserver shutdown operations to ensure that the system state is consistent before proceeding with subsequent test steps, thereby stabilizing the CI pipeline without altering core application logic. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
@gemini-code-assist review |
There was a problem hiding this comment.
Code Review
The pull request effectively addresses Windows CI test flakiness by introducing a short, conditional sleep after chunkserver stop operations. This is a practical solution for improving test stability. While fixed delays can sometimes mask underlying timing issues or introduce unnecessary slowdowns, for test scripts, this approach is often acceptable. To enhance maintainability and clarity, consider defining the sleep duration as a named variable.
tests/test_suites/ShortSystemTests/test_ec_read_combinations.sh
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
This PR addresses test reliability issues on Windows by adding short sleep delays (0.1 seconds) after chunkserver stop commands in two specific test files. The changes target flaky tests that were failing on Windows CI due to the chunkserver stop command not taking effect immediately.
Changes:
- Added Windows-specific sleep delays after
saunafs_chunkserver_daemon stopcommands in two test files - Used the established
is_windows_systemfunction to conditionally apply the delays only on Windows systems
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| tests/test_suites/ShortSystemTests/test_truncate_retry.sh | Added 0.1s sleep after two chunkserver stop commands (lines 16-18 and 26-28) |
| tests/test_suites/ShortSystemTests/test_ec_read_combinations.sh | Added 0.1s sleep inside loop after each chunkserver stop command (lines 18-20) |
There was a problem hiding this comment.
Code Review
This pull request introduces a pragmatic fix for flaky tests on Windows by adding a short sleep after stopping a chunkserver. This should help prevent race conditions where tests proceed before the stop operation is fully complete. My review includes a suggestion to refactor duplicated code in one of the test scripts to improve maintainability.
7a3fab4 to
93fcfb5
Compare
Recently, the test test_truncate_retry has been observed to fail on the Windows client CI due to chunkserver stop operations not taking effect immediately. This delay causes the test to become flaky. To mitigate the issue, a short sleep has been added after stopping the chunkserver to ensure that the stop command and its subsequent operations have fully taken effect before the test proceeds. Additionally, the test test_ec_read_combinations was flaky because insufficient time was given for the file-generate utility to write all file chunk parts. This timing issue also caused intermittent failures on the Windows client CI. Co-authored-by: Dave <dave@leil.io> Signed-off-by: rolysr <rolysr@leil.io>
93fcfb5 to
7ca3b75
Compare
Recently, the test test_truncate_retry has been observed to fail on the Windows client CI due to chunkserver stop operations not taking effect immediately. This delay causes the test to become flaky. To mitigate the issue, a short sleep has been added after stopping the chunkserver to ensure that the stop command and its subsequent operations have fully taken effect before the test proceeds.
Additionally, the test test_ec_read_combinations was flaky because insufficient time was given for the file-generate utility to write all file chunk parts. This timing issue also caused intermittent failures on the Windows client CI.
Co-authored-by: Dave dave@leil.io
Signed-off-by: rolysr rolysr@leil.io