Skip to content

Conversation

JonathanOppenheimer
Copy link
Member

@JonathanOppenheimer JonathanOppenheimer commented Jul 23, 2025

Why this should be merged

Currently, if a single test fails, the entire test-suite is reran (up to 4 possible times). This is extremely inefficient as if a flaky test fails, the test suite will rerun, but then maybe a different flaky test fails, so the whole test-suite is rerun, etc. This leads to long time waiting for CI jobs to pass in GitHub, which is a large waste of time.

The prior claim that re-running all prior tests was necessary is copied below:

Note the absence of unexpected failures cannot be indicative that we only need to run the tests that failed, or example a test may panic and cause subsequent tests in that package to not run.

However, there may not be an unexpected failure that causes subsequent tests not to run -- if a single flaky test fails there is no need to rerun the whole test-suite. Additionally, even if there is a panic that causes subsequent tests not to run, there's still no reason to re-run the whole test-suite -- just the tests that failed, and the tests that never ran.

Scenario Before Behavior After Behavior
All tests pass Exit immediately Exit immediately
Unexpected failures Exit immediately Exit immediately
All tests run, only known flakes fail Rerun ALL tests  Retry only failed tests
Some tests don't run due to panics Rerun ALL tests  Rerun all tests in packages in which tests panicked + failed flaky tests

How this works

  1. Get list of all expected tests for the package using and store for comparison
  2. Run tests with go test -json -shuffle=on for structured output
  3. Parse test output to extract failed tests using regex patterns for both failures and panics
  4. Compare expected vs ran tests to find tests that didn't run due to panics
  5. Categorize failures into known flakes vs unexpected failures
  6. Depending on whether tests failed, or tests were skipped, target retry relevant tests.

How this was tested

This modifies the test framework directly.

Need to be documented?

No

Need to update RELEASES.md?

No

@JonathanOppenheimer JonathanOppenheimer requested a review from a team as a code owner July 23, 2025 19:22
@JonathanOppenheimer JonathanOppenheimer added the testing Anything testing-related label Jul 23, 2025
@JonathanOppenheimer JonathanOppenheimer marked this pull request as draft July 23, 2025 19:23
@JonathanOppenheimer JonathanOppenheimer marked this pull request as ready for review July 24, 2025 20:39
Comment on lines +99 to +100
# shellcheck disable=SC2046
go test -shuffle=on ${race:-} -timeout="${TIMEOUT:-600s}" -coverprofile=coverage.out -covermode=atomic "$@" $(go list ./... | grep -v github.com/ava-labs/coreth/tests) | tee test.out || command_status=$?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's possible to write this line so it both
A. works
B. passes lint

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about separating creation of the array from its use?

mapfile -t pkgs < <(go list ./... | grep -v github.com/ava-labs/coreth/tests)
go test -shuffle=on "${race:-}" -timeout="${TIMEOUT:-600s}" -coverprofile=coverage.out -covermode=atomic "$@" "${pkgs[@]}" | tee test.\
out || command_status=$?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If only! Writing this script was actually quite difficult because while bash is a decent programming language (imo). mapfile was added in Bash 4.0 and all supported macOS releases (including the ones behind GitHub Actions’ macos‑latest runner) still ship with Bash 3.2.x—currently 3.2.57—because Apple never adopted the GPL v3‑licensed Bash 4+ and instead moved its interactive shell to zsh while leaving /bin/bash frozen at 3.2.57
(see https://jmmv.dev/2019/11/macos-bash-baggage.html)

Thus all of our testing scripts need to be compatible with bash circa 2014, and I was hindered from using a lot of newer bash features that would have made this easier to write.

If you endorse it, I can refactor the whole testing script to use Python (or GoLang) instead of bash but it would be a much more significant change.

Copy link
Contributor

@maru-ava maru-ava Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no obligation to be compatible with macos's ancient bash version. In avalanchego, we explicitly require a modern bash. One way to ensure the use of modern bash could be to run the command with scripts/dev_shell.sh, since the nix shell guarantees a compatible bash version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! That's definitely something that is worth bringing over to the evm repositories in my opinion.

@JonathanOppenheimer JonathanOppenheimer requested a review from a team as a code owner August 5, 2025 18:15
@maru-ava
Copy link
Contributor

maru-ava commented Aug 6, 2025

Why did you decide to write this in bash? The complexity involved would really benefit from a real programming language (i.e. golang).

@JonathanOppenheimer
Copy link
Member Author

Why did you decide to write this in bash? The complexity involved would really benefit from a real programming language (i.e. golang).

Not really any particular reason besides sticking with the current structure (order remains the same, run_task.sh and so on). If you find this overly complex, I can rewrite it in Python (or golang is no Python)

@maru-ava
Copy link
Contributor

maru-ava commented Aug 6, 2025

Not really any particular reason besides sticking with the current structure (order remains the same, run_task.sh and so on). If you find this overly complex, I can rewrite it in Python (or golang is no Python)

I'm not going to block merge of this - that's up to the coreth maintainers - so these comments are more food for thought when considering future changes.

  • Python would be a fine choice...if only our pool of reviewers was well-stocked with python experts. Safer to choose a language for which both expertise and the required toolchain are readily available.
  • Non-trivial bash scripting is tech debt as soon as it is written. It's at best challenging to both test and review for anyone but a bash expert (and we have few of those). Future maintenance is thus guaranteed more costly than the same functionality implemented in a language that maintainers have expertise in (i.e. golang).
  • There is nothing preventing a non-bash implementation in the current structure. The indirection of the taskfile means task build-test could be invoking golang as easily as bash without having to change CI or user expectation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Anything testing-related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants