-
Notifications
You must be signed in to change notification settings - Fork 91
Description
Problem
I've been losing some time debugging failing GitHub Actions. I can't restart them myself, so I try to confirm failures aren’t caused by me. But some tests fail so often that it was hard for me to tell. We can’t make pipelines 100% reliable, but after analysis I had on Friday have an idea for few adjustments.
Solution
- should not cache (API Batch 1 -> acceptance-workflow)
it('should not cache "safe" block in "eth_getBlockByNumber"', async function () {
const blockResult = await relay.call(RelayCalls.ETH_ENDPOINTS.ETH_GET_BLOCK_BY_NUMBER, ['safe', false]);
await Utils.wait(1000);
const blockResult2 = await relay.call(RelayCalls.ETH_ENDPOINTS.ETH_GET_BLOCK_BY_NUMBER, ['safe', false]);
expect(blockResult).to.not.deep.equal(blockResult2);
});
At least one of the tests with "should not cache" fail often: and simply increasing the wait time should fix it. A single second sometimes isn’t enough.
Failed here for example:
https://github.com/hiero-ledger/hiero-json-rpc-relay/actions/runs/19759973622/job/56628197548?pr=4613
- HBAR Limiter Batch 1 - large contract deployment
This fails for me in every full local run (but passes when run alone):
AssertionError: expected 6548219490 to be close to 6468385768 +/- 26580711.6
As a hotfix, maybe simply increase the tolerance ~4× to reduce the failing rates?
Failed here:
Acceptance Tests / HBar Limiter Batch 1 / acceptance-workflow (pull_request)
I also hit many other random failures, but those are easy to spot (e.g. timeouts, 502 from remote APIs). The tests listed above are problematic because they look like real bugs rather than randomness. For example, a "wrongly cached block" failing on a cache-related PR was genuinely alarming. Same as incorrect HBAR usage in a tx-creation-related PR... At least until I realized that I'd had the same failures on my other PRs as well…
Alternatives
No response