Attempt to fix mac tests #770

BenHenning · 2025-11-20T21:38:27Z

Fixes #<issue_number_goes_here>

BenHenning · 2025-11-20T22:21:44Z

Looking at this scroll test first: Insert scrolls new block into view. It's failing with an error that the controls_if block isn't in the viewport. Comparing my local Linux run of the test vs. the CI run makes it pretty clear:

Linux local:

block bounds: { bottom: 1016, left: 53.34375, right: 213.34375, top: 904 }
viewport: { bottom: 1265.5, left: -10, right: 561, top: 814.5 }

Mac on CI:

block bounds: {
  bottom: 1882,
  left: 130.41592407226562,
  right: 290.4159240722656,
  top: 1770
}
viewport: { bottom: 1166.5, left: -10, right: 571, top: 705.5 }

The block is obviously way below the viewport for some reason. Also it's interesting to see that the viewport sizes are different which suggests discrepancies in test behaviors. Lack of cross-platform or cross-environment determinism will make this more difficult.

Attempt to use BiDi and upgrade WIO so that viewport manipulation can be used instead of window management for better compatibility.

BenHenning · 2025-11-20T23:30:55Z

Oh hey. It actually passed when using viewport instead of window size. Cool--I didn't expect that. I was guessing there could be some differences but I wasn't expecting there to be enough of one to cause that much of a coordinate discrepancy. I'm surprised but I don't care about the why quite enough to dig on it.

BenHenning · 2025-11-20T23:33:00Z

For reference, here are the results running the changed test on each platform.

Linux (local):

block bounds: { top: 904, bottom: 1016, left: 53.34375, right: 213.34375 }
viewport: { top: 740, bottom: 1340, left: -10, right: 570 }

Mac (CI):

block bounds: {
  top: 904,
  bottom: 1016,
  left: 54.20796203613281,
  right: 214.2079620361328
}
viewport: { top: 740, bottom: 1340, left: -10, right: 571 }

They are basically identical now which is great. :D

Unfortunately other Mac failures are likely a different problem entirely. Will dig on those next.

This test suite seems to now pass consistently between Linux & Mac.

BenHenning · 2025-11-20T23:37:59Z

Aha--enabling BiDi caused a bunch of new failures! Interesting.

BenHenning · 2025-11-21T00:01:19Z

Updating to latest Webdriver seems to fix a few more issues with BiDi and bring us back to basically where we were at before for failures (I think), plus a new failure affecting Linux (maybe only?).

Edit: Actually interestingly the previously fixed test is either failing again, or is failing when run in conjunction with other tests.

It's failing either again (after WIO update) or when run with the rest of the test suite.

BenHenning · 2025-11-21T00:08:46Z

Either it's failing again after the upgrade or, maybe more likely, it was flaky and incidentally passed earlier. Latest dimensions with the failure:

block bounds: {
  top: 1770,
  bottom: 1882,
  left: 130.41592407226562,
  right: 290.4159240722656
}
viewport: { top: 636, bottom: 1236, left: -10, right: 571 }

This is more or less what we saw earlier. It seems that the BiDi change is probably not needed (and we could downgrade back if I can find fixes for everything without it). The flake seems to be that, for some reason, the block is being put in the wrong place.

BenHenning · 2025-11-21T00:31:21Z

Huh. This is surprisingly difficult to break. It seems much more consistent now, but it obviously can still fail per the two repros above.

BenHenning · 2025-11-21T01:11:05Z

It passed 50 times in a row. Okay...this might be really tricky actually. My current working theory is that the extra browser waits to perform the debugging are actually adding a pause that allows scrolling to stabilize, thus fixing the test.

BenHenning · 2025-11-21T01:26:09Z

The extra pause seems to fix it. Removing all debug logs successfully re-failed the test (though we did see a failure earlier w/o needing to remove the later logs in the test, but oh well).

BenHenning · 2025-11-21T01:30:02Z

Attempting a 50x run of all tests except for the move tests since those time out. I think this might take a while. Will follow up with a rough estimate for completion if none fail.

Edit: Approximately 65-70 seconds per run so I'm guessing about an hour to run through 50 times without failure.

Edit 2: Though that's based on WebdriverIO's reported times. The runner actually took 2.5 minutes to fail so I think there's a lot of overhead not being reported here. That likely means this will take more than 2 hours to run through. I think GitHub has a 6 hour limit on runners so we should be below that.

BenHenning · 2025-11-21T01:35:09Z

Interestingly running the whole suite actually caused the earlier test to fail again. Will try re-adding all of the logs and see if we can glean anything interesting.

BenHenning · 2025-12-05T02:32:56Z

I've been able to partially isolate it. The move start test seems to consistently fail when run after the flyout/toolbox tests. None of the other suites seems to matter, but that one does.

BenHenning · 2025-12-05T02:55:21Z

I've isolated it to the flyout tests opening an alert dialog and not closing it. For some reason the regular 'Start moving statement blocks' test needs to run and complete after the dialog has been opened in a previous test and before starting the 'Start moving value blocks' test begins in order to trigger the failure.

BenHenning · 2025-12-05T02:57:20Z

I'm trying to reduce both of the move tests to just a single case rather than a loop and see if that still triggers the failure (in conjunction with the alert dialog).

This is a rather bizarre situation and it's not at all clear to me what's happening since the tests should be largely isolated between runs. My best guess is that webdriver or Chrome is being put into a semi-broken state here.

Edit: Isolating both move tests to 1 iteration seems to work. That means it's failing on the first iteration of the failing test and only 1 iteration is needed to get things into a broken state.

BenHenning · 2025-12-05T02:59:17Z

The screenshot for the latest failure doesn't seem particularly interesting or noteworthy:

This includes some dialog handling changes.

This reverts commit caf02c8.

BenHenning · 2025-12-05T03:24:49Z

Forcing session reloading fixes it. I may never fully understand the nuances that led to this particular issue, but I'm going to try reenabling the whole suite and also upping the timeout threshold for the Mac tests since there are a few that get close to 10s.

We also are going to have a bit slowdown with this change since there's a lot of overhead in reloading the entire browser for every suite, but it should tremendously improve stability especially on Mac.

BenHenning · 2025-12-05T03:29:12Z

Looks like most timeouts are already 30s so only needed to make a few adjustments there.

BenHenning · 2025-12-05T04:28:51Z

Looks like everything is passing, and honestly the runtime isn't terrible. I'm going to try kicking off 50 runs and see how far both Linux & Mac get to see how stable they are now.

BenHenning added 3 commits November 20, 2025 20:51

fix: Fix failing NPM build.

8101a16

chore: Disable workflows for investigation.

b85a4b1

chore: Debug & disable test for investigation.

8859f98

chore: More investigation work.

ae9ee30

Attempt to use BiDi and upgrade WIO so that viewport manipulation can be used instead of window management for better compatibility.

chore: Clean up test & re-enable all.

27785c6

This test suite seems to now pass consistently between Linux & Mac.

BenHenning added 3 commits November 20, 2025 23:47

chore: Try upgrading WebdriverIO.

d5497fe

Merge branch 'main' into update-npm-lock-file

3d7c478

Merge branch 'update-npm-lock-file' into attempt-to-fix-mac-tests

75d0c73

chore: Re-isolate scroll test.

8d56eba

It's failing either again (after WIO update) or when run with the rest of the test suite.

BenHenning added 2 commits November 21, 2025 00:12

chore: Add more debug logs.

af1dbaa

chore: Run tests up to 50 times.

2e210fe

BenHenning added 2 commits November 21, 2025 00:52

chore: Speed & reduce tests for CI investigating.

2187b2c

chore: Fix path changing.

dabaad5

BenHenning added 3 commits November 21, 2025 01:12

chore: Remove debug logs to test theory.

4b97659

chore: Try to remove more logs.

cefb286

chore: Try adding a pause.

28a32f4

chore: Add comment & enable most tests.

059092f

chore: Re-add removed comment.

531979e

chore: Re-add logs for investigation.

efecfd9

BenHenning added 6 commits December 5, 2025 02:11

chore: Skip some of the earlier suites.

8499089

chore: Re-disable more suites.

3a26ac5

chore: Re-enable one suite.

4d2a86e

chore: Try re-disabling other suites.

2750f66

chore: Try re-disabling big suite.

c8e3728

chore: Isolate to just 2 suites.

eceb74f

BenHenning added 8 commits December 5, 2025 02:33

chore: Drop inner suite.

adc0c98

chore: Disable first 8 & re-add sub-suite.

f015528

chore: Skip remaining tests.

a213520

chore: Disable immediate previous test.

cf3b5c4

chore: Swap disabled tests.

c8743df

chore: Try disabling alert dialog assertion.

0895aaf

chore: Don't open alert dialog.

e561389

chore: Re-open alert. Reduce previous test.

df7324a

chore: Try reducing failing test.

f7e7479

BenHenning added 3 commits December 5, 2025 03:13

chore: Attempt WDIO upgrade.

caf02c8

This includes some dialog handling changes.

Revert "chore: Attempt WDIO upgrade."

ff5ba58

This reverts commit caf02c8.

chore: Add cross-suite session reloading in CI.

339667e

BenHenning added 3 commits December 5, 2025 03:26

chore: Re-enable all suites and tests.

69896a2

chore: Re-enable Ubuntu tests in CI.

1122f65

chore: Increase timeouts.

e407c73

BenHenning added 2 commits December 5, 2025 04:29

chore: Run tests 50 times in CI.

1411d90

fix: Make mkdir resilient for multiple runs.

65e6c02

Attempt to fix mac tests #770

Are you sure you want to change the base?

Attempt to fix mac tests #770

Conversation

BenHenning commented Nov 20, 2025

Uh oh!

BenHenning commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenHenning commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenHenning commented Nov 20, 2025

Uh oh!

BenHenning commented Nov 20, 2025

Uh oh!

BenHenning commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenHenning commented Nov 21, 2025

Uh oh!

BenHenning commented Nov 21, 2025

Uh oh!

BenHenning commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenHenning commented Nov 21, 2025

Uh oh!

BenHenning commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenHenning commented Nov 21, 2025

Uh oh!

BenHenning commented Dec 5, 2025

Uh oh!

BenHenning commented Dec 5, 2025

Uh oh!

BenHenning commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenHenning commented Dec 5, 2025

Uh oh!

BenHenning commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenHenning commented Dec 5, 2025

Uh oh!

BenHenning commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

BenHenning commented Nov 20, 2025 •

edited

Loading

BenHenning commented Nov 20, 2025 •

edited

Loading

BenHenning commented Nov 21, 2025 •

edited

Loading

BenHenning commented Nov 21, 2025 •

edited

Loading

BenHenning commented Nov 21, 2025 •

edited

Loading

BenHenning commented Dec 5, 2025 •

edited

Loading

BenHenning commented Dec 5, 2025 •

edited

Loading