Skip to content

Conversation

@BenHenning
Copy link
Collaborator

Fixes #<issue_number_goes_here>

@BenHenning
Copy link
Collaborator Author

BenHenning commented Nov 20, 2025

Looking at this scroll test first: Insert scrolls new block into view. It's failing with an error that the controls_if block isn't in the viewport. Comparing my local Linux run of the test vs. the CI run makes it pretty clear:

Linux local:

block bounds: { bottom: 1016, left: 53.34375, right: 213.34375, top: 904 }
viewport: { bottom: 1265.5, left: -10, right: 561, top: 814.5 }

Mac on CI:

block bounds: {
  bottom: 1882,
  left: 130.41592407226562,
  right: 290.4159240722656,
  top: 1770
}
viewport: { bottom: 1166.5, left: -10, right: 571, top: 705.5 }

The block is obviously way below the viewport for some reason. Also it's interesting to see that the viewport sizes are different which suggests discrepancies in test behaviors. Lack of cross-platform or cross-environment determinism will make this more difficult.

Attempt to use BiDi and upgrade WIO so that viewport manipulation can be
used instead of window management for better compatibility.
@BenHenning
Copy link
Collaborator Author

BenHenning commented Nov 20, 2025

Oh hey. It actually passed when using viewport instead of window size. Cool--I didn't expect that. I was guessing there could be some differences but I wasn't expecting there to be enough of one to cause that much of a coordinate discrepancy. I'm surprised but I don't care about the why quite enough to dig on it.

@BenHenning
Copy link
Collaborator Author

For reference, here are the results running the changed test on each platform.

Linux (local):

block bounds: { top: 904, bottom: 1016, left: 53.34375, right: 213.34375 }
viewport: { top: 740, bottom: 1340, left: -10, right: 570 }

Mac (CI):

block bounds: {
  top: 904,
  bottom: 1016,
  left: 54.20796203613281,
  right: 214.2079620361328
}
viewport: { top: 740, bottom: 1340, left: -10, right: 571 }

They are basically identical now which is great. :D

Unfortunately other Mac failures are likely a different problem entirely. Will dig on those next.

This test suite seems to now pass consistently between Linux & Mac.
@BenHenning
Copy link
Collaborator Author

Aha--enabling BiDi caused a bunch of new failures! Interesting.

@BenHenning
Copy link
Collaborator Author

BenHenning commented Nov 21, 2025

Updating to latest Webdriver seems to fix a few more issues with BiDi and bring us back to basically where we were at before for failures (I think), plus a new failure affecting Linux (maybe only?).

Edit: Actually interestingly the previously fixed test is either failing again, or is failing when run in conjunction with other tests.

It's failing either again (after WIO update) or when run with the rest
of the test suite.
@BenHenning
Copy link
Collaborator Author

Either it's failing again after the upgrade or, maybe more likely, it was flaky and incidentally passed earlier. Latest dimensions with the failure:

block bounds: {
  top: 1770,
  bottom: 1882,
  left: 130.41592407226562,
  right: 290.4159240722656
}
viewport: { top: 636, bottom: 1236, left: -10, right: 571 }

This is more or less what we saw earlier. It seems that the BiDi change is probably not needed (and we could downgrade back if I can find fixes for everything without it). The flake seems to be that, for some reason, the block is being put in the wrong place.

@BenHenning
Copy link
Collaborator Author

Huh. This is surprisingly difficult to break. It seems much more consistent now, but it obviously can still fail per the two repros above.

@BenHenning
Copy link
Collaborator Author

BenHenning commented Nov 21, 2025

It passed 50 times in a row. Okay...this might be really tricky actually. My current working theory is that the extra browser waits to perform the debugging are actually adding a pause that allows scrolling to stabilize, thus fixing the test.

@BenHenning
Copy link
Collaborator Author

The extra pause seems to fix it. Removing all debug logs successfully re-failed the test (though we did see a failure earlier w/o needing to remove the later logs in the test, but oh well).

@BenHenning
Copy link
Collaborator Author

BenHenning commented Nov 21, 2025

Attempting a 50x run of all tests except for the move tests since those time out. I think this might take a while. Will follow up with a rough estimate for completion if none fail.

Edit: Approximately 65-70 seconds per run so I'm guessing about an hour to run through 50 times without failure.

Edit 2: Though that's based on WebdriverIO's reported times. The runner actually took 2.5 minutes to fail so I think there's a lot of overhead not being reported here. That likely means this will take more than 2 hours to run through. I think GitHub has a 6 hour limit on runners so we should be below that.

@BenHenning
Copy link
Collaborator Author

Interestingly running the whole suite actually caused the earlier test to fail again. Will try re-adding all of the logs and see if we can glean anything interesting.

@BenHenning
Copy link
Collaborator Author

I've been able to partially isolate it. The move start test seems to consistently fail when run after the flyout/toolbox tests. None of the other suites seems to matter, but that one does.

@BenHenning
Copy link
Collaborator Author

I've isolated it to the flyout tests opening an alert dialog and not closing it. For some reason the regular 'Start moving statement blocks' test needs to run and complete after the dialog has been opened in a previous test and before starting the 'Start moving value blocks' test begins in order to trigger the failure.

@BenHenning
Copy link
Collaborator Author

BenHenning commented Dec 5, 2025

I'm trying to reduce both of the move tests to just a single case rather than a loop and see if that still triggers the failure (in conjunction with the alert dialog).

This is a rather bizarre situation and it's not at all clear to me what's happening since the tests should be largely isolated between runs. My best guess is that webdriver or Chrome is being put into a semi-broken state here.

Edit: Isolating both move tests to 1 iteration seems to work. That means it's failing on the first iteration of the failing test and only 1 iteration is needed to get things into a broken state.

@BenHenning
Copy link
Collaborator Author

The screenshot for the latest failure doesn't seem particularly interesting or noteworthy:

image

@BenHenning
Copy link
Collaborator Author

BenHenning commented Dec 5, 2025

Forcing session reloading fixes it. I may never fully understand the nuances that led to this particular issue, but I'm going to try reenabling the whole suite and also upping the timeout threshold for the Mac tests since there are a few that get close to 10s.

We also are going to have a bit slowdown with this change since there's a lot of overhead in reloading the entire browser for every suite, but it should tremendously improve stability especially on Mac.

@BenHenning
Copy link
Collaborator Author

Looks like most timeouts are already 30s so only needed to make a few adjustments there.

@BenHenning
Copy link
Collaborator Author

Looks like everything is passing, and honestly the runtime isn't terrible. I'm going to try kicking off 50 runs and see how far both Linux & Mac get to see how stable they are now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant