Skip to content

[Issue #8942] Split staging E2E workflow: sharded no-auth, single-worker auth#8965

Open
Bhavna-Ramachandran wants to merge 15 commits intomainfrom
BR-8942-E2E-Staging-Run
Open

[Issue #8942] Split staging E2E workflow: sharded no-auth, single-worker auth#8965
Bhavna-Ramachandran wants to merge 15 commits intomainfrom
BR-8942-E2E-Staging-Run

Conversation

@Bhavna-Ramachandran
Copy link
Collaborator

@Bhavna-Ramachandran Bhavna-Ramachandran commented Mar 10, 2026

Summary

Work for : Task #8942

Changes proposed

  • Tests that require Login.gov authentication now contain logic to run only against Chrome when targeting staging.
  • The staging workflow has been split into two jobs:
    • e2e-tests-no-auth-sharded : Runs in parallel shards similar to local E2E runs.
    • e2e-tests-auth-single-worker : Runs auth-related tests in a single shard to avoid parallel Login.gov sessions.

Context for reviewers

Due to Staging Login.gov limitations - authentication tests cannot run in parallel. Splitting the workflow ensures non-auth tests still benefit from parallelization while auth tests run safely in a single worker.

During this PR, we noticed that - Even with serially run tests, there were few more challenges with Login.gov OTP codes, this PR mainly addresses the fix for these challenges + extra tuning on other tests.

  • Login.gov OTPs are valid for 30 seconds and single-use. In longer test runs or in serially run tests:
    • By the time a Test in series reaches the MFA step, the generated code becomes invalid before login.gov processes it or
    • If a test was faster the same code used in previous test login may be attempted again by another test (as the OTP was still valid), which login.gov rejects even if it is still within the same window.

This PR ensures a fresh OTP is generated and handles retry logic if the code is rejected, making the login flow more stable for the E2E test run.

Credit

Tagging approach inspired by @doug-s-nava's PR #8960 , where tags are passed via the test options object instead of embedding them in the test name.

@Bhavna-Ramachandran Bhavna-Ramachandran changed the title Split staging E2E workflow: sharded no-auth, single-worker auth; upda… [Issue #8942] Split staging E2E workflow: sharded no-auth, single-worker auth Mar 11, 2026
@Bhavna-Ramachandran
Copy link
Collaborator Author

Credit: Tagging approach inspired by @doug-s-nava's PR #8960 , where tags are passed via the test options object instead of embedding them in the test name.

run-id: ${{ inputs.run_id }}
pattern: blob-report-shard-*
# Merges both auth and no-auth test artifacts
pattern: blob-report*-shard-*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this need to track the change above that names these off the report_name_prefix?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. I have made the update.

mdragon
mdragon previously approved these changes Mar 13, 2026
Copy link
Collaborator

@mdragon mdragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

Copy link
Collaborator

@doug-s-nava doug-s-nava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we solved this by opting out of anything but chrome within the affected tests rather than updating the jobs?

npm run test:e2e -- --grep "${{ inputs.playwright_tags }}"
shell: bash

- name: Run e2e tests excluding ${{ inputs.playwright_tags_invert }} (Shard ${{ inputs.current_shard }}/${{ inputs.total_shards }})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with this in here there's a chance that if a user supplies both a positive list and a negative list, the same tests could run twice.

I think the ability to run a negative list is not a super high priority and outside the scope of this change, so I'd like to save implementing this to a separate PR if that's ok

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I will remove this from action and handle the exclusion directly in staging workflow. Will make this update now.

total_shards: ${{ matrix.total_shards }}
current_shard: ${{ matrix.shard }}
playwright_tags: ${{ inputs.playwright_tags }}
playwright_tags_invert: "@auth"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't quite what I had in mind for this tag. The @auth tag in my conception is for tests of login specific behavior, or behavior that is tied to specific permissions, not just any functionality that requires the user to be logged in. It's not a big deal since we haven't really started implementing tags yet, but if we're using @auth to tag any test where a user logs in we may need to define a different tag to track tests of auth specific behavior

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on our discussion. I am removing the auth tag.

throw new Error(`Unsupported env ${targetEnv}`);
}

if (isMobile) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is out of scope but why do we want to always open the mobile nav after logging in?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some login tests, post login the flow is expected to open mobile nav (on mobile browser it is hidden behind hamburger menu) if needed, since this function is a shared login helper, we have it part of this. Based on need we can also move it part of each test too.

/**
* Navigates to a URL with retry logic to handle transient network errors.
*/
async function gotoWithRetry(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like a general util rather than something tied to "create-application". Can we move to a more general file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. I have moved this to lifecycle-utils

const TIMEOUT_HOME = playwrightEnv.targetEnv === "staging" ? 180000 : 60000;
const TIMEOUT_MFA = 120000;

// TOTP codes are valid for 30s windows. If we're within this many seconds
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very cool!

'button:has-text("Sign out"), a:has-text("Sign out")',
);
if (await existingSignOut.isVisible({ timeout: 3000 }).catch(() => false)) {
// console.log("performStagingLogin: already logged in, skipping login flow");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acknowledged.

test.afterEach(async ({ context }) => {
await context.close();
});
// Note: do NOT close context in afterEach — Playwright manages context lifecycle
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think we need this comment here - if this should be documented let's put it somewhere more general. Should we start a document for things like this either in /documentation or in Confluence?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is useful to document this. I saw failures in this test in Staging and I made this note. If its ok I would suggest to keep this in this spec too, it would serve as reminder when we make updates or debug this flow. Let me know your thoughts.

with:
version: ${{ inputs.version || github.ref }}
target: ${{ env.defaulted_target }}
total_shards: 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still going to run all browsers, just in 1 shard. The configurations are defined in the playwright config so unless we're overriding that or supplying a new config this doesn't really solve the problem

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. However, the browser execution is controlled in the test command we pass, where we had explicitly specified the Chrome project:
npx playwright test --config ./tests/playwright.config.ts --project=Chrome --workers=1 --reporter=list --grep @auth
So the auth workflow ran only against the Chrome project with a single worker, which avoided the parallel session in staging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants