Add DigitalOcean infrastructure e2e monitoring workflow#4259

Draft

jasbanza wants to merge 5 commits intostagefrom

jason/do-endpoints-e2e

Member

jasbanza commented Jan 25, 2026

What is the purpose of the change:

WIP. This PR allows us to do playwright tests on vercel preview links, using DigitalOcean endpoints.

NB: Endpoints need to be manually set for this specific branch in Vercel secrets!

Linear Task

MER-59: Set up DigitalOcean Migration monitoring

Brief Changelog

There are additional workflows in this branch which are specifically meant to help facilitate automated tests.

Testing and Verifying

Additional tweaks might be needed as we begin testing.
- simplify playwright tests as existing ones are still flaky.
This is a work in progress and this PR will be updated once ready for review.


          Add DigitalOcean infrastructure e2e monitoring workflow

67792b9

vercel bot commented Jan 25, 2026 •

edited

Loading

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
osmosis-frontend	Ready	Preview, Comment	Jan 27, 2026 3:02pm

4 Skipped Deployments

Project	Deployment	Review	Updated (UTC)
osmosis-frontend-datadog	Ignored		Jan 27, 2026 3:02pm
osmosis-frontend-dev	Ignored		Jan 27, 2026 3:02pm
osmosis-frontend-edgenet	Ignored		Jan 27, 2026 3:02pm
osmosis-testnet	Ignored		Jan 27, 2026 3:02pm

github-advanced-security bot found potential problems

View reviewed changes

.github/workflows/monitoring-do-e2e-tests.yml

Comment on lines +29 to +53

+                  runs-on: ubuntu-latest
+                  outputs:
+                    environment_url: ${{ steps.vercel.outputs.environment_url }}
+                  steps:
+                    - name: Check out repository
+                      uses: actions/checkout@v4
+                    - name: Get DO branch Vercel preview URL
+                      id: vercel
+                      env:
+                        BRANCH_NAME: ${{ env.DO_BRANCH }}
+                        FILTER_BRANCH: ${{ env.DO_BRANCH }}
+                        VERCEL_TOKEN: ${{ secrets.VERCEL_TOKEN }}
+                        VERCEL_PROJECT: ${{ secrets.VERCEL_PROJECT }}
+                      run: |
+                        echo "Looking for latest READY deployment for branch: $FILTER_BRANCH"
+                        cd .github && python await_deployment.py
+                    - name: Echo resolved URL
+                      env:
+                        environment_url: ${{ steps.vercel.outputs.environment_url }}
+                      run: |
+                        echo "DO Preview URL: https://$environment_url"
+                        echo "This preview is configured to use DO endpoints via Vercel branch env vars"
+                # Server synthetic tests against DO SQS
+                server-e2e-tests:

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}

Copilot Autofix

AI 28 days ago

To fix the issue, add an explicit permissions block that limits the GITHUB_TOKEN to the minimal required scope. From the provided snippet, the jobs only need to read the repository (for actions/checkout) and do not appear to perform any GitHub write operations. A safe minimal configuration is contents: read at the workflow root so it applies to all jobs that don’t override it.

Concretely, in .github/workflows/monitoring-do-e2e-tests.yml, add a top‑level permissions: block after the on: section (e.g., after line 16). This block should specify contents: read. No changes are needed to individual jobs unless some job (not shown) truly needs additional scopes; based on the provided snippet, that is not the case. No imports or external tools are required; this is purely a YAML configuration change within the workflow file.

Suggested changeset 1

.github/workflows/monitoring-do-e2e-tests.yml

@@ -15,6 +15,9 @@
                     type: boolean
               # Schedule removed - triggered by dispatch-do-monitoring.yml on stage branch
+            permissions:
+              contents: read
             # DO endpoint configuration
             env:
               DO_BRANCH: "jason/do-endpoints-e2e"

Copilot is powered by AI and may make mistakes. Always verify output.

.github/workflows/monitoring-do-e2e-tests.yml Fixed Show fixed Hide fixed


          Fix YAML syntax error in DO monitoring workflow

3bd368b

github-advanced-security bot found potential problems

View reviewed changes

.github/workflows/monitoring-do-e2e-tests.yml Outdated

Comment on lines 54 to 90

+                  name: do-server-tests
+                  needs: resolve-do-preview
+                  runs-on: ubuntu-latest
+                  if: ${{ github.event.inputs.skip_server_tests != 'true' }}
+                  steps:
+                    - name: Checkout Repo
+                      uses: actions/checkout@v4
+                    - name: Setup Node.js
+                      uses: actions/setup-node@v4
+                      with:
+                        node-version: 22.x
+                    - name: Cache dependencies
+                      uses: actions/cache@v4
+                      with:
+                        path: "**/node_modules"
+                        key: ${{ runner.OS }}-22.x-${{ hashFiles('**/yarn.lock') }}
+                        restore-keys: |
+                          ${{ runner.OS }}-22.x-
+                    - name: Install Dependencies
+                      run: yarn install --frozen-lockfile
+                    - name: Echo DO SQS Server URL
+                      run: |
+                        echo "Testing SQS Server URL: ${{ env.DO_SQS_ENDPOINT }}"
+                    - name: Run Server Tests against DO SQS
+                      id: tests
+                      run: yarn test:e2e --filter=@osmosis-labs/server
+                      env:
+                        NEXT_PUBLIC_SIDECAR_BASE_URL: ${{ env.DO_SQS_ENDPOINT }}
+                        NEXT_PUBLIC_TIMESERIES_DATA_URL: https://data.app.osmosis.zone
+                # FE swap monitoring tests (US region - no proxy)
+                fe-swap-us:

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}

Copilot Autofix

AI 28 days ago

In general, the fix is to add an explicit permissions: block that grants the minimal necessary access to the GITHUB_TOKEN. This can be done at the workflow root (applies to all jobs) or for individual jobs. Since all visible jobs only need to read repository contents, the minimal safe baseline is permissions: { contents: read }.

The best fix here, without changing functionality, is to define a single workflow-level permissions: block near the top of .github/workflows/monitoring-do-e2e-tests.yml, after the on: section and before env: or jobs:. This will satisfy CodeQL’s complaint about the server-e2e-tests job (and all other jobs), while keeping functionality unchanged, because they only require read access to check out code and use caches/artifacts. No additional methods, imports, or definitions are needed; this is purely a YAML configuration change.

Specifically, in .github/workflows/monitoring-do-e2e-tests.yml, insert:

permissions:
  contents: read

between the on: block (ending at line 16) and the env: block (starting at line 19). No other changes are required.

Suggested changeset 1

.github/workflows/monitoring-do-e2e-tests.yml

@@ -15,6 +15,9 @@
                     type: boolean
               # Schedule removed - triggered by dispatch-do-monitoring.yml on stage branch
+            permissions:
+              contents: read
             # DO endpoint configuration
             env:
               DO_BRANCH: "jason/do-endpoints-e2e"

Copilot is powered by AI and may make mistakes. Always verify output.

.github/workflows/monitoring-do-e2e-tests.yml Fixed Show fixed Hide fixed

.github/workflows/monitoring-do-e2e-tests.yml Fixed Show fixed Hide fixed

.github/workflows/monitoring-do-e2e-tests.yml Fixed Show fixed Hide fixed

.github/workflows/monitoring-do-e2e-tests.yml Fixed Show fixed Hide fixed

.github/workflows/monitoring-do-e2e-tests.yml Fixed Show fixed Hide fixed

.github/workflows/monitoring-do-e2e-tests.yml Fixed Show fixed Hide fixed

.github/workflows/monitoring-do-e2e-tests.yml Fixed Show fixed Hide fixed

.github/workflows/monitoring-do-e2e-tests.yml Fixed Show fixed Hide fixed

.github/workflows/monitoring-do-e2e-tests.yml Fixed Show fixed Hide fixed


          Simplify DO monitoring workflow - one test per type for load balanced…

123d034

… endpoints

github-advanced-security bot found potential problems

View reviewed changes

.github/workflows/monitoring-do-e2e-tests.yml Outdated

Comment on lines 92 to 136

+                  timeout-minutes: 15
+                  name: do-fe-swap
+                  needs: resolve-do-preview
+                  runs-on: macos-14
+                  steps:
+                    - name: Echo IP
+                      run: curl -L "https://ipinfo.io" -s
+                    - name: Check out repository
+                      uses: actions/checkout@v4
+                      with:
+                        sparse-checkout: |
+                          packages/e2e
+                    - name: Setup Node.js
+                      uses: actions/setup-node@v4
+                      with:
+                        node-version: 22.x
+                    - name: Cache dependencies
+                      uses: actions/cache@v4
+                      with:
+                        path: "**/node_modules"
+                        key: ${{ runner.OS }}-22.x-${{ hashFiles('**/yarn.lock') }}
+                        restore-keys: |
+                          ${{ runner.OS }}-22.x-
+                    - name: Install Playwright
+                      run: |
+                        yarn --cwd packages/e2e install --frozen-lockfile && npx playwright install --with-deps chromium
+                    - name: Run Swap tests against DO preview
+                      env:
+                        BASE_URL: "https://${{ needs.resolve-do-preview.outputs.environment_url }}"
+                        REST_ENDPOINT: ${{ env.DO_LCD_ENDPOINT }}
+                        PRIVATE_KEY: ${{ secrets.TEST_PRIVATE_KEY_3 }}
+                        WALLET_ID: ${{ secrets.TEST_WALLET_ID_3 }}
+                      run: |
+                        echo "Testing DO preview at: $BASE_URL"
+                        cd packages/e2e
+                        npx playwright test monitoring.swap
+                    - name: upload test results
+                      if: failure()
+                      uses: actions/upload-artifact@v4
+                      with:
+                        name: do-swap-test-results-${{ github.run_id }}
+                        path: packages/e2e/playwright-report
+                # FE market order test
+                fe-trade:

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}

Copilot Autofix

AI 28 days ago

To fix the problem, explicitly set permissions for the workflow so the GITHUB_TOKEN has only the minimal scopes required. Since the jobs here only read repository contents and upload artifacts, we can safely set contents: read at the workflow level. This documents the requirement and ensures the workflow won’t gain excessive privileges if repo/org defaults change.

The best minimal change is to add a root-level permissions: block near the top of .github/workflows/monitoring-do-e2e-tests.yml, after the on: trigger (or before env: / jobs:). This will apply to all jobs (resolve-do-preview, server-e2e-tests, fe-swap, fe-limit, etc.) that don’t have their own permissions block. No additional imports, methods, or definitions are needed; it’s a pure YAML configuration change.

Concretely:

Edit .github/workflows/monitoring-do-e2e-tests.yml.
Insert:

permissions:
  contents: read

at the root level (same indentation as on: and env:), for example between the on: block ending at line 16 and the env: block starting at line 19. This constrains the GITHUB_TOKEN for every job and addresses the CodeQL warning.

Suggested changeset 1

.github/workflows/monitoring-do-e2e-tests.yml

@@ -15,6 +15,9 @@
                     type: boolean
               # Schedule removed - triggered by dispatch-do-monitoring.yml on stage branch
+            permissions:
+              contents: read
             # DO endpoint configuration
             env:
               DO_BRANCH: "jason/do-endpoints-e2e"

Copilot is powered by AI and may make mistakes. Always verify output.

.github/workflows/monitoring-do-e2e-tests.yml Fixed Show fixed Hide fixed

.github/workflows/monitoring-do-e2e-tests.yml Fixed Show fixed Hide fixed

.github/workflows/monitoring-do-e2e-tests.yml Fixed Show fixed Hide fixed

.github/workflows/monitoring-do-e2e-tests.yml Fixed Show fixed Hide fixed


          Remove failing fe-trade job from DO monitoring workflow

0cc1751

github-advanced-security bot found potential problems

View reviewed changes

.github/workflows/monitoring-do-e2e-tests.yml

Comment on lines +137 to +184

+                  timeout-minutes: 15
+                  needs: [resolve-do-preview, fe-swap]
+                  name: do-fe-limit
+                  runs-on: macos-14
+                  outputs:
+                    unexpected: ${{ steps.set-output.outputs.unexpected }}
+                  steps:
+                    - name: Check out repository
+                      uses: actions/checkout@v4
+                      with:
+                        sparse-checkout: |
+                          packages/e2e
+                    - name: Setup Node.js
+                      uses: actions/setup-node@v4
+                      with:
+                        node-version: 22.x
+                    - name: Cache dependencies
+                      uses: actions/cache@v4
+                      with:
+                        path: "**/node_modules"
+                        key: ${{ runner.OS }}-22.x-${{ hashFiles('**/yarn.lock') }}
+                        restore-keys: |
+                          ${{ runner.OS }}-22.x-
+                    - name: Install Playwright
+                      run: |
+                        yarn --cwd packages/e2e install --frozen-lockfile && npx playwright install --with-deps chromium
+                    - name: Run Limit tests against DO preview
+                      env:
+                        BASE_URL: "https://${{ needs.resolve-do-preview.outputs.environment_url }}"
+                        REST_ENDPOINT: ${{ env.DO_LCD_ENDPOINT }}
+                        PRIVATE_KEY: ${{ secrets.TEST_PRIVATE_KEY_3 }}
+                        WALLET_ID: ${{ secrets.TEST_WALLET_ID_3 }}
+                      run: |
+                        cd packages/e2e
+                        npx playwright test monitoring.limit --timeout 180000
+                    - name: set-output
+                      if: failure() || success()
+                      id: set-output
+                      run: echo "unexpected=$(jq -r '.stats.unexpected' packages/e2e/playwright-report/test-results.json)" >> $GITHUB_OUTPUT
+                    - name: upload test results
+                      if: failure()
+                      uses: actions/upload-artifact@v4
+                      with:
+                        name: do-limit-test-results-${{ github.run_id }}
+                        path: packages/e2e/playwright-report
+                # Alert on critical failures (2nd attempt with multiple failures)
+                fe-bot-alert:

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}

Copilot Autofix

AI 28 days ago

In general, to fix this issue you should explicitly define a permissions: block either at the root of the workflow (to apply to all jobs by default) or per-job, and restrict the GITHUB_TOKEN to the least privileges necessary. For workflows that only need to read code and interact with Actions features (like artifacts and cache) but do not push commits, modify releases, or change issues/PRs, contents: read is usually sufficient as a base.

For this specific workflow in .github/workflows/monitoring-do-e2e-tests.yml, none of the shown jobs perform write operations against the GitHub API or repository contents. They use actions/checkout, actions/cache, actions/upload-artifact, run Playwright tests, and call external services (Vercel, Datadog) using secrets. These all function with a read-only contents permission. The simplest, least-invasive fix is therefore:

Add a top-level permissions: block immediately after the name: (or before on:) that sets contents: read.
This will apply to all jobs in the workflow, including the fe-limit job at line 135 that CodeQL specifically flagged, without changing any functional behavior.

No new methods, definitions, or imports are needed; it’s just a YAML configuration change.

Suggested changeset 1

.github/workflows/monitoring-do-e2e-tests.yml

@@ -1,5 +1,8 @@
             name: Synthetic DO Load Balanced Infrastructure Monitoring tests
+            permissions:
+              contents: read
             # This workflow validates the DigitalOcean global infrastructure endpoints
             # by running e2e tests against a Vercel preview configured to use DO RPC/LCD/SQS.
             # Used for migration validation before shifting traffic from GCP to DO.

Copilot is powered by AI and may make mistakes. Always verify output.

.github/workflows/monitoring-do-e2e-tests.yml

Comment on lines +185 to +223

+                  runs-on: ubuntu-latest
+                  needs: [server-e2e-tests, fe-limit]
+                  if: failure() && github.run_attempt == 2 && needs.fe-limit.outputs.unexpected > 1
+                  steps:
+                    - name: Install Datadog CI
+                      run: |
+                        echo "Installing Datadog CI..."
+                        curl -L --fail "https://github.com/DataDog/datadog-ci/releases/download/v4.1.2/datadog-ci_linux-x64" --output "/usr/local/bin/datadog-ci"
+                        chmod +x /usr/local/bin/datadog-ci
+                        echo "Datadog CI installed"
+                    - name: Verify Datadog CI Installation
+                      run: |
+                        echo "Verifying Datadog CI installation..."
+                        datadog-ci version
+                        echo "Datadog CI is ready to use"
+                    - name: Send Datadog alert for DO infrastructure failure
+                      env:
+                        DD_API_KEY: ${{ secrets.DATADOG_API_KEY }}
+                        DD_APP_KEY: ${{ secrets.DATADOG_APPLICATION_KEY }}
+                        DD_SITE: ${{ secrets.DATADOG_SITE }}
+                      run: |
+                        echo "Sending DO infrastructure failure metrics to Datadog..."
+                        # Tag as DO infrastructure failure
+                        datadog-ci tag --level pipeline \
+                          --tags "critical_failure:true" \
+                          --tags "infrastructure:digitalocean" \
+                          --tags "dd_gh_run_attempt:${{ github.run_attempt }}"
+                        # Add tags for unexpected failure counts
+                        datadog-ci tag --level pipeline \
+                          --tags "do_fe_limit_unexpected:${{ needs.fe-limit.outputs.unexpected }}"
+                        echo "Metrics sent to Datadog successfully"
+                # Clean up deployments
+                delete-deployments:

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {}

Copilot Autofix

AI 28 days ago

In general, fix this by adding an explicit permissions: block that scopes the GITHUB_TOKEN to the minimum required. Where jobs only need to read repository contents or metadata, use contents: read. Where a job must modify deployments (like delete-deployments), grant deployments: write for that job only. This documents the workflow’s needs and prevents accidental escalation if org/repo defaults change.

For this workflow, the simplest and least intrusive fix is:

Add a root-level permissions: block (just below on:) that sets the token to read-only: contents: read.
Add a job-level permissions: block to delete-deployments that grants the additional deployments: write permission that actions/github-script needs to list, mark inactive, and delete deployments.
Leave other jobs (e.g., resolve-do-preview, server-e2e-tests, fe-limit, fe-bot-alert) with the inherited read-only permissions, since they do not appear to use write operations on the GitHub API.

Concretely:

In .github/workflows/monitoring-do-e2e-tests.yml, insert:
```
permissions:
  contents: read
```
after the on: block (after line 16/18 region, before env:).
In the delete-deployments job definition (line 221 onwards), insert:
```
  permissions:
    contents: read
    deployments: write
```
immediately under delete-deployments: and before runs-on:. This overrides the workflow default for that job only and grants the minimal required write scope.

No new imports or external libraries are needed, just YAML changes to the workflow.

Suggested changeset 1

.github/workflows/monitoring-do-e2e-tests.yml

@@ -15,6 +15,9 @@
                     type: boolean
               # Schedule removed - triggered by dispatch-do-monitoring.yml on stage branch
+            permissions:
+              contents: read
             # DO endpoint configuration
             env:
               DO_BRANCH: "jason/do-endpoints-e2e"
@@ -219,6 +222,9 @@
               # Clean up deployments
               delete-deployments:
+                permissions:
+                  contents: read
+                  deployments: write
                 runs-on: ubuntu-latest
                 if: always()
                 needs: [server-e2e-tests, fe-limit, fe-bot-alert]

Copilot is powered by AI and may make mistakes. Always verify output.

.github/workflows/monitoring-do-e2e-tests.yml

Comment on lines +224 to +252

+                  runs-on: ubuntu-latest
+                  if: always()
+                  needs: [server-e2e-tests, fe-limit, fe-bot-alert]
+                  steps:
+                    - name: Delete Previous deployments
+                      uses: actions/github-script@v7
+                      with:
+                        debug: true
+                        script: |
+                          const deployments = await github.rest.repos.listDeployments({
+                             owner: context.repo.owner,
+                             repo: context.repo.repo,
+                             sha: context.sha
+                           });
+                           await Promise.all(
+                             deployments.data.map(async (deployment) => {
+                               await github.rest.repos.createDeploymentStatus({
+                                 owner: context.repo.owner,
+                                 repo: context.repo.repo,
+                                 deployment_id: deployment.id,
+                                 state: 'inactive'
+                               });
+                               return github.rest.repos.deleteDeployment({
+                                owner: context.repo.owner,
+                                repo: context.repo.repo,
+                                deployment_id: deployment.id
+                              });
+                             })
+                           );

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {}

Copilot Autofix

AI 28 days ago

At a high level, the fix is to define explicit GITHUB_TOKEN permissions for this workflow, reducing them to the minimum needed. The simplest and safest approach is to add a permissions: block at the workflow root that grants read-only access to repository contents and other resources by default, and then override permissions for specific jobs that need additional write scopes.

For this workflow, most jobs are running tests and uploading artifacts; they only need contents: read and id-token: write if they use OIDC (not shown here). The fe-bot-alert job only sends data to Datadog using API keys in secrets and does not call GitHub APIs, so it can use the default read-only permissions. The delete-deployments job uses the GitHub REST API (via actions/github-script) to list deployments and then update and delete them, which require deployments: write. To implement the fix without changing existing behavior, we will:

Add a workflow-level permissions: block after the on: section, setting contents: read as the default (and you could add other read scopes if needed elsewhere).
Add a job-level permissions: block under delete-deployments: granting deployments: write (and keeping contents: read), so that this job has the rights it needs while other jobs stay read-only.

All changes are confined to .github/workflows/monitoring-do-e2e-tests.yml in the shown regions: one insertion near the top (after the on: block, around line 16–18), and one insertion under the delete-deployments job (around line 221–223). No new imports or external dependencies are required.

Suggested changeset 1

.github/workflows/monitoring-do-e2e-tests.yml

@@ -15,6 +15,9 @@
                     type: boolean
               # Schedule removed - triggered by dispatch-do-monitoring.yml on stage branch
+            permissions:
+              contents: read
             # DO endpoint configuration
             env:
               DO_BRANCH: "jason/do-endpoints-e2e"
@@ -222,6 +225,9 @@
                 runs-on: ubuntu-latest
                 if: always()
                 needs: [server-e2e-tests, fe-limit, fe-bot-alert]
+                permissions:
+                  contents: read
+                  deployments: write
                 steps:
                   - name: Delete Previous deployments
                     uses: actions/github-script@v7

Copilot is powered by AI and may make mistakes. Always verify output.


          Remove schedule trigger - will be dispatched from stage

641a1d3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet