Skip to content

Comments

Add DigitalOcean infrastructure e2e monitoring workflow#4259

Draft
jasbanza wants to merge 5 commits intostagefrom
jason/do-endpoints-e2e
Draft

Add DigitalOcean infrastructure e2e monitoring workflow#4259
jasbanza wants to merge 5 commits intostagefrom
jason/do-endpoints-e2e

Conversation

@jasbanza
Copy link
Member

What is the purpose of the change:

WIP. This PR allows us to do playwright tests on vercel preview links, using DigitalOcean endpoints.

NB: Endpoints need to be manually set for this specific branch in Vercel secrets!

Linear Task

MER-59: Set up DigitalOcean Migration monitoring

Brief Changelog

  • There are additional workflows in this branch which are specifically meant to help facilitate automated tests.

Testing and Verifying

  • Additional tweaks might be needed as we begin testing.
    • simplify playwright tests as existing ones are still flaky.
  • This is a work in progress and this PR will be updated once ready for review.

@vercel
Copy link

vercel bot commented Jan 25, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
osmosis-frontend Ready Ready Preview, Comment Jan 27, 2026 3:02pm
4 Skipped Deployments
Project Deployment Review Updated (UTC)
osmosis-frontend-datadog Ignored Ignored Jan 27, 2026 3:02pm
osmosis-frontend-dev Ignored Ignored Jan 27, 2026 3:02pm
osmosis-frontend-edgenet Ignored Ignored Jan 27, 2026 3:02pm
osmosis-testnet Ignored Ignored Jan 27, 2026 3:02pm

Request Review

Comment on lines +29 to +53
runs-on: ubuntu-latest
outputs:
environment_url: ${{ steps.vercel.outputs.environment_url }}
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Get DO branch Vercel preview URL
id: vercel
env:
BRANCH_NAME: ${{ env.DO_BRANCH }}
FILTER_BRANCH: ${{ env.DO_BRANCH }}
VERCEL_TOKEN: ${{ secrets.VERCEL_TOKEN }}
VERCEL_PROJECT: ${{ secrets.VERCEL_PROJECT }}
run: |
echo "Looking for latest READY deployment for branch: $FILTER_BRANCH"
cd .github && python await_deployment.py
- name: Echo resolved URL
env:
environment_url: ${{ steps.vercel.outputs.environment_url }}
run: |
echo "DO Preview URL: https://$environment_url"
echo "This preview is configured to use DO endpoints via Vercel branch env vars"

# Server synthetic tests against DO SQS
server-e2e-tests:

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}

Copilot Autofix

AI 28 days ago

To fix the issue, add an explicit permissions block that limits the GITHUB_TOKEN to the minimal required scope. From the provided snippet, the jobs only need to read the repository (for actions/checkout) and do not appear to perform any GitHub write operations. A safe minimal configuration is contents: read at the workflow root so it applies to all jobs that don’t override it.

Concretely, in .github/workflows/monitoring-do-e2e-tests.yml, add a top‑level permissions: block after the on: section (e.g., after line 16). This block should specify contents: read. No changes are needed to individual jobs unless some job (not shown) truly needs additional scopes; based on the provided snippet, that is not the case. No imports or external tools are required; this is purely a YAML configuration change within the workflow file.

Suggested changeset 1
.github/workflows/monitoring-do-e2e-tests.yml

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/monitoring-do-e2e-tests.yml b/.github/workflows/monitoring-do-e2e-tests.yml
--- a/.github/workflows/monitoring-do-e2e-tests.yml
+++ b/.github/workflows/monitoring-do-e2e-tests.yml
@@ -15,6 +15,9 @@
         type: boolean
   # Schedule removed - triggered by dispatch-do-monitoring.yml on stage branch
 
+permissions:
+  contents: read
+
 # DO endpoint configuration
 env:
   DO_BRANCH: "jason/do-endpoints-e2e"
EOF
@@ -15,6 +15,9 @@
type: boolean
# Schedule removed - triggered by dispatch-do-monitoring.yml on stage branch

permissions:
contents: read

# DO endpoint configuration
env:
DO_BRANCH: "jason/do-endpoints-e2e"
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines 54 to 90
name: do-server-tests
needs: resolve-do-preview
runs-on: ubuntu-latest
if: ${{ github.event.inputs.skip_server_tests != 'true' }}
steps:
- name: Checkout Repo
uses: actions/checkout@v4

- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: 22.x

- name: Cache dependencies
uses: actions/cache@v4
with:
path: "**/node_modules"
key: ${{ runner.OS }}-22.x-${{ hashFiles('**/yarn.lock') }}
restore-keys: |
${{ runner.OS }}-22.x-

- name: Install Dependencies
run: yarn install --frozen-lockfile

- name: Echo DO SQS Server URL
run: |
echo "Testing SQS Server URL: ${{ env.DO_SQS_ENDPOINT }}"

- name: Run Server Tests against DO SQS
id: tests
run: yarn test:e2e --filter=@osmosis-labs/server
env:
NEXT_PUBLIC_SIDECAR_BASE_URL: ${{ env.DO_SQS_ENDPOINT }}
NEXT_PUBLIC_TIMESERIES_DATA_URL: https://data.app.osmosis.zone

# FE swap monitoring tests (US region - no proxy)
fe-swap-us:

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}

Copilot Autofix

AI 28 days ago

In general, the fix is to add an explicit permissions: block that grants the minimal necessary access to the GITHUB_TOKEN. This can be done at the workflow root (applies to all jobs) or for individual jobs. Since all visible jobs only need to read repository contents, the minimal safe baseline is permissions: { contents: read }.

The best fix here, without changing functionality, is to define a single workflow-level permissions: block near the top of .github/workflows/monitoring-do-e2e-tests.yml, after the on: section and before env: or jobs:. This will satisfy CodeQL’s complaint about the server-e2e-tests job (and all other jobs), while keeping functionality unchanged, because they only require read access to check out code and use caches/artifacts. No additional methods, imports, or definitions are needed; this is purely a YAML configuration change.

Specifically, in .github/workflows/monitoring-do-e2e-tests.yml, insert:

permissions:
  contents: read

between the on: block (ending at line 16) and the env: block (starting at line 19). No other changes are required.

Suggested changeset 1
.github/workflows/monitoring-do-e2e-tests.yml

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/monitoring-do-e2e-tests.yml b/.github/workflows/monitoring-do-e2e-tests.yml
--- a/.github/workflows/monitoring-do-e2e-tests.yml
+++ b/.github/workflows/monitoring-do-e2e-tests.yml
@@ -15,6 +15,9 @@
         type: boolean
   # Schedule removed - triggered by dispatch-do-monitoring.yml on stage branch
 
+permissions:
+  contents: read
+
 # DO endpoint configuration
 env:
   DO_BRANCH: "jason/do-endpoints-e2e"
EOF
@@ -15,6 +15,9 @@
type: boolean
# Schedule removed - triggered by dispatch-do-monitoring.yml on stage branch

permissions:
contents: read

# DO endpoint configuration
env:
DO_BRANCH: "jason/do-endpoints-e2e"
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines 92 to 136
timeout-minutes: 15
name: do-fe-swap
needs: resolve-do-preview
runs-on: macos-14
steps:
- name: Echo IP
run: curl -L "https://ipinfo.io" -s
- name: Check out repository
uses: actions/checkout@v4
with:
sparse-checkout: |
packages/e2e
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: 22.x
- name: Cache dependencies
uses: actions/cache@v4
with:
path: "**/node_modules"
key: ${{ runner.OS }}-22.x-${{ hashFiles('**/yarn.lock') }}
restore-keys: |
${{ runner.OS }}-22.x-
- name: Install Playwright
run: |
yarn --cwd packages/e2e install --frozen-lockfile && npx playwright install --with-deps chromium
- name: Run Swap tests against DO preview
env:
BASE_URL: "https://${{ needs.resolve-do-preview.outputs.environment_url }}"
REST_ENDPOINT: ${{ env.DO_LCD_ENDPOINT }}
PRIVATE_KEY: ${{ secrets.TEST_PRIVATE_KEY_3 }}
WALLET_ID: ${{ secrets.TEST_WALLET_ID_3 }}
run: |
echo "Testing DO preview at: $BASE_URL"
cd packages/e2e
npx playwright test monitoring.swap
- name: upload test results
if: failure()
uses: actions/upload-artifact@v4
with:
name: do-swap-test-results-${{ github.run_id }}
path: packages/e2e/playwright-report

# FE market order test
fe-trade:

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}

Copilot Autofix

AI 28 days ago

To fix the problem, explicitly set permissions for the workflow so the GITHUB_TOKEN has only the minimal scopes required. Since the jobs here only read repository contents and upload artifacts, we can safely set contents: read at the workflow level. This documents the requirement and ensures the workflow won’t gain excessive privileges if repo/org defaults change.

The best minimal change is to add a root-level permissions: block near the top of .github/workflows/monitoring-do-e2e-tests.yml, after the on: trigger (or before env: / jobs:). This will apply to all jobs (resolve-do-preview, server-e2e-tests, fe-swap, fe-limit, etc.) that don’t have their own permissions block. No additional imports, methods, or definitions are needed; it’s a pure YAML configuration change.

Concretely:

  • Edit .github/workflows/monitoring-do-e2e-tests.yml.
  • Insert:
permissions:
  contents: read

at the root level (same indentation as on: and env:), for example between the on: block ending at line 16 and the env: block starting at line 19. This constrains the GITHUB_TOKEN for every job and addresses the CodeQL warning.

Suggested changeset 1
.github/workflows/monitoring-do-e2e-tests.yml

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/monitoring-do-e2e-tests.yml b/.github/workflows/monitoring-do-e2e-tests.yml
--- a/.github/workflows/monitoring-do-e2e-tests.yml
+++ b/.github/workflows/monitoring-do-e2e-tests.yml
@@ -15,6 +15,9 @@
         type: boolean
   # Schedule removed - triggered by dispatch-do-monitoring.yml on stage branch
 
+permissions:
+  contents: read
+
 # DO endpoint configuration
 env:
   DO_BRANCH: "jason/do-endpoints-e2e"
EOF
@@ -15,6 +15,9 @@
type: boolean
# Schedule removed - triggered by dispatch-do-monitoring.yml on stage branch

permissions:
contents: read

# DO endpoint configuration
env:
DO_BRANCH: "jason/do-endpoints-e2e"
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +137 to +184
timeout-minutes: 15
needs: [resolve-do-preview, fe-swap]
name: do-fe-limit
runs-on: macos-14
outputs:
unexpected: ${{ steps.set-output.outputs.unexpected }}
steps:
- name: Check out repository
uses: actions/checkout@v4
with:
sparse-checkout: |
packages/e2e
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: 22.x
- name: Cache dependencies
uses: actions/cache@v4
with:
path: "**/node_modules"
key: ${{ runner.OS }}-22.x-${{ hashFiles('**/yarn.lock') }}
restore-keys: |
${{ runner.OS }}-22.x-
- name: Install Playwright
run: |
yarn --cwd packages/e2e install --frozen-lockfile && npx playwright install --with-deps chromium
- name: Run Limit tests against DO preview
env:
BASE_URL: "https://${{ needs.resolve-do-preview.outputs.environment_url }}"
REST_ENDPOINT: ${{ env.DO_LCD_ENDPOINT }}
PRIVATE_KEY: ${{ secrets.TEST_PRIVATE_KEY_3 }}
WALLET_ID: ${{ secrets.TEST_WALLET_ID_3 }}
run: |
cd packages/e2e
npx playwright test monitoring.limit --timeout 180000
- name: set-output
if: failure() || success()
id: set-output
run: echo "unexpected=$(jq -r '.stats.unexpected' packages/e2e/playwright-report/test-results.json)" >> $GITHUB_OUTPUT
- name: upload test results
if: failure()
uses: actions/upload-artifact@v4
with:
name: do-limit-test-results-${{ github.run_id }}
path: packages/e2e/playwright-report

# Alert on critical failures (2nd attempt with multiple failures)
fe-bot-alert:

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}

Copilot Autofix

AI 28 days ago

In general, to fix this issue you should explicitly define a permissions: block either at the root of the workflow (to apply to all jobs by default) or per-job, and restrict the GITHUB_TOKEN to the least privileges necessary. For workflows that only need to read code and interact with Actions features (like artifacts and cache) but do not push commits, modify releases, or change issues/PRs, contents: read is usually sufficient as a base.

For this specific workflow in .github/workflows/monitoring-do-e2e-tests.yml, none of the shown jobs perform write operations against the GitHub API or repository contents. They use actions/checkout, actions/cache, actions/upload-artifact, run Playwright tests, and call external services (Vercel, Datadog) using secrets. These all function with a read-only contents permission. The simplest, least-invasive fix is therefore:

  • Add a top-level permissions: block immediately after the name: (or before on:) that sets contents: read.
  • This will apply to all jobs in the workflow, including the fe-limit job at line 135 that CodeQL specifically flagged, without changing any functional behavior.

No new methods, definitions, or imports are needed; it’s just a YAML configuration change.

Suggested changeset 1
.github/workflows/monitoring-do-e2e-tests.yml

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/monitoring-do-e2e-tests.yml b/.github/workflows/monitoring-do-e2e-tests.yml
--- a/.github/workflows/monitoring-do-e2e-tests.yml
+++ b/.github/workflows/monitoring-do-e2e-tests.yml
@@ -1,5 +1,8 @@
 name: Synthetic DO Load Balanced Infrastructure Monitoring tests
 
+permissions:
+  contents: read
+
 # This workflow validates the DigitalOcean global infrastructure endpoints
 # by running e2e tests against a Vercel preview configured to use DO RPC/LCD/SQS.
 # Used for migration validation before shifting traffic from GCP to DO.
EOF
@@ -1,5 +1,8 @@
name: Synthetic DO Load Balanced Infrastructure Monitoring tests

permissions:
contents: read

# This workflow validates the DigitalOcean global infrastructure endpoints
# by running e2e tests against a Vercel preview configured to use DO RPC/LCD/SQS.
# Used for migration validation before shifting traffic from GCP to DO.
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +185 to +223
runs-on: ubuntu-latest
needs: [server-e2e-tests, fe-limit]
if: failure() && github.run_attempt == 2 && needs.fe-limit.outputs.unexpected > 1
steps:
- name: Install Datadog CI
run: |
echo "Installing Datadog CI..."
curl -L --fail "https://github.com/DataDog/datadog-ci/releases/download/v4.1.2/datadog-ci_linux-x64" --output "/usr/local/bin/datadog-ci"
chmod +x /usr/local/bin/datadog-ci
echo "Datadog CI installed"

- name: Verify Datadog CI Installation
run: |
echo "Verifying Datadog CI installation..."
datadog-ci version
echo "Datadog CI is ready to use"

- name: Send Datadog alert for DO infrastructure failure
env:
DD_API_KEY: ${{ secrets.DATADOG_API_KEY }}
DD_APP_KEY: ${{ secrets.DATADOG_APPLICATION_KEY }}
DD_SITE: ${{ secrets.DATADOG_SITE }}
run: |
echo "Sending DO infrastructure failure metrics to Datadog..."

# Tag as DO infrastructure failure
datadog-ci tag --level pipeline \
--tags "critical_failure:true" \
--tags "infrastructure:digitalocean" \
--tags "dd_gh_run_attempt:${{ github.run_attempt }}"

# Add tags for unexpected failure counts
datadog-ci tag --level pipeline \
--tags "do_fe_limit_unexpected:${{ needs.fe-limit.outputs.unexpected }}"

echo "Metrics sent to Datadog successfully"

# Clean up deployments
delete-deployments:

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {}

Copilot Autofix

AI 28 days ago

In general, fix this by adding an explicit permissions: block that scopes the GITHUB_TOKEN to the minimum required. Where jobs only need to read repository contents or metadata, use contents: read. Where a job must modify deployments (like delete-deployments), grant deployments: write for that job only. This documents the workflow’s needs and prevents accidental escalation if org/repo defaults change.

For this workflow, the simplest and least intrusive fix is:

  • Add a root-level permissions: block (just below on:) that sets the token to read-only: contents: read.
  • Add a job-level permissions: block to delete-deployments that grants the additional deployments: write permission that actions/github-script needs to list, mark inactive, and delete deployments.
  • Leave other jobs (e.g., resolve-do-preview, server-e2e-tests, fe-limit, fe-bot-alert) with the inherited read-only permissions, since they do not appear to use write operations on the GitHub API.

Concretely:

  • In .github/workflows/monitoring-do-e2e-tests.yml, insert:

    permissions:
      contents: read

    after the on: block (after line 16/18 region, before env:).

  • In the delete-deployments job definition (line 221 onwards), insert:

      permissions:
        contents: read
        deployments: write

    immediately under delete-deployments: and before runs-on:. This overrides the workflow default for that job only and grants the minimal required write scope.

No new imports or external libraries are needed, just YAML changes to the workflow.

Suggested changeset 1
.github/workflows/monitoring-do-e2e-tests.yml

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/monitoring-do-e2e-tests.yml b/.github/workflows/monitoring-do-e2e-tests.yml
--- a/.github/workflows/monitoring-do-e2e-tests.yml
+++ b/.github/workflows/monitoring-do-e2e-tests.yml
@@ -15,6 +15,9 @@
         type: boolean
   # Schedule removed - triggered by dispatch-do-monitoring.yml on stage branch
 
+permissions:
+  contents: read
+
 # DO endpoint configuration
 env:
   DO_BRANCH: "jason/do-endpoints-e2e"
@@ -219,6 +222,9 @@
 
   # Clean up deployments
   delete-deployments:
+    permissions:
+      contents: read
+      deployments: write
     runs-on: ubuntu-latest
     if: always()
     needs: [server-e2e-tests, fe-limit, fe-bot-alert]
EOF
@@ -15,6 +15,9 @@
type: boolean
# Schedule removed - triggered by dispatch-do-monitoring.yml on stage branch

permissions:
contents: read

# DO endpoint configuration
env:
DO_BRANCH: "jason/do-endpoints-e2e"
@@ -219,6 +222,9 @@

# Clean up deployments
delete-deployments:
permissions:
contents: read
deployments: write
runs-on: ubuntu-latest
if: always()
needs: [server-e2e-tests, fe-limit, fe-bot-alert]
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +224 to +252
runs-on: ubuntu-latest
if: always()
needs: [server-e2e-tests, fe-limit, fe-bot-alert]
steps:
- name: Delete Previous deployments
uses: actions/github-script@v7
with:
debug: true
script: |
const deployments = await github.rest.repos.listDeployments({
owner: context.repo.owner,
repo: context.repo.repo,
sha: context.sha
});
await Promise.all(
deployments.data.map(async (deployment) => {
await github.rest.repos.createDeploymentStatus({
owner: context.repo.owner,
repo: context.repo.repo,
deployment_id: deployment.id,
state: 'inactive'
});
return github.rest.repos.deleteDeployment({
owner: context.repo.owner,
repo: context.repo.repo,
deployment_id: deployment.id
});
})
);

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {}

Copilot Autofix

AI 28 days ago

At a high level, the fix is to define explicit GITHUB_TOKEN permissions for this workflow, reducing them to the minimum needed. The simplest and safest approach is to add a permissions: block at the workflow root that grants read-only access to repository contents and other resources by default, and then override permissions for specific jobs that need additional write scopes.

For this workflow, most jobs are running tests and uploading artifacts; they only need contents: read and id-token: write if they use OIDC (not shown here). The fe-bot-alert job only sends data to Datadog using API keys in secrets and does not call GitHub APIs, so it can use the default read-only permissions. The delete-deployments job uses the GitHub REST API (via actions/github-script) to list deployments and then update and delete them, which require deployments: write. To implement the fix without changing existing behavior, we will:

  • Add a workflow-level permissions: block after the on: section, setting contents: read as the default (and you could add other read scopes if needed elsewhere).
  • Add a job-level permissions: block under delete-deployments: granting deployments: write (and keeping contents: read), so that this job has the rights it needs while other jobs stay read-only.

All changes are confined to .github/workflows/monitoring-do-e2e-tests.yml in the shown regions: one insertion near the top (after the on: block, around line 16–18), and one insertion under the delete-deployments job (around line 221–223). No new imports or external dependencies are required.

Suggested changeset 1
.github/workflows/monitoring-do-e2e-tests.yml

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/monitoring-do-e2e-tests.yml b/.github/workflows/monitoring-do-e2e-tests.yml
--- a/.github/workflows/monitoring-do-e2e-tests.yml
+++ b/.github/workflows/monitoring-do-e2e-tests.yml
@@ -15,6 +15,9 @@
         type: boolean
   # Schedule removed - triggered by dispatch-do-monitoring.yml on stage branch
 
+permissions:
+  contents: read
+
 # DO endpoint configuration
 env:
   DO_BRANCH: "jason/do-endpoints-e2e"
@@ -222,6 +225,9 @@
     runs-on: ubuntu-latest
     if: always()
     needs: [server-e2e-tests, fe-limit, fe-bot-alert]
+    permissions:
+      contents: read
+      deployments: write
     steps:
       - name: Delete Previous deployments
         uses: actions/github-script@v7
EOF
@@ -15,6 +15,9 @@
type: boolean
# Schedule removed - triggered by dispatch-do-monitoring.yml on stage branch

permissions:
contents: read

# DO endpoint configuration
env:
DO_BRANCH: "jason/do-endpoints-e2e"
@@ -222,6 +225,9 @@
runs-on: ubuntu-latest
if: always()
needs: [server-e2e-tests, fe-limit, fe-bot-alert]
permissions:
contents: read
deployments: write
steps:
- name: Delete Previous deployments
uses: actions/github-script@v7
Copilot is powered by AI and may make mistakes. Always verify output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant