Playwright automation for the UK NDR portal. The workflow signs in, cleans stale packages from Your Downloads, runs Data Discovery filtering, creates a Session Basket package, and downloads the resulting files to a local folder.
- Sign in to https://ndr.nstauthority.co.uk/.
- Open
Your Downloadsbefore every run and expire any visibleExpire Manuallypackages. - Open
Data Discovery. - Filter the
Project IDs TablebyNDR_PROJECT_ID. - Open
View Filesfor the matching project. - Filter the
Files TablebyNDR_INFO_TAG. - Select files using the Files Table header checkbox (left of
Classification Tags). - Add selected files to
Session Basket. - Open
Session Basket, select the package header checkbox, and clickCreate Download Package from Selection. - Return to
Your Downloads. - Wait until one or more green download buttons are clickable.
- Close the top
Information Messagebanner immediately before each download click. - Click all visible green download buttons one by one.
- Detect partial download files (
.crdownload,.part,.tmp) and wait until they resolve into final files (.zip,.sgy,.segy, or other extensions). - Save screenshots and logs under
artifacts/, then print a success line with the downloaded file path.
The project creates these folders automatically if they do not already exist:
/Users/tuna/Documents/ndr-auto-download/artifacts/Users/tuna/Documents/ndr-auto-download/downloaded/Users/tuna/Documents/ndr-auto-download/browser-profile/Users/tuna/Documents/ndr-auto-download/checkpoints
You do not need to create them manually after cloning.
- Node.js 18+ (Node 20+ recommended)
- npm
- A valid UK NDR account
- Enough local disk space for the files you plan to download, especially for large SEGY downloads
brew install nodesudo apt-get update
sudo apt-get install -y nodejs npmnode -v
npm -vFrom the project root:
cd /Users/tuna/Documents/ndr-auto-download
npm ciIf you are developing locally and intentionally need npm to refresh the lockfile, you can use:
npm installThis installs node_modules/ locally. node_modules/ is ignored by git.
The workflow uses Chromium through Playwright.
npx playwright install chromiumIf your Linux machine allows package installation (local machine, VM, container, or cluster image you control), this is the most complete setup:
npx playwright install --with-deps chromiumThe script will use this local macOS Playwright Chromium path if it exists:
/Users/tuna/Library/Caches/ms-playwright/chromium-1208/chrome-mac-arm64/Google Chrome for Testing.app/Contents/MacOS/Google Chrome for Testing
If that path does not exist, the script falls back to Playwright's default Chromium installation.
You can force a specific Chromium or Chrome binary with:
export NDR_CHROMIUM_PATH='/absolute/path/to/chrome-or-chromium'This is useful on clusters, shared Linux machines, or any machine where the browser is installed in a non-default location.
This is the normal mode for batch runs, remote machines, and clusters.
export NDR_HEADLESS='true'If NDR_HEADLESS is unset, the script still runs headless by default.
Use this for local debugging when you want to watch the browser.
export NDR_HEADLESS='false'Example visible run:
NDR_EMAIL='your-email' \
NDR_PASSWORD='your-password' \
NDR_HEADLESS='false' \
./run.sh SH__1984seis0004 FINAL_POST_STACK 4A cluster or terminal-only machine is fine as long as you run headless.
- Install Node.js and npm.
- Install project dependencies:
npm install- Install Chromium for Playwright:
npx playwright install chromium- Run in headless mode (default).
- No visible desktop UI is required when running headless.
- The machine must have outbound network access to the NDR site.
- The machine must have enough writable disk space for:
- the persistent browser profile
- temporary browser-managed storage
- the final downloaded files
- If the cluster blocks GUI libraries, use a browser binary already configured for that environment and point to it with
NDR_CHROMIUM_PATH. - Non-headless runs on a cluster generally require a display server (or something like Xvfb). Unless you explicitly need visual debugging, keep cluster runs headless.
The workflow now uses a persistent browser profile in browser-profile/ instead of a fresh temporary browser context. This is important because some large downloads (especially SEGY downloads) rely on browser-managed temporary storage before the final file is written. A persistent profile behaves more like a normal desktop browser and avoids a class of download failures seen with temporary contexts.
Set these environment variables before running:
NDR_EMAILNDR_PASSWORD
Example:
export NDR_EMAIL='your-email'
export NDR_PASSWORD='your-password'Default single-job run:
NDR_EMAIL='your-email' NDR_PASSWORD='your-password' ./run.shExplicit single-job run:
NDR_EMAIL='your-email' NDR_PASSWORD='your-password' ./run.sh <PROJECT_ID> <INFO_TAG> <ATTEMPTS>Example:
NDR_EMAIL='your-email' NDR_PASSWORD='your-password' ./run.sh TT__1993seis0002 ACQUISITION_REPORT 4Equivalent npm command:
NDR_EMAIL='your-email' NDR_PASSWORD='your-password' npm run download:single -- TT__1993seis0002 ACQUISITION_REPORT 4NDR_EMAIL='your-email' NDR_PASSWORD='your-password' ./run.sh --batch ./download_jobs2.json 4Or:
NDR_EMAIL='your-email' NDR_PASSWORD='your-password' npm run download:batchnpm run download:batch uses the batch file configured by NDR_BATCH_FILE or defaults to download_jobs.json if present.
Supported shapes:
- top-level object with
jobs - top-level array
Example:
{
"jobs": [
{
"NDR_PROJECT_ID": "TT__1993seis0002",
"NDR_INFO": "ACQUISITION_REPORT"
},
{
"NDR_PROJECT_ID": "PU__2009seis0001",
"NDR_INFO": "ACQUISITION_REPORT"
}
]
}Per item:
NDR_PROJECT_IDNDR_INFO
NDR_EMAIL: login emailNDR_PASSWORD: login passwordNDR_HEADLESS:trueorfalseNDR_DOWNLOAD_TIMEOUT_MS: override download polling timeout for large filesNDR_BATCH_FILE: batch JSON path when usingnpm run download:batchNDR_BATCH_MAX_ATTEMPTS: attempts per batch jobNDR_BROWSER_PROFILE_DIR: override the persistent browser profile pathNDR_CHROMIUM_PATH: override the browser executable path
- Screenshots and failure logs:
/Users/tuna/Documents/ndr-auto-download/artifacts
- Downloaded files:
/Users/tuna/Documents/ndr-auto-download/downloaded
- Persistent browser profile:
/Users/tuna/Documents/ndr-auto-download/browser-profile
- Checkpoint snapshots:
/Users/tuna/Documents/ndr-auto-download/checkpoints
The workflow treats these as temporary in-progress files:
.crdownload.part.tmp
A file is only considered complete when:
- the temporary companion file is gone, and
- the final file exists with size greater than zero
This avoids false positives such as:
file.sgyappearing whilefile.sgy.crdownloadstill exists
This warning is generated by the browser, not necessarily by the filesystem path where the final file is saved.
It usually means Chromium cannot allocate enough browser-managed temporary storage for the download staging process.
Fixes:
- Free disk space on the local drive.
- Remove old files from
downloaded/. - Remove old screenshots/logs from
artifacts/. - Reuse the persistent
browser-profile/directory (already enabled). - Ensure the profile directory lives on a disk with enough free space.
The Microsoft B2C login path sometimes returns transient server-side errors. The launcher already retries runs, so rerunning is usually sufficient.
For large packages, the site may still be preparing the browser-side transfer. The script waits for actual partial or completed download files and monitors the in-page Downloading modal before deciding the run is stalled.
These paths are gitignored and should stay local-only:
node_modules/artifacts/artifacts_old/downloaded/browser-profile/checkpoints/- local
.envfiles - local logs and machine-specific noise files
- Launcher:
/Users/tuna/Documents/ndr-auto-download/run.sh - Main workflow:
/Users/tuna/Documents/ndr-auto-download/scripts/session-basket-download-segy.js - Batch runner:
/Users/tuna/Documents/ndr-auto-download/scripts/run-download-batch.js - Example batch files:
/Users/tuna/Documents/ndr-auto-download/download_jobs1.json/Users/tuna/Documents/ndr-auto-download/download_jobs2.json