NDR Auto Download

Playwright automation for the UK NDR portal. The workflow signs in, cleans stale packages from Your Downloads, runs Data Discovery filtering, creates a Session Basket package, and downloads the resulting files to a local folder.

What The Automation Does

Sign in to https://ndr.nstauthority.co.uk/.
Open Your Downloads before every run and expire any visible Expire Manually packages.
Open Data Discovery.
Filter the Project IDs Table by NDR_PROJECT_ID.
Open View Files for the matching project.
Filter the Files Table by NDR_INFO_TAG.
Select files using the Files Table header checkbox (left of Classification Tags).
Add selected files to Session Basket.
Open Session Basket, select the package header checkbox, and click Create Download Package from Selection.
Return to Your Downloads.
Wait until one or more green download buttons are clickable.
Close the top Information Message banner immediately before each download click.
Click all visible green download buttons one by one.
Detect partial download files (.crdownload, .part, .tmp) and wait until they resolve into final files (.zip, .sgy, .segy, or other extensions).
Save screenshots and logs under artifacts/, then print a success line with the downloaded file path.

Runtime Folders

The project creates these folders automatically if they do not already exist:

/Users/tuna/Documents/ndr-auto-download/artifacts
/Users/tuna/Documents/ndr-auto-download/downloaded
/Users/tuna/Documents/ndr-auto-download/browser-profile
/Users/tuna/Documents/ndr-auto-download/checkpoints

You do not need to create them manually after cloning.

Requirements

Node.js 18+ (Node 20+ recommended)
npm
A valid UK NDR account
Enough local disk space for the files you plan to download, especially for large SEGY downloads

Install Node.js

macOS (Homebrew)

brew install node

Ubuntu / Debian

sudo apt-get update
sudo apt-get install -y nodejs npm

Verify

node -v
npm -v

Install Project Dependencies

From the project root:

cd /Users/tuna/Documents/ndr-auto-download
npm ci

If you are developing locally and intentionally need npm to refresh the lockfile, you can use:

npm install

This installs node_modules/ locally. node_modules/ is ignored by git.

Install Playwright Browsers

The workflow uses Chromium through Playwright.

Standard install (local machine)

npx playwright install chromium

Linux machine with package dependency install support

If your Linux machine allows package installation (local machine, VM, container, or cluster image you control), this is the most complete setup:

npx playwright install --with-deps chromium

Existing local browser cache

The script will use this local macOS Playwright Chromium path if it exists:

/Users/tuna/Library/Caches/ms-playwright/chromium-1208/chrome-mac-arm64/Google Chrome for Testing.app/Contents/MacOS/Google Chrome for Testing

If that path does not exist, the script falls back to Playwright's default Chromium installation.

Override browser executable explicitly

You can force a specific Chromium or Chrome binary with:

export NDR_CHROMIUM_PATH='/absolute/path/to/chrome-or-chromium'

This is useful on clusters, shared Linux machines, or any machine where the browser is installed in a non-default location.

Headless vs Non-Headless

Headless (default)

This is the normal mode for batch runs, remote machines, and clusters.

export NDR_HEADLESS='true'

If NDR_HEADLESS is unset, the script still runs headless by default.

Non-headless (visible browser)

Use this for local debugging when you want to watch the browser.

export NDR_HEADLESS='false'

Example visible run:

NDR_EMAIL='your-email' \
NDR_PASSWORD='your-password' \
NDR_HEADLESS='false' \
./run.sh SH__1984seis0004 FINAL_POST_STACK 4

Cluster / Server Notes

A cluster or terminal-only machine is fine as long as you run headless.

Recommended cluster setup

Install Node.js and npm.
Install project dependencies:

npm install

Install Chromium for Playwright:

npx playwright install chromium

Run in headless mode (default).

Important notes for clusters

No visible desktop UI is required when running headless.
The machine must have outbound network access to the NDR site.
The machine must have enough writable disk space for:
- the persistent browser profile
- temporary browser-managed storage
- the final downloaded files
If the cluster blocks GUI libraries, use a browser binary already configured for that environment and point to it with NDR_CHROMIUM_PATH.
Non-headless runs on a cluster generally require a display server (or something like Xvfb). Unless you explicitly need visual debugging, keep cluster runs headless.

Why the persistent browser profile matters

The workflow now uses a persistent browser profile in browser-profile/ instead of a fresh temporary browser context. This is important because some large downloads (especially SEGY downloads) rely on browser-managed temporary storage before the final file is written. A persistent profile behaves more like a normal desktop browser and avoids a class of download failures seen with temporary contexts.

Credentials

Set these environment variables before running:

NDR_EMAIL
NDR_PASSWORD

Example:

export NDR_EMAIL='your-email'
export NDR_PASSWORD='your-password'

Run The Workflow

Single job

Default single-job run:

NDR_EMAIL='your-email' NDR_PASSWORD='your-password' ./run.sh

Explicit single-job run:

NDR_EMAIL='your-email' NDR_PASSWORD='your-password' ./run.sh <PROJECT_ID> <INFO_TAG> <ATTEMPTS>

Example:

NDR_EMAIL='your-email' NDR_PASSWORD='your-password' ./run.sh TT__1993seis0002 ACQUISITION_REPORT 4

Equivalent npm command:

NDR_EMAIL='your-email' NDR_PASSWORD='your-password' npm run download:single -- TT__1993seis0002 ACQUISITION_REPORT 4

Batch jobs from JSON

NDR_EMAIL='your-email' NDR_PASSWORD='your-password' ./run.sh --batch ./download_jobs2.json 4

Or:

NDR_EMAIL='your-email' NDR_PASSWORD='your-password' npm run download:batch

npm run download:batch uses the batch file configured by NDR_BATCH_FILE or defaults to download_jobs.json if present.

Batch File Format

Supported shapes:

top-level object with jobs
top-level array

Example:

{
  "jobs": [
    {
      "NDR_PROJECT_ID": "TT__1993seis0002",
      "NDR_INFO": "ACQUISITION_REPORT"
    },
    {
      "NDR_PROJECT_ID": "PU__2009seis0001",
      "NDR_INFO": "ACQUISITION_REPORT"
    }
  ]
}

Per item:

NDR_PROJECT_ID
NDR_INFO

Useful Environment Variables

NDR_EMAIL: login email
NDR_PASSWORD: login password
NDR_HEADLESS: true or false
NDR_DOWNLOAD_TIMEOUT_MS: override download polling timeout for large files
NDR_BATCH_FILE: batch JSON path when using npm run download:batch
NDR_BATCH_MAX_ATTEMPTS: attempts per batch job
NDR_BROWSER_PROFILE_DIR: override the persistent browser profile path
NDR_CHROMIUM_PATH: override the browser executable path

Output

Screenshots and failure logs:
- /Users/tuna/Documents/ndr-auto-download/artifacts
Downloaded files:
- /Users/tuna/Documents/ndr-auto-download/downloaded
Persistent browser profile:
- /Users/tuna/Documents/ndr-auto-download/browser-profile
Checkpoint snapshots:
- /Users/tuna/Documents/ndr-auto-download/checkpoints

Download Detection Rules

The workflow treats these as temporary in-progress files:

.crdownload
.part
.tmp

A file is only considered complete when:

the temporary companion file is gone, and
the final file exists with size greater than zero

This avoids false positives such as:

file.sgy appearing while file.sgy.crdownload still exists

Troubleshooting

`You do not have enough browser storage space to download this SEGY file`

This warning is generated by the browser, not necessarily by the filesystem path where the final file is saved.

It usually means Chromium cannot allocate enough browser-managed temporary storage for the download staging process.

Fixes:

Free disk space on the local drive.
Remove old files from downloaded/.
Remove old screenshots/logs from artifacts/.
Reuse the persistent browser-profile/ directory (already enabled).
Ensure the profile directory lives on a disk with enough free space.

Login occasionally fails with auth errors

The Microsoft B2C login path sometimes returns transient server-side errors. The launcher already retries runs, so rerunning is usually sufficient.

A green download button is visible but nothing downloads yet

For large packages, the site may still be preparing the browser-side transfer. The script waits for actual partial or completed download files and monitors the in-page Downloading modal before deciding the run is stalled.

Git / Repository Hygiene

These paths are gitignored and should stay local-only:

node_modules/
artifacts/
artifacts_old/
downloaded/
browser-profile/
checkpoints/
local .env files
local logs and machine-specific noise files

Main Files

Launcher: /Users/tuna/Documents/ndr-auto-download/run.sh
Main workflow: /Users/tuna/Documents/ndr-auto-download/scripts/session-basket-download-segy.js
Batch runner: /Users/tuna/Documents/ndr-auto-download/scripts/run-download-batch.js
Example batch files:
- /Users/tuna/Documents/ndr-auto-download/download_jobs1.json
- /Users/tuna/Documents/ndr-auto-download/download_jobs2.json

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
scripts		scripts
.gitignore		.gitignore
README.md		README.md
download_jobs1.json		download_jobs1.json
download_jobs2.json		download_jobs2.json
package-lock.json		package-lock.json
package.json		package.json
project_tracker.md		project_tracker.md
run.sh		run.sh
survey_well_counts_sorted_desc.txt		survey_well_counts_sorted_desc.txt

Folders and files

Latest commit

History

Repository files navigation

NDR Auto Download

What The Automation Does

Runtime Folders

Requirements

Install Node.js

macOS (Homebrew)

Ubuntu / Debian

Verify

Install Project Dependencies

Install Playwright Browsers

Standard install (local machine)

Linux machine with package dependency install support

Existing local browser cache

Override browser executable explicitly

Headless vs Non-Headless

Headless (default)

Non-headless (visible browser)

Cluster / Server Notes

Recommended cluster setup

Important notes for clusters

Why the persistent browser profile matters

Credentials

Run The Workflow

Single job

Batch jobs from JSON

Batch File Format

Useful Environment Variables

Output

Download Detection Rules

Troubleshooting

You do not have enough browser storage space to download this SEGY file

Login occasionally fails with auth errors

A green download button is visible but nothing downloads yet

Git / Repository Hygiene

Main Files

About

Resources

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`You do not have enough browser storage space to download this SEGY file`

Packages