Skip to content

slimgroup/ndr-auto-download

Repository files navigation

NDR Auto Download

Playwright automation for the UK NDR portal. The workflow signs in, cleans stale packages from Your Downloads, runs Data Discovery filtering, creates a Session Basket package, and downloads the resulting files to a local folder.

What The Automation Does

  1. Sign in to https://ndr.nstauthority.co.uk/.
  2. Open Your Downloads before every run and expire any visible Expire Manually packages.
  3. Open Data Discovery.
  4. Filter the Project IDs Table by NDR_PROJECT_ID.
  5. Open View Files for the matching project.
  6. Filter the Files Table by NDR_INFO_TAG.
  7. Select files using the Files Table header checkbox (left of Classification Tags).
  8. Add selected files to Session Basket.
  9. Open Session Basket, select the package header checkbox, and click Create Download Package from Selection.
  10. Return to Your Downloads.
  11. Wait until one or more green download buttons are clickable.
  12. Close the top Information Message banner immediately before each download click.
  13. Click all visible green download buttons one by one.
  14. Detect partial download files (.crdownload, .part, .tmp) and wait until they resolve into final files (.zip, .sgy, .segy, or other extensions).
  15. Save screenshots and logs under artifacts/, then print a success line with the downloaded file path.

Runtime Folders

The project creates these folders automatically if they do not already exist:

  • /Users/tuna/Documents/ndr-auto-download/artifacts
  • /Users/tuna/Documents/ndr-auto-download/downloaded
  • /Users/tuna/Documents/ndr-auto-download/browser-profile
  • /Users/tuna/Documents/ndr-auto-download/checkpoints

You do not need to create them manually after cloning.

Requirements

  • Node.js 18+ (Node 20+ recommended)
  • npm
  • A valid UK NDR account
  • Enough local disk space for the files you plan to download, especially for large SEGY downloads

Install Node.js

macOS (Homebrew)

brew install node

Ubuntu / Debian

sudo apt-get update
sudo apt-get install -y nodejs npm

Verify

node -v
npm -v

Install Project Dependencies

From the project root:

cd /Users/tuna/Documents/ndr-auto-download
npm ci

If you are developing locally and intentionally need npm to refresh the lockfile, you can use:

npm install

This installs node_modules/ locally. node_modules/ is ignored by git.

Install Playwright Browsers

The workflow uses Chromium through Playwright.

Standard install (local machine)

npx playwright install chromium

Linux machine with package dependency install support

If your Linux machine allows package installation (local machine, VM, container, or cluster image you control), this is the most complete setup:

npx playwright install --with-deps chromium

Existing local browser cache

The script will use this local macOS Playwright Chromium path if it exists:

  • /Users/tuna/Library/Caches/ms-playwright/chromium-1208/chrome-mac-arm64/Google Chrome for Testing.app/Contents/MacOS/Google Chrome for Testing

If that path does not exist, the script falls back to Playwright's default Chromium installation.

Override browser executable explicitly

You can force a specific Chromium or Chrome binary with:

export NDR_CHROMIUM_PATH='/absolute/path/to/chrome-or-chromium'

This is useful on clusters, shared Linux machines, or any machine where the browser is installed in a non-default location.

Headless vs Non-Headless

Headless (default)

This is the normal mode for batch runs, remote machines, and clusters.

export NDR_HEADLESS='true'

If NDR_HEADLESS is unset, the script still runs headless by default.

Non-headless (visible browser)

Use this for local debugging when you want to watch the browser.

export NDR_HEADLESS='false'

Example visible run:

NDR_EMAIL='your-email' \
NDR_PASSWORD='your-password' \
NDR_HEADLESS='false' \
./run.sh SH__1984seis0004 FINAL_POST_STACK 4

Cluster / Server Notes

A cluster or terminal-only machine is fine as long as you run headless.

Recommended cluster setup

  1. Install Node.js and npm.
  2. Install project dependencies:
npm install
  1. Install Chromium for Playwright:
npx playwright install chromium
  1. Run in headless mode (default).

Important notes for clusters

  • No visible desktop UI is required when running headless.
  • The machine must have outbound network access to the NDR site.
  • The machine must have enough writable disk space for:
    • the persistent browser profile
    • temporary browser-managed storage
    • the final downloaded files
  • If the cluster blocks GUI libraries, use a browser binary already configured for that environment and point to it with NDR_CHROMIUM_PATH.
  • Non-headless runs on a cluster generally require a display server (or something like Xvfb). Unless you explicitly need visual debugging, keep cluster runs headless.

Why the persistent browser profile matters

The workflow now uses a persistent browser profile in browser-profile/ instead of a fresh temporary browser context. This is important because some large downloads (especially SEGY downloads) rely on browser-managed temporary storage before the final file is written. A persistent profile behaves more like a normal desktop browser and avoids a class of download failures seen with temporary contexts.

Credentials

Set these environment variables before running:

  • NDR_EMAIL
  • NDR_PASSWORD

Example:

export NDR_EMAIL='your-email'
export NDR_PASSWORD='your-password'

Run The Workflow

Single job

Default single-job run:

NDR_EMAIL='your-email' NDR_PASSWORD='your-password' ./run.sh

Explicit single-job run:

NDR_EMAIL='your-email' NDR_PASSWORD='your-password' ./run.sh <PROJECT_ID> <INFO_TAG> <ATTEMPTS>

Example:

NDR_EMAIL='your-email' NDR_PASSWORD='your-password' ./run.sh TT__1993seis0002 ACQUISITION_REPORT 4

Equivalent npm command:

NDR_EMAIL='your-email' NDR_PASSWORD='your-password' npm run download:single -- TT__1993seis0002 ACQUISITION_REPORT 4

Batch jobs from JSON

NDR_EMAIL='your-email' NDR_PASSWORD='your-password' ./run.sh --batch ./download_jobs2.json 4

Or:

NDR_EMAIL='your-email' NDR_PASSWORD='your-password' npm run download:batch

npm run download:batch uses the batch file configured by NDR_BATCH_FILE or defaults to download_jobs.json if present.

Batch File Format

Supported shapes:

  • top-level object with jobs
  • top-level array

Example:

{
  "jobs": [
    {
      "NDR_PROJECT_ID": "TT__1993seis0002",
      "NDR_INFO": "ACQUISITION_REPORT"
    },
    {
      "NDR_PROJECT_ID": "PU__2009seis0001",
      "NDR_INFO": "ACQUISITION_REPORT"
    }
  ]
}

Per item:

  • NDR_PROJECT_ID
  • NDR_INFO

Useful Environment Variables

  • NDR_EMAIL: login email
  • NDR_PASSWORD: login password
  • NDR_HEADLESS: true or false
  • NDR_DOWNLOAD_TIMEOUT_MS: override download polling timeout for large files
  • NDR_BATCH_FILE: batch JSON path when using npm run download:batch
  • NDR_BATCH_MAX_ATTEMPTS: attempts per batch job
  • NDR_BROWSER_PROFILE_DIR: override the persistent browser profile path
  • NDR_CHROMIUM_PATH: override the browser executable path

Output

  • Screenshots and failure logs:
    • /Users/tuna/Documents/ndr-auto-download/artifacts
  • Downloaded files:
    • /Users/tuna/Documents/ndr-auto-download/downloaded
  • Persistent browser profile:
    • /Users/tuna/Documents/ndr-auto-download/browser-profile
  • Checkpoint snapshots:
    • /Users/tuna/Documents/ndr-auto-download/checkpoints

Download Detection Rules

The workflow treats these as temporary in-progress files:

  • .crdownload
  • .part
  • .tmp

A file is only considered complete when:

  • the temporary companion file is gone, and
  • the final file exists with size greater than zero

This avoids false positives such as:

  • file.sgy appearing while file.sgy.crdownload still exists

Troubleshooting

You do not have enough browser storage space to download this SEGY file

This warning is generated by the browser, not necessarily by the filesystem path where the final file is saved.

It usually means Chromium cannot allocate enough browser-managed temporary storage for the download staging process.

Fixes:

  • Free disk space on the local drive.
  • Remove old files from downloaded/.
  • Remove old screenshots/logs from artifacts/.
  • Reuse the persistent browser-profile/ directory (already enabled).
  • Ensure the profile directory lives on a disk with enough free space.

Login occasionally fails with auth errors

The Microsoft B2C login path sometimes returns transient server-side errors. The launcher already retries runs, so rerunning is usually sufficient.

A green download button is visible but nothing downloads yet

For large packages, the site may still be preparing the browser-side transfer. The script waits for actual partial or completed download files and monitors the in-page Downloading modal before deciding the run is stalled.

Git / Repository Hygiene

These paths are gitignored and should stay local-only:

  • node_modules/
  • artifacts/
  • artifacts_old/
  • downloaded/
  • browser-profile/
  • checkpoints/
  • local .env files
  • local logs and machine-specific noise files

Main Files

  • Launcher: /Users/tuna/Documents/ndr-auto-download/run.sh
  • Main workflow: /Users/tuna/Documents/ndr-auto-download/scripts/session-basket-download-segy.js
  • Batch runner: /Users/tuna/Documents/ndr-auto-download/scripts/run-download-batch.js
  • Example batch files:
    • /Users/tuna/Documents/ndr-auto-download/download_jobs1.json
    • /Users/tuna/Documents/ndr-auto-download/download_jobs2.json

About

No description, website, or topics provided.

Resources

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors