Skip to content

Add multithreaded S3 object mover script#295

Open
JordanLaserGit wants to merge 2 commits intomainfrom
s3mv
Open

Add multithreaded S3 object mover script#295
JordanLaserGit wants to merge 2 commits intomainfrom
s3mv

Conversation

@JordanLaserGit
Copy link
Collaborator

This script moves objects between S3 prefixes with multithreading and pattern filtering.

WIP. add example in docs.

Additions

Removals

Changes

Testing

Screenshots

Notes

Todos

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Visually tested in supported browsers and devices (see checklist below 👇)
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Testing checklist

Target Environment support

  • Windows
  • Linux
  • Browser

Accessibility

  • Keyboard friendly
  • Screen reader friendly

Other

  • Is useable without CSS
  • Is useable without JS
  • Flexible from small to large screens
  • No linting errors or warnings
  • JavaScript tests are passing

This script moves objects between S3 prefixes with multithreading and pattern filtering.
@JordanLaserGit JordanLaserGit self-assigned this Feb 2, 2026
@JordanLaserGit JordanLaserGit added P3 Priority level. P0: Critical, P1: High, P2: Medium, P3: Low NRDS AWS Output Data Related to NRDS NextGen outputs in AWS labels Feb 2, 2026
Added functionality to track sample moves that match a specified pattern. Enhanced argument validation and updated the display of sample moves in the dry-run preview.
Copy link
Member

@JoshCu JoshCu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how much of a difference this makes here because you've got external multi threading but boto3 has a few options that make it a bit quicker like https://github.com/CIROH-UA/NGIAB_data_preprocess/blob/main/modules%2Fdata_sources%2Fsource_validation.py#L102 https://github.com/CIROH-UA/NGIAB_data_preprocess/blob/main/modules%2Fdata_sources%2Fsource_validation.py#L66

There's another one called something like max_queue that I've got set to 10,000 in my .aws/config which makes aws s3 sync run a lot smoother but I've not tried it in python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

NRDS AWS Output Data Related to NRDS NextGen outputs in AWS P3 Priority level. P0: Critical, P1: High, P2: Medium, P3: Low

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

3 participants