Scalable S3↔S3 Transfers with rclone, Slurm, and pyxis
xfer is a command-line tool for orchestrating large-scale S3 data transfers on HPC and cloud clusters using:
- rclone (inside a container)
- Slurm job arrays
- enroot + pyxis
- manifest-based sharding for reliability and resumability
It is designed for datasets with:
- Millions of objects
- Highly variable object sizes
- Long-running transfers that need retries, logging, and restartability
- 📄 Stable JSONL manifest (
xfer.manifest.v1) - 🧩 Byte-balanced sharding to avoid long-tail array tasks
- 🔁 Automatic retries with Slurm requeue
- ⏭ Skip-if-done semantics per shard
- 📦 Containerized rclone (no host rclone dependency)
- ⚙️ Configurable topology (array size, concurrency, CPU/mem)
- 🧪 Safe to re-run: idempotent by design
- Slurm
- enroot + pyxis enabled (
srun --container-imageworks) - Network access from compute nodes to both S3 endpoints
- Python ≥ 3.10
uv- An rclone config file (
rclone.conf) with your S3 remotes
Clone the repository:
git clone https://github.com/fluidnumerics/xfer.git ~/xfer
cd ~/xferCreate and sync the virtual environment:
uv venv
uv syncRun the CLI (no install required):
uv run xfer --helpOptional: install editable for convenience
uv pip install -e .
xfer --helpYou must have an rclone config on the submit host, e.g.:
[s3src]
type = s3
provider = Other
endpoint = https://objects.source.example.com
access_key_id = ...
secret_access_key = ...
[s3dst]
type = s3
provider = Other
endpoint = https://objects.dest.example.com
access_key_id = ...
secret_access_key = ...This file is mounted read-only into the container at runtime.
This builds the manifest, shards it, renders Slurm scripts, and submits the job:
uv run xfer run \
--run-dir run_001 \
--source s3src:mybucket/dataset \
--dest s3dst:mybucket/dataset \
--num-shards 512 \
--array-concurrency 96 \
--rclone-image rclone/rclone:latest \
--rclone-config ~/.config/rclone/rclone.conf \
--rclone-flags "--transfers 48 --checkers 96 --fast-list --stats 30s" \
--partition transfer \
--cpus-per-task 4 \
--mem 8G \
--submitThis will submit a Slurm array job immediately.
Lists all objects using rclone lsjson (inside a container) and writes a stable JSONL manifest.
uv run xfer manifest build \
--source s3src:mybucket/dataset \
--dest s3dst:mybucket/dataset \
--out run/manifest.jsonl \
--rclone-image rclone/rclone:latest \
--rclone-config ~/.config/rclone/rclone.conf \
--extra-lsjson-flags "--fast-list"Output:
run/
manifest.jsonl
Splits the manifest into balanced shards (by total bytes):
uv run xfer manifest shard \
--in run/manifest.jsonl \
--outdir run/shards \
--num-shards 512Output:
run/
shards/
shard_000000.jsonl
shard_000001.jsonl
...
shards.meta.json
Creates:
worker.sh— executed by each array tasksbatch_array.sh— Slurm submission scriptsubmit.sh— convenience wrapperconfig.resolved.json— frozen run configuration
uv run xfer slurm render \
--run-dir run \
--num-shards 512 \
--array-concurrency 96 \
--job-name s3-xfer \
--time-limit 24:00:00 \
--partition transfer \
--cpus-per-task 4 \
--mem 8G \
--rclone-image rclone/rclone:latest \
--rclone-config ~/.config/rclone/rclone.confuv run xfer slurm submit --run-dir runsqueue -j <jobid>
sacct -j <jobid> --format=JobID,State,ElapsedIf you're seeing connection refused errors or a high number of retries for copying, you can back off on the number of simultaneous array tasks by lowering the ArrayTaskThrottle
scontrol update ArrayTaskThrottle=<new array concurrency> JobId=<jobid>run/
logs/
shard_12_attempt_1.log
shard_12_attempt_2.log
run/
state/
shard_12.done
shard_47.fail
.done→ shard completed successfully.fail→ last attempt failed.attempt→ retry counter
- Failed shards are automatically requeued up to
MAX_ATTEMPTS - Completed shards are skipped on re-run
- You can safely re-submit the same array job
To re-submit manually:
sbatch run/sbatch_array.sh--transfers 32
--checkers 64
--fast-list
--retries 10
--low-level-retries 20
--stats 30s
--transfers 16
--checkers 128
--fast-list
--progress --stats 600s
run/
manifest.jsonl
shards/
shard_000123.jsonl
shards.meta.json
logs/
state/
worker.sh
sbatch_array.sh
submit.sh
config.resolved.json
- Manifest is immutable → enables reproducibility and auditing
- Shards are deterministic → re-runs don’t reshuffle work
- rclone handles object-level idempotency
- Slurm handles node-level failures
- xfer handles orchestration only
- To enable pre-commit
blackformatting, runuv run pre-commit install- If necessary, you can format locally with
uv run black .
- If necessary, you can format locally with
- Name branches as either:
<your name>/<branch name>(e.g.,alice/update-readme)<type of contribution>/<branch name>(e.g.,feature/claude-integration) (these are usuallyfeature,patch, ordocs)
- Do NOT squash PRs into a single commit
