Skip to content

Conversation

krzysztof-palka-monogo
Copy link

@krzysztof-palka-monogo krzysztof-palka-monogo commented Oct 3, 2025

Batch upload R2 cache using rclone

This PR proposes adding support for optional batch uploading R2 cache using rclone.
Based on the solution discussed in Issue #866

Details

Automatic selection between batch vs standard upload

The CLI now detects when all of R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, and R2_ACCOUNT_ID are set in the environment and switches to batch upload mode. Otherwise, it falls back to the standard (Wrangler-based) upload method.

Graceful fallback on errors

If batch upload fails (e.g. due to rclone errors), the flow will log a warning and transparently revert to the default upload approach.

Staging and parallel transfers

Assets are copied into a temporary staging directory, then uploaded using rclone copy with concurrency settings (--transfers=32, --checkers=16) for performance.

Related issues

Copy link

changeset-bot bot commented Oct 3, 2025

🦋 Changeset detected

Latest commit: 32dc294

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@opennextjs/cloudflare Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vicb
Copy link
Contributor

vicb commented Oct 3, 2025

@krzysztof-palka-monogo thank you so much for this PR, it will please many of our users!

I only took a quick look and I'll do a full review on Monday.

I have some preliminary comments/thoughts:

  • I think we should rename --rcloneBatch. First we do not want to expose the implementation (rclone) as it might change - we can keep mention to rclone and in the doc. Ideally we can re-use cacheChunkSize
  • rclone.js was last released 3y ago - Ideally we could find a more recent and maintained implementation (but we should not block the PR on that, it could be a follow-up or we can wait until we actually have an issue)
  • We should info log when users are not using batch upload and have i.e. > 10 assets. Another option could be to batch upload by default and fallback to slow/safe on error (again happy if this addressed as a follow up)

@krzysztof-palka-monogo
Copy link
Author

Good morning @vicb

Here’s the justification for keeping an explicit --rcloneBatch (or similarly named) flag and retaining the existing upload mechanism as the default:

  1. Requires external configuration
    Rclone is not a drop-in replacement — it depends on a separate config file (the rclone config, credentials, remotes, etc). For users who have never used rclone, they must explicitly create and supply that config. That means it is not “just work” out of the box.

  2. Avoids a breaking change
    If we made rclone-based behavior the default, we risk breaking setups for users who do not have configured rclone. Keeping the current behavior as default ensures backward compatibility.

  3. Explicit intent
    Having a named flag (e.g. --rcloneBatch) clearly signals the use of rclone and keeps any implementation coupling explicit (rather than hidden). It also gives future flexibility: if one day we swap in a different batch uploader (or abstraction), the explicit flag boundary helps isolate that change.

@vicb
Copy link
Contributor

vicb commented Oct 6, 2025

Here’s the justification for keeping an explicit --rcloneBatch (or similarly named) flag and retaining the existing upload mechanism as the default:

We definitely don't want "rclone" in the flag. We could use another mechanism, i.e. remote bindings.

  1. Requires external configuration
    Rclone is not a drop-in replacement — it depends on a separate config file (the rclone config, credentials, remotes, etc). For users who have never used rclone, they must explicitly create and supply that config. That means it is not “just work” out of the box.

IMO we should drop the need for an external config in favor of using env vars - the CLI could dynamically create a temp config file for rclone.js as it doesn't seem to support env vars.

  1. Avoids a breaking change
    If we made rclone-based behavior the default, we risk breaking setups for users who do not have configured rclone. Keeping the current behavior as default ensures backward compatibility.

We can use rclone when configured and fallback to the current mechanism - but log that there is faster way and link some docs.

  1. Explicit intent
    Having a named flag (e.g. --rcloneBatch) clearly signals the use of rclone and keeps any implementation coupling explicit (rather than hidden). It also gives future flexibility: if one day we swap in a different batch uploader (or abstraction), the explicit flag boundary helps isolate that change.

I disagree here.
Users want something that works (fast), they should not have to care about how it is implemented.

@krzysztof-palka-monogo
Copy link
Author

krzysztof-palka-monogo commented Oct 6, 2025

Hi @vicb

I've updated the PR to align with your suggestions:

  • Removed --rcloneBatch flag and all rclone references from user-facing API
  • Implemented environment variable configuration (R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, R2_ACCOUNT_ID)
  • Batch upload now activates automatically when credentials are detected
  • Fallback to wrangler uploads when credentials aren't provided
  • Temp config files are created dynamically with secure permissions and
    auto-cleanup

Documentation now focuses on "batch upload" as an optional performance feature, not implementation details.

Please let me know if thats will meet the requirements and if so could we run "Publish prereleases" action to properly test the changes in some real applications 😄

Copy link

pkg-pr-new bot commented Oct 6, 2025

Open in StackBlitz

npm i https://pkg.pr.new/@opennextjs/cloudflare@925

commit: 0d53eac

@krzysztof-palka-monogo krzysztof-palka-monogo changed the title feat(cloudflare): add --rcloneBatch flag for faster R2 cache uploads using rclone feat(cloudflare): add optional R2 batch uploads via rclone for cache population Oct 6, 2025
@krzysztof-palka-monogo
Copy link
Author

Hi @vicb

I will be glad for code review.

function createTempRcloneConfig(): string | null {
const accessKey = process.env.R2_ACCESS_KEY_ID;
const secretKey = process.env.R2_SECRET_ACCESS_KEY;
const accountId = process.env.R2_ACCOUNT_ID;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid double access, maybe there could be a retrieveR2CredentialsFromEnv returning {R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, CLOUDFLARE_ACCOUNT_ID} (I don't think the account as anything to do with R2)

Copy link
Contributor

@vicb vicb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note sure I'll have time for a full review today, sorry.

A couple comments:

  • R2_ACCOUNT_ID should rather be CLOUDFLARE_ACCOUNT_ID as it is not specific to R2

  • Thanks for implementing the env vars based solution for CI. I'm wondering if we can improve the story for local dev. It would be nice to get the vars from .env / .dev.vars files but it would mean adding vars that are only used in local dev and that might confuse users. Maybe we could look for a local <pjt>/rclone.conf and fallback to env vars when it's not here. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants