[Proposal] Bucket data on object storage #376

rkistner · 2025-10-15T13:32:20Z

rkistner
Oct 15, 2025
Maintainer

Background

PowerSync currently syncs all data directly via either an HTTP stream, or a WebSocket connection.

This proposal specifies an optional addition in the protocol, to allow syncing bulk data "out-of-band", by including links to the data in the stream, instead of the data itself.

The main goals here are to improve server-side efficiency, to improve sync throughput, especially for initial sync.

Status

2025-10-15: Initial ideas in place; need to dig into details on back-pressure and bucket priorities.
2025-10-16: Changed from one large BSON document to individual concatenated BSON documents. Clarify some details on bucket priorities

Proposal - protocol changes

The current protocol generally sends data in three steps:

checkpoint or checkpoint_diff: Sends a list of (changed) buckets, priorities and checksums.
One or more data lines: Sends the data inline.
A checkpoint_complete (or partial_checkpoint_complete) line to indicate that all data for the checkpoint has been sent.

The data line is in this format, (in JSON or BSON):

interface SyncBucketData {
  bucket: string;
  data: OplogEntry[];
  has_more: boolean;
  after: ProtocolOpId;
  next_after: ProtocolOpId;
}

This replaces the data line with data_references:

interface SyncBucketDataReferences {
  references: {
    bucket: string;
    after: ProtocolOpId;
    last: ProtocolOpId; 
    url: string;
  }[];
}

Each message may contain one or more references (we explicitly support multiple in a single message, to reduce the total number of messages when syncing large buckets). We remove has_more - this is generally unused by the client. We do keep after, and last replaces next_after - the client uses that to filter the referenced data.

Here, the URL is a string URL, either a path on the current endpoint, or an absolute URL. If the string is a path starting with /, it is joined with the endpoint configured on the client. Otherwise, it is interpreted as an absolute URL.

In both cases, the client is expected to make a GET request to the URL within 5 minutes. If the client makes a request after that period, the server may respond with a 401 or 403 response, in which case the client must re-establish the streaming connection to get a new URL.

When making the request, the client must not include any authorization headers. Authorization is performed by including a token in the URL path. The client must include the following headers:

User-Agent.
Accept-Encoding, to indicate support for gzip or zstd compression.
Accept: application/bson

The server may then respond with:

200 OK with the data.
307 Temporary Redirect to a different URL.

Each response in the chain must have the same CORS headers as the service itself.

The server may respond with zstd or gzip-compressed data, if supported by the client.

The response is a stream (concatenated bytes) of BSON-encoded documents. The first document is a "header":

interface ChunkHeader {
  bucket: string;
  default_type: string; // allows omitting the type from individual OplogEntries.
  // we may add more optional fields here later
}

Every document after the first is an OplogEntry document. These are always sorted by op_id. The data here may include data outside the range indicated by the streamed SyncBucketData - the client must ignore any data entry with op_id <= after, or op_id > last. Since the data is sorted, the client may be able to skip some of the parsing for entries outside the range (although the data still has to be downloaded). The client may completely stop downloading and/or parsing when it reached the entry matching last. If there is no entry matching last, this is considered an error, and the client should retry the download.

Indicating support

Older clients may not support this new response format. Clients must indicate support by including supports_data_url: true in the request. If that is not included in the request, the server must respond with the data inline.

Optionally, the server may disable support for clients that could negatively affect performance:

Disable support for clients not supporting the new protocol.
Disable support for clients that do not support zstd or gzip compression.

Service implementation

The service will be modified to store bucket data on object storage, such as AWS S3. The bucket_data collection/table (or alternatively a new one) will still be used to store an "index" of this data. Each entry would include:

bucket name
first op_id
last op_id
file size (uncompressed)
operation count
combined checksum
storage path

The service may choose to store the data inline, instead of as a path, especially in cases where the data is small. A reasonable size threshold should be used for this, e.g. around 10-100KB.

The service may store batches of data directly in object storage when replicating, and/or move data to object storage when compacting. The compact process may also merge multiple object storage files into one.

When the client requests a file, the service will respond with a 307 response directly to S3 or a CDN. (CDN integration TBD).

For self-hosting purposes, the service will support any S3-compatible object storage (exact requirements TBD). To simplify the setup, the service may support proxying all requests, rather than redirecting. The service may also support using a local filesystem as object storage for simplicity in development.

Proposed initial service compression strategy:

Store each file compressed with gzip on S3.
If the client supports gzip, respond with a "direct" URL (S3 -> CDN).
If the client does not support gzip, pass through the service to decompress (S3 -> service -> CDN) and optionally re-compress with zstd. This can still be cached by the CDN.
Use different URLs on the CDN based on the compression format - don't use the Accept-Encoding header there.

There are some options to do the compression outside the main service itself. Some examples include Cloudflare's workers, Lambda@Edge, or even just a dedicated service process for this. Those are not important for the initial implementation, but could be options later.

Client implementation

This will only be supported when using the Rust implementation.

The client will send the sync line to the Rust client as always. If the Rust client detects a reference, it will send a message indicating the client should download the URL.

The client should continue sending more sync lines to the Rust client while downloading the URL.

When the client has finished a download, it must send the data to the Rust client using a new message type.

If the request fails, the client should send the response code to the Rust client, which may update the status and/or ask the client to re-download.

Once the Rust client has received all the required data for a checkpoint, it will respond in the usual way.

TBD: How do we manage back-pressure here? If we reach a certain threshold of in-progress requests, we may want to pause streaming. And for example, if a client receives a new checkpoint while still downloading data for the last one, does it:

Wait for downloads to complete before starting with the new checkpoint?
Skip the previous checkpoint, and directly continue on to the next one?
"Buffer" the new checkpoint, potentially downloading its data concurrently with the current one?

Bucket priorities and interruptions

In the current protocol and service implementation, bulk data in low-priority buckets may be interrupted with a new checkpoint for high-priority buckets. When using data references, there may not be anything to interrupt in the service, since the service only sends references which are sent pretty-much instantly.

This means the checkpoint interruption logic will shift to the client: The client could receive one checkpoint containing bulk data in low-priority buckets, as well as data in high-priority buckets. Or alternatively, the high-priority buckets could be in a following checkpoint, while the client is still downloading data for the previous checkpoint.

Instead of relying on partial_checkpoint_complete messages from the service, the client may implement its own prioritized downloads based on the received bucket priorities.

Other considerations

Should we support resumeable downloads of large files, using Range headers? Or rather soft-limit files to smaller sizes, e.g. 10MB?
Should we support downloading files without compression? -> Yes
Should we allow the service limiting compression to say gzip-only? (i.e. not support clients that only support zstd)
Should we change the BSON format to use Long for op_id, instead of a String?
Should we support JSON for lightweight clients? -> Not for initial version, but may add support in the future.

This does not include data-level compression yet (see #330). Adding data-level compression would require another protocol change, but would make the project too big if we include it here.

Performance advantages

There are a couple of advantages on the service side:

Bulk data can be stored in object storage rather than MongoDB or Postgres, which is typically much simpler and cheaper for large amounts of data.
Bulk data can be stored in a compressed format. Batches can be large enough to be compressed efficiently (compared to compressing individual operations).
Bulk data can be served completely outside the service itself, or using a different dedicated service process for this. This can significantly reduce computational and memory overhead in the service.
1. If using a dedicated service to serve these files, it can stream the data, rather than loading large batches of data at a time, once again reducing memory overhead.
2. There is no encoding to JSON or BSON required to serve these files - it can be served as-is as raw binary data. There may be some on-demand compression or decompression, but that can be fairly efficient.
The responses for this data can be cached on a CDN, significantly improving efficiency if many clients sync the same buckets.
Clients using WebSocket can better manage back-pressure, since the WebSocket messages themselves will stay small.
Initial or bulk sync throughput can increase by using a CDN close to the user.

Notes

We specifically do not use the original authentication token in the file download request, since authorization could be computationally expensive (requires evaluating Sync Rules to determine whether the bucket may be synced). By including a synced token in the query parameters, only the original streaming request needs to perform that check.

simolus3 · 2025-10-15T14:31:25Z

simolus3
Oct 15, 2025
Maintainer

Should we support resumable downloads of large files, using Range headers? Or rather soft-limit files to smaller sizes, e.g. 10MB?

One option could be to change the format in buckets to be a concatenation of BSON objects for OplogEntry[]. We don't really need to store the name in there anyway because it's also referenced in the main protocol. The service could potentially serve an index of offsets:

interface SyncBucketDataReference {
  bucket: string;
  url: string;
  // If set, points to the start index (in bytes) of a BSON object representing an oplog
  // entry with an id <= after
  start_offset: usize | undefined;
  // If set, points to the end index (exclusive, in bytes) of a BSON object representing an oplog
  // entry with an id >= next_after
  end_offset: usize | undefined;
  after: ProtocolOpId;
  next_after: ProtocolOpId;
}

Then, if start_offset and end_offset are set, the client can issue a range request as an optimization without changing its response handling logic.

That might also make us support appending to existing data in directory buckets.

TBD: How do we manage back-pressure here?
When the client has finished a download, it must send the data to the Rust client using a new message type.

A decent way to handle backpressure could be to have clients push chunks of response data into the Rust client directly. The client could buffer data until the end of each BSON object, then deserialize and write into the database before returning. Because this process is asynchronous, awaiting that in the client would throttle the stream.

And for example, if a client receives a new checkpoint while still downloading data for the last one, does it:

One benefit of option 2 (skipping the old checkpoint) is that this is closer to the current behavior, and we rely on this for stream priorities (since new data on a higher-priority bucket while we're syncing a lower-priority bucket would interrupt the process).

However, one thing that's currently used in the protocol is the fact that the service always knows which oplog entries have already been synced (their ids are part of the original request, and the fact that the protocol is serialized allows the service to track sent lines). This is no longer possible, so we can't reliably determine whether a new checkpoint should interrupt an old one. Maybe this logic needs to be moved to the client.

I wonder if we really need something like a data line at all then. It seems to me like SyncBucketDataReference could be an array in the initial checkpoint line. Then clients are expected to always download data (and for data that isn't hosted on object storage, the service returns a relative URL that streams the same response an object storage provider would return by reading from internal bucket storage).
We don't really need checkpoint_complete and checkpoint_partial_complete either, each bucket references its data range up-front and clients know when to apply checkpoints after downloading data. When a new checkpoint_diff comes in while clients are still busy downloading, they could check whether the changed buckets include ones with a higher priority and base their behavior on that.

Should we support JSON for light-weight clients?

IMO, no. Realistically all clients need the core extension anyway (I think the test client for the service is the only exception, but when we migrate that to support object storage downloads, it's probably easier to adopt the Rust client for everything instead of implementing that logic twice). So since the core extension can deal with BSON, adding JSON support feels like a complication.

1 reply

rkistner Oct 16, 2025
Maintainer Author

Then, if start_offset and end_offset are set, the client can issue a range request as an optimization without changing its response handling logic.

The issue is this starts breaking as soon as we use compression, since:

The range request needs to refer to a position in the compressed file.
The client cannot start decompression at an arbitrary point.

Technically it is still possible to implement some support like that by:

Starting new compression frames at certain points, say once every 1000 operations.
Store those compression offsets separately for each compression format we support

There are also some options like storing a mapping of source offsets -> compressed offsets in a header in the file itself, such as explained here: https://github.com/facebook/zstd/blob/dev/contrib/seekable_format/zstd_seekable_compression_format.md. However, that would add even more complexity.

Overall, I expect the usage of offsets to be fairly rare - it would only be relevant if a client has not synced in a while, and the last synced position on the client gets compacted into the middle of a file. In that case, I'd say it's fine for the client to just download some extra data. Unless we compact into massive files, that shouldn't be a big overhead.

That might also make us support appending to existing data in directory buckets.

That is a good argument for just using a concatenation of BSON objects instead of an array - it might be better to go that route.

A decent way to handle backpressure could be to have clients push chunks of response data into the Rust client directly. The client could buffer data until the end of each BSON object, then deserialize and write into the database before returning. Because this process is asynchronous, awaiting that in the client would throttle the stream.

That is another advantage of concatenated BSON objects - the response can be processed in chunks as it is being downloaded, without waiting for the entire file.

However, in terms of back-pressure I was more thinking in terms of back-pressure on the stream itself. The simple approach is doing everything "linearly" - finish one download before moving on to the next one, but we could have some throughput gains if we implement concurrent downloads.

However, one thing that's currently used in the protocol is the fact that the service always knows which oplog entries have already been synced (their ids are part of the original request, and the fact that the protocol is serialized allows the service to track sent lines). This is no longer possible, so we can't reliably determine whether a new checkpoint should interrupt an old one. Maybe this logic needs to be moved to the client.

In terms of consistency, the service doesn't need to know whether the client has actually been downloaded or not - it can just continue sending checkpoints and the associated data references. The client will have all the info it needs to get the checkpoint. The client may choose to skip checkpoints if it has a backlog of data to download, and the service shouldn't care.

It may get more complicated with actual bucket priorities. Typical back-pressure wouldn't be enough to completely "pause" the stream while downloading large files, and as you said the service doesn't know where the client's progress lies with those files. As an extreme example, the service may send a reference to a 1GB file for a low-priority bucket.

So that means to "interrupt" bulk data downloads in large low-priority buckets, that logic would have to be on the client. I believe we do include the bucket priorities on the checkpoints, so the client should have the info it needs to be able to do that?

I wonder if we really need something like a data line at all then. It seems to me like SyncBucketDataReference could be an array in the initial checkpoint line. Then clients are expected to always download data (and for data that isn't hosted on object storage, the service returns a relative URL that streams the same response an object storage provider would return by reading from internal bucket storage).

That is an interesting idea - it could potentially simplify the protocol quite a bit. I do wonder if there are still some cases where splitting it out has advantages. For example, if you have a checkpoint with 10k individual buckets updated, each with multiple files of data (e.g. hasn't been compacted yet), then the checkpoint message could become quite large.

Another option is to create a new message type as an alternative to the data line, e.g. data_references, that contains an array. There could still be multiple of these in one checkpoint to support a form of "streaming" of the response, but we also don't need a separate websocket message for every bucket/reference.

Should we support JSON for light-weight clients?
IMO, no. Realistically all clients need the core extension anyway (I think the test client for the service is the only exception, but when we migrate that to support object storage downloads, it's probably easier to adopt the Rust client for everything instead of implementing that logic twice). So since the core extension can deal with BSON, adding JSON support feels like a complication.

The specific use case I have in mind is something like a very light-weight web client: Think something like a 50-100kb maximum library size; no SQLite or offline support, but still allows subscribing to sync streams, getting the data for each stream in-memory.

That said, such a client could just continue using the current protocol that does support JSON, or we could add JSON support here at a later point if we need it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Proposal] Bucket data on object storage #376

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[Proposal] Bucket data on object storage #376

Uh oh!

Uh oh!

rkistner Oct 15, 2025 Maintainer

Background

Status

Proposal - protocol changes

Indicating support

Service implementation

Client implementation

Bucket priorities and interruptions

Other considerations

Performance advantages

Notes

Replies: 1 comment · 1 reply

Uh oh!

simolus3 Oct 15, 2025 Maintainer

Uh oh!

rkistner Oct 16, 2025 Maintainer Author

rkistner
Oct 15, 2025
Maintainer

Replies: 1 comment 1 reply

simolus3
Oct 15, 2025
Maintainer

rkistner Oct 16, 2025
Maintainer Author