Fix postgres op_id sequence intialization #260

rkistner · 2025-05-08T11:15:51Z

This is a rare edge case, but can be consistently triggered in some development setups.

Requirements to reproduce:

Postgres storage.
Sync that ignores specific rows (i.e. the data is replicated to current_data, but not in bucket_data / synced to clients). This can be done by adding a data query with where false in the sync rules.
Start with one or more records for the above table, and no other records.
Sync one client.
Now add a row that does get synced.
The client will now go into a repeated "Checksum failed for ..." loop.

The issue is in how op_id_sequence is initialized in postgres storage. We use select last_value from op_id_sequence to get the op_id for a checkpoint, and select nextval('op_id_sequence') for each op_id in bucket_data. last_value will generally return the largest op_id value in bucket_data, and nextval will always generate a larger op_id. However, if we haven't generated any op_id yet, and generate the first checkpoint, then we get the case that last_value is 1 (used for the first checkpoint). Then for the next checkpoint, nextval gives 1, giving another checkpoint with the same op_id. Together with caching of checksums, this results in a op_id conflict that gives the "Checksum failed" loop.

To see what's going on, run curl http://localhost:8080/sync/stream -X POST -H "Authorization: Bearer <token>. You should see no data initially:

{"checkpoint":{"last_op_id":"1","buckets":[{"bucket":"global_data[]","checksum":0,"count":0,"priority":3}]}}
{"checkpoint_complete":{"last_op_id":"1"}}

Then add a record to sync. The above stream would not get a new checkpoint, while re-running will give this:

{"checkpoint":{"last_op_id":"1","buckets":[{"bucket":"global_data[]","checksum":0,"count":0,"priority":3}]}}
{"data":{"bucket":"global_data[]","after":"0","has_more":false,"data":[{"op_id":"1","op":"PUT","object_type":"test","object_id":"abc","checksum":287648289,"subkey":"...","data":{"id":"abc"}}],"next_after":"1"}}
{"checkpoint_complete":{"last_op_id":"1"}}

Note above:

last_op_id is still 1.
The checkpoint for the bucket still contains "checksum: 0" and "count: 0".
There is however a data line for that bucket.

This causes the checksums to fail on the client.

The issue specifically happens because the count and checksum for op_id 1 is cached and never refreshed. Even when more data is added, the op_id is incrementally updated, and still remains incorrect. The service recovers when it is restarted, clearing the cache.

However, the cache is not the core issue here - op_id 1 being re-used for two checkpoints is the issue.

The fix

One simple fix is to just call select nextval('op_id_sequence') once as part of the migrations. This initializes the sequence, avoiding this issue completely. However, this also changes the op_id sequence for all our test fixtures. Rather than updating them all, and accounting for differences between mongodb and postgres storage, this fix just checks is_called on the sequence, and defaults to 0 instead of 1 for that case.

changeset-bot · 2025-05-08T11:15:54Z

🦋 Changeset detected

Latest commit: 90b55b8

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 9 packages

Name	Type
@powersync/service-module-postgres-storage	Patch
@powersync/service-core-tests	Patch
@powersync/service-image	Patch
@powersync/service-module-mongodb	Patch
@powersync/service-module-mysql	Patch
@powersync/service-module-postgres	Patch
@powersync/service-module-mongodb-storage	Patch
@powersync/service-core	Patch
test-client	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copilot

Pull Request Overview

This PR fixes an issue in Postgres storage where the op_id_sequence may be incorrectly initialized, leading to op_id conflicts during checkpoint creation. Key changes include:

Adding a new test to cover the op_id initialization edge case.
Refactoring PostgresBucketBatch to use a centralized getLastOpIdSequence() method.
Updating sync rules storage to check for a non-null last_flushed_op.
A changeset documenting the patch for Postgres storage.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
packages/service-core-tests/src/tests/register-data-storage-tests.ts	Added a test case for op_id initialization edge case.
modules/module-postgres-storage/src/storage/batch/PostgresBucketBatch.ts	Refactored to use a shared function for reading the op_id_sequence.
modules/module-postgres-storage/src/storage/PostgresSyncRulesStorage.ts	Updated condition to explicitly check for non-null last_flushed_op.
.changeset/sixty-melons-bow.md	Added changeset for patch, with a spelling issue in the commit title.

.changeset/sixty-melons-bow.md

Co-authored-by: Copilot <[email protected]>

stevensJourney

Thanks for this fix!

rkistner added 4 commits May 8, 2025 12:42

Add failing test.

8383d63

Add migration to fix the issue.

fe1bc2b

Add changeset.

3661908

Revert migration; check is_called instead.

4105912

Merge branch 'main' into fix-postgres-op-id-init

dac0825

rkistner requested a review from Copilot May 8, 2025 11:21

Copilot AI reviewed May 8, 2025

View reviewed changes

.changeset/sixty-melons-bow.md Outdated Show resolved Hide resolved

Update .changeset/sixty-melons-bow.md

90b55b8

Co-authored-by: Copilot <[email protected]>

rkistner requested a review from stevensJourney May 8, 2025 11:30

rkistner marked this pull request as ready for review May 8, 2025 11:30

stevensJourney approved these changes May 8, 2025

View reviewed changes

rkistner merged commit 05b9593 into main May 8, 2025
21 checks passed

rkistner deleted the fix-postgres-op-id-init branch May 8, 2025 13:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix postgres op_id sequence intialization #260

Fix postgres op_id sequence intialization #260

Uh oh!

rkistner commented May 8, 2025

Uh oh!

changeset-bot bot commented May 8, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

stevensJourney left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Fix postgres op_id sequence intialization #260

Fix postgres op_id sequence intialization #260

Uh oh!

Conversation

rkistner commented May 8, 2025

The fix

Uh oh!

changeset-bot bot commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

stevensJourney left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

changeset-bot bot commented May 8, 2025 •

edited

Loading