safekeeper: lift decoding and interpretation of WAL to the safekeeper #9746

VladLazar · 2024-11-13T14:30:57Z

Problem

For any given tenant shard, pageservers receive all of the tenant's WAL from the safekeeper.
This soft-blocks us from using larger shard counts due to bandwidth concerns and CPU overhead of filtering
out the records.

Summary of changes

This PR lifts the decoding and interpretation of WAL from the pageserver into the safekeeper.

A customised PG replication protocol is used where instead of sending raw WAL, the safekeeper sends
filtered, interpreted records. The receiver drives the protocol selection, so, on the pageserver side, usage
of the new protocol is gated by a new pageserver config: wal_receiver_protocol.

More granularly the changes are:

Optionally inject the protocol and shard identity into the arguments used for starting replication
On the safekeeper side, implement a new wal sending primitive which decodes and interprets records
before sending them over
On the pageserver side, implement the ingestion of this new replication message type. It's very similar
to what we already have for raw wal (minus decoding and interpreting).

Notes

This PR currently uses my branch of rust-postgres which includes the deserialization logic for the new replication message type. PR for that is open here.
This PR contains changes for both pageservers and safekeepers. It's safe to merge because the new protocol is disabled by default on the pageserver side. We can gradually start enabling it in subsequent releases.
CI tests are running on [WIP] pageserver: use interpreted wal proto by default #9747

Links

Related: #9336
Epic: #9329

github-actions · 2024-11-13T17:11:20Z

6941 tests run: 6633 passed, 0 failed, 308 skipped (full report)

Code coverage* (full report)

functions: 30.8% (7973 of 25845 functions)
lines: 48.7% (63310 of 130127 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
685df74 at 2024-11-25T17:12:43.488Z :recycle:}

erikgrinaker

I've done a high-level pass, flushing some comments. Looks good overall.

I need to do another pass to examine the details more closely, will do that when it's ready for final review.

libs/wal_decoder/src/models.rs

libs/pq_proto/src/lib.rs

libs/utils/src/postgres_client.rs

safekeeper/src/handler.rs

safekeeper/src/send_interpreted_wal.rs

libs/pq_proto/src/lib.rs

safekeeper/src/wal_reader_stream.rs

## Problem We want to serialize interpreted records to send them over the wire from safekeeper to pageserver. ## Summary of changes Make `InterpretedWalRecord` ser/de. This is a temporary change to get the bulk of the lift merged in #9746. For going to prod, we don't want to use bincode since we can't evolve the schema. Questions on serialization will be tackled separately.

erikgrinaker

LGTM, provided we replace the wire protocol encoding before production.

Let's also enable this protocol in a few tests, to get some rudimentary coverage.

safekeeper/src/wal_reader_stream.rs

safekeeper/src/send_interpreted_wal.rs

safekeeper/src/handler.rs

safekeeper/src/send_interpreted_wal.rs

safekeeper/src/wal_reader_stream.rs

safekeeper/src/send_interpreted_wal.rs

libs/pq_proto/src/lib.rs

arssher

Nice, LGTM! A few minor comments.

Also would be great to remove duplication between send_wal.rs and wal_reader_stream.rs (obviously ok separately).

safekeeper/src/wal_reader_stream.rs

…pret-wal Conflicts: pageserver/src/config.rs

…#9821) ## Problem #9746 lifted decoding and interpretation of WAL to the safekeeper. This reduced the ingested amount on the pageservers by around 10x for a tenant with 8 shards, but doubled the ingested amount for single sharded tenants. Also, #9746 uses bincode which doesn't support schema evolution. Technically the schema can be evolved, but it's very cumbersome. ## Summary of changes This patch set addresses both problems by adding protobuf support for the interpreted wal records and adding compression support. Compressed protobuf reduced the ingested amount by 100x on the 32 shards `test_sharded_ingest` case (compared to non-interpreted proto). For the 1 shard case the reduction is 5x. Sister change to `rust-postgres` is [here](neondatabase/rust-postgres#33). ## Links Related: #9336 Epic: #9329

VladLazar force-pushed the vlad/safekeeper-interpret-wal branch 3 times, most recently from 474c03a to 55f5e89 Compare November 13, 2024 16:01

VladLazar force-pushed the vlad/safekeeper-interpret-wal branch from 55f5e89 to a08f5f8 Compare November 14, 2024 10:51

VladLazar changed the title ~~safekeeper: lift decoding an interpretation of WAL to the safekeeper~~ safekeeper: lift decoding and interpretation of WAL to the safekeeper Nov 14, 2024

VladLazar force-pushed the vlad/safekeeper-interpret-wal branch from a08f5f8 to c4ae5f8 Compare November 14, 2024 13:58

VladLazar requested review from erikgrinaker and arssher November 14, 2024 14:09

VladLazar force-pushed the vlad/safekeeper-interpret-wal branch from c4ae5f8 to 25db978 Compare November 14, 2024 15:46

VladLazar mentioned this pull request Nov 14, 2024

backend: add interpreted record replication message type neondatabase/rust-postgres#32

Merged

VladLazar force-pushed the vlad/safekeeper-interpret-wal branch 5 times, most recently from 7a7031f to f51ff0a Compare November 15, 2024 17:00

erikgrinaker reviewed Nov 15, 2024

View reviewed changes

VladLazar mentioned this pull request Nov 15, 2024

wal_decoder: make InterpretedWalRecord serde #9775

Merged

VladLazar added 11 commits November 18, 2024 11:16

wal_decoder: add an is empty method for interpreted record

d5d2e26

util: allow for different protocol flavours in PG connections

d64c60d

safekeeper: parse new replication connection arguments

260127e

safekeeper: abstract WAL reading into a stream

057138f

libs/pq_proto: add interpreted wal records type

1f043f4

safekeeper: add an interpreted wal sender

3cef53c

safekeeper: dispatch wal sending based on protocol

4f3583a

pageserver: ingest interpreted WAL records

3b4ffaf

pageserver: gate new protocol with pageserver config

e503db2

tests: parametrize sharded ingest test on protocol

dceab62

tests: measure received WAL and records for sharded ingest

a19dd5a

VladLazar added 2 commits November 18, 2024 12:06

wip: update rust-postgres

c370189

review: pass buffer size as argument to wal reader

8c4deee

VladLazar force-pushed the vlad/safekeeper-interpret-wal branch from f51ff0a to 8c4deee Compare November 18, 2024 11:11

VladLazar marked this pull request as ready for review November 18, 2024 11:14

VladLazar requested a review from a team as a code owner November 18, 2024 11:14

VladLazar requested a review from erikgrinaker November 18, 2024 11:14

erikgrinaker approved these changes Nov 18, 2024

View reviewed changes

safekeeper/src/wal_reader_stream.rs Show resolved Hide resolved

safekeeper/src/send_interpreted_wal.rs Show resolved Hide resolved

VladLazar added 3 commits November 20, 2024 14:51

Merge branch 'main' into vlad/safekeeper-interpret-wal

a7c2d91

review: add comment on body struct twin

6a0fb94

review: enable interpreted protocol in a few tests

cc87a39

This was referenced Nov 20, 2024

utils: use ShardIdentity in `postgres_client.rs #9823

Open

safekeeper: use WalReaderStreamBuilder for sk peer-to-peer wal replication #9824

Open

safekeeper: use protobuf for sending compressed records to pageserver #9821

Merged

arssher reviewed Nov 22, 2024

View reviewed changes

arssher approved these changes Nov 22, 2024

View reviewed changes

arssher reviewed Nov 22, 2024

View reviewed changes

safekeeper/src/wal_reader_stream.rs Outdated Show resolved Hide resolved

VladLazar added 5 commits November 25, 2024 15:12

review: update obsolete comment in wal_reader_stream

270e008

review: use end of last sent chunk for logging

b661827

review: use more generic name for commit_lsn

e659f81

build: update rust-postgres deps

e57f392

Merge remote-tracking branch 'origin/main' into vlad/safekeeper-inter…

685df74

…pret-wal Conflicts: pageserver/src/config.rs

VladLazar added this pull request to the merge queue Nov 25, 2024

VladLazar removed this pull request from the merge queue due to a manual request Nov 25, 2024

VladLazar added this pull request to the merge queue Nov 25, 2024

Merged via the queue into main with commit 7a2f0ed Nov 25, 2024
80 checks passed

VladLazar deleted the vlad/safekeeper-interpret-wal branch November 25, 2024 17:32

erikgrinaker mentioned this pull request Dec 11, 2024

safekeeper: optimize WAL decoding #10097

Closed

lmcrean mentioned this pull request Aug 7, 2025

utils: use ShardIdentity in postgres_client.rs for improved type safety #12806

Open

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

safekeeper: lift decoding and interpretation of WAL to the safekeeper #9746

safekeeper: lift decoding and interpretation of WAL to the safekeeper #9746

Uh oh!

VladLazar commented Nov 13, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Nov 13, 2024 •

edited

Loading

Uh oh!

erikgrinaker left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

erikgrinaker left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arssher left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

safekeeper: lift decoding and interpretation of WAL to the safekeeper #9746

safekeeper: lift decoding and interpretation of WAL to the safekeeper #9746

Uh oh!

Conversation

VladLazar commented Nov 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Summary of changes

Notes

Links

Uh oh!

github-actions bot commented Nov 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

6941 tests run: 6633 passed, 0 failed, 308 skipped (full report)

Code coverage* (full report)

Uh oh!

erikgrinaker left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

erikgrinaker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arssher left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

VladLazar commented Nov 13, 2024 •

edited

Loading

github-actions bot commented Nov 13, 2024 •

edited

Loading

erikgrinaker left a comment •

edited

Loading