Skip to content

Conversation

VladLazar
Copy link
Contributor

@VladLazar VladLazar commented Nov 13, 2024

Problem

For any given tenant shard, pageservers receive all of the tenant's WAL from the safekeeper.
This soft-blocks us from using larger shard counts due to bandwidth concerns and CPU overhead of filtering
out the records.

Summary of changes

This PR lifts the decoding and interpretation of WAL from the pageserver into the safekeeper.

A customised PG replication protocol is used where instead of sending raw WAL, the safekeeper sends
filtered, interpreted records. The receiver drives the protocol selection, so, on the pageserver side, usage
of the new protocol is gated by a new pageserver config: wal_receiver_protocol.

More granularly the changes are:

  1. Optionally inject the protocol and shard identity into the arguments used for starting replication
  2. On the safekeeper side, implement a new wal sending primitive which decodes and interprets records
    before sending them over
  3. On the pageserver side, implement the ingestion of this new replication message type. It's very similar
    to what we already have for raw wal (minus decoding and interpreting).

Notes

  • This PR currently uses my branch of rust-postgres which includes the deserialization logic for the new replication message type. PR for that is open here.
  • This PR contains changes for both pageservers and safekeepers. It's safe to merge because the new protocol is disabled by default on the pageserver side. We can gradually start enabling it in subsequent releases.
  • CI tests are running on [WIP] pageserver: use interpreted wal proto by default #9747

Links

Related: #9336
Epic: #9329

@VladLazar VladLazar force-pushed the vlad/safekeeper-interpret-wal branch 3 times, most recently from 474c03a to 55f5e89 Compare November 13, 2024 16:01
Copy link

github-actions bot commented Nov 13, 2024

6941 tests run: 6633 passed, 0 failed, 308 skipped (full report)


Code coverage* (full report)

  • functions: 30.8% (7973 of 25845 functions)
  • lines: 48.7% (63310 of 130127 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
685df74 at 2024-11-25T17:12:43.488Z :recycle:

@VladLazar VladLazar force-pushed the vlad/safekeeper-interpret-wal branch from 55f5e89 to a08f5f8 Compare November 14, 2024 10:51
@VladLazar VladLazar changed the title safekeeper: lift decoding an interpretation of WAL to the safekeeper safekeeper: lift decoding and interpretation of WAL to the safekeeper Nov 14, 2024
@VladLazar VladLazar force-pushed the vlad/safekeeper-interpret-wal branch from a08f5f8 to c4ae5f8 Compare November 14, 2024 13:58
@VladLazar VladLazar force-pushed the vlad/safekeeper-interpret-wal branch from c4ae5f8 to 25db978 Compare November 14, 2024 15:46
@VladLazar VladLazar force-pushed the vlad/safekeeper-interpret-wal branch 5 times, most recently from 7a7031f to f51ff0a Compare November 15, 2024 17:00
Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done a high-level pass, flushing some comments. Looks good overall.

I need to do another pass to examine the details more closely, will do that when it's ready for final review.

VladLazar added a commit that referenced this pull request Nov 15, 2024
## Problem

We want to serialize interpreted records to send them over the wire from
safekeeper to pageserver.

## Summary of changes

Make `InterpretedWalRecord` ser/de. This is a temporary change to get
the bulk of the lift merged in
#9746. For going to prod, we
don't want to use bincode since we can't evolve the schema.
Questions on serialization will be tackled separately.
@VladLazar VladLazar force-pushed the vlad/safekeeper-interpret-wal branch from f51ff0a to 8c4deee Compare November 18, 2024 11:11
@VladLazar VladLazar marked this pull request as ready for review November 18, 2024 11:14
@VladLazar VladLazar requested a review from a team as a code owner November 18, 2024 11:14
Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, provided we replace the wire protocol encoding before production.

Let's also enable this protocol in a few tests, to get some rudimentary coverage.

Copy link
Contributor

@arssher arssher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, LGTM! A few minor comments.

Also would be great to remove duplication between send_wal.rs and wal_reader_stream.rs (obviously ok separately).

@VladLazar VladLazar added this pull request to the merge queue Nov 25, 2024
@VladLazar VladLazar removed this pull request from the merge queue due to a manual request Nov 25, 2024
@VladLazar VladLazar added this pull request to the merge queue Nov 25, 2024
Merged via the queue into main with commit 7a2f0ed Nov 25, 2024
80 checks passed
@VladLazar VladLazar deleted the vlad/safekeeper-interpret-wal branch November 25, 2024 17:32
github-merge-queue bot pushed a commit that referenced this pull request Nov 27, 2024
…#9821)

## Problem

#9746 lifted decoding and
interpretation of WAL to the safekeeper.
This reduced the ingested amount on the pageservers by around 10x for a
tenant with 8 shards, but doubled
the ingested amount for single sharded tenants.

Also, #9746 uses bincode which
doesn't support schema evolution.
Technically the schema can be evolved, but it's very cumbersome.

## Summary of changes

This patch set addresses both problems by adding protobuf support for
the interpreted wal records and adding compression support. Compressed
protobuf reduced the ingested amount by 100x on the 32 shards
`test_sharded_ingest` case (compared to non-interpreted proto). For the
1 shard case the reduction is 5x.

Sister change to `rust-postgres` is
[here](neondatabase/rust-postgres#33).

## Links

Related: #9336
Epic: #9329
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants