Is your feature request related to a problem? Please describe.
pw.io.postgres.read in "streaming" mode currently always creates a temporary replication slot internally. The slot is automatically dropped when the connection is closed. As stated in the documentation, this is an intentional guardrail against WAL accumulation. However, this design makes it impossible to use Pathway's persistence mechanism with the Postgres connector: since the slot is gone after restart, the connector cannot resume from where it left off and must perform a full re-snapshot on every startup.
Describe the solution you'd like
Add support for an externally managed (pre-created) replication slot in pw.io.postgres.read. When the user provides a slot name via a new parameter (e.g. replication_slot_name), the connector should:
- Use that slot as-is instead of creating a temporary one.
- Integrate with Pathway's persistence layer by storing the LSN of the last acknowledged WAL record as the connector's persistent offset. On restart, the connector should resume reading from that LSN rather than re-snapshotting.
- Emit a warning at startup that the lifecycle of the replication slot (creation, monitoring, and deletion) is the user's responsibility, and that an inactive or abandoned slot can cause unbounded WAL retention on the PostgreSQL server.
The existing behavior (temporary slot, no persistence) should remain the default when no slot name is provided.
Describe alternatives you've considered
- Using
pw.io.debezium.read with Kafka as a persistence-capable alternative — works, but introduces significant operational overhead (Kafka + Debezium) for use cases where a direct Postgres connection would suffice.
- Absorbing the re-snapshot cost: acceptable for small tables but not for large ones where a full re-read on every restart is prohibitive.
Additional context
The documentation for pw.io.postgres.read currently contains the following note:
"There is no need to create a replication slot manually, and doing so is strongly discouraged."
This feature request proposes making the externally-managed slot an opt-in mode with explicit user acknowledgment (via the warning), rather than removing the guardrail from the default path. The LSN-based offset aligns naturally with how PostgreSQL logical replication tracks progress via confirmed_flush_lsn in pg_replication_slots.
Is your feature request related to a problem? Please describe.
pw.io.postgres.readin"streaming"mode currently always creates a temporary replication slot internally. The slot is automatically dropped when the connection is closed. As stated in the documentation, this is an intentional guardrail against WAL accumulation. However, this design makes it impossible to use Pathway's persistence mechanism with the Postgres connector: since the slot is gone after restart, the connector cannot resume from where it left off and must perform a full re-snapshot on every startup.Describe the solution you'd like
Add support for an externally managed (pre-created) replication slot in
pw.io.postgres.read. When the user provides a slot name via a new parameter (e.g.replication_slot_name), the connector should:The existing behavior (temporary slot, no persistence) should remain the default when no slot name is provided.
Describe alternatives you've considered
pw.io.debezium.readwith Kafka as a persistence-capable alternative — works, but introduces significant operational overhead (Kafka + Debezium) for use cases where a direct Postgres connection would suffice.Additional context
The documentation for
pw.io.postgres.readcurrently contains the following note:This feature request proposes making the externally-managed slot an opt-in mode with explicit user acknowledgment (via the warning), rather than removing the guardrail from the default path. The LSN-based offset aligns naturally with how PostgreSQL logical replication tracks progress via
confirmed_flush_lsninpg_replication_slots.