@@ -6,17 +6,19 @@ authors: [nik, maxim]
66tags : [Postgres insights, PostgresMarathon, cloning, pgBackRest, performance]
77---
88
9- Suppose you need to create a replica for a 1 TiB database. You have a fast server with NVMe storage and 75 Gbps network, but pg_basebackup typically delivers only 300-500 MiB/s due to its single-threaded architecture — regardless of how powerful your hardware is (though PG18 brings a surprise we'll discuss later).
9+ Suppose you need to create a replica for a 1 TiB database. You have a fast server with NVMe storage and 75 Gbps network, but ` pg_basebackup ` typically delivers only 300-500 MiB/s due to its single-threaded architecture — regardless of how powerful your hardware is (though PG18 brings a surprise we'll discuss later).
1010
11- The solution: replace pg_basebackup with pgBackRest and leverage parallel processing to achieve significantly faster replica creation, saturating (≈97% of) line rate on a 75 Gbps link.
11+ The solution: replace ` pg_basebackup ` with pgBackRest and leverage parallel processing to achieve significantly faster replica creation, saturating (≈97% of) line rate on a 75 Gbps link.
1212
1313** Note:** This is an R&D-style exploration focused on performance benchmarking on idle systems, not a production-ready automation guide. Many considerations important for production environments (monitoring, retry logic, integration with orchestration tools, etc.) are intentionally omitted to focus on the core performance characteristics.
1414
1515<!--truncate-->
1616
17+ *// 2025-11-12: Reviewed and updated for accuracy*
18+
1719## pg_basebackup is single-threaded
1820
19- The standard approach to creating a Postgres replica uses pg_basebackup:
21+ The standard approach to creating a Postgres replica uses ` pg_basebackup` :
2022
2123```bash
2224pg_basebackup \
@@ -30,13 +32,13 @@ pg_basebackup \
3032 --pgdata=$PG_DATA_DIR
3133```
3234
33- Despite having fast NVMe storage and 75 Gbps network capacity, pg_basebackup is fundamentally limited by its single-threaded design. On Postgres versions prior to 18, pg_basebackup typically delivers only 300-500 MiB/s regardless of hardware capabilities.
35+ Despite having fast NVMe storage and 75 Gbps network capacity, ` pg_basebackup` is fundamentally limited by its single-threaded design. On Postgres versions prior to 18, ` pg_basebackup` typically delivers only 300-500 MiB/s regardless of hardware capabilities.
3436
35- PostgreSQL 18's [ new io_uring support ](https://www.postgresql.org/docs/18/runtime-config-resource.html#GUC-IO-METHOD) can speed it up significantly. For our experiment on i4i.32xlarge machines with local NVMe disks, we managed to reach 1.08 GiB/s which is very impressive. But it's still limited. For large databases, this creates a significant operational bottleneck, especially in cases when disk IO and network capacity is high and we could have many gigabytes per second.
37+ PostgreSQL 18's new [asynchronous I/O ](https://www.postgresql.org/docs/18/runtime-config-resource.html#GUC-IO-METHOD) (we left settings default: [`io_method=worker`](https://postgresqlco.nf/doc/en/param/io_method/) and [`io_workers=3`](https://postgresqlco.nf/doc/en/param/io_workers/)) can speed it up significantly. For our experiment on ` i4i.32xlarge` machines with local NVMe disks, we managed to reach 1.08 GiB/s which is very impressive. But it's still limited. For large databases, this creates a significant operational bottleneck, especially in cases when disk IO and network capacity is high and we could have many gigabytes per second.
3638
37- Multi-threaded pg_basebackup has been a recurring topic on pgsql-hackers over the years ([one](https://www.postgresql.org/message-id/CAEHH7R5sBCRyiu5_qanE741VWGE-LPbVnCdxJZh2U1y1BRPW7A@mail.gmail.com), [two](https://www.postgresql.org/message-id/CADM=Jeg3ZN+kPQpiSfeWCXr=xgpLrq4cBQE5ZviUxygKq3VqiA@mail.gmail.com). However, the feature was never completed or merged. To this day, pg_basebackup remains single-threaded.
39+ Multi-threaded ` pg_basebackup` has been a recurring topic on pgsql-hackers over the years ([one](https://www.postgresql.org/message-id/CAEHH7R5sBCRyiu5_qanE741VWGE-LPbVnCdxJZh2U1y1BRPW7A@mail.gmail.com), [two](https://www.postgresql.org/message-id/CADM=Jeg3ZN+kPQpiSfeWCXr=xgpLrq4cBQE5ZviUxygKq3VqiA@mail.gmail.com). However, the feature was never completed or merged. To this day, ` pg_basebackup` remains single-threaded.
3840
39- **Note:** PostgreSQL 18's io_uring support was a pleasant surprise, delivering 1.08 GiB/s compared to the typical 300-500 MiB/s from pre-18 versions. This represents a 2-3x improvement, but it's still single-threaded and leaves most of the available network and storage bandwidth unused.
41+ **Note:** The default performance of PostgreSQL 18, with its async I/O, was a pleasant surprise, delivering 1.08 GiB/s compared to the expected typical 300-500 MiB/s from pre-18 versions. This represents a 2-3x improvement (roughly matching the 3 default I/O workers) , but `pg_basebackup` remains fundamentally single-threaded and leaves most of the available network and storage bandwidth unused.
4042
4143## Alternative: pgBackRest
4244
@@ -46,9 +48,9 @@ pgBackRest is primarily a backup and restore tool for Postgres, but nothing prev
4648
47491. Passwordless SSH access to `postgres@$PRIMARY` via private key
48502. Passwordless psql connection for `$REPLICATION_USER` to `$PRIMARY` (standard SQL connectivity for pgBackRest verification; replication wiring with slot + primary_conninfo happens at the end)
49- 3. pgBackRest installed on both primary and replica servers with the same version. Use a recent version (2.50 +) to ensure PostgreSQL 18 support and optimal performance features
51+ 3. pgBackRest installed on both primary and replica servers with the same version. Use a recent version (2.55 +) to ensure PostgreSQL 18 support and optimal performance features
5052
51- All configuration is performed on the destination server, similar to pg_basebackup.
53+ All configuration is performed on the destination server, similar to ` pg_basebackup` .
5254
5355### Configuration
5456
@@ -195,17 +197,17 @@ For production deployments with WAL archiving configured on the primary, you may
195197
196198Testing environment:
197199- ** Database:** 1.023 TiB (70 pgbench databases, ~ 7 billion rows)
198- - ** Storage:** 8x 3,750 GB NVMe SSD in RAID0 (AWS i4i.32xlarge)
200+ - ** Storage:** 8x 3,750 GB NVMe SSD in RAID0 (AWS ` i4i.32xlarge ` )
199201- ** Network:** 75 Gbps = 9.375 GB/s (decimal) = 8.73 GiB/s (binary). We report GiB/s below
200202- ** CPUs:** 128 vCPUs
201- - ** PostgreSQL:** 18.0 with io_uring support
203+ - ** PostgreSQL:** 18.0 with default worker-based async I/O ( ` io_method=worker ` , ` io_workers=3 ` )
202204- ** Testing:** no network compression (compress-type=none), no checksum verification for maximum throughput
203205- ** Cold cache:** between tests, we restarted PostgreSQL on the replica and flushed OS page cache on both servers (` sync; echo 3 > /proc/sys/vm/drop_caches ` ). This is imperfect but ensures reasonably cold conditions for each run
204206
205207Results (cold cache for each test):
206208
207209``` bash
208- # pg_basebackup baseline (PostgreSQL 18 with io_uring )
210+ # pg_basebackup baseline (PostgreSQL 18 with worker-based async I/O )
209211time pg_basebackup \
210212 --pgdata=" $PG_DATA_DIR " \
211213 --write-recovery-conf \
@@ -281,13 +283,13 @@ The actual on-wire throughput peaks at approximately 8.5 GiB/s (measured via net
281283
282284Key observations:
283285
284- - ** PostgreSQL 18 io_uring improvement:** pg_basebackup achieves 1.08 GiB/s, which is 2-3x faster than older PostgreSQL versions (typically 300-500 MiB/s)
286+ - ** PostgreSQL 18 async I/O improvement:** ` pg_basebackup ` achieves 1.08 GiB/s with worker-based async I/O , which is 2-3x faster than older PostgreSQL versions (typically 300-500 MiB/s) — roughly matching the 3 default I/O workers
285287- ** Nearly linear scaling:** performance doubles with each doubling of parallelism up to 16 processes (96% efficiency)
286288- ** Network saturation at 32 processes:** on-wire saturation at ~ 8.5 GiB/s (~ 97% of theoretical 8.73 GiB/s); logical throughput shows 10.13 GiB/s due to pgBackRest skipping sparse regions and zero-filled blocks
287- - ** Optimal configuration:** ` --process-max=32 ` provides best performance (9.4x faster than pg_basebackup)
289+ - ** Optimal configuration:** ` --process-max=32 ` provides best performance (9.4x faster than ` pg_basebackup ` )
288290- ** Network compression minimal benefit:** ` --compress-level-network=1 ` provides only 5-6% improvement on 75 Gbps network, not worth the CPU overhead for high-bandwidth environments
289291- ** Diminishing returns beyond 32:** higher parallelism adds SSH overhead without throughput improvement
290- - ** For 1 TiB database:** replica creation time reduced from 15.8 minutes to 1.7 minutes (89% faster )
292+ - ** For 1 TiB database:** replica creation time reduced from 15.8 minutes to 1.7 minutes (89% reduction in time )
291293
292294## Practical considerations
293295
@@ -299,18 +301,17 @@ For production deployments:
2993014. Consider using `renice` on the source server if CPU contention affects production workload
3003025. Ensure sufficient I/O capacity on the destination to handle parallel writes
3013036. For high parallelism (64+), increase SSH limits: `MaxStartups=200:30:300` and `MaxSessions=200` in `/etc/ssh/sshd_config`. Alternatively, SSH ControlMaster creates a single persistent connection that all subsequent SSH sessions multiplex through — we haven't tested this approach yet, but it's worth exploring for reducing connection overhead at high parallelism levels
302- 7. For real-world scenarios, consider using `--delta` option to handle interrupted transfers or refresh an existing but stale replica. Delta mode copies only changed files instead of the entire database, which is particularly useful for resuming failed operations or re-syncing a replica that has diverged from the primary
303- 8. Enable checksum verification in production (`--checksum-page=y`) for data integrity validation, though this will reduce throughput. Our benchmarks omitted checksums to measure maximum raw transfer speed
304+ 7. Enable checksum verification in production (`--checksum-page=y`) for data integrity validation, though this will reduce throughput. Our benchmarks omitted checksums to measure maximum raw transfer speed
304305
305306The performance gain is substantial for large databases: reducing replica creation time from 15.8 minutes to 1.7 minutes enables more aggressive disaster recovery testing, faster environment provisioning, and reduced operational risk during failover scenarios.
306307
307308## When to use standard pg_basebackup instead
308309
309- While pgBackRest provides significant performance benefits for large databases with fast infrastructure, stick with standard pg_basebackup in these scenarios:
310+ While pgBackRest provides significant performance benefits for large databases with fast infrastructure, stick with standard ` pg_basebackup` in these scenarios:
310311
311312- **Small databases (< 100 GiB):** setup overhead and complexity outweigh speed benefits. The time saved is minimal and not worth the additional configuration
312313- **Slow storage (non-NVMe):** disk I/O becomes the bottleneck before network utilization. Parallel processing won't help if storage can't keep up
313- - **Limited network (< 10 Gbps):** pg_basebackup can already saturate 1-8 Gbps networks. The single-threaded limitation isn't the bottleneck in these environments
314+ - **Limited network (< 10 Gbps):** ` pg_basebackup` can already saturate 1-8 Gbps networks. The single-threaded limitation isn't the bottleneck in these environments
314315- **Resource-constrained primary:** limited CPU or memory makes parallel processing counterproductive. The overhead of managing multiple processes can degrade primary performance
315316- **Simple environments:** when 15 minutes vs 2 minutes doesn't justify the additional complexity. Sometimes operational simplicity is more valuable than raw speed
316317
@@ -338,17 +339,26 @@ While these benchmarks demonstrate significant performance improvements, there's
338339
339340- **Measure actual network throughput:** our current numbers show " effective" throughput (logical size ÷ time) which exceeds physical network limits due to pgBackRest's optimization of sparse files and metadata. For next iteration, measure actual on-wire throughput using NIC counters or tools like `ifstat`/`sar -n DEV` during the run to distinguish between logical and physical transfer rates
340341
341- - **Testing on newer AWS instances:** AWS i7i and i7ie instances feature 100 Gbps network connectivity (vs. 75 Gbps on i4i), which could deliver approximately 30% better throughput — potentially reaching 13+ GiB/s with pgBackRest at `--process-max=32` or higher
342+ - **Testing on newer AWS instances:** AWS `i7i` and `i7ie` instances feature 100 Gbps network connectivity (vs. 75 Gbps on `i4i`), which could deliver approximately 30% better throughput — potentially reaching 13+ GiB/s with pgBackRest at `--process-max=32` or higher
343+
344+ - **PostgreSQL I/O method comparisons:** test `pg_basebackup` performance with different I/O configurations to understand the impact of each:
345+ - Higher `io_workers` values (default is 3, max is 32) to see if more workers improve throughput
346+ - `io_method=io_uring` for systems built with liburing support
347+ - `io_method=sync` on PostgreSQL 18 and PostgreSQL 17 as baselines to quantify the improvement from async I/O
342348
343349These optimizations may help push throughput even higher and reduce the overhead we observed at very high parallelism levels.
344350
345351## Key takeaways
346352
347- - pg_basebackup's single-threaded architecture limits throughput regardless of hardware capabilities (though PostgreSQL 18's io_uring provides significant improvement)
353+ - ` pg_basebackup` 's single-threaded architecture limits throughput regardless of hardware capabilities (though PostgreSQL 18's worker-based async I/O provides significant improvement)
348354- pgBackRest with parallel processing can utilize full network and disk bandwidth
349355- it makes sense to expect much better throughput to start at `--process-max=8` or higher
350356- optimal process count depends on available CPU cores, network bandwidth, and disk I/O capacity
351357- for very large databases with fast infrastructure, pgBackRest can reduce replica creation time by 6-10x or even more
352358- the setup requires no special configuration on the source database
353- - all operations are performed from the destination server, similar to pg_basebackup workflow
359+ - all operations are performed from the destination server, similar to ` pg_basebackup` workflow
354360- storage location matters critically — always use fast storage (NVMe, not EBS) for pgBackRest's repo and spool directories to avoid 80x performance degradation
361+
362+ ## Acknowledgments
363+
364+ Thanks to David Steele (pgBackRest creator) and Michael Christofides (pgMustard) for corrections and feedback on this article. Our discussion with David also inspired [PR #2693](https://github.com/pgbackrest/pgbackrest/pull/2693) adding native process priority support to pgBackRest.
0 commit comments