Skip to content

Commit 6596926

Browse files
committed
[Issue #60] Calculate compression ratio("zratio") during backup
1 parent 6062362 commit 6596926

File tree

11 files changed

+435
-27
lines changed

11 files changed

+435
-27
lines changed

Documentation.md

Lines changed: 34 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -708,13 +708,14 @@ pg_probackup displays the list of all the available backups. For example:
708708

709709
```
710710
BACKUP INSTANCE 'node'
711-
============================================================================================================================================
712-
Instance Version ID Recovery time Mode WAL Current/Parent TLI Time Data Start LSN Stop LSN Status
713-
============================================================================================================================================
714-
node 10 P7XDQV 2018-04-29 05:32:59+03 DELTA STREAM 1 / 1 11s 19MB 0/15000060 0/15000198 OK
715-
node 10 P7XDJA 2018-04-29 05:28:36+03 PTRACK STREAM 1 / 1 21s 32MB 0/13000028 0/13000198 OK
716-
node 10 P7XDHU 2018-04-29 05:27:59+03 PAGE STREAM 1 / 1 31s 33MB 0/11000028 0/110001D0 OK
717-
node 10 P7XDHB 2018-04-29 05:27:15+03 FULL STREAM 1 / 0 11s 39MB 0/F000028 0/F000198 OK
711+
======================================================================================================================================
712+
Instance Version ID Recovery time Mode WAL Mode TLI Time Data WAL Zratio Start LSN Stop LSN Status
713+
======================================================================================================================================
714+
node 10 PYSUE8 2019-10-03 15:51:48+03 FULL ARCHIVE 1/0 16s 9047kB 16MB 4.31 0/12000028 0/12000160 OK
715+
node 10 P7XDQV 2018-04-29 05:32:59+03 DELTA STREAM 1/1 11s 19MB 16MB 1.00 0/15000060 0/15000198 OK
716+
node 10 P7XDJA 2018-04-29 05:28:36+03 PTRACK STREAM 1/1 21s 32MB 32MB 1.00 0/13000028 0/13000198 OK
717+
node 10 P7XDHU 2018-04-29 05:27:59+03 PAGE STREAM 1/1 15s 33MB 16MB 1.00 0/11000028 0/110001D0 OK
718+
node 10 P7XDHB 2018-04-29 05:27:15+03 FULL STREAM 1/0 11s 39MB 16MB 1.00 0/F000028 0/F000198 OK
718719
```
719720

720721
For each backup, the following information is provided:
@@ -724,12 +725,14 @@ For each backup, the following information is provided:
724725
- ID — the backup identifier.
725726
- Recovery time — the earliest moment for which you can restore the state of the database cluster.
726727
- Mode — the method used to take this backup. Possible values: FULL, PAGE, DELTA, PTRACK.
727-
- WAL — the WAL delivery mode. Possible values: STREAM and ARCHIVE.
728-
- Current/Parent TLI — timeline identifiers of current backup and its parent.
728+
- WAL Mode — the WAL delivery mode. Possible values: STREAM and ARCHIVE.
729+
- TLI — timeline identifiers of current backup and its parent.
729730
- Time — the time it took to perform the backup.
730-
- Data — the size of the data files in this backup. This value does not include the size of WAL files.
731-
- Start LSN — WAL log sequence number corresponding to the start of the backup process.
732-
- Stop LSN — WAL log sequence number corresponding to the end of the backup process.
731+
- Data — the size of the data files in this backup. This value does not include the size of WAL files. In case of STREAM backup the total size of backup can be calculated as 'Data' + 'WAL'.
732+
- WAL — the uncompressed size of WAL files required to apply by PostgreSQL recovery process to reach consistency.
733+
- Zratio — compression ratio calculated as 'uncompressed-bytes' / 'data-bytes'.
734+
- Start LSN — WAL log sequence number corresponding to the start of the backup process. REDO point for PostgreSQL recovery process to start from.
735+
- Stop LSN — WAL log sequence number corresponding to the end of the backup process. Consistency point for PostgreSQL recovery process.
733736
- Status — backup status. Possible values:
734737

735738
- OK — the backup is complete and valid.
@@ -774,11 +777,29 @@ recovery-xid = 597
774777
recovery-time = '2017-05-16 12:57:31'
775778
data-bytes = 22288792
776779
wal-bytes = 16777216
780+
uncompressed-bytes = 39961833
781+
pgdata-bytes = 39859393
777782
status = OK
778783
parent-backup-id = 'PT8XFX'
779784
primary_conninfo = 'user=backup passfile=/var/lib/pgsql/.pgpass port=5432 sslmode=disable sslcompression=1 target_session_attrs=any'
780785
```
781786

787+
Detailed output has additional attributes:
788+
- compress-alg — compression algorithm used during backup. Possible values: 'zlib', 'pglz', 'none'.
789+
- compress-level — compression level used during backup.
790+
- from-replica — the fact that backup was taken from standby server. Possible values: '1', '0'.
791+
- block-size — (block_size)[https://www.postgresql.org/docs/current/runtime-config-preset.html#GUC-BLOCK-SIZE] setting of PostgreSQL cluster at the moment of backup start.
792+
- wal-block-size — (wal_block_size)[https://www.postgresql.org/docs/current/runtime-config-preset.html#GUC-WAL-BLOCK-SIZE] setting of PostgreSQL cluster at the moment of backup start.
793+
- checksum-version — the fact that PostgreSQL cluster, from which backup is taken, has enabled [data block checksumms](https://www.postgresql.org/docs/current/runtime-config-preset.html#GUC-DATA-CHECKSUMS). Possible values: '1', '0'.
794+
- program-version — full version of pg_probackup binary used to create backup.
795+
- start-time — the backup starting time.
796+
- end-time — the backup ending time.
797+
- uncompressed-bytes — size of the data files before adding page headers and applying compression. You can evaluate the effectiveness of compression by comparing 'uncompressed-bytes' to 'data-bytes' if compression if used.
798+
- pgdata-bytes — size of the PostgreSQL cluster data files at the time of backup. You can evaluate the effectiveness of incremental backup by comparing 'pgdata-bytes' to 'uncompressed-bytes'.
799+
- recovery-xid — current transaction id at the moment of backup ending.
800+
- parent-backup-id — backup ID of parent backup. Available only for incremental backups.
801+
- primary_conninfo — libpq conninfo used for connection to PostgreSQL cluster during backup. The password is not included.
802+
782803
To get more detailed information about the backup in json format, run the show with the backup ID:
783804

784805
pg_probackup show -B backup_dir --instance instance_name --format=json -i backup_id
@@ -851,7 +872,7 @@ For each backup, the following information is provided:
851872
- Max Segno — number of the last existing WAL segment belonging to the timeline.
852873
- N segments — number of WAL segments belonging to the timeline.
853874
- Size — the size files take on disk.
854-
- Zratio - compression ratio calculated as "N segments" * wal_seg_size / "Size".
875+
- Zratio compression ratio calculated as 'N segments' * wal_seg_size / 'Size'.
855876
- N backups — number of backups belonging to the timeline. To get the details about backups, use json format.
856877
- Status — archive status for this exact timeline. Possible values:
857878
- OK — all WAL segments between Min and Max are present.

src/backup.c

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -547,6 +547,22 @@ do_backup_instance(PGconn *backup_conn, PGNodeInfo *nodeInfo)
547547
/* close ssh session in main thread */
548548
fio_disconnect();
549549

550+
/* Calculate pgdata_bytes */
551+
for (i = 0; i < parray_num(backup_files_list); i++)
552+
{
553+
pgFile *file = (pgFile *) parray_get(backup_files_list, i);
554+
555+
/* In case of FULL or DELTA backup we can trust read_size.
556+
* In case of PAGE or PTRACK we are forced to trust datafile size,
557+
* taken at the start of backup.
558+
*/
559+
if (current.backup_mode == BACKUP_MODE_FULL ||
560+
current.backup_mode == BACKUP_MODE_DIFF_DELTA)
561+
current.pgdata_bytes += file->read_size;
562+
else
563+
current.pgdata_bytes += file->size;
564+
}
565+
550566
/* Add archived xlog files into the list of files of this backup */
551567
if (stream_wal)
552568
{
@@ -1934,6 +1950,7 @@ pg_stop_backup(pgBackup *backup, PGconn *pg_startbackup_conn,
19341950
file->crc = pgFileGetCRC(file->path, true, false,
19351951
&file->read_size, FIO_BACKUP_HOST);
19361952
file->write_size = file->read_size;
1953+
file->uncompressed_size = file->read_size;
19371954
free(file->path);
19381955
file->path = strdup(PG_BACKUP_LABEL_FILE);
19391956
parray_append(backup_files_list, file);

src/catalog.c

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1052,6 +1052,12 @@ pgBackupWriteControl(FILE *out, pgBackup *backup)
10521052
if (backup->wal_bytes != BYTES_INVALID)
10531053
fio_fprintf(out, "wal-bytes = " INT64_FORMAT "\n", backup->wal_bytes);
10541054

1055+
if (backup->uncompressed_bytes >= 0)
1056+
fio_fprintf(out, "uncompressed-bytes = " INT64_FORMAT "\n", backup->uncompressed_bytes);
1057+
1058+
if (backup->pgdata_bytes >= 0)
1059+
fio_fprintf(out, "pgdata-bytes = " INT64_FORMAT "\n", backup->pgdata_bytes);
1060+
10551061
fio_fprintf(out, "status = %s\n", status2str(backup->status));
10561062

10571063
/* 'parent_backup' is set if it is incremental backup */
@@ -1121,6 +1127,7 @@ write_backup_filelist(pgBackup *backup, parray *files, const char *root,
11211127
char buf[BUFFERSZ];
11221128
size_t write_len = 0;
11231129
int64 backup_size_on_disk = 0;
1130+
int64 uncompressed_size_on_disk = 0;
11241131
int64 wal_size_on_disk = 0;
11251132

11261133
pgBackupGetPath(backup, path, lengthof(path), DATABASE_FILE_LIST);
@@ -1142,16 +1149,25 @@ write_backup_filelist(pgBackup *backup, parray *files, const char *root,
11421149
i++;
11431150

11441151
if (S_ISDIR(file->mode))
1152+
{
11451153
backup_size_on_disk += 4096;
1154+
uncompressed_size_on_disk += 4096;
1155+
}
11461156

11471157
/* Count the amount of the data actually copied */
11481158
if (S_ISREG(file->mode) && file->write_size > 0)
11491159
{
1150-
/* TODO: in 3.0 add attribute is_walfile */
1160+
/*
1161+
* Size of WAL files in 'pg_wal' is counted separately
1162+
* TODO: in 3.0 add attribute is_walfile
1163+
*/
11511164
if (IsXLogFileName(file->name) && (file->external_dir_num == 0))
11521165
wal_size_on_disk += file->write_size;
11531166
else
1167+
{
11541168
backup_size_on_disk += file->write_size;
1169+
uncompressed_size_on_disk += file->uncompressed_size;
1170+
}
11551171
}
11561172

11571173
/* for files from PGDATA and external files use rel_path
@@ -1235,6 +1251,7 @@ write_backup_filelist(pgBackup *backup, parray *files, const char *root,
12351251
/* use extra variable to avoid reset of previous data_bytes value in case of error */
12361252
backup->data_bytes = backup_size_on_disk;
12371253
backup->wal_bytes = wal_size_on_disk;
1254+
backup->uncompressed_bytes = uncompressed_size_on_disk;
12381255
}
12391256

12401257
/*
@@ -1269,6 +1286,8 @@ readBackupControlFile(const char *path)
12691286
{'t', 0, "recovery-time", &backup->recovery_time, SOURCE_FILE_STRICT},
12701287
{'I', 0, "data-bytes", &backup->data_bytes, SOURCE_FILE_STRICT},
12711288
{'I', 0, "wal-bytes", &backup->wal_bytes, SOURCE_FILE_STRICT},
1289+
{'I', 0, "uncompressed-bytes", &backup->uncompressed_bytes, SOURCE_FILE_STRICT},
1290+
{'I', 0, "pgdata-bytes", &backup->pgdata_bytes, SOURCE_FILE_STRICT},
12721291
{'u', 0, "block-size", &backup->block_size, SOURCE_FILE_STRICT},
12731292
{'u', 0, "xlog-block-size", &backup->wal_block_size, SOURCE_FILE_STRICT},
12741293
{'u', 0, "checksum-version", &backup->checksum_version, SOURCE_FILE_STRICT},
@@ -1513,6 +1532,8 @@ pgBackupInit(pgBackup *backup)
15131532

15141533
backup->data_bytes = BYTES_INVALID;
15151534
backup->wal_bytes = BYTES_INVALID;
1535+
backup->uncompressed_bytes = 0;
1536+
backup->pgdata_bytes = 0;
15161537

15171538
backup->compress_alg = COMPRESS_ALG_DEFAULT;
15181539
backup->compress_level = COMPRESS_LEVEL_DEFAULT;

src/data.c

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -510,6 +510,7 @@ compress_and_backup_page(pgFile *file, BlockNumber blknum,
510510
}
511511

512512
file->write_size += write_buffer_size;
513+
file->uncompressed_size += BLCKSZ;
513514
}
514515

515516
/*
@@ -556,6 +557,7 @@ backup_data_file(backup_files_arg* arguments,
556557
/* reset size summary */
557558
file->read_size = 0;
558559
file->write_size = 0;
560+
file->uncompressed_size = 0;
559561
INIT_FILE_CRC32(true, file->crc);
560562

561563
/* open backup mode file for read */
@@ -625,6 +627,8 @@ backup_data_file(backup_files_arg* arguments,
625627
elog(ERROR, "Failed to read file \"%s\": %s",
626628
file->path, rc == PAGE_CHECKSUM_MISMATCH ? "data file checksum mismatch" : strerror(-rc));
627629
n_blocks_read = rc;
630+
631+
file->uncompressed_size = (n_blocks_read - n_blocks_skipped)*BLCKSZ;
628632
}
629633
else
630634
{
@@ -959,6 +963,7 @@ copy_file(fio_location from_location, const char *to_root,
959963
/* reset size summary */
960964
file->read_size = 0;
961965
file->write_size = 0;
966+
file->uncompressed_size = 0;
962967

963968
/* open backup mode file for read */
964969
in = fio_fopen(file->path, PG_BINARY_R, from_location);
@@ -1046,6 +1051,9 @@ copy_file(fio_location from_location, const char *to_root,
10461051
}
10471052

10481053
file->write_size = (int64) file->read_size;
1054+
1055+
if (file->write_size > 0)
1056+
file->uncompressed_size = file->write_size;
10491057
/* finish CRC calculation and store into pgFile */
10501058
FIN_FILE_CRC32(true, crc);
10511059
file->crc = crc;

src/dir.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1694,6 +1694,7 @@ write_database_map(pgBackup *backup, parray *database_map, parray *backup_files_
16941694
file->crc = pgFileGetCRC(database_map_path, true, false,
16951695
&file->read_size, FIO_BACKUP_HOST);
16961696
file->write_size = file->read_size;
1697+
file->uncompressed_size = file->read_size;
16971698
parray_append(backup_files_list, file);
16981699
}
16991700

src/pg_probackup.h

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,9 @@ typedef struct pgFile
140140
int64 write_size; /* size of the backed-up file. BYTES_INVALID means
141141
that the file existed but was not backed up
142142
because not modified since last backup. */
143+
int64 uncompressed_size; /* size of the backed-up file before compression
144+
* and adding block headers.
145+
*/
143146
/* we need int64 here to store '-1' value */
144147
pg_crc32 crc; /* CRC value of the file, regular file only */
145148
char *linked; /* path of the linked file */
@@ -317,8 +320,17 @@ struct pgBackup
317320
* BYTES_INVALID means nothing was backed up.
318321
*/
319322
int64 data_bytes;
320-
/* Size of WAL files needed to restore this backup */
323+
/* Size of WAL files needed to replay on top of this
324+
* backup to reach the consistency.
325+
*/
321326
int64 wal_bytes;
327+
/* Size of data files before applying compression and block header,
328+
* WAL files are not included.
329+
*/
330+
int64 uncompressed_bytes;
331+
332+
/* Size of data files in PGDATA at the moment of backup. */
333+
int64 pgdata_bytes;
322334

323335
CompressAlg compress_alg;
324336
int compress_level;

src/show.c

Lines changed: 35 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ typedef struct ShowBackendRow
3131
char duration[20];
3232
char data_bytes[20];
3333
char wal_bytes[20];
34+
char zratio[20];
3435
char start_lsn[20];
3536
char stop_lsn[20];
3637
const char *status;
@@ -385,6 +386,18 @@ print_backup_json_object(PQExpBuffer buf, pgBackup *backup)
385386
appendPQExpBuffer(buf, INT64_FORMAT, backup->wal_bytes);
386387
}
387388

389+
if (backup->uncompressed_bytes >= 0)
390+
{
391+
json_add_key(buf, "uncompressed-bytes", json_level);
392+
appendPQExpBuffer(buf, INT64_FORMAT, backup->uncompressed_bytes);
393+
}
394+
395+
if (backup->uncompressed_bytes >= 0)
396+
{
397+
json_add_key(buf, "pgdata-bytes", json_level);
398+
appendPQExpBuffer(buf, INT64_FORMAT, backup->pgdata_bytes);
399+
}
400+
388401
if (backup->primary_conninfo)
389402
json_add_value(buf, "primary_conninfo", backup->primary_conninfo,
390403
json_level, true);
@@ -435,16 +448,16 @@ show_backup(const char *instance_name, time_t requested_backup_id)
435448
static void
436449
show_instance_plain(const char *instance_name, parray *backup_list, bool show_name)
437450
{
438-
#define SHOW_FIELDS_COUNT 13
451+
#define SHOW_FIELDS_COUNT 14
439452
int i;
440453
const char *names[SHOW_FIELDS_COUNT] =
441454
{ "Instance", "Version", "ID", "Recovery Time",
442455
"Mode", "WAL Mode", "TLI", "Time", "Data", "WAL",
443-
"Start LSN", "Stop LSN", "Status" };
456+
"Zratio", "Start LSN", "Stop LSN", "Status" };
444457
const char *field_formats[SHOW_FIELDS_COUNT] =
445458
{ " %-*s ", " %-*s ", " %-*s ", " %-*s ",
446-
" %-*s ", " %-*s ", " %-*s ", " %*s ", " %-*s ", " %-*s ",
447-
" %-*s ", " %-*s ", " %-*s "};
459+
" %-*s ", " %-*s ", " %-*s ", " %*s ", " %*s ", " %*s ",
460+
" %*s ", " %-*s ", " %-*s ", " %-*s "};
448461
uint32 widths[SHOW_FIELDS_COUNT];
449462
uint32 widths_sum = 0;
450463
ShowBackendRow *rows;
@@ -465,6 +478,7 @@ show_instance_plain(const char *instance_name, parray *backup_list, bool show_na
465478
pgBackup *backup = parray_get(backup_list, i);
466479
ShowBackendRow *row = &rows[i];
467480
int cur = 0;
481+
float zratio = 1;
468482

469483
/* Instance */
470484
row->instance = instance_name;
@@ -503,7 +517,6 @@ show_instance_plain(const char *instance_name, parray *backup_list, bool show_na
503517
cur++;
504518

505519
/* Current/Parent TLI */
506-
507520
if (backup->parent_backup_link != NULL)
508521
parent_tli = backup->parent_backup_link->tli;
509522

@@ -540,6 +553,19 @@ show_instance_plain(const char *instance_name, parray *backup_list, bool show_na
540553
widths[cur] = Max(widths[cur], strlen(row->wal_bytes));
541554
cur++;
542555

556+
/* Zratio (compression ratio) */
557+
if (backup->uncompressed_bytes != BYTES_INVALID &&
558+
(backup->uncompressed_bytes > 0 && backup->data_bytes > 0))
559+
{
560+
zratio = (float)backup->uncompressed_bytes / (backup->data_bytes);
561+
snprintf(row->zratio, lengthof(row->zratio), "%.2f", zratio);
562+
}
563+
else
564+
snprintf(row->zratio, lengthof(row->zratio), "%.2f", zratio);
565+
566+
widths[cur] = Max(widths[cur], strlen(row->zratio));
567+
cur++;
568+
543569
/* Start LSN */
544570
snprintf(row->start_lsn, lengthof(row->start_lsn), "%X/%X",
545571
(uint32) (backup->start_lsn >> 32),
@@ -630,6 +656,10 @@ show_instance_plain(const char *instance_name, parray *backup_list, bool show_na
630656
row->wal_bytes);
631657
cur++;
632658

659+
appendPQExpBuffer(&show_buf, field_formats[cur], widths[cur],
660+
row->zratio);
661+
cur++;
662+
633663
appendPQExpBuffer(&show_buf, field_formats[cur], widths[cur],
634664
row->start_lsn);
635665
cur++;

src/util.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -400,6 +400,7 @@ copy_pgcontrol_file(const char *from_root, fio_location from_location,
400400
file->crc = ControlFile.crc;
401401
file->read_size = size;
402402
file->write_size = size;
403+
file->uncompressed_size = size;
403404

404405
join_path_components(to_path, to_root, file->rel_path);
405406
writeControlFile(&ControlFile, to_path, to_location);

tests/archive.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,12 +49,12 @@ def test_pgpro434_1(self):
4949
node.slow_start()
5050

5151
# Recreate backup catalog
52+
self.clean_pb(backup_dir)
5253
self.init_pb(backup_dir)
5354
self.add_instance(backup_dir, 'node', node)
5455

5556
# Make backup
56-
self.backup_node(
57-
backup_dir, 'node', node)
57+
self.backup_node(backup_dir, 'node', node)
5858
node.cleanup()
5959

6060
# Restore Database

0 commit comments

Comments
 (0)