Skip to content

optimizing inspect_delete_snap() for significant speedups under certain workloads with many snapshots #64

@totalAldo

Description

@totalAldo

Hi Allan,

In an effort to improve the total execution time of zxfer, one of the bottlenecks that I encountered was the logic within inspect_delete_snap(). The v1.1.7 implementation uses nested loops to determine which destination snapshots to delete. This logic is even executed when -d is not used. On some of my systems with hundreds of snapshots per dataset, this takes several minutes to execute. e.g. if a source and destination contain 1000 snapshots each, then 1000* 1000 iterations of the loop have to be made each time spawning at least 2 grep + 2 cut commands to compare the snapshot names.

I've refactored the function and moved the nested loops to use temp files, and comm to determine snapshots that don't exist in the destination. This implementation has reduced the execute time from minutes to seconds.

Here's a possible implementation.

This has only been tested in FreeBSD 14.0

https://github.com/totalAldo/zxfer/blob/9a7b4e1da5305952863ddbe518403b8c8c18521b/zxfer#L1655-L1789

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions