Skip to content

Commit 239e992

Browse files
author
Patrick Reynolds
committed
Run uniq | sort | uniq
The data we're sorting has clusters of duplicates in the input, because `dirname` reduces all repos in the same network (i.e., forks) to the same network path. Running `uniq` before `sort` eliminates those duplicates, which means `sort` requires less CPU and RAM to do its thing. We still need `uniq` on the output end, because there's no guarantee that all duplicates in the input are clustered. I've run tests, and the cost of `uniq` is small enough that it does no harm if the input has no duplicates at all.
1 parent 01cda66 commit 239e992

File tree

2 files changed

+4
-4
lines changed

2 files changed

+4
-4
lines changed

share/github-backup-utils/ghe-backup-repositories

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -366,8 +366,8 @@ bm_end "$(basename $0) - Special Data Directories Sync"
366366

367367
if [ -z "$GHE_SKIP_ROUTE_VERIFICATION" ]; then
368368
bm_start "$(basename $0) - Verifying Routes"
369-
cat $tempdir/*.rsync | sort | uniq > $tempdir/source_routes
370-
(cd $backup_dir/ && find * -mindepth 5 -maxdepth 6 -type d -name \*.git | fix_paths_for_ghe_version | sort | uniq) > $tempdir/destination_routes
369+
cat $tempdir/*.rsync | uniq | sort | uniq > $tempdir/source_routes
370+
(cd $backup_dir/ && find * -mindepth 5 -maxdepth 6 -type d -name \*.git | fix_paths_for_ghe_version | uniq | sort | uniq) > $tempdir/destination_routes
371371

372372
git --no-pager diff --unified=0 --no-prefix -- $tempdir/source_routes $tempdir/destination_routes || echo "Warning: One or more repository networks and/or gists were not found on the source appliance. Please contact GitHub Enterprise Support for assistance."
373373

share/github-backup-utils/ghe-backup-storage

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -141,8 +141,8 @@ bm_end "$(basename $0) - Storage object sync"
141141
if [ -z "$GHE_SKIP_ROUTE_VERIFICATION" ]; then
142142
bm_start "$(basename $0) - Verifying Routes"
143143

144-
cat $tempdir/*.rsync | sort | uniq > $tempdir/source_routes
145-
(cd $backup_dir/ && find * -mindepth 3 -maxdepth 3 -type f -print | sort | uniq) > $tempdir/destination_routes
144+
cat $tempdir/*.rsync | uniq | sort | uniq > $tempdir/source_routes
145+
(cd $backup_dir/ && find * -mindepth 3 -maxdepth 3 -type f -print | uniq | sort | uniq) > $tempdir/destination_routes
146146

147147
git --no-pager diff --unified=0 --no-prefix -- $tempdir/source_routes $tempdir/destination_routes || echo "Warning: One or more storage objects were not found on the source appliance. Please contact GitHub Enterprise Support for assistance."
148148

0 commit comments

Comments
 (0)