Skip to content

Commit a6d4183

Browse files
committed
Auto-GarbageCollect on PurgeOldBackups and DeleteBackup (facebook#6015)
Summary: Only if there is a crash, power failure, or I/O error in DeleteBackup, shared or private files from the backup might be left behind that are not cleaned up by PurgeOldBackups or DeleteBackup-- only by GarbageCollect. This makes the BackupEngine API "leaky by default." Even if it means a modest performance hit, I think we should make Delete and Purge do as they say, with ongoing best effort: i.e. future calls will attempt to finish any incomplete work from earlier calls. This change does that by having DeleteBackup and PurgeOldBackups do a GarbageCollect, unless (to minimize performance hit) this BackupEngine has already done a GarbageCollect and there have been no deletion-related I/O errors in that GarbageCollect or since then. Rejected alternative 1: remove meta file last instead of first. This would in theory turn partially deleted backups into corrupted backups, but code changes would be needed to allow the missing files and consider it acceptably corrupt, rather than failing to open the BackupEngine. This might be a reasonable choice, but I mostly rejected it because it doesn't solve the legacy problem of cleaning up existing lingering files. Rejected alternative 2: use a deletion marker file. If deletion started with creating a file that marks a backup as flagged for deletion, then we could reliably detect partially deleted backups and efficiently finish removing them. In addition to not solving the legacy problem, this could be precarious if there's a disk full situation, and we try to create a new file in order to delete some files. Ugh. Pull Request resolved: facebook#6015 Test Plan: Updated unit tests Differential Revision: D18401333 Pulled By: pdillinger fbshipit-source-id: 12944e372ce6809f3f5a4c416c3b321a8927d925
1 parent cb1dc29 commit a6d4183

File tree

4 files changed

+207
-73
lines changed

4 files changed

+207
-73
lines changed

HISTORY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
### Bug Fixes
44
* Fix a assertion failure in MultiGe4t() when BlockBasedTableOptions::no_block_cache is true and there is no compressed block cache
55
* Fix a buffer overrun problem in BlockBasedTable::MultiGet() when compression is enabled and no compressed block cache is configured.
6+
* If a call to BackupEngine::PurgeOldBackups or BackupEngine::DeleteBackup suffered a crash, power failure, or I/O error, files could be left over from old backups that could only be purged with a call to GarbageCollect. Any call to PurgeOldBackups, DeleteBackup, or GarbageCollect should now suffice to purge such files.
67

78
## 6.5.1 (10/16/2019)
89
### Bug Fixes

include/rocksdb/utilities/backupable_db.h

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -325,7 +325,10 @@ class BackupEngine {
325325

326326
// Will delete all the files we don't need anymore
327327
// It will do the full scan of the files/ directory and delete all the
328-
// files that are not referenced.
328+
// files that are not referenced. PurgeOldBackups() and DeleteBackup()
329+
// will do a similar operation as needed to clean up from any incomplete
330+
// deletions, so this function is not really needed if calling one of
331+
// those.
329332
virtual Status GarbageCollect() = 0;
330333
};
331334

utilities/backupable/backupable_db.cc

Lines changed: 98 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -9,24 +9,10 @@
99

1010
#ifndef ROCKSDB_LITE
1111

12-
#include "rocksdb/utilities/backupable_db.h"
13-
#include "file/filename.h"
14-
#include "logging/logging.h"
15-
#include "port/port.h"
16-
#include "rocksdb/rate_limiter.h"
17-
#include "rocksdb/transaction_log.h"
18-
#include "test_util/sync_point.h"
19-
#include "util/channel.h"
20-
#include "util/coding.h"
21-
#include "util/crc32c.h"
22-
#include "util/file_reader_writer.h"
23-
#include "util/string_util.h"
24-
#include "utilities/checkpoint/checkpoint_impl.h"
25-
26-
#include <cinttypes>
2712
#include <stdlib.h>
2813
#include <algorithm>
2914
#include <atomic>
15+
#include <cinttypes>
3016
#include <functional>
3117
#include <future>
3218
#include <limits>
@@ -39,6 +25,20 @@
3925
#include <unordered_set>
4026
#include <vector>
4127

28+
#include "file/filename.h"
29+
#include "logging/logging.h"
30+
#include "port/port.h"
31+
#include "rocksdb/rate_limiter.h"
32+
#include "rocksdb/transaction_log.h"
33+
#include "rocksdb/utilities/backupable_db.h"
34+
#include "test_util/sync_point.h"
35+
#include "util/channel.h"
36+
#include "util/coding.h"
37+
#include "util/crc32c.h"
38+
#include "util/file_reader_writer.h"
39+
#include "util/string_util.h"
40+
#include "utilities/checkpoint/checkpoint_impl.h"
41+
4242
namespace rocksdb {
4343

4444
void BackupStatistics::IncrementNumberSuccessBackup() {
@@ -120,6 +120,7 @@ class BackupEngineImpl : public BackupEngine {
120120

121121
private:
122122
void DeleteChildren(const std::string& dir, uint32_t file_type_filter = 0);
123+
Status DeleteBackupInternal(BackupID backup_id);
123124

124125
// Extends the "result" map with pathname->size mappings for the contents of
125126
// "dir" in "env". Pathnames are prefixed with "dir".
@@ -456,6 +457,10 @@ class BackupEngineImpl : public BackupEngine {
456457
std::mutex byte_report_mutex_;
457458
channel<CopyOrCreateWorkItem> files_to_copy_or_create_;
458459
std::vector<port::Thread> threads_;
460+
// Certain operations like PurgeOldBackups and DeleteBackup will trigger
461+
// automatic GarbageCollect (true) unless we've already done one in this
462+
// session and have not failed to delete backup files since then (false).
463+
bool might_need_garbage_collect_ = true;
459464

460465
// Adds a file to the backup work queue to be copied or created if it doesn't
461466
// already exist.
@@ -559,6 +564,9 @@ Status BackupEngineImpl::Initialize() {
559564
options_.Dump(options_.info_log);
560565

561566
if (!read_only_) {
567+
// we might need to clean up from previous crash or I/O errors
568+
might_need_garbage_collect_ = true;
569+
562570
// gather the list of directories that we need to create
563571
std::vector<std::pair<std::string, std::unique_ptr<Directory>*>>
564572
directories;
@@ -928,8 +936,8 @@ Status BackupEngineImpl::CreateNewBackupWithMetadata(
928936
ROCKS_LOG_INFO(options_.info_log, "Backup Statistics %s\n",
929937
backup_statistics_.ToString().c_str());
930938
// delete files that we might have already written
939+
might_need_garbage_collect_ = true;
931940
DeleteBackup(new_backup_id);
932-
GarbageCollect();
933941
return s;
934942
}
935943

@@ -957,6 +965,10 @@ Status BackupEngineImpl::CreateNewBackupWithMetadata(
957965
Status BackupEngineImpl::PurgeOldBackups(uint32_t num_backups_to_keep) {
958966
assert(initialized_);
959967
assert(!read_only_);
968+
969+
// Best effort deletion even with errors
970+
Status overall_status = Status::OK();
971+
960972
ROCKS_LOG_INFO(options_.info_log, "Purging old backups, keeping %u",
961973
num_backups_to_keep);
962974
std::vector<BackupID> to_delete;
@@ -966,17 +978,44 @@ Status BackupEngineImpl::PurgeOldBackups(uint32_t num_backups_to_keep) {
966978
itr++;
967979
}
968980
for (auto backup_id : to_delete) {
969-
auto s = DeleteBackup(backup_id);
981+
auto s = DeleteBackupInternal(backup_id);
970982
if (!s.ok()) {
971-
return s;
983+
overall_status = s;
972984
}
973985
}
974-
return Status::OK();
986+
// Clean up after any incomplete backup deletion, potentially from
987+
// earlier session.
988+
if (might_need_garbage_collect_) {
989+
auto s = GarbageCollect();
990+
if (!s.ok() && overall_status.ok()) {
991+
overall_status = s;
992+
}
993+
}
994+
return overall_status;
975995
}
976996

977997
Status BackupEngineImpl::DeleteBackup(BackupID backup_id) {
998+
auto s1 = DeleteBackupInternal(backup_id);
999+
auto s2 = Status::OK();
1000+
1001+
// Clean up after any incomplete backup deletion, potentially from
1002+
// earlier session.
1003+
if (might_need_garbage_collect_) {
1004+
s2 = GarbageCollect();
1005+
}
1006+
1007+
if (!s1.ok()) {
1008+
return s1;
1009+
} else {
1010+
return s2;
1011+
}
1012+
}
1013+
1014+
// Does not auto-GarbageCollect
1015+
Status BackupEngineImpl::DeleteBackupInternal(BackupID backup_id) {
9781016
assert(initialized_);
9791017
assert(!read_only_);
1018+
9801019
ROCKS_LOG_INFO(options_.info_log, "Deleting backup %u", backup_id);
9811020
auto backup = backups_.find(backup_id);
9821021
if (backup != backups_.end()) {
@@ -997,6 +1036,10 @@ Status BackupEngineImpl::DeleteBackup(BackupID backup_id) {
9971036
corrupt_backups_.erase(corrupt);
9981037
}
9991038

1039+
// After removing meta file, best effort deletion even with errors.
1040+
// (Don't delete other files if we can't delete the meta file right
1041+
// now.)
1042+
10001043
if (options_.max_valid_backups_to_open == port::kMaxInt32) {
10011044
std::vector<std::string> to_delete;
10021045
for (auto& itr : backuped_file_infos_) {
@@ -1005,6 +1048,10 @@ Status BackupEngineImpl::DeleteBackup(BackupID backup_id) {
10051048
ROCKS_LOG_INFO(options_.info_log, "Deleting %s -- %s",
10061049
itr.first.c_str(), s.ToString().c_str());
10071050
to_delete.push_back(itr.first);
1051+
if (!s.ok()) {
1052+
// Trying again later might work
1053+
might_need_garbage_collect_ = true;
1054+
}
10081055
}
10091056
}
10101057
for (auto& td : to_delete) {
@@ -1023,6 +1070,10 @@ Status BackupEngineImpl::DeleteBackup(BackupID backup_id) {
10231070
Status s = backup_env_->DeleteDir(GetAbsolutePath(private_dir));
10241071
ROCKS_LOG_INFO(options_.info_log, "Deleting private dir %s -- %s",
10251072
private_dir.c_str(), s.ToString().c_str());
1073+
if (!s.ok()) {
1074+
// Full gc or trying again later might work
1075+
might_need_garbage_collect_ = true;
1076+
}
10261077
return Status::OK();
10271078
}
10281079

@@ -1505,8 +1556,15 @@ Status BackupEngineImpl::InsertPathnameToSizeBytes(
15051556

15061557
Status BackupEngineImpl::GarbageCollect() {
15071558
assert(!read_only_);
1559+
1560+
// We will make a best effort to remove all garbage even in the presence
1561+
// of inconsistencies or I/O failures that inhibit finding garbage.
1562+
Status overall_status = Status::OK();
1563+
// If all goes well, we don't need another auto-GC this session
1564+
might_need_garbage_collect_ = false;
1565+
15081566
ROCKS_LOG_INFO(options_.info_log, "Starting garbage collection");
1509-
if (options_.max_valid_backups_to_open == port::kMaxInt32) {
1567+
if (options_.max_valid_backups_to_open != port::kMaxInt32) {
15101568
ROCKS_LOG_WARN(
15111569
options_.info_log,
15121570
"Garbage collection is limited since `max_valid_backups_to_open` "
@@ -1533,7 +1591,9 @@ Status BackupEngineImpl::GarbageCollect() {
15331591
s = Status::OK();
15341592
}
15351593
if (!s.ok()) {
1536-
return s;
1594+
overall_status = s;
1595+
// Trying again later might work
1596+
might_need_garbage_collect_ = true;
15371597
}
15381598
}
15391599
for (auto& child : shared_children) {
@@ -1553,6 +1613,10 @@ Status BackupEngineImpl::GarbageCollect() {
15531613
ROCKS_LOG_INFO(options_.info_log, "Deleting %s -- %s",
15541614
rel_fname.c_str(), s.ToString().c_str());
15551615
backuped_file_infos_.erase(rel_fname);
1616+
if (!s.ok()) {
1617+
// Trying again later might work
1618+
might_need_garbage_collect_ = true;
1619+
}
15561620
}
15571621
}
15581622
}
@@ -1563,7 +1627,9 @@ Status BackupEngineImpl::GarbageCollect() {
15631627
auto s = backup_env_->GetChildren(GetAbsolutePath(GetPrivateDirRel()),
15641628
&private_children);
15651629
if (!s.ok()) {
1566-
return s;
1630+
overall_status = s;
1631+
// Trying again later might work
1632+
might_need_garbage_collect_ = true;
15671633
}
15681634
}
15691635
for (auto& child : private_children) {
@@ -1587,14 +1653,23 @@ Status BackupEngineImpl::GarbageCollect() {
15871653
ROCKS_LOG_INFO(options_.info_log, "Deleting %s -- %s",
15881654
(full_private_path + subchild).c_str(),
15891655
s.ToString().c_str());
1656+
if (!s.ok()) {
1657+
// Trying again later might work
1658+
might_need_garbage_collect_ = true;
1659+
}
15901660
}
15911661
// finally delete the private dir
15921662
Status s = backup_env_->DeleteDir(full_private_path);
15931663
ROCKS_LOG_INFO(options_.info_log, "Deleting dir %s -- %s",
15941664
full_private_path.c_str(), s.ToString().c_str());
1665+
if (!s.ok()) {
1666+
// Trying again later might work
1667+
might_need_garbage_collect_ = true;
1668+
}
15951669
}
15961670

1597-
return Status::OK();
1671+
assert(overall_status.ok() || might_need_garbage_collect_);
1672+
return overall_status;
15981673
}
15991674

16001675
// ------- BackupMeta class --------

0 commit comments

Comments
 (0)