Skip to content

Rework of FDB Wipe + Remote FDB Wipe#184

Open
ChrisspyB wants to merge 121 commits intodevelopfrom
wipe-changes
Open

Rework of FDB Wipe + Remote FDB Wipe#184
ChrisspyB wants to merge 121 commits intodevelopfrom
wipe-changes

Conversation

@ChrisspyB
Copy link
Member

@ChrisspyB ChrisspyB commented Oct 20, 2025

Description

  • Complete rework of the wipe visitor mechanism. We now visit the Catalogue in order to produce a WipeState object, which contains URIs to be deleted/marked as safe. The client then forwards this information to the relevant Stores.
  • This abstraction should allow for mixed Catalogue/Store backends.
  • This PR also adds the corresponding functionality for Wiping in Remote FDB.

Some caveats:

  • There are a couple of TODO's in the code. I will continue chipping away at these.
  • DAOS wipe functionality is broken. This is currently preventing build on the CI. I will fix things so that they build, but further work will be required to reimplement the DAOS wipe.
  • --unsafe-wipe-all will hit an assert(false) on remote FDB, as I do not want this functionality supported without better testing / some discussion.
  • I am currently trying to test the remote functionality.

Contributor Declaration

By opening this pull request, I affirm the following:

  • All authors agree to the Contributor License Agreement.
  • The code follows the project's coding standards.
  • I have performed self-review and added comments where needed.
  • I have added or updated tests to verify that my changes are effective and functional.
  • I have run all existing tests and confirmed they pass.

🌈🌦️📖🚧 Documentation 🚧📖🌦️🌈
https://sites.ecmwf.int/docs/dev-section/fdb/pull-requests/PR-184

🌈🌦️📖🚧 Documentation Z3FDB 🚧📖🌦️🌈
https://sites.ecmwf.int/docs/dev-section/z3fdb/pull-requests/PR-184

🌈🌦️📖🚧 Documentation FDB 🚧📖🌦️🌈
https://sites.ecmwf.int/docs/dev-section/fdb/pull-requests/PR-184

FDB-508 Fix merge of daos wipe changes
@ChrisspyB ChrisspyB marked this pull request as ready for review February 9, 2026 09:34
danovaro and others added 4 commits February 12, 2026 00:33
Tests have had to be further updated to account for the bug reported in FDB-633 being fixed, as this has increased the number of index files written when subtocs are used.

format
Copy link
Contributor

@simondsmart simondsmart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super happy with this. Only very minor comments, and happy for it to be merged without my re-review after this.

Can you make sure that we include the diagrams that you drew of the overall wipe process/structure, and reference these in an obvious comment at the wipe entry point. There is a lot of implicit overall flow/state knowledge which is needed to work on this code, and we want to give any future developers a heads up...

[](const auto& pair) { return !pair.second.empty(); });

if (doit && unclean && !unsafeWipeAll) {
eckit::Log::warning() << "Unclean FDB database has the following unknown URIs:" << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to error out here, throwing an exception, then this should probably go to Log::error()

CatalogueWipeState catalogueWipeState;
while (it.next(catalogueWipeState)) {

auto elements = coordinator.wipe(catalogueWipeState, doit, unsafeWipeAll);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The elements here are constructed inside the coordinator. Consider whether we should just pass the queue to coordinator, and have it push directly? (This may be a bad idea, but it looks to me like the output elements are all constructed after the action has taken place so it should be functionally equivalent?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a sensible suggestion

#include "eckit/filesystem/PathName.h"

template <>
struct std::hash<eckit::URI> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably belongs in eckit in URI.h

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely a bit random to see it in WipeVisitor.h... I wonder if it is still used.

#include "eckit/filesystem/PathName.h"

template <>
struct std::hash<eckit::URI> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that specialising std::hash (or anything else) is the one thing that should be done inside namespace std!!!

#include "eckit/filesystem/PathName.h"

template <>
struct std::hash<eckit::URI> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should almost certainly be in eckit, in URI.h

Note that specialising a template in namespace std is the one thing that you should be doing inside namesapace std.

catalogueWipeState_.catalogue(catalogue.config());

// Build the initial control state (is there really not a function for this?)
ControlIdentifiers id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like it. But do have a look at the constructor ControlElement::ControlElement(const Catalogue&) - which should probably be abstracted to somewhere else.

catalogueWipeState_.excludeData(dataURI);
}
}
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns true, but I don't see us having (or needed) calls to visitDatum anywhere?

bool TocStore::doWipeUnknowns(const std::set<eckit::URI>& unknownURIs) const {
for (const auto& uri : unknownURIs) {
if (uri.path().exists()) {
remove(uri, eckit::Log::info(), eckit::Log::info(), true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we have "using namespace eckit" at the top of this file. We can tidy quite a lot of eckit:: in this file.


//----------------------------------------------------------------------------------------------------------------------

StdDir::StdDir(const eckit::PathName& p) : path_(p), d_(opendir(p.localPath())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from elsewhere, no? Is it not possible to put this somewhere sane for common use?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC FDB's StdDir is confusingly functionally distinct from eckit's StdDir. But I should check.

Adds [CATALOGUE] and [STORE] to the revelevant server logs to allow for better filtering when debugging.
Also enable FDB_DEBUG for the server tests, as the serverside logs are otherwise quite sparse and not useful for debugging.
@ChrisspyB ChrisspyB removed the request for review from Ozaq February 17, 2026 12:39
And minor change to remote_api test logging and modify it such that
client sleeps while the server flushes the consolidated index.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants