Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
8650224
Initial plan
Copilot Mar 3, 2026
d2e6ba3
Add backup snapshot fetch feature: config, task, hook, and e2e test
Copilot Mar 3, 2026
3a9e396
Add config schema and CHANGELOG for backup snapshot fetch feature
Copilot Mar 3, 2026
5a1c404
Fix file handle leak in test; address code review feedback
Copilot Mar 3, 2026
3b9c163
Remove max_size from backup snapshot fetch configuration
Copilot Mar 3, 2026
388c734
Add target_rpc_interface config to backup snapshot fetch
Copilot Mar 4, 2026
59f6b7d
Merge branch 'main' into copilot/add-snapshot-fetching-feature
achamayou Mar 4, 2026
420b307
Merge branch 'main' into copilot/add-snapshot-fetching-feature
achamayou Mar 5, 2026
df810b9
Fix C++ formatting (clang-format) in configuration.h and node_state.h
Copilot Mar 5, 2026
3aaff93
Merge branch 'main' into copilot/add-snapshot-fetching-feature
achamayou Mar 5, 2026
8368add
fix
achamayou Mar 5, 2026
d80036a
fmt
achamayou Mar 5, 2026
df0eafc
..
achamayou Mar 5, 2026
fbd9b2f
Merge branch 'main' into copilot/add-snapshot-fetching-feature
achamayou Mar 6, 2026
7bdd619
Merge branch 'main' into copilot/add-snapshot-fetching-feature
achamayou Mar 6, 2026
383c5a5
Merge branch 'main' into copilot/add-snapshot-fetching-feature
achamayou Mar 6, 2026
ccfa6b8
logging
achamayou Mar 6, 2026
44f9d78
Merge branch 'main' into copilot/add-snapshot-fetching-feature
achamayou Mar 6, 2026
f5428ff
fmt
achamayou Mar 6, 2026
2c8c44e
test size cap
achamayou Mar 6, 2026
fc189dc
Merge branch 'main' into copilot/add-snapshot-fetching-feature
achamayou Mar 6, 2026
15b1189
.
achamayou Mar 6, 2026
38fa2b8
Apply suggestion from @achamayou
achamayou Mar 6, 2026
0bc4eaa
Update tests/e2e_operations.py
achamayou Mar 7, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

### Added

- Backup nodes can now be configured to automatically fetch snapshots from the primary when snapshot evidence is detected. This is controlled by the `snapshots.backup_fetch` configuration section, with `enabled`, `max_attempts`, `retry_interval`, `max_size` and `target_rpc_interface` options. Note that the target RPC interface selected must have the `SnapshotRead` operator feature enabled.
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changelog entry lists the new snapshots.backup_fetch options but omits max_size, which is present in the config struct/schema/template in this PR. Please either document max_size here as well, or remove max_size support from the implementation/schema so the changelog matches actual behaviour.

Copilot uses AI. Check for mistakes.
- Added `ccf::IdentityHistoryNotFetched` exception type to distinguish identity-history-fetching errors from other logic errors in the network identity subsystem (#7708).
- Added `ccf::describe_cose_receipt_v1(receipt)` to obtain COSE receipts with Merkle proof in unprotected header for non-signature TXs, and empty unprotected header for signature TXs (#7700).
- `NetworkIdentitySubsystemInterface` now exposes `get_trusted_keys()`, returning all trusted network identity keys as a `TrustedKeys` map (#7690).
Expand Down
33 changes: 33 additions & 0 deletions doc/host_config_schema/cchost_config.json
Original file line number Diff line number Diff line change
Expand Up @@ -498,6 +498,39 @@
"read_only_directory": {
"type": ["string", "null"],
"description": "Path to read-only snapshots directory"
},
"backup_fetch": {
"type": "object",
"properties": {
"enabled": {
"type": "boolean",
"default": false,
"description": "If true, backup nodes will automatically fetch snapshots from the primary when snapshot evidence is detected"
},
"max_attempts": {
"type": "integer",
"default": 3,
"description": "Maximum number of fetch attempts before giving up",
"minimum": 1
},
"retry_interval": {
"type": "string",
"default": "1000ms",
"description": "Delay between retry attempts"
},
"target_rpc_interface": {
"type": "string",
"default": "primary_rpc_interface",
"description": "Name of the RPC interface on the primary node to use for downloading snapshots. Must have the SnapshotRead feature enabled."
},
"max_size": {
"type": "string",
"default": "200MB",
"description": "Maximum size of snapshot this node is willing to fetch"
}
},
"description": "Configuration for automatic snapshot fetching by backup nodes",
"additionalProperties": false
}
},
"description": "This section includes configuration for the snapshot directories and files",
Expand Down
12 changes: 12 additions & 0 deletions include/ccf/node/startup_config.h
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,18 @@ namespace ccf
size_t tx_count = 10'000;
std::optional<std::string> read_only_directory = std::nullopt;

struct BackupFetch
{
bool enabled = false;
size_t max_attempts = 3;
ccf::ds::TimeString retry_interval = {"1000ms"};
std::string target_rpc_interface = "primary_rpc_interface";
ccf::ds::SizeString max_size = {"200MB"};

bool operator==(const BackupFetch&) const = default;
};
BackupFetch backup_fetch = {};

bool operator==(const Snapshots&) const = default;
};
Snapshots snapshots = {};
Expand Down
16 changes: 15 additions & 1 deletion src/common/configuration.h
Original file line number Diff line number Diff line change
Expand Up @@ -93,10 +93,24 @@ namespace ccf
snp_uvm_endorsements_file,
snp_endorsements_file);

DECLARE_JSON_TYPE_WITH_OPTIONAL_FIELDS(CCFConfig::Snapshots::BackupFetch);
DECLARE_JSON_REQUIRED_FIELDS(CCFConfig::Snapshots::BackupFetch);
DECLARE_JSON_OPTIONAL_FIELDS(
CCFConfig::Snapshots::BackupFetch,
enabled,
max_attempts,
retry_interval,
target_rpc_interface,
max_size);

DECLARE_JSON_TYPE_WITH_OPTIONAL_FIELDS(CCFConfig::Snapshots);
DECLARE_JSON_REQUIRED_FIELDS(CCFConfig::Snapshots);
DECLARE_JSON_OPTIONAL_FIELDS(
CCFConfig::Snapshots, directory, tx_count, read_only_directory);
CCFConfig::Snapshots,
directory,
tx_count,
read_only_directory,
backup_fetch);

DECLARE_JSON_TYPE_WITH_OPTIONAL_FIELDS(CCFConfig);
DECLARE_JSON_REQUIRED_FIELDS(CCFConfig, network);
Expand Down
192 changes: 192 additions & 0 deletions src/node/node_state.h
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@
#include <algorithm>
#include <atomic>
#include <chrono>
#include <limits>
#define FMT_HEADER_ONLY
#include <fmt/format.h>
#include <nlohmann/json.hpp>
Expand Down Expand Up @@ -202,6 +203,153 @@ namespace ccf
}
};

struct BackupSnapshotFetch : public ccf::tasks::BaseTask
{
const ccf::CCFConfig::Snapshots snapshot_config;
ccf::kv::Version since_seqno;
NodeState* owner;

BackupSnapshotFetch(
ccf::CCFConfig::Snapshots snapshot_config_,
ccf::kv::Version since_seqno_,
NodeState* owner_) :
snapshot_config(std::move(snapshot_config_)),
since_seqno(since_seqno_),
owner(owner_)
{}

void do_task_implementation() override
{
struct ClearOnExit
{
NodeState* owner;
~ClearOnExit()
{
std::lock_guard<pal::Mutex> guard(owner->lock);
owner->backup_snapshot_fetch_task = nullptr;
}
} clear_on_exit{owner};

// Resolve the primary's RPC address
std::string primary_address;
std::vector<uint8_t> service_cert;
{
auto primary_id = owner->consensus->primary();
if (!primary_id.has_value())
{
LOG_INFO_FMT(
"BackupSnapshotFetch: No known primary, skipping fetch");
return;
}

auto tx = owner->network.tables->create_read_only_tx();
auto* nodes = tx.ro<ccf::Nodes>(Tables::NODES);
auto node_info = nodes->get(primary_id.value());
if (!node_info.has_value())
{
LOG_INFO_FMT(
"BackupSnapshotFetch: Could not find primary node {} in nodes "
"table",
primary_id.value());
return;
}

// Use the configured RPC interface to find the primary's address
const auto& target_interface =
snapshot_config.backup_fetch.target_rpc_interface;
auto iface_it = node_info->rpc_interfaces.find(target_interface);
if (iface_it == node_info->rpc_interfaces.end())
{
LOG_INFO_FMT(
"BackupSnapshotFetch: Primary node {} does not have RPC "
"interface '{}' configured",
primary_id.value(),
target_interface);
return;
}
primary_address = iface_it->second.published_address;

if (owner->network.identity == nullptr)
{
LOG_INFO_FMT(
"BackupSnapshotFetch: No service identity available, cannot "
"construct TLS credentials for fetching snapshot");
return;
}

service_cert = owner->network.identity->cert.raw();
}

LOG_INFO_FMT(
"BackupSnapshotFetch: Attempting to fetch snapshot from primary at "
"{}",
primary_address);

const auto& bf = snapshot_config.backup_fetch;

auto latest_peer_snapshot = snapshots::fetch_from_peer(
primary_address,
service_cert,
bf.max_attempts,
bf.retry_interval.count_ms(),
bf.max_size.count_bytes(),
since_seqno);

if (latest_peer_snapshot.has_value())
{
LOG_INFO_FMT(
"BackupSnapshotFetch: Received snapshot {} from primary (size: "
"{})",
latest_peer_snapshot->snapshot_name,
latest_peer_snapshot->snapshot_data.size());

const auto snapshot_path =
std::filesystem::path(latest_peer_snapshot->snapshot_name);

if (
snapshot_path.empty() || snapshot_path.is_absolute() ||
snapshot_path.has_parent_path() ||
snapshot_path.filename() != snapshot_path)
{
LOG_FAIL_FMT(
"BackupSnapshotFetch: Rejecting snapshot with invalid name "
"'{}' from primary",
latest_peer_snapshot->snapshot_name);
return;
}

const auto dst_path =
std::filesystem::path(snapshot_config.directory) / snapshot_path;

if (files::exists(dst_path))
{
LOG_INFO_FMT(
"BackupSnapshotFetch: Snapshot {} already exists locally, "
"skipping write",
dst_path.string());
return;
}

files::dump(latest_peer_snapshot->snapshot_data, dst_path);
LOG_INFO_FMT(
"BackupSnapshotFetch: Wrote snapshot {} ({} bytes)",
dst_path.string(),
latest_peer_snapshot->snapshot_data.size());
}
else
{
LOG_INFO_FMT(
"BackupSnapshotFetch: No snapshot available from primary");
}
}

[[nodiscard]] const std::string& get_name() const override
{
static const std::string name = "BackupSnapshotFetch";
return name;
}
};

private:
//
// this node's core state
Expand Down Expand Up @@ -280,6 +428,7 @@ namespace ccf

ccf::tasks::Task join_periodic_task;
ccf::tasks::Task snapshot_fetch_task;
ccf::tasks::Task backup_snapshot_fetch_task;

std::shared_ptr<ccf::kv::AbstractTxEncryptor> make_encryptor()
{
Expand Down Expand Up @@ -2924,6 +3073,49 @@ namespace ccf
return {nullptr};
}));

network.tables->set_global_hook(
network.snapshot_evidence.get_name(),
SnapshotEvidence::wrap_commit_hook(
[this](
[[maybe_unused]] ccf::kv::Version version,
const SnapshotEvidence::Write& w) {
if (!w.has_value())
{
return;
}

auto snapshot_evidence = w.value();

// If backup snapshot fetching is enabled and this node is a
// backup, schedule a fetch task
if (
config.snapshots.backup_fetch.enabled && consensus != nullptr &&
!consensus->is_primary())
{
std::lock_guard<pal::Mutex> guard(lock);
if (
backup_snapshot_fetch_task != nullptr &&
!backup_snapshot_fetch_task->is_cancelled())
{
LOG_DEBUG_FMT(
"Backup snapshot fetch already in progress, skipping");
}
else
{
LOG_INFO_FMT(
"Snapshot evidence detected on backup - scheduling "
"snapshot fetch from primary (since seqno: {})",
snapshot_evidence.version);
backup_snapshot_fetch_task =
std::make_shared<BackupSnapshotFetch>(
config.snapshots,
snapshot_evidence.version - 1 /* YIKES */,
this);
ccf::tasks::add_task(backup_snapshot_fetch_task);
}
}
}));

setup_basic_hooks();
}

Expand Down
24 changes: 18 additions & 6 deletions src/snapshots/fetch.h
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,8 @@ namespace snapshots
static std::optional<SnapshotResponse> try_fetch_from_peer(
const std::string& peer_address,
const std::vector<uint8_t>& peer_ca,
size_t max_size)
size_t max_size,
std::optional<size_t> since_seqno = std::nullopt)
{
try
{
Expand All @@ -222,8 +223,16 @@ namespace snapshots
// redirects terminate, the final response is likely to be extremely
// large so is fetched over multiple requests for a sub-range, returning
// PARTIAL_CONTENT each time.
std::string snapshot_url =
fmt::format("https://{}/node/snapshot", peer_address);
std::string snapshot_url;
if (since_seqno.has_value())
{
snapshot_url = fmt::format(
"https://{}/node/snapshot?since={}", peer_address, *since_seqno);
}
else
{
snapshot_url = fmt::format("https://{}/node/snapshot", peer_address);
}

// Fetch 4MB chunks at a time
constexpr size_t range_size = 4L * 1024 * 1024;
Expand Down Expand Up @@ -440,13 +449,15 @@ namespace snapshots
const std::vector<uint8_t>& peer_ca,
size_t max_attempts,
size_t retry_delay_ms,
size_t max_size)
size_t max_size,
std::optional<size_t> since_seqno = std::nullopt)
{
for (size_t attempt = 0; attempt < max_attempts; ++attempt)
{
LOG_INFO_FMT(
"Fetching snapshot from {} (attempt {}/{})",
"Fetching snapshot from {} since {} (attempt {}/{})",
peer_address,
since_seqno.has_value() ? std::to_string(*since_seqno) : "any",
attempt + 1,
max_attempts);

Expand All @@ -455,7 +466,8 @@ namespace snapshots
std::this_thread::sleep_for(std::chrono::milliseconds(retry_delay_ms));
}

auto response = try_fetch_from_peer(peer_address, peer_ca, max_size);
auto response =
try_fetch_from_peer(peer_address, peer_ca, max_size, since_seqno);
if (response.has_value())
{
return response;
Expand Down
9 changes: 8 additions & 1 deletion tests/config.jinja
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,14 @@
{
"directory": "{{ snapshots_dir }}",
"tx_count": {{ snapshot_tx_interval }},
"read_only_directory": {{ read_only_snapshots_dir|tojson }}
"read_only_directory": {{ read_only_snapshots_dir|tojson }}{% if backup_snapshot_fetch_enabled %},
"backup_fetch": {
"enabled": true{% if backup_snapshot_fetch_max_attempts %},
"max_attempts": {{ backup_snapshot_fetch_max_attempts }}{% endif %}{% if backup_snapshot_fetch_retry_interval %},
"retry_interval": "{{ backup_snapshot_fetch_retry_interval }}"{% endif %}{% if backup_snapshot_fetch_target_rpc_interface %},
"target_rpc_interface": "{{ backup_snapshot_fetch_target_rpc_interface }}"{% endif %}{% if backup_snapshot_fetch_max_size %},
"max_size": "{{ backup_snapshot_fetch_max_size }}"{% endif %}
}{% endif %}
},
"logging":
{
Expand Down
Loading