Skip to content

Conversation

@tnull
Copy link
Contributor

@tnull tnull commented Aug 22, 2025

Previously, we omitted setting ChaCha20Poly1305RFC's aad field, which had the tag not commit to any particular key. This would allow a malicious VSS provider to substitute blobs stored under a different key without the client noticing.

Here, we allow to set the aad (which usually should be the full key under which we expect the Storable to get stored). This change is backwards compatible if users provide a static aad: &[] argument. Otherwise, users will need to find an upgrade path client-side.

Previously, we omitted setting `ChaCha20Poly1305RFC`'s `aad` field,
which had the `tag` not commit to any particular key. This would allow a
malicious VSS provider to substitute blobs stored under a different key
without the client noticing.

Here, we allow to set the `aad` (which usually should be the full key
under which we expect the `Storable` to get stored). This change is
backwards compatible if users provide a static `aad: &[]` argument.
Otherwise, users will need to find an upgrade path client-side.
@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Aug 22, 2025

👋 Thanks for assigning @TheBlueMatt as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

tnull added 2 commits August 22, 2025 11:23
Previously, this `unwrap` could have us panic if the service wouldn't
provide a message including the `EncryptionMetadata`. Here we fix this
by simply erroring out if the expected metadata is not present.
There is now reason `StorableBuilder` needs to permanently own the
`data_encryption_key`, which just risks that the key will linger around
in memory after `StorableBuilder` has been dropped. Here, we switch the
API to simply take `data_encryption_key` by reference for the
`build`/`deconstruct` operations.
@tnull tnull force-pushed the 2025-08-include-key-aad branch 2 times, most recently from 3c62a9f to 03ca9d9 Compare August 22, 2025 09:48
@jurvis
Copy link

jurvis commented Aug 22, 2025

the core of the issue of not having the AAD is that we can no longer authenticate the storage context of the encrypted blob.

another attack that is still possible is that the VSS provider can claim thing that an encrypted blob came from a different store_id, despite the underlying blob being the same. Works something like this:

  • Since one DEK is reused across tenants and stores (i.e. S1, S2)
  • A client can write value to S2 with a valid triple (nonce, ciphertext, tag)
  • But a malicious VSS can return the S2 blob, but label it as coming from S1
  • Client verifies AEAD tag successfully and accepts whatever S2

Not really sure how this becomes bad in practice, but also something worth keeping in mind.

@jurvis
Copy link

jurvis commented Aug 22, 2025

something that just came to mind while thinking about backwards-compatibility.

A server could also:

  1. Return an older triple for the same key OR pick any version it wants for PlaintextBlob
  2. And since AAD is now empty, we don't know if it's actually the latest version and the client cannot verify and could get stale state

I'm not sure adding something to AAD alone would help here at all. The App would need to store some state, which runs counter to VSS' use-case as a backup store (how do you verify something you don't have information about?)

///
/// [`PutObjectRequest`]: crate::types::PutObjectRequest
pub fn build(&self, input: Vec<u8>, version: i64) -> Storable {
pub fn build(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if we could be opinionated here for future writes and have the AAD commit to:

  1. The key-value key (lol) [fixes the issue mentioned in pr]
  2. store_id [fixes potential multi-tenancy issues]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: 2.:

  1. The key-value key (lol) [fixes the issue mentioned in pr]

Hmm, that might be an option, but (especially for backwards compat) the current approach where the 'user' determines the aad (given that they also build the actual storable) might be more flexible?

  1. store_id [fixes potential multi-tenancy issues]

I guess we also could consider committing to the store id, but it would mean it can never change, e.g., that services can't ever move data between stores for the user.

Also, as noted below, I doubt the described replay attacks can be fully mitigated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On whether we should commit to store_id:

Since a VSS store is:

pub struct VssStore {
	client: VssClient<CustomRetryPolicy>,
	store_id: String,
	runtime: Arc<Runtime>,
	data_encryption_key: [u8; KEY_LENGTH],
	key_obfuscator: KeyObfuscator,
}

We already have the notion that each store_id should have a unique DEK.

So as long as the user uses a different DEK for each store_id, the server won't be able to move data to a different store_id without being transparent with the user.

If I understand things correctly here, I would rather not commit to the store_id in the aad, and suggest users use a different DEK for each store_id - which seems to fit well with the current VssStore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for committing to the key-value key, yea I think letting the user set the aad so that we have an easier time with backwards compat makes the most sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have the notion that each store_id should have a unique DEK.

Huh, why? Users currently can change the store_id when setting up LDK with VSS, without changing their entropy, and the store_id doesn't influence how the DEK is derived?

If I understand things correctly here, I would rather not commit to the store_id in the aad, and suggest users use a different DEK for each store_id - which seems to fit well with the current VssStore.

Right, we could change the DEK derivation scheme to include the store_id, but a) we'll need to figure out how to maintain backwards compatibility for the 'legacy' derivation and b) this would mean that we can't easily migrate between stores/store ids. Might be debatable if that's what we want, but we'd def. block off that road.

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users currently can change the store_id when setting up LDK with VSS, without changing their entropy, and the store_id doesn't influence how the DEK is derived?

Ah yes you are right.

this would mean that we can't easily migrate between stores/store ids. Might be debatable if that's what we want, but we'd def. block off that road.

Should migrations between store_id of this kind be handled by the user, not by the server ? Ie list all keys, get all the values from the old store, and put them in the new store.

Right, we could change the DEK derivation scheme to include the store_id

In that case I would rather just commit to the store_id in the aad instead of including it in the DEK.

we'll need to figure out how to maintain backwards compatibility for the 'legacy' derivation

1- How important is it to maintain backwards compat at this point ? This feature is alpha after all.
2- If we want to maintain backwards compat, could we create a migration routine on startup that triggers for old VSS stores, and transitions them to the new store + scheme ? We'd have to support both schemes at once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should migrations between store_id of this kind be handled by the user, not by the server ? Ie list all keys, get all the values from the old store, and put them in the new store.

Yes, that sounds reasonable. Just saying we're blocking off the other way and need to take that step consciously. Committing to the store id also manifests that it needs to remain the users' choice I believe?

How important is it to maintain backwards compat at this point ? This feature is alpha after all.

'Alpha' or not, we have users in production with it. So at the very least we need to offer some migration path for them.

2- If we want to maintain backwards compat, could we create a migration routine on startup that triggers for old VSS stores, and transitions them to the new store + scheme ? We'd have to support both schemes at once.

I think this discussion is better had over on the LDK Node PR, but I was thinking to introduce a vss_version (or similar) key that is read first thing and default to 0 if it's absent.

@ldk-reviews-bot
Copy link

🔔 1st Reminder

Hey @jkczyz! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@tankyleo tankyleo requested review from tankyleo and removed request for jkczyz August 25, 2025 06:08
@ldk-reviews-bot
Copy link

🔔 1st Reminder

Hey @tankyleo! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 2nd Reminder

Hey @tankyleo! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 3rd Reminder

Hey @tankyleo! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@tnull
Copy link
Contributor Author

tnull commented Sep 1, 2025

another attack that is still possible is that the VSS provider can claim thing that an encrypted blob came from a different store_id, despite the underlying blob being the same. Works something like this:

Hmm, but both stores would have to be created by the same user, right?

something that just came to mind while thinking about backwards-compatibility.

A server could also:

  1. Return an older triple for the same key OR pick any version it wants for PlaintextBlob

Not sure I'm following here? The server would never learn the version included in the encrypted data, no?
Yes, they could replay an older value, but our key-level versioning should (eventually) detect if we have have more current data. If we're syncing from scratch, I'm not sure how we'd avoid replaying older state.

  1. And since AAD is now empty, we don't know if it's actually the latest version and the client cannot verify and could get stale state

The idea is not to upgrade v0 clients but rather find a way to detect the storage version, and then use a non-empty aad for for all v1+ clients.

@ldk-reviews-bot
Copy link

🔔 4th Reminder

Hey @tankyleo! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 5th Reminder

Hey @tankyleo! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@tankyleo
Copy link
Contributor

tankyleo commented Sep 7, 2025

Previously, we omitted setting ChaCha20Poly1305RFC's aad field, which had the tag not commit to any particular key. This would allow a malicious VSS provider to substitute blobs stored under a different key without the client noticing.

Hmm if I substitute a blob stored under a different data_encryption_key, it looks like the client would notice ?

Ah ok we are talking about the key produced here in ldk-node, I understand now:

		let store_key = self.build_key(primary_namespace, secondary_namespace, key)?;

Copy link
Contributor

@tankyleo tankyleo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the comment about our drop implementation, everything else LGTM

@tnull tnull requested review from jurvis and tankyleo September 9, 2025 08:53

impl Drop for KeyObfuscator {
fn drop(&mut self) {
// Zeroize the owned keys

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not work - Rust structs are by definition movable within memory via memcpy so overwriting the current copy won't accomplish much. Also, the whole point of zeroizing is because C bugs allow for arbitrary memory reading due to unsafety, this does not (materially) apply in Rust.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaah, I recall a similar discussion over at rust-bitcoin. IIRC the opinions were mixed, but if we're not positive zeroizing doesn't gain us anything here, probably best to just leave it out for now to not create a false sense of security. Dropping the commit.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, IMO its not worth bothering in Rust.

@tnull tnull force-pushed the 2025-08-include-key-aad branch from 0abb475 to 6873274 Compare September 10, 2025 09:47
@tnull tnull requested a review from TheBlueMatt September 10, 2025 09:47
/// [`PutObjectRequest`]: crate::types::PutObjectRequest
pub fn build(&self, input: Vec<u8>, version: i64) -> Storable {
pub fn build(
&self, input: Vec<u8>, version: i64, data_encryption_key: &[u8; 32], aad: &[u8],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the aad is required to have a specific (set of?) lengths or we panic, we shouldn't expose a generic any-length API publicly as a result, IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the aad is required to have a specific (set of?) lengths or we panic, we shouldn't expose a generic any-length API publicly as a result, IMO.

No, AFAICT that's only the key and the nonce?:

			assert!(key.len() == 16 || key.len() == 32);
			assert!(nonce.len() == 12);

For the aad we even include the length in the mac after padding has been added:

			self.mac.input(&self.aad_len.to_le_bytes());
			self.mac.input(&(self.data_len as u64).to_le_bytes());

Or maybe I'm missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, it might still be debatable if we simply want to take the storage_key as an argument here and prescribe the specific commitment scheme. Just seemed a bit more flexible to leave it to the user, in case they wanted not to do it for some reason or needed something else committed to.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops sorry no you're right, I was thinking of the nonce. I think this API is fine.

@ldk-reviews-bot
Copy link

🔔 1st Reminder

Hey @jurvis @tankyleo! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 1st Reminder

Hey @jurvis @tankyleo! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@tnull tnull merged commit 28976f4 into lightningdevkit:main Sep 12, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants