-
Notifications
You must be signed in to change notification settings - Fork 299
context: introduce new global context API with rerandomization #806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
cc @Kixunil @TheBlueMatt @dpc @JeremyRubin (if you still care about this) @tcharding This PR is a bit nasty but I scoped it to "just the hard parts" and the rest of it should be cathartic API changes that Tobin and I should be able to power through on our own. |
879205e
to
0026677
Compare
I don't particularly care anymore, but from memory: part of what makes this confusing is in WASM contexts we explicitly want "perfect" determinism, so we don't want to give any external observables. probably the only way to do this is to hijack getrandom() and give a entropy transcript through it, and be sure we're not running multi-threaded anything, or we want to e.g. call to a host API to perform signature operations. what we largely want to avoid is that an initialization somewhere from some far-flung library code, or by enabling a feature, means that all of the sudden we do a getrandom call for a context initialization when we're just doing verification work. cheers, jeremy |
0026677
to
7e26f33
Compare
Force-pushed to update tons of unit tests, and to update the recovery API. The essential code is unchanged. @JeremyRubin this new code only ever calls |
a66b75a
to
dbb164f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On dbb164f successfully ran local tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mad, I enjoyed reviewing that. Feels good to have to think hard for a change. Only problem I found was a few commas.
src/context.rs
Outdated
/// Borrows the global context and do some operation on it. | ||
/// | ||
/// If provided, after the operation is complete, [`rerandomize_global_context`] | ||
/// is called on the context. If you have some random data available, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// is called on the context. If you have some random data available, | |
/// is called on the context. If you have some random data available. |
In 9330ccd and again in the previous commit for the std
version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, this sentence is just totally broken. It should say "If some random data is provided, then after the operation is complete, [rerandomize_global_context
] is called."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, hopefully one of the other ones only needed the full stop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK dbb164f
dbb164f
to
d2cbb91
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On dbb164f successfully ran local tests
d2cbb91
to
fc83079
Compare
src/context.rs
Outdated
/// If `randomize_seed` is provided, it is used to call [`rerandomize_global_context`] | ||
/// the context after the operation is complete. If it is not provided, randomization | ||
/// is skipped. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lolz, looks like you've done a Tobin here and put something different in the editor to what was in your brain.
/// If `randomize_seed` is provided, it is used to call [`rerandomize_global_context`] | |
/// the context after the operation is complete. If it is not provided, randomization | |
/// is skipped. | |
/// If `randomize_seed` is provided, it is used to call [`rerandomize_global_context`] | |
/// to rerandomize the context after the operation is complete. If it is not provided, | |
/// randomization is skipped. |
fc83079
to
8870a13
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 8870a13
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On fc83079 successfully ran local tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On 8870a13 successfully ran local tests
Before I review this, just a note regarding entropy: I've recently learned that So what we really want is something like: #[cfg(all("getrandom", not("std"))]
let rng = rand::rngs::OsRng;
#[cfg("std")]
let rng = rand::thread_rng(); I'm well aware that this might be slower but as I understand it this only needs to be done once, when initializing the library. |
@Kixunil ok, awesome! I would like to improve the entropy generation in a followup PR, and still review this one as-is so we can get the synchronization logic and the API nailed down. Great point that if we have slow sources of entropy, it's OK to do them once to get 32 bytes or so and then we can stretch it forever. |
@Kixunil I hit "resolve" on the bulk of your comments (everything except those that have open questions that aren't related to this code). I refactored everything and basically rewrote the entire PR. The logic is the same but it will probably require a total re-review. |
Oh, and I added miri tests on the spinlock you can run with I guess I should add those to CI. I'd like to do that in a followup or separate PR because I need to go through the whole library's unit tests and explicitly whitelist/blacklist the ones that work with MIri. |
59bd1b8
to
4ce914b
Compare
This introduces the new global context API when std is enabled, using thread locals to allow rerandomizing the context after sensitive operations. As you can see, even the simple case involves some unsafe code and is a bit tricky to implement.
Introduces a spinlocking mutex that only offers access to its internals via a "try_unlock" method which spins a small finite number of times before unlocking. We use a spinlock because, in the minimal dependency set we support, there are no synchronization primitives except atomics, so that's the only form of mutex we can create. However, there are a number of problems with spinlocks -- see this article (from Kix in rust-bitcoin#346) for some of them: https://matklad.github.io/2020/01/02/spinlocks-considered-harmful.html To avoid these problems, we give up after a few spins. The way we will use this in the context object is: 1. When initializing the global context, if we can't get the lock, we just initialize a new stack-local context and use that. (A parallel thread must be initializing the context, which is wasteful but harmless.) 2. Once we unlock the context, we copy it onto the stack and re-lock it in order to minimize the time holding the lock. (The exception is during initialization where we hold the lock for the whole initialization, in the hopes that other threads will block on us instead of doing their own initialization.) If we rerandomize, we do this on the stack-local copy and then only re-lock to copy it back. 3. If we fail to get the lock to copy the rerandomized context back, we just don't copy it. The result is that we wasted some time rerandomizing without any benefit, which is not the end of the world. The spinlock was implemented with help from ChatGPT o3 and the unit tests with help from Claude 4 (though in both cases I did significant refactoring and review by hand).
4ce914b
to
cdbcf6c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On d5a61b8 successfully ran local tests
See the previous commit description for a high-level overview of the spinlocking logic used in this commit. Next steps are: 1. Update the API to use this logic everywhere; on validation functions we don't need to rerandomize and on signing/keygen functions we should rerandomize using our secret key material. 2. Remove the existing "no context" API, along with the global-context and global-context-less-secure features. 3. Improve our entropy story on nostd by scraping system time or CPU jitter or something and hashing that into our rerandomization. We don't need to do a great job here -- if we can get even a bit or two per signature, that will completely BTFO a timing attacker.
…d FromStr Since we have a no-feature-gate global context now, we can remove the feature gates from these things. No API change (other than an expansion of the API for users without features enabled).
Sometihng like half the tests in this crate are gated on "rand", most of which are for dumb reasons (we are generating random keys from the thread rng). By adding a non-feature=rand "random key generator" we can enable these tests even without the rand feature. We typically also have a gate on "std", which is needed to get the thread rng, but in some cases this is the *only* reason to have a std gate. So by eliminating the rand requirement we can make tests work in nostd. We do this by implementing a parallel LCG which is obviously not cryptographic but is fine for testing. In the LLM-generated tests in musig2.rs we have some rand feature gates for literally no reason at all :/. My bad. In addition to dramatically increasing nostd test coverage, the new "generate random keys" function also gives us an opportunity to use the new global context API including rerandomization.
This updates a couple functions, and their associated unit tests (which no longer need any std/alloc/global-context feature gates). This runs clean in valgrind, providing some evidence that my new code is sound.
This API is basically unused except for some niche or legacy applications, so I feel comfortable breaking it pretty dramatically. Move all the Secp256k1 functions onto RecoverableSignature and use self/Self as appropriate. Leave the stupid ecdsa_recoverable names even though they are even more redundant, because this module is basically in maintenance mode. We only do these changes since we'll be forced to once we drop the Secp256k1 object.
cdbcf6c
to
45c264e
Compare
OK, I'll try to review it tomorrow. (I'm not sure if you get notifications that I replied to some of your hidden comments; I didn't but found them by chance.) |
Thanks! I did not. Checked them now. In future I won't try hiding conversations like this, even when I do a total PR rewrite. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On 45c264e successfully ran local tests
} | ||
|
||
impl<T> SpinLock<T> { | ||
/// Blocks until the lock is acquired, then returns an RAII guard. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Blocks until the lock is acquired, then returns an RAII guard. | |
/// Blocks until the lock is acquired, then returns an RAII guard. | |
/// It will spin up to a fixed number of iterations (currently 128) or | |
/// return None if the lock isn’t acquired in that time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify the behavior of SpinLock::try_lock()
in the docs.
It doesn’t strictly “block until acquired” – it might give up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yeah, I should fix that comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
..and rebase.
<= core::mem::size_of::<[AlignedType; MAX_PREALLOC_SIZE]>(), | ||
"prealloc size exceeds our guessed compile-time upper bound", | ||
); | ||
ffi::secp256k1_context_preallocated_clone(other, self.buf()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it necessary to set self.1
?
ffi::secp256k1_context_preallocated_clone(other, self.buf()); | |
let new_ctx = ffi::secp256k1_context_preallocated_clone(other, self.buf()); | |
self.1 = Some(new_ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. This does not even type check.
/// | ||
/// The provided data will be mixed with the entropy from previous calls in a timing | ||
/// analysis resistant way. It is safe to directly pass secret data to this function. | ||
pub fn rerandomize_global_context(seed: &[u8; 32]) { with_raw_global_context(|_| {}, Some(seed)) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but how does this add to the entire library's worth of unit tests which exercise the nostd context object?
/// | ||
/// The provided data will be mixed with the entropy from previous calls in a timing | ||
/// analysis resistant way. It is safe to directly pass secret data to this function. | ||
pub fn rerandomize_global_context(seed: &[u8; 32]) { with_raw_global_context(|_| {}, Some(seed)) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the following tests useful ?
pub fn rerandomize_global_context(seed: &[u8; 32]) { with_raw_global_context(|_| {}, Some(seed)) } | |
pub fn rerandomize_global_context(seed: &[u8; 32]) { with_raw_global_context(|_| {}, Some(seed)) } | |
#[cfg(all(test, not(feature = "std")))] | |
mod nostd_tests { | |
use super::*; | |
use crate::{Message, SecretKey, Secp256k1}; | |
#[test] | |
fn test_spinlock_fallback() { | |
// If we didn't get it, another test has it - that's fine! | |
// likely due to test parallelism | |
// we're already testing the fallback path! | |
let lock_acquired = SECP256K1.try_lock().is_some(); | |
// Generate test data | |
let msg = Message::from_digest(crate::test_random_32_bytes()); | |
let sk = SecretKey::test_random(); | |
// This should work whether or not we got the lock | |
let sig = crate::with_global_context( | |
|secp: &Secp256k1<crate::SignOnly>| secp.sign_ecdsa(msg, &sk), | |
Some(&sk.secret_bytes()), | |
); | |
let pk = crate::with_global_context( | |
|secp: &Secp256k1<crate::SignOnly>| crate::key::PublicKey::from_secret_key(secp, &sk), | |
None, | |
); | |
let result = crate::with_global_context( | |
|secp: &Secp256k1<crate::VerifyOnly>| secp.verify_ecdsa(&sig, msg, &pk), | |
None, | |
); | |
assert!(result.is_ok(), "operations should work with or without lock"); | |
if lock_acquired { | |
// If we got the lock, let's test that operations still work | |
// even while we're holding it (they'll use fallback) | |
let msg2 = Message::from_digest([42u8; 32]); | |
let sk2 = SecretKey::from_byte_array([2u8; 32]).unwrap(); | |
let sig2 = crate::with_global_context( | |
|secp: &Secp256k1<crate::SignOnly>| secp.sign_ecdsa(msg2, &sk2), | |
None, | |
); | |
assert_eq!(sig2.serialize_compact(), sig2.serialize_compact()); | |
} | |
} | |
#[test] | |
fn test_explicit_fallback() { | |
// A more direct test: try to acquire the lock multiple times | |
// to force at least one fallback | |
let mut guards = Vec::new(); | |
// Try to acquire the lock multiple times | |
for _ in 0..3 { | |
if let Some(guard) = SECP256K1.try_lock() { | |
guards.push(guard); | |
} | |
} | |
// At most one guard should succeed | |
assert!(guards.len() <= 1, "spinlock should prevent multiple locks"); | |
// Now do operations - if we have a guard, these will use fallback | |
let msg = Message::from_digest(crate::test_random_32_bytes()); | |
let sk = SecretKey::test_random(); | |
let sig = crate::with_global_context( | |
|secp: &Secp256k1<crate::SignOnly>| secp.sign_ecdsa(msg, &sk), | |
Some(&sk.secret_bytes()), | |
); | |
let pk = crate::with_global_context( | |
|secp: &Secp256k1<crate::SignOnly>| crate::key::PublicKey::from_secret_key(secp, &sk), | |
None, | |
); | |
let result = crate::with_global_context( | |
|secp: &Secp256k1<crate::VerifyOnly>| secp.verify_ecdsa(&sig, msg, &pk), | |
None, | |
); | |
assert!(result.is_ok()); | |
// guards drop here | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, neat! Yeah, I like these ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add to src/key.rs
to test new APIs that don't require context ?
#[test]
fn test_api_without_context() {
// Test PublicKey::negate without context
let (_sk, pk) = crate::test_random_keypair();
let negated = pk.negate();
let double_negated = negated.negate();
assert_eq!(pk, double_negated);
// Test PublicKey::add_exp_tweak without context
let tweak = Scalar::from_be_bytes([3u8; 32]).unwrap();
let tweaked = pk.add_exp_tweak(&tweak).unwrap();
assert_ne!(pk, tweaked);
// Test Keypair::from_seckey_byte_array without context
let kp = Keypair::from_seckey_byte_array([4u8; 32]).unwrap();
assert_eq!(kp.secret_bytes(), [4u8; 32]);
}
/// | ||
/// The provided data will be mixed with the entropy from previous calls in a timing | ||
/// analysis resistant way. It is safe to directly pass secret data to this function. | ||
pub fn rerandomize_global_context(seed: &[u8; 32]) { with_raw_global_context(|_| {}, Some(seed)) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but how does this add to the entire library's worth of unit tests which exercise the nostd context object?
/// | ||
/// The provided data will be mixed with the entropy from previous calls in a timing | ||
/// analysis resistant way. It is safe to directly pass secret data to this function. | ||
pub fn rerandomize_global_context(seed: &[u8; 32]) { with_raw_global_context(|_| {}, Some(seed)) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test name says that it checks that rerandomization "actually changes the context" but the test does not verify this, and there is no way to verify it.
/// | ||
/// The provided data will be mixed with the entropy from previous calls in a timing | ||
/// analysis resistant way. It is safe to directly pass secret data to this function. | ||
pub fn rerandomize_global_context(seed: &[u8; 32]) { with_raw_global_context(|_| {}, Some(seed)) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unnecessary. clone_into
is a one-liner wrapping an upstream method which is already tested there.
/// | ||
/// The provided data will be mixed with the entropy from previous calls in a timing | ||
/// analysis resistant way. It is safe to directly pass secret data to this function. | ||
pub fn rerandomize_global_context(seed: &[u8; 32]) { with_raw_global_context(|_| {}, Some(seed)) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this different from your above "are these tests useful?"
/// | ||
/// The provided data will be mixed with the entropy from previous calls in a timing | ||
/// analysis resistant way. It is safe to directly pass secret data to this function. | ||
pub fn rerandomize_global_context(seed: &[u8; 32]) { with_raw_global_context(|_| {}, Some(seed)) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
....there are multiple assertions for this already in the code.
@@ -468,22 +427,20 @@ mod tests { | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are already tests for this.
Did you do this review with a LLM? Please don't do this. I am going to start ignoring your review comments if they are consistently wrong and take time to check.
Thanks for specifying the commit ID you were reviewing in this one. Please do that for all your comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly not sure I understood all the rerandomization comments. It seems weird that now any of the methods that are internally using the global context are doing some non-trivial locking, waiting, rerandomizing, cloning, etc as "side-effects". Just two general comments on the commits:
-
The commit message for
context: introduce global rerandomizable context (std only)
mentions unsafe code, but there is no unsafe code in the commit. -
You mention "Improve our entropy story on nostd by scraping system time or CPU jitter or something and hashing that into our rerandomization." as next step in a commit message, but I can't find that anywhere.
Other than that, I read everything, didn't see anything that looked problematic, but I would feel hesitant to give it a full-on ACK. Maybe this would be a good release to make a RC for first.
|
||
#[cfg(test)] | ||
impl SpinLock<u64> { | ||
pub const fn new() -> Self { Self { flag: AtomicBool::new(false), data: UnsafeCell::new(100) } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this 100? just a test value? Couldn't you just do a new(v: u64)
and keep the 100 for within the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's not a bad idea. Would make the tests more self-contained.
Ran local tests on 45c264e. |
As discussed in #388 and its parent issues, when
std
is enabled we have a fairly straightforward way to enable global contexts. We use thread-local variables and on every access we rerandomize them. When therand
crate is also available the situation is even better, because we don't need to think too hard about where to get entropy from.In the nostd case things are harder. We have no thread locals and basically no synchronization primitives except atomics, which can be used to implement spinlocks but nothing else. Kix has argued strongly against spinlocks but in the following several messages we came to a solution in which do a "soft spinlock" where after a couple iterations we just give up and don't rerandomize.
Kix suggested adding some logging and debugging facilities, which I did not include in my solution here. We can add those in a followup.
Kix also suggested setting the maximum spin count to 0, on the theory that in most cases there will never be any contention except in cases of reentrancy, and in that case spinning is pointless. I think it should be higher than zero to help in situations where there really are multiple threads. I set it to 128 which shouldn't be a noticable (or even measurable) burden even in the case where the spinning is pointless.
This mostly resolves #388. To completely resolve that issue, we need to:
Once we've done that, we will be much better-equipped to address #346. To do that, we should attempt to scrape together some entropy even on nostd without the rand crate. I believe we can do this by reading the system time and CPU jitter. We don't need to do a very good job for this to work; even a bit or two of entropy on each signature will BTFO an attacker attempting to learn timing information from multiple signatures.