Skip to content

Conversation

apoelstra
Copy link
Member

As discussed in #388 and its parent issues, when std is enabled we have a fairly straightforward way to enable global contexts. We use thread-local variables and on every access we rerandomize them. When the rand crate is also available the situation is even better, because we don't need to think too hard about where to get entropy from.

In the nostd case things are harder. We have no thread locals and basically no synchronization primitives except atomics, which can be used to implement spinlocks but nothing else. Kix has argued strongly against spinlocks but in the following several messages we came to a solution in which do a "soft spinlock" where after a couple iterations we just give up and don't rerandomize.

Kix suggested adding some logging and debugging facilities, which I did not include in my solution here. We can add those in a followup.

Kix also suggested setting the maximum spin count to 0, on the theory that in most cases there will never be any contention except in cases of reentrancy, and in that case spinning is pointless. I think it should be higher than zero to help in situations where there really are multiple threads. I set it to 128 which shouldn't be a noticable (or even measurable) burden even in the case where the spinning is pointless.

This mostly resolves #388. To completely resolve that issue, we need to:

  1. Update the API to use this logic everywhere; on validation functions we don't need to rerandomize and on signing/keygen functions we should rerandomize using our secret key material.
  2. Remove the existing "no context" API, along with the global-context and global-context-less-secure features.

Once we've done that, we will be much better-equipped to address #346. To do that, we should attempt to scrape together some entropy even on nostd without the rand crate. I believe we can do this by reading the system time and CPU jitter. We don't need to do a very good job for this to work; even a bit or two of entropy on each signature will BTFO an attacker attempting to learn timing information from multiple signatures.

@apoelstra
Copy link
Member Author

cc @Kixunil @TheBlueMatt @dpc @JeremyRubin (if you still care about this) @tcharding

This PR is a bit nasty but I scoped it to "just the hard parts" and the rest of it should be cathartic API changes that Tobin and I should be able to power through on our own.

@apoelstra apoelstra force-pushed the 2025-06_context branch 2 times, most recently from 879205e to 0026677 Compare June 21, 2025 14:25
@JeremyRubin
Copy link

I don't particularly care anymore, but from memory:

part of what makes this confusing is in WASM contexts we explicitly want "perfect" determinism, so we don't want to give any external observables. probably the only way to do this is to hijack getrandom() and give a entropy transcript through it, and be sure we're not running multi-threaded anything, or we want to e.g. call to a host API to perform signature operations.

what we largely want to avoid is that an initialization somewhere from some far-flung library code, or by enabling a feature, means that all of the sudden we do a getrandom call for a context initialization when we're just doing verification work.

cheers,

jeremy

@apoelstra
Copy link
Member Author

Force-pushed to update tons of unit tests, and to update the recovery API. The essential code is unchanged.

@JeremyRubin this new code only ever calls getrandom if the user enables the rand feature. Maybe we want to treat "compiling for wasm" the same as "rand not enabled". But I think we can deal with that in a followup PR.

@apoelstra apoelstra force-pushed the 2025-06_context branch 3 times, most recently from a66b75a to dbb164f Compare June 21, 2025 22:19
Copy link
Member Author

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On dbb164f successfully ran local tests

Copy link
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mad, I enjoyed reviewing that. Feels good to have to think hard for a change. Only problem I found was a few commas.

src/context.rs Outdated
/// Borrows the global context and do some operation on it.
///
/// If provided, after the operation is complete, [`rerandomize_global_context`]
/// is called on the context. If you have some random data available,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// is called on the context. If you have some random data available,
/// is called on the context. If you have some random data available.

In 9330ccd and again in the previous commit for the std version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, this sentence is just totally broken. It should say "If some random data is provided, then after the operation is complete, [rerandomize_global_context] is called."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, hopefully one of the other ones only needed the full stop.

tcharding
tcharding previously approved these changes Jun 25, 2025
Copy link
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK dbb164f

Copy link
Member Author

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On dbb164f successfully ran local tests

src/context.rs Outdated
Comment on lines 112 to 114
/// If `randomize_seed` is provided, it is used to call [`rerandomize_global_context`]
/// the context after the operation is complete. If it is not provided, randomization
/// is skipped.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lolz, looks like you've done a Tobin here and put something different in the editor to what was in your brain.

Suggested change
/// If `randomize_seed` is provided, it is used to call [`rerandomize_global_context`]
/// the context after the operation is complete. If it is not provided, randomization
/// is skipped.
/// If `randomize_seed` is provided, it is used to call [`rerandomize_global_context`]
/// to rerandomize the context after the operation is complete. If it is not provided,
/// randomization is skipped.

tcharding
tcharding previously approved these changes Jun 26, 2025
Copy link
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 8870a13

Copy link
Member Author

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On fc83079 successfully ran local tests

Copy link
Member Author

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On 8870a13 successfully ran local tests

@Kixunil
Copy link
Collaborator

Kixunil commented Jun 26, 2025

Before I review this, just a note regarding entropy: I've recently learned that std is not actually needed. The getrandom crate can get entropy even on bare-metal if you provide it with a handler. For JS, it specifically has the js feature.

So what we really want is something like:

#[cfg(all("getrandom", not("std"))]
let rng = rand::rngs::OsRng;
#[cfg("std")]
let rng = rand::thread_rng();

I'm well aware that this might be slower but as I understand it this only needs to be done once, when initializing the library.

@apoelstra
Copy link
Member Author

@Kixunil ok, awesome! I would like to improve the entropy generation in a followup PR, and still review this one as-is so we can get the synchronization logic and the API nailed down.

Great point that if we have slow sources of entropy, it's OK to do them once to get 32 bytes or so and then we can stretch it forever.

@apoelstra
Copy link
Member Author

Addressed Tobin's and Steven's comments and rebased.

@apoelstra apoelstra force-pushed the 2025-06_context branch 2 times, most recently from da9fbef to 44d9a51 Compare August 24, 2025 23:52
Copy link
Member Author

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On 44d9a51 successfully ran local tests

@stevenroose
Copy link
Contributor

ack diff from my last comment to 44d9a51 and ran local tests successfully

@stevenroose
Copy link
Contributor

@Kixunil @tcharding or who else could review this PR?

@stevenroose
Copy link
Contributor

it's kinda unfortunate that musig2 stuff is blocked on this (cuz I think it's not really related)

Copy link
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 44d9a51

@apoelstra
Copy link
Member Author

@tcharding I'm going to start merging things here with only your ACK from now on. I will try to be conscientious about crypto-heavy things where you're not an expert and we should pull somebody else in. But it's not reasonable for this crate to be basically stalled due to lack of maintainers.

This PR in particular I think is fine to go in (especially since Kix reviewed an early revision of it) -- and same with the followups which will clean up the API using the new context stuff. Which will be much less technically heavy.

Although I am going to rebase it on master since #840 to make sure CI passes, so I'll need a re-ACK.

Will make it easier to introduce submodules.
This introduces the new global context API when std is enabled, using
thread locals to allow rerandomizing the context after sensitive
operations. As you can see, even the simple case involves some unsafe
code and is a bit tricky to implement.
Introduces a spinlocking mutex that only offers access to its internals
via a "try_unlock" method which spins a small finite number of times
before unlocking. We use a spinlock because, in the minimal dependency
set we support, there are no synchronization primitives except atomics,
so that's the only form of mutex we can create.

However, there are a number of problems with spinlocks -- see this article
(from Kix in rust-bitcoin#346) for some of them:

https://matklad.github.io/2020/01/02/spinlocks-considered-harmful.html

To avoid these problems, we give up after a few spins. The way we will
use this in the context object is:

1. When initializing the global context, if we can't get the lock, we
   just initialize a new stack-local context and use that. (A parallel
   thread must be initializing the context, which is wasteful but
   harmless.)
2. Once we unlock the context, we copy it onto the stack and re-lock it
   in order to minimize the time holding the lock. (The exception is
   during initialization where we hold the lock for the whole
   initialization, in the hopes that other threads will block on us
   instead of doing their own initialization.) If we rerandomize, we do
   this on the stack-local copy and then only re-lock to copy it back.
3. If we fail to get the lock to copy the rerandomized context back, we
   just don't copy it. The result is that we wasted some time
   rerandomizing without any benefit, which is not the end of the world.

The spinlock was implemented with help from ChatGPT o3 and the unit
tests with help from Claude 4 (though in both cases I did significant
refactoring and review by hand).
See the previous commit description for a high-level overview of the
spinlocking logic used in this commit.

Next steps are:

1. Update the API to use this logic everywhere; on validation functions
   we don't need to rerandomize and on signing/keygen functions we
   should rerandomize using our secret key material.
2. Remove the existing "no context" API, along with the global-context
   and global-context-less-secure features.
3. Improve our entropy story on nostd by scraping system time or CPU
   jitter or something and hashing that into our rerandomization. We
   don't need to do a great job here -- if we can get even a bit or two
   per signature, that will completely BTFO a timing attacker.
…d FromStr

Since we have a no-feature-gate global context now, we can remove the
feature gates from these things. No API change (other than an expansion
of the API for users without features enabled).
Sometihng like half the tests in this crate are gated on "rand", most of
which are for dumb reasons (we are generating random keys from the
thread rng). By adding a non-feature=rand "random key generator" we can
enable these tests even without the rand feature.

We typically also have a gate on "std", which is needed to get the
thread rng, but in some cases this is the *only* reason to have a std
gate. So by eliminating the rand requirement we can make tests work in
nostd. We do this by implementing a parallel LCG which is obviously not
cryptographic but is fine for testing.

In the LLM-generated tests in musig2.rs we have some rand feature gates
for literally no reason at all :/. My bad.

In addition to dramatically increasing nostd test coverage, the new
"generate random keys" function also gives us an opportunity to use the
new global context API including rerandomization.
This updates a couple functions, and their associated unit tests (which
no longer need any std/alloc/global-context feature gates). This runs
clean in valgrind, providing some evidence that my new code is sound.
This API is basically unused except for some niche or legacy
applications, so I feel comfortable breaking it pretty dramatically.

Move all the Secp256k1 functions onto RecoverableSignature and use
self/Self as appropriate.

Leave the stupid ecdsa_recoverable names even though they are even more
redundant, because this module is basically in maintenance mode. We only
do these changes since we'll be forced to once we drop the Secp256k1
object.
Copy link
Member Author

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On 19cfe16 successfully ran local tests

Copy link
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 19cfe16

@tcharding
Copy link
Member

I'm going to start merging things here with only your ACK from now on. I will try to be conscientious about crypto-heavy things where you're not an expert and we should pull somebody else in.

No worries, and like I mentioned before if it would make my review better for some particular PR to first read up on some cryptography theory just let me know, happy to dig in if there is a benefit.

@apoelstra apoelstra merged commit 6ca9f58 into rust-bitcoin:master Sep 1, 2025
28 checks passed
@apoelstra apoelstra deleted the 2025-06_context branch September 1, 2025 13:39
@tcharding
Copy link
Member

Props man, this one has been a while in the making huh.

chain-forgexcr45 added a commit to chain-forgexcr45/rust-secp256k1 that referenced this pull request Sep 28, 2025
…context API with rerandomization

19cfe160d85d5ed385e5aed4779c27890e749801 recovery: rewrite API to not use context objects (Andrew Poelstra)
4f600dbcce04fe8b0285449d464afd3fc0059db8 key: update a couple arbitrary API functions to no longer take a context (Andrew Poelstra)
362495bf3efbef7746b63d527e4cc9fe1a753f1a test: remove a ton of rand feature-gating (Andrew Poelstra)
5fa32b33e2ed3c72ed1c53a04f998ce6dcb06b01 key: remove std/alloc/global-context gates from serde::deserialize and FromStr (Andrew Poelstra)
0e429502b2e99eeb187347c4b88c5a89244f6347 context: add nostd version of global context (Andrew Poelstra)
979aa1a0a76822cc82a97cd0cfe1976c73f2a993 context: introduce spinlock that gives up after a few iterations (Andrew Poelstra)
9d548728e84687a8a650843e8cdc2b9f46ea3dd7 context: introduce global rerandomizable context (std only) (Andrew Poelstra)
1e45d4cd93ae848c614db76595526cd601a424ee context: rename src/context.rs to src/context/mod.rs (Andrew Poelstra)

Pull request description:

  As discussed in #388 and its parent issues, when `std` is enabled we have a fairly straightforward way to enable global contexts. We use thread-local variables and on every access we rerandomize them. When the `rand` crate is also available the situation is even better, because we don't need to think too hard about where to get entropy from.
  
  In the nostd case things are harder. We have no thread locals and basically no synchronization primitives except atomics, which can be used to implement spinlocks but nothing else.  [Kix has argued strongly against spinlocks](rust-bitcoin/rust-secp256k1#346 (comment)) but in the [following several messages](rust-bitcoin/rust-secp256k1#346 (comment)) we came to a solution in which do a "soft spinlock" where after a couple iterations we just give up and don't rerandomize.
  
  Kix suggested adding some logging and debugging facilities, which I did not include in my solution here. We can add those in a followup.
  
  Kix also suggested setting the maximum spin count to 0, on the theory that in most cases there will never be any contention except in cases of reentrancy, and in that case spinning is pointless. I think it should be higher than zero to help in situations where there really are multiple threads. I set it to 128 which shouldn't be a noticable (or even measurable) burden even in the case where the spinning is pointless.
  
  This mostly resolves #388. To completely resolve that issue, we need to:
  
  1. Update the API to use this logic everywhere; on validation functions we don't need to rerandomize and on signing/keygen functions we  should rerandomize using our secret key material.
  2. Remove the existing "no context" API, along with the global-context  and global-context-less-secure features.
  
  Once we've done that, we will be much better-equipped to address #346. To do *that*, we should attempt to scrape together some entropy even on nostd without the rand crate. I believe we can do this by reading the system time and CPU jitter. We don't need to do a very good job for this to work; even a bit or two of entropy on each signature will BTFO an attacker attempting to learn timing information from multiple signatures.


ACKs for top commit:
  tcharding:
    ACK 19cfe160d85d5ed385e5aed4779c27890e749801


Tree-SHA512: 5b0be1472ef7a52221a01c141ac58f080c85f954515c567e2ecba6549f2d970996a0f7ce3c5349c2391b1eee3b504b695efdddf86a5cc70ab411dd5f3a40704b
william2332-limf added a commit to william2332-limf/rust-secp256k1 that referenced this pull request Oct 2, 2025
…context API with rerandomization

19cfe160d85d5ed385e5aed4779c27890e749801 recovery: rewrite API to not use context objects (Andrew Poelstra)
4f600dbcce04fe8b0285449d464afd3fc0059db8 key: update a couple arbitrary API functions to no longer take a context (Andrew Poelstra)
362495bf3efbef7746b63d527e4cc9fe1a753f1a test: remove a ton of rand feature-gating (Andrew Poelstra)
5fa32b33e2ed3c72ed1c53a04f998ce6dcb06b01 key: remove std/alloc/global-context gates from serde::deserialize and FromStr (Andrew Poelstra)
0e429502b2e99eeb187347c4b88c5a89244f6347 context: add nostd version of global context (Andrew Poelstra)
979aa1a0a76822cc82a97cd0cfe1976c73f2a993 context: introduce spinlock that gives up after a few iterations (Andrew Poelstra)
9d548728e84687a8a650843e8cdc2b9f46ea3dd7 context: introduce global rerandomizable context (std only) (Andrew Poelstra)
1e45d4cd93ae848c614db76595526cd601a424ee context: rename src/context.rs to src/context/mod.rs (Andrew Poelstra)

Pull request description:

  As discussed in #388 and its parent issues, when `std` is enabled we have a fairly straightforward way to enable global contexts. We use thread-local variables and on every access we rerandomize them. When the `rand` crate is also available the situation is even better, because we don't need to think too hard about where to get entropy from.
  
  In the nostd case things are harder. We have no thread locals and basically no synchronization primitives except atomics, which can be used to implement spinlocks but nothing else.  [Kix has argued strongly against spinlocks](rust-bitcoin/rust-secp256k1#346 (comment)) but in the [following several messages](rust-bitcoin/rust-secp256k1#346 (comment)) we came to a solution in which do a "soft spinlock" where after a couple iterations we just give up and don't rerandomize.
  
  Kix suggested adding some logging and debugging facilities, which I did not include in my solution here. We can add those in a followup.
  
  Kix also suggested setting the maximum spin count to 0, on the theory that in most cases there will never be any contention except in cases of reentrancy, and in that case spinning is pointless. I think it should be higher than zero to help in situations where there really are multiple threads. I set it to 128 which shouldn't be a noticable (or even measurable) burden even in the case where the spinning is pointless.
  
  This mostly resolves #388. To completely resolve that issue, we need to:
  
  1. Update the API to use this logic everywhere; on validation functions we don't need to rerandomize and on signing/keygen functions we  should rerandomize using our secret key material.
  2. Remove the existing "no context" API, along with the global-context  and global-context-less-secure features.
  
  Once we've done that, we will be much better-equipped to address #346. To do *that*, we should attempt to scrape together some entropy even on nostd without the rand crate. I believe we can do this by reading the system time and CPU jitter. We don't need to do a very good job for this to work; even a bit or two of entropy on each signature will BTFO an attacker attempting to learn timing information from multiple signatures.


ACKs for top commit:
  tcharding:
    ACK 19cfe160d85d5ed385e5aed4779c27890e749801


Tree-SHA512: 5b0be1472ef7a52221a01c141ac58f080c85f954515c567e2ecba6549f2d970996a0f7ce3c5349c2391b1eee3b504b695efdddf86a5cc70ab411dd5f3a40704b
Copy link

@Salarmahmo Salarmahmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ashilsali62 QqqqqqqqqqQ

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Context randomization tracking issue

7 participants