Conversation
bb4db82 to
e63e5d0
Compare
e63e5d0 to
892eeea
Compare
5486961 to
451a5dc
Compare
preston-evans98
left a comment
There was a problem hiding this comment.
I think this high level approach is probably workable, but there are some scenarios to think.
Analysis
-
What happens if the timing oracle goes offline (i.e. because the preferred sequencer breaks)? In that case, the buckets will fill without limit until the sequencer comes back online. This will either DOS the rollup (if gas charges for storage are zero or too low) or spike gas usage for honest user txs. Honest users can circumvent a gas spike by switching to an alternate uniqueness strategy, but there is no substitute for good gas pricing to prevent DOS.
-
What happens if the timing oracle is malicious? The timing oracle could theoretically pull off three attacks. It could rewind the clock, it could freeze or slow time, or it could jump forward in time.
- Rewinding the clock would be devastating, since it would allow replay of all historical txs. This is a new attack vector introduced in this PR. To mitigate it, we MUST enforce that the timestamp oracle is monotonic.
- Freezing time was analyzed above. That would degrade the utility of this uniquness strategy, but honest users could switch to a different strategy with only short term impacts.
- The oracle could wind the clock forward aggressively. This would essentially brick this uniqueness vector (all timestamps would be expired) but honest users could switch to a different uniqueness vector.
-
What can a malicious user do if the timing oracle behaves correctly? They can fill up buckets with as many txs as possible in 60 seconds by mining collisions (with only 16 bits of entropy, this is trivial).
-
DOS: Since the compute cost of colliding transactions is quadratic, this would allow them to "hog" compute on the chain if the paymaster was active, especially if rate limits were not in place at the API layer. Sophisticated users can probably circumvent the rate limits with sybils. The only effective defense for this attack is to disallow paymaster transactions above a certain gas cost.
-
Griefing: If users rely on standard gas costs, it may be possible to grief some transactions by filling up the bucket that their deduplicator would fall into. This is not very effective because there are
2^16buckets.
Required Mitigations
This PR introduces a few new attack vectors that need to be carefully mitigated.
- The Timing Oracle must enforce that timestamps are monotonic. IMO, that should be done before this PR is merged
- Gas costs for deduplication must be set appropriately. To mitigate DOS concerns, it's probably best to add an extra gas charge for deduplication beyond the state access charge. This charge should be super-linear in the size of the bucket and must be large enough to (1) make the cost of malicious transactions larger than the cost of any ordinary transaction AND (2) make sustating a DOS attack expensive in real world terms. That will ensure that any coarse-grained paymaster limits can reject malicious txs and that those paymaster restrictions are effective at preventing the attack
- The paymaster must enforce limits on the gas cost that it will cover to avoid subsidizing DOS attacks through this vector.
Conclusion
The tools we have available in the SDK to prevent abuse are very limited. With the new DOS vectors added in this PR, I think it's somewhat unlikely that non-experts will be able to set the gas parameters and paymaster limits correctly in this environment.
We should think very carefully about whether the tradeoffs here are worth the risk. If we do want to go ahead, this PR adds significant urgency to gas constant calibration.
| UniquenessData::Timestamp(ts) => { | ||
| self.mark_timestamp_tx_attempted(None, ts, transaction_hash, state) | ||
| } | ||
| UniquenessData::TimestampNonce(ts) => { | ||
| self.mark_timestamp_tx_attempted(Some(credential_id), ts, transaction_hash, state) | ||
| } |
There was a problem hiding this comment.
(non-blocking): flagging that this will be a chain hash breaking change
| /// Use the first two bytes as bucket value. | ||
| fn hash_to_bucket(transaction_hash: &TxHash) -> u16 { | ||
| let ptr: &[u8] = transaction_hash.as_ref(); | ||
| unsafe { core::ptr::read_unaligned(ptr.as_ptr() as *const u16) } |
There was a problem hiding this comment.
We have a strong norm of avoiding unsafe unless absolutely necessary. It looks to me like we can just copy_from_slice and from_le_bytes to get the same behavior but with no unsoundness on big-endian platforms (admittedly, those are very rare!) and no headaches around alignment
| "Time-expired transaction for credential_id {credential_id}: hash {transaction_hash:}" | ||
| ); | ||
|
|
||
| anyhow::ensure!((expires as u128) < current_time + 60_000_000, |
There was a problem hiding this comment.
nit: let's extract this into a const
| ); | ||
|
|
||
| anyhow::ensure!((expires as u128) < current_time + 60_000_000, | ||
| "Future transaction for credential_id {credential_id}: hash {transaction_hash:} needs to wait." |
There was a problem hiding this comment.
nit: it's probably helpful to include the two timestamps that lead to the rejection in this message
| ); | ||
| anyhow::ensure!( | ||
| expires > self.nonces.get(credential_id, state)?.unwrap_or_default(), | ||
| "Nonce-expired transaction for credential_id {credential_id}: hash {transaction_hash:} is invalidated" |
There was a problem hiding this comment.
nit: it's probably helpful to include the two timestamps that lead to the rejection in this message
| if let Some(credential_id) = credential_id { | ||
| self.nonces.set(credential_id, &expires, state)?; | ||
| } |
There was a problem hiding this comment.
Anyone who uses this functionality is vulnerable to replay attacks. What's the design goal here?
There was a problem hiding this comment.
Oh, I see this is only intended to increment the nonce. this is probably fine then, but let's add documentation explaining why this is safe
| /// Buckets of transactions with their expiry timestamp. The | ||
| /// buckets are taken from the first bytes of the TxHash. | ||
| #[state] | ||
| pub(crate) timestamps: StateMap<u16, Vec<(TxHash, u64)>>, |
There was a problem hiding this comment.
(non-blocking): Noting for other reviewers that this looks non-breaking since it's the last #[state] item in the module
Description
Summary of the changes...
CHANGELOG.mdwith a new entry if my PR makes any breaking changes or fixes a bug. If my PR removes a feature or changes its behavior, I provide help for users on how to migrate to the new behavior.Cargo.tomlchanges before opening the PRs. (Are all new dependencies necessary? Is any module dependency leaked into the full-node (hint: it shouldn't)?)Linked Issues
Testing
Describe how these changes were tested. If you've added new features, have you added unit tests?
Docs
Describe where this code is documented. If it changes a documented interface, have the docs been updated?
Rendered docs are available at
https://sovlabs-ci-rustdoc-artifacts.us-east-1.amazonaws.com/<BRANCH_NAME>/index.html