SIMD-0363: Simple Alpenglow Clock#363
SIMD-0363: Simple Alpenglow Clock#363rogerANZA wants to merge 25 commits intosolana-foundation:mainfrom
Conversation
removed this paragraph (see discussion with ashwin)
|
Could you articulate how these clock conditions interact with fast leader handover / a changing parent + the fact that your approach still works under these conditions? Separately - with this SIMD, it may be worth including a "parent block (slot, hash)" field in the block footer in addition to the existing clock. This would be the "final parent" of the block, taking fast leader handover into account. Two good things happen as a result of this: (1) given the final parent, we can skip a block with a bad clock quickly on shred ingest rather than after replay, since the first shred of the last (2) repair in a block with an UpdateParent marker gets slightly better because now there are two shreds with the "final parent" that race each other EDIT: fixed an issue with comment (1) above. |
|
|
||
| In Alpenglow, the current block leader includes an updated integer clock value | ||
| (Rust system time in nanoseconds) in its block *b* in slot *s* in the block | ||
| footer. This value is bounded by the clock value in the parent block. |
There was a problem hiding this comment.
At what point should we get the system clock time? When it is the time to create our leader slot? When we actually send out the block footer? Or just refer to what the current clock sysvar says?
There was a problem hiding this comment.
You insert your system time when you produce the slice with the clock info in the block marker.
There was a problem hiding this comment.
I mean - what precisely do you mean by "system time when you produce the slice?"
E.g. is this:
- The beginning of block production for a slot?
- The end of block production for a slot, post block building?
- When we actually create the block footer, post block building, and pre-shredding for turbine dissemination?
The specific choice isn't really enforceable, although, for default implementation purposes / consistency, it's probably worth opining on the specifics in the SIMD.
There was a problem hiding this comment.
All of these are possible (and enforceable). This depends on how we organize meta-data in a block (i.e., our discussion in Chicago). If we may change the parent, then the header (the first slice) is the wrong place, but the footer (the last slice) should be alright.
There was a problem hiding this comment.
For now, it looks like nothing prevents a validator from reporting a timestamp that is off by +/- 50 ms, equating to an honest validator reporting a timestamp that is at the beginning of block production vs. maybe even somewhere in the middle of block production or even the end of block production.
To somewhat make the notion of a clock uniform across validator implementations, we should probably specify roughly at what point in a leader's production of a block the timestamp should be captured.
If this can somehow be enforced, that's even better imho.
There was a problem hiding this comment.
For now, it looks like nothing prevents a validator from reporting a timestamp that is off by +/- 50 ms...
Well, I would say it can even be up to 800 ms wrongly reported. Nothing we can do here.
But we established that the clock should be in the block footer, and we established that it should be captured before putting it it there, so what's still unclear?
There was a problem hiding this comment.
All of these are possible (and enforceable).
Well, I would say it can even be up to 800 ms wrongly reported. Nothing we can do here.
This part is a bit unclear - these two statements seem a bit contradictory. I agree with the second statement you're making re. being off by up to 800 ms, though.
But we established that the clock should be in the block footer, and we established that it should be captured before putting it it there, so what's still unclear?
The part that's unclear - at what exact part of the leader block production phase should we have leaders set the block timestamp?
Should the timestamp be associated with when the leader starts producing their block? When the leader conclusively knows what the block parent is? When the leader actually constructs the footer itself?
I'm aware that it's impossible to enforce the particular point in time, as you pointed out re. the 800 ms piece. This being said, it's worth having a sane default.
There was a problem hiding this comment.
"We assume that a correct leader inserts its correct local time..."
Maybe this is not clear enough? When you insert the time value in the footer you insert your local time at that point. This way we have the least (additional) skew. Note that we will still have a systematic lag since the slice then has be encoded, sent through Rotor/Turbine, and decoded on the receiver side. We might consider to take that systematic lag into account.
This should be mentioned in the fast leader handover SIMD in my opinion. (1) If a parent is changed and we consider everything before the parent change irrelevant, then we must include the clock again. (2) If the metadata of the block before the parent change is still relevant even after the parent change, then we don't include it again. In any case, we must have exactly 1 valid clock entry per block. But mentioning in the fast handover SIMD is more natural because it's the same exception for all the cases.
We could decide to have the clock always in the last slice (if we have other information that always goes in the last slice). If this is the way to go, then we can describe it here as well. Thanks for these good questions. |
What do you mean by "we must include the clock again?" The block footer, which is where the clock value will reside, will appear exactly once per block, after the final entry batch (see SIMD 0307). Could we mention SIMD 0307 + specify that it's specifically the
The clock won't always be in the last slice / FEC Set, but rather in the last block marker, which will eventually span multiple FEC Sets. This is why I'm suggesting that we place parent block information into the footer as part of this SIMD - if (1) the clock and (2) the final parent are both included within the same shred, we can run the clock check in this SIMD exactly once, directly in shred ingest, prior to replay. |
Sure, footer is okay. In fact, line 40 already says footer. |
I think there's a bit of confusion here. Yes, we're in agreement that the clock should go into the footer; e.g., SIMD 0307 specifies this. This isn't what I'm referring to, though. To clarify - at the moment, the only fields included in the block footer are:
I'm saying that, in this SIMD, we should consider proposing a new third field to the block footer of type |
I don't disagree. But this should rather be in SIMD 307. |
|
FYI - after a few conversations, looks like we'll be punting on placing a third field in the block footer denoting the parent. We plan on accomplishing this via other means (can elaborate if there's interest) in later work. |
Okay, now I understand your argument. Why only include the parent and the slot of the parent, and not also the actual time of that parent slot? Then our check is even easier because all the data is already there... Would you agree that this is a slippery slope? Including the same information a second time is problematic in my opinion. Now we additionally would have to check that the second inclusion of the information is equal to the first inclusion of the information. What if not? Then the block is still skipped? |
@OliverNChalk @qkniep - any further updates on this? In the interest of getting this SIMD across the finish line, I'd strongly prefer coming up with a solution to this in a follow-up SIMD, while keeping this SIMD about the design for when the chain is online. |
|
Works for me - happy for @qkniep to propose exposing additional information to DeFi apps in a subsequent proposal |
#### Problem and Summary of Changes Now that we can process `BlockComponent`s during replay, update the clock sysvar when we observe the block footer. This PR exposes the Alpenglow clock to `BlockComponentProcessor`, which we'll next modify to enforce the Alpenglow clock bounds outlined in solana-foundation/solana-improvement-documents#363.
|
FYI, we're implementing the clock checks in this PR: anza-xyz/alpenglow#597. @rogerANZA additional implementation details to include in this SIMD: The clock sysvar, as currently implemented, has second resolution. This SIMD proposes a clock with greater resolution (nanoseconds). Accordingly, we introduce an off-curve account where we store the clock value upon processing the block footer (at the end of a block). In particular, we calculate the off-curve account as follows: /// The off-curve account where we store the Alpenglow clock. The clock sysvar has seconds
/// resolution while the Alpenglow clock has nanosecond resolution.
static NANOSECOND_CLOCK_ACCOUNT: LazyLock<Pubkey> = LazyLock::new(|| {
let (pubkey, _) =
Pubkey::find_program_address(&[b"alpenclock"], &agave_feature_set::alpenglow::id());
pubkey
});where The clock sysvar is updated at the end of each block by simply dividing the nanosecond timestamp by |
|
Might be worth mentioning somewhere that for program developers using the clock sysvar the semantics have slightly changed: I hope no one is using the clock at this resolution to be impacted by this, but would be good to call it out anyway. |
#### Problem and Summary of Changes We enforce Alpenglow clock checks, outlined in detail here: solana-foundation/solana-improvement-documents#363.
ok. (for the record: I was told to go for nanoseconds because it's "easier." just computing with seconds would had been possible too.) |
| Since the clock sysvar computation is incompatible with Alpenglow, we need a new | ||
| design. In this document we suggest a simple replacement solution, which should | ||
| be accurate enough for all use cases. |
There was a problem hiding this comment.
Just throwing this out there - should we consider deprecating the unix_timestamp and epoch_start_timestamp clock sysvar fields entirely? We could mark them as deprecated in the next release, and set them to zero in a feature gate a little while after that.
If we are considering changing the semantics of the timestamps, we should also consider getting rid of them entirely.
Is there a good reason not to do this? Will this break a lot of use-cases today? Genuinely asking. If removing these fields would break a lot of important use-cases then we should probably keep them, if not then it's worth considering.
There was a problem hiding this comment.
unix_timestamp is used very broadly within defi, its probably not feasible to ever remove that. would there be another trustless way to access the current time on chain? a notion of time is required for things like interest calculations.
There was a problem hiding this comment.
Unfortunately I don't know how many applications are using the clock (and how). I asked people about this before I wrote this SIMD, but nobody was able to tell me. The consensus was then to have some reasonably good solution no matter what. This is basically what this SIMD is about.
There was a problem hiding this comment.
@OliverNChalk if the timestamp fields are removed, then slot number could be used as a rough proxy for time. I'm not sure if this is sufficient.
There was a problem hiding this comment.
This would require us to expose the configured slot time target and would likely drift fairly significantly over time? Not to mention this still breaks marginfi, kamino, and all other money markets, not to mention meme coin launchpads/AMMs that use time as a trigger for pool unlock.
In an ideal world we have decent accuracy in all cases. In a okay world we have this proposal which gives us decent accuracy in the average case. I don't think it's possible to remove the concept of time from DEFI as time is very fundamental to finance (i.e. funding/interest rates). If we remove time then now all of defi will have to ask Pyth for the current time which introduces another trust assumption making all of solana finance dependent on a 3rd party chain
There was a problem hiding this comment.
the intended uses of today's block timestamp are these
- enforcing paper contract dates on chain (eg. stake account lockups)
- real-world (tax) accounting
all other uses are a misuse of this timestamp as it is not what the consumer believes it to be. that said, its misuse is prevalent, so we must retain its availability as its removal would be extremely disruptive to the ecosystem today
attempts to use slot_height * target_block_time will fall apart since they lack corrective inputs. each time we drive down the block time target would need treatment. we've already accumulated several hours worth of slot-time/wall-time skew due to outages
that the clock sysvar timestamp is unsuitable for alpenglow is fine, so long as the solution to alpenglow clock is capable of emitting a timestamp suitable for populating the clock sysvar timestamp
|
@rogerANZA any updates here? In particular, please see: |
sure, feel free to integrate these two comments in the doc? |
|
Would be good to get a review here, so that we can finally merge this back. |
| Upon receiving a block footer from a leader, non-leader validators will: | ||
|
|
||
| - Locally update the clock sysvar associated with the bank | ||
| - During replay, validate the bounds of this clock sysvar with respect to the | ||
| parent bank's clock sysvar and proceed as outlined in "Detailed Design" |
There was a problem hiding this comment.
Quick question about this - how does replay work? Do we update the clock sysvar from the block footer and then start replaying? If so, doesn't that mean we can't start replaying until we've received the block footer?
There was a problem hiding this comment.
no, it happens when we replay the block footer. So effectively we are setting the clock sysvar at the end of the block for use in the child block
There was a problem hiding this comment.
"during replay" should be more precise?
There was a problem hiding this comment.
Yeah, useful to clarify this.
@rogerANZA - I updated the doc. The relevant part:
Non-leader validators must apply the following block footer logic after transactions in a block have been replayed:
- Locally update the clock sysvar associated with the bank
- Validate the bounds of this clock sysvar with respect to the parent bank's clock sysvar as outlined in "Detailed Design"
For avoidance of doubt - while replaying transactions in a block with slot S, the clock sysvar has value specified in S.parent’s block footer.
Now that we can process `BlockComponent`s during replay, update the clock sysvar when we observe the block footer. This PR exposes the Alpenglow clock to `BlockComponentProcessor`, which we'll next modify to enforce the Alpenglow clock bounds outlined in solana-foundation/solana-improvement-documents#363.
We enforce Alpenglow clock checks, outlined in detail here: solana-foundation/solana-improvement-documents#363.
No description provided.