Skip to content

SIMD-0503: Static Sysvars#503

Open
deanmlittle wants to merge 4 commits intosolana-foundation:mainfrom
blueshift-gg:static-sysvars
Open

SIMD-0503: Static Sysvars#503
deanmlittle wants to merge 4 commits intosolana-foundation:mainfrom
blueshift-gg:static-sysvars

Conversation

@deanmlittle
Copy link
Copy Markdown
Contributor

No description provided.

@simd-bot
Copy link
Copy Markdown

simd-bot bot commented Mar 26, 2026

Hello deanmlittle! Welcome to the SIMD process. By opening this PR you are affirming that your SIMD has been thoroughly discussed and vetted in the SIMD discussion section. The SIMD PR section should only be used to submit a final technical specification for review. If your design / idea still needs discussion, please close this PR and create a new discussion here.

This PR requires the following approvals before it can be merged:

Once all requirements are met, you can merge this PR by commenting /merge.

@deanmlittle deanmlittle changed the title Static sysvars SIMD-0503: Static Sysvars Mar 26, 2026
@ptaffet-jump
Copy link
Copy Markdown
Contributor

I don't think it's a good idea to overload memory translation like this. This leads to some strange behavior where

lddw r3, 0x494df715 // SOL_RENT_SYSVAR
ldxdw r3, [r3+0]

loads from a sysvar, but

lddw r3, 0x494df714 // SOL_RENT_SYSVAR - 1
add  r3, 1
ldxdw r3, [r3+0]

OR

lddw r3, 0x494df714 // SOL_RENT_SYSVAR - 1
ldxdw r3, [r3+1]

gives a different value (if I'm understanding correctly). An ISA where these snippets do not have identical behavior seems cursed to me.

I'm less opposed to putting these sysvars at a known, fixed, appropriately aligned location in an existing read-only data segment, but it doesn't seem better enough than passing in the account to justify a change.
Can't you just require the sysvar accounts to be passed in first, and then use pre-computed offsets into the input section? Checking the pubkey takes like 16 CUs, no?

@deanmlittle
Copy link
Copy Markdown
Contributor Author

deanmlittle commented Mar 26, 2026

You have perfectly demonstrated why this technique is safe and doesn't interfere with any existing APIs or instructions. It simply makes dereferencing certain 32bit addresses that currently point to nothing translate to a pointer to a global variable instead of throwing a memory access error.

I am not sure you have considered the performance benefits. We can load all sysvars once per slot for less than the cost of loading in the largest one just once, and consume them N times within a slot.

Beyond this, it makes writing onchain programs incredibly ergonomic and performant. There really are no downsides and some rather extreme performance and devex benefits. I am yet to see a developer whose mind wasn't blown by how easy it could be for them to write programs with this method.

@alnoki
Copy link
Copy Markdown

alnoki commented Mar 26, 2026

I am yet to see a developer whose mind wasn't blown by how easy it could be for them to write programs with this method

+1, please land this

@ptaffet-jump
Copy link
Copy Markdown
Contributor

How do you access the other fields of the sysvar? E.g. the epoch value from the clock sysvar?

Don't forget that making address translation in the VM more expensive makes every program slower, even if that's not measurable in CUs.

@deanmlittle
Copy link
Copy Markdown
Contributor Author

deanmlittle commented Mar 27, 2026

How do you access the other fields of the sysvar? E.g. the epoch value from the clock sysvar?

It's actually super simple. As you've demonstrated, all existing behavior remains unmodified. The only change is that a previously invalid dereference to this address is now valid. Due to the fact that:

  • We are utilizing an address in the 32-bit range (extended to 64 bits with upper 32 bits masked out in this case, as our wonderful Rust/LLVM/BPF doesn't currently have a good way to produce ALU32 instructions without inline assembly)
  • All deferences for any address under 0x0100000000 are currently invalid
  • Write (stx) instructions remain unchanged and continue to fail as they do now
  • All dereferences (ldx) require a relative offset
  • The translated address is bound-checked

We simply treat any dereference to this address as the base, and its offset as a relative offset to that address.

In other words, if 0xff395088u64 is the start address of our Clock, then:

lddw r3, 0xff395088 // SOL_CLOCK_SYSVAR
ldxdw r4, [r3+0x00] // load current slot into r4
ldxdw r5, [r3+0x20] // load current timestamp into r5

By virtue of none of our existing sysvars having a length in excess of i16::MAX (32,767 bytes), relative offsets from a known base address is sufficient to encapsulate all existing sysvar account values.

Don't forget that making address translation in the VM more expensive makes every program slower, even if that's not measurable in CUs.

This only happens globally once per slot. The overhead per transaction is a single fat pointer to the sysvar cache. Absolutely negligible considering how much we already copy just to process a transaction. Consider that by comparison today, a single sysvar account load in a single transaction (of which there are MANY more than one per slot) requires serializing >10kb of data, the majority of which is never even used in the VM, whereas the total aggregate of all sysvars today is <30kb – less than the cost of serializing our single largest sysvar account one time, or our smallest sysvar account 3 times in any given slot.

Copy link
Copy Markdown
Contributor

@Lichtso Lichtso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By opening this PR you are affirming that your SIMD has been thoroughly discussed and vetted in the SIMD discussion section.

Did I miss anything?


## Summary

Leverage existing static linking infrastructure of JIT compilation to enable
Copy link
Copy Markdown
Contributor

@Lichtso Lichtso Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static linking

Nit on terminology: I know we called them "static syscalls" but they are not static linking (that would imply we would inline their implementation into the SBPF ELF at or before deployment), instead they are still dynamic linking but relocation-less.

existing ... infrastructure

The infrastructure for "static syscalls" is for syscalls only, as the name implies. Linking of (in this case readonly) data is a different process and there is no existing infrastructure to utilize.

JIT compilation

I know we have used JIT-only mode on MNB in Agave for years but with the next version we could have tiered compilation (Interpreter & JIT hybrid). Also Firedancer is using an Interpreter.


1. Invoke a specific getter syscall
2. Invoke the get_sysvar syscall, or
3. Include a Sysvar account in their program.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obligatory https://xkcd.com/927/.

That said, yes we could improve upon sysvars and the sysvar syscalls are not great.

other sysvar accounts, such as the murmur hash of `0xff395088` for `Clock`.

Ergo, we can safely and performantly expose any available global variable to
the VM without the overhead of additonal account loads or syscall invocation.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this thinks about the JIT only, which is not protocol, it is one possible implementation. What about an interpreter? It would have to do this in address translation on every memory access.

Also, while maybe not that impactful, it would also make the JIT implementation more complex and thus compilation slower as this is a form of macro-op fusion which requires state to be tracked between the instructions. Currently we can compile every instruction independently.

The downsides of all of these approaches are threefold:

1. Sysvar values are globals that are always available to the validator, but
aren't exposed as globals during execution. This is a clunky anti-pattern
Copy link
Copy Markdown
Contributor

@Lichtso Lichtso Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sysvar accounts are already loaded for all transactions even if their message does not mention the sysvar key.

Scratch that, it only applies to the sysvar cache but it should be easy to expand to transaction accounts too.

resulting in degraded developer experience.
2. Invoking a syscall to access a global requires both a stack allocation and
halting execution – an immense amount of overhead just to read a value that
is already readily available.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, sysvar syscalls are overkill for reading constant data.

halting execution – an immense amount of overhead just to read a value that
is already readily available.
3. Having to pass in an account to access a global results in a 10kb penalty
to data serialization and has negative implications for composability.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, depending on the design of the entrypoint a program must also find the instruction account of the sysvar, but that will be fixed by SIMD-0449.

10kb penalty to data serialization

That part will be gone after direct mapping and the CU charging adjustments of SIMD-0452, which I still have to rewrite.

negative implications for composability

I think that is the remaining actual downside of sysvar accounts: They waste key space in the instruction invocation.

## Alternatives Considered

- Leverage JIT intrinsics to provide similar syscall functionality.
- Don't improve the existing design of sysvars/syscalls.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think taking another look at solving this problem using sysvar accounts is worthwhile. The gap to bridge there might be relatively small (see https://github.com/solana-foundation/solana-improvement-documents/pull/503/changes#r3002020193).

@Lichtso
Copy link
Copy Markdown
Contributor

Lichtso commented Mar 27, 2026

All deferences for any address under 0x0100000000 are currently invalid

In SIMD-0189 we moved the readonly-data down into the 32 bit address range as most 64 bit load immediate (lddw) only want to load addresses to that and can thus be shrunken to 32 bit load immediate.

So, the murmur hash and references to readonly-data could collide.

@deanmlittle
Copy link
Copy Markdown
Contributor Author

deanmlittle commented Mar 27, 2026

All deferences for any address under 0x0100000000 are currently invalid

In SIMD-0189 we moved the readonly-data down into the 32 bit address range as most 64 bit load immediate (lddw) only want to load addresses to that and can thus be shrunken to 32 bit load immediate.

So, the murmur hash and references to readonly-data could collide.

It's always a fun time discovering new and exciting reasons why the runtime has continued to trend from "functional" to "borderline unusable" via SIMD trivia. I can guarantee you that devs would unanimously prefer to maintain the viability of the method laid out in this SIMD over curing you of your irrational hatred of lddw. That should be a sign to anyone who takes their end customers seriously to revert the change.


```asm
lddw r3, 0x494df715 // SOL_RENT_SYSVAR
ldxdw r3, [r3+0]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would benefit from explaining that sysvars are mapped in as byte slices starting at the murmur hash as address. Otherwise it invites the interpretation that all sysvars are supposed to be accessed via a single u64 value, which would limit them to 8 bytes.

1. We must ensure no intra-slot mutability of any exposed globals.
2. We must ensure static sysvar pointers remain synchronized with SysvarCache
at the slot boundary.
3. Despite being 32-bit hashes, it is important that we cast to u64 first as
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively one could also only use the 31 LSBs of the hash / mask out the 1 MSB and always load a i32 immediate value.

@LucasSte
Copy link
Copy Markdown
Contributor

We are utilizing an address in the 32-bit range (extended to 64 bits with upper 32 bits masked out in this case, as our wonderful Rust/LLVM/BPF doesn't currently have a good way to produce ALU32 instructions without inline assembly)

BPF upstream has the +alu32 target feature which achieves that. If generating LDDW, instead of mov32, is a problem, please open an issue in our repo, and we'll fix that.

@LucasSte
Copy link
Copy Markdown
Contributor

It's always a fun time discovering new and exciting reasons why the runtime has continued to trend from "functional" to "borderline unusable" via SIMD trivia.

I don't think this is an appropriate comment for this discussion. You have read and approved the SIMD yourself: #189 (review).

@Lichtso
Copy link
Copy Markdown
Contributor

Lichtso commented Mar 27, 2026

It's always a fun time discovering new and exciting reasons why the runtime has continued to trend from "functional" to "borderline unusable" via SIMD trivia.

First of all, you approved the SIMD yourself, it did not pass you by unnoticed. You simply, like anybody else, did not have the foresight that there could be more interactions with future ideas.

I can guarantee you that devs would unanimously prefer to maintain the viability of the method laid out in this SIMD

Devs would prefer any method that achieves the same low cost direct access to sysvars, independent of it being is this proposal or some alternative.

over curing you of your irrational hatred of lddw.

Which you seem to be infected with too, reciting this very proposal:

"this would be more ideal, as it would save 8 bytes of binary size."

That should be a sign to anyone who takes their end customers seriously to revert the change.

What if I told you there is an alternative which achieves the same interface and CU costs but does not need to revert SBPFv3?

@deanmlittle
Copy link
Copy Markdown
Contributor Author

deanmlittle commented Mar 27, 2026

We are utilizing an address in the 32-bit range (extended to 64 bits with upper 32 bits masked out in this case, as our wonderful Rust/LLVM/BPF doesn't currently have a good way to produce ALU32 instructions without inline assembly)

BPF upstream has the +alu32 target feature which achieves that. If generating LDDW, instead of mov32, is a problem, please open an issue in our repo, and we'll fix that.

Yeah, we can do it with upstream already too. The issue is if we tried to force 32 bits, the feature being safe becomes inherently tied to a compiler version which doesn't sound great. If v3 was a feature branch until the feature gates activated and only shipped to master when it was fully baked, it would have been fine to assume alu32. In absence of that, lddw works just fine here. The main issue is just managing sign extension.

@deanmlittle
Copy link
Copy Markdown
Contributor Author

First of all, you approved the SIMD yourself, it did not pass you by unnoticed. You simply, like anybody else, did not have the foresight that there could be more interactions with future ideas.

It has now become apparent why you wanted to sneak in this unrelated design choice into the header restrictions. I wish I had understood your motivation more clearly at the time instead of rubber stamping it after ripping out all of the bloat.

What if I told you there is an alternative which achieves the same interface and CU costs but does not need to revert SBPFv3?

I'd say you're about to shill me on a new memory address prefix in the upper 32-bits, locking us into 64-bit targets for good. Sad if it has to come to that. Also, was absolutely not suggesting to revert all of V3, just the silly 32-bit mapping change. Ironically, in your quest to kill lddw, you've just made it more useful 🙃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants