-
Notifications
You must be signed in to change notification settings - Fork 500
Description
We perform scope checking before evaluating a program on the chain to ensure that the program is well-scoped and doesn't contain free variables.
If scope checking fails, then this is a phase-2 validation error, i.e. exactly the same situation as if we let the program to get evaluated (which would produce a free variable error at runtime, unless the free variable happened not to be on the evaluation path).
I.e. the scope check saves us some work when it comes to handling incorrect scripts. But given that the node still distributes the incorrect program to other nodes and records it on the blockchain (since scope checking is a phase-2 check), it doesn't save us that much.
What does the scope check cost us on the other hand? A lot. Adding the scope check to validation-decode benchmarks measuring the cost of decoding scripts increases runtime by 25% (see this PR). I.e. 20% of the script preparation stage (decoding + version check + scope check) is wasted on the scope check (and 3% of total validation times for the full-validation benchmarks, see this PR removing the scope check).
I assume, this includes reference scripts as well, since the scope check isn't a part of deserialization, but rather something that we do right before evaluation.
According to @colll78:
the revenue of the top dApps (including Minswap, SundaeSwap and Liqwid) was nearly cut in half by the introduction of the cost per reference script bytes parameter.
I.e. the useless scope check is responsible for squeezing 10% (20% * 50%) out of margins of businesses running on the Cardano blockchain. This is absolutely ridiculous in my opinion.
How did we add this check in the first place? By accident, to quote @michaelpj:
The only reason we have this behaviour is because we had the de Bruijn conversion pass in there which incidentally enforced this property. It wasn't a conscious choice.
So how come it's still there?
Originally, it wasn't removed because there was an idea that maybe doing scope checking once before evaluation is cheaper than performing it during evaluation over and over again. This idea was wrong, I investigated it (resulting in this massive performance improvement as a side effect) and I strongly believe that performing the scope check upfront does not and cannot save us anything at runtime.
The GHC Core for the new code handling Var in the CEK machine is available here -- check for yourself and let me know if you see how assuming that all variables are in-scope can save us anything in there.
This seems fairly uncontroversial, so again, how come the scope check is still there?
Well, our Agda formalization, plutus-metatheory relies on it being there, quoting @jmchapman:
The current semantics of the language is for closed terms. So, moving to open terms would lead to undefined behaviour. Weakening this guarantee would make it easier to make a mistake in the implementation of the CEK machine.
My arguments to that are:
- There's no way we could ignore handling the free variable case in the Haskell evaluator. It does not operate on intrinsically-scoped terms, we have to somehow handle a free variable in the implementation regardless of how sure we are that this code path can never be triggered. Consequently we have to specify and verify that case simply because it's in there.
- The implementation of the CEK machine does not consult the formalization in any way, we do not track what the formalization is up to in order to adapt the implementation accordingly. I've always assumed the flow would be the opposite: the formalization builds a model of what the implementation does, since it's the latter that is constrained by the real world requirements, not the former. 10% of revenue is hella a lot to pay so that the metatheory can pretend that it formalizes the real thing.
- The current approach is this: instead of formalizing the implementation of the CEK machine we assume that certain clauses in it cannot happen due to a check performed in
plutus-ledger-api, which makes the implementation of the CEK machine not self-contained from the point of view of the metatheory. And there's such a distance to travel fromplutus-ledger-apitountyped-plutus-coretoindex-envsto conclude it's all fine together, just not individually. Instead, the metatheory could just formalize what's in the implementation of the CEK machine, it's perfectly sound on its own. - The implementation of variable lookup is quite non-trivial and it's hard to convince oneself that it's correct. And it's used on the critical path: this is how we do variable lookup in the CEK machine. So how come it's not formalized while being critical, non-trivial and perfectly formalizable? Well, the metatheory does a different thing on the basis of there existing the scope check pass somewhere upstream, this creates divergence between the formalization and the implementation, and the entirety of variable lookup logic gets sucked up into that divergence.
My conclusion is that not only is the scope check very expensive while being literally useless for on-chain validation, it also makes the implementation less safe, because we assume safety where there's in fact a huge unformalized gap between the implementation and the formalization.
We should just remove the scope check.