Skip to content

Conversation

szuend
Copy link
Collaborator

@szuend szuend commented Apr 9, 2025

Preview

Rebased on top of #193 and #194.

This is an early draft of the "scopes" proposal spec text. Sharing it already to get some early feedback and to write it out in the open :)

Done

  • Record types for scopes and ranges
  • Grammar for scopes and ranges
  • Decoding abstract operations/SDOs for scopes and ranges

ToDo

  • Add "optionally report an error" handling throughout the decoding where it makes sense.
  • Add a section that defines bounds of generated ranges w.r.t. to Ecma262 AST.
  • Add normative (or probably informative) section how to correctly utilize the scopes info (similar to Spec: add algorithm for retrieving original position #195):
    • How to retrieve original function names
    • How to expand stack frames with inlined functions
    • How to condense stack traces with outlined functions

@szuend szuend force-pushed the scopes branch 2 times, most recently from 26d4baf to 115f0b0 Compare April 24, 2025 09:27
Copy link
Member

@nicolo-ribaudo nicolo-ribaudo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I only went through the data structures so far)

spec.emu Outdated
</tr>
</table>
</emu-table>
<p>A root Decoded Generated Record describes a distinct part of the generated code. Each Generated Range Record may be associated with a Original Scope Record as indicated by the presence of the [[Definition]] field.</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Decoded Generated Record" is not auto-linked.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the ecmarkup gets confused as there is currently a definition for both the terms "generated range" and "generated range record". Maybe we have to rename it to "range record", or remove the definition for "generated range". Wdyt?

* Combine 'IsStackFrame' and 'IsHidden' of generated range into a single
  enum field.
* Get rid of Sub-Range Binding Record in favor of a list of list of
  binding records.
<emu-alg>
1. Let _relativeName_ be the VLQSignedValue of |Vlq|.
1. Set _state_.[[ScopeNameIndex]] to _state_.[[ScopeNameIndex]] + _relativeName_.
1. Set _scope_.[[Name]] to _names_[_state_.[[ScopeNameIndex]]].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check names length?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on how lists are defined. If out-of-bound reads results in *null*, this would be fine. But we probably want to "optionally report an error" in such cases anyway.

Comment on lines +1944 to +1945
1. If |RangeLine| is present, perform DecodeGeneratedRangeItem of |RangeLine| with arguments _range_, _state_ and _names_.
1. Set _range_.[[Start]].[[Line]] to _state_.[[RangeLine]].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto this creates ambiguity in the parser.

spec.emu Outdated
Comment on lines 1796 to 1797
1. Assert: |ScopeName| is present if and only if, _flags_ & 0x1 = 0x1.
1. If |ScopeName| is present, perform DecodeOriginalScopeItem of |ScopeName| with arguments _scope_, _state_ and _names_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we can assert this because ScopeName and ScopeKind are ambiguous. We could pass different args based on the value, though, if both were defined as Vlq instead of special named productions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the grammar currently is not context-free. If we define both as Vlq, how do I reference them from the SDO?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can reference them as the first Vlq or the second Vlq(eg https://tc39.es/ecma262/multipage/abstract-operations.html#sec-runtime-semantics-stringnumericvalue:~:text=Let%20a%20be%20the%20MV%20of%20the%20first). We'd need to call a differently named SDO instead of reusing DecodeOriginalScopeItem. Or we could define an AO that operates on Parse Nodes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave it a shot. See commit f15b006 for the diff. I'm not a huge fan as it makes the production/decoding more complicated. Does it really need to be unambiguous?

Note that we should probably turn these assertions into "optionally report an error".

szuend and others added 3 commits July 14, 2025 07:36
  * Include 'scopes' in the overall source map example
  * Clarify pre-order for scope/range start/end items
  * Clarify ~none~ for StackFrametype
  * Introduce 'BindingExpression' as a dedicated name for sub-range
    binding VLQs
  * Don't reserve 'A' for as an item tag
Copy link
Member

@nicolo-ribaudo nicolo-ribaudo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar review

Comment on lines +1558 to +1559
Tag
Tag UnknownVlqList
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Tag
Tag UnknownVlqList
Tag UnknownVlqList?

Comment on lines +1320 to +1323
Scopes :
OriginalScopeTreeList
TopLevelItemList
OriginalScopeTreeList `,` TopLevelItemList
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to have either a comment saying that OriginalScopeTreeList is guaranteed to only start with ,/B or to be empty, and that TopLevelItemList is guaranteed to not be empty and not start with ,/B.

It's not obvious that there is no ambiguity here otherwise, since it requires going down into the grammar multiple levels across different sections.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to that, I don't really like that an eager parser cannot parse this Scopes production. When parsing the string ,,X, a parser would:

  1. parse ,, as a OriginalScopeTreeList
  2. see that ,, ends with a , is followed by something that is a valid TopLevelItemList, and backtrack by 1 so that now the OriginalScopeTreeList is only ,
  3. Consume the now available ,
  4. Parse TopLevelItemList

I also slightly dislike that in ,,X we have two original scope trees, while in ,, we have three.

What if the last trailing comma for OriginalScopeTreeList was always ignored, like it is inside array literals in JS, so that the grammar for Scope would become

Scope :
  OriginalScopeTreeList ScopeCommas?
  OriginalScopeTreeList ScopeCommas TopLevelItemList

OriginalScopeTreeList :
  [empty]
  OriginalScopeTree
  OriginalScopeTreeList ScopeCommas OriginalScopeTree

ScopeCommas :
  `,`
  ScopeCommas `,`

and:

  • would fall in the first case, with an empty OriginalScopeTreeList and no ScopeCommas
  • , would have an empty OriginalScopeTreeList followed by a ScopeCommas with one comma
  • ,, would have an empty OriginalScopeTreeList followed by a ScopeCommas with two commas
  • ,,X would have an empty OriginalScopeTreeList followed by a ScopeCommas with two commas
  • B...,,B...,,X would have a B...,,B... OriginalScopeTreeList followed by a ScopeCommas with two commas and then TopLevelItemList

When decoding, for we would first decode OriginalScopeTreeList and then add as many nulls as the number of commas in Scope's ScopeCommas minus 1.

<p>In grammar form:</p>
<emu-grammar type="definition">
OriginalScopeTree :
OriginalScopeStart OriginalScopeVariablesItem? OriginalScopeItemList? `,` OriginalScopeEnd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very slight preference here for doing OriginalScopeStart `,` OriginalScopeVariablesItem? OriginalScopeItemList? OriginalScopeEnd and giving OriginalScopeVariablesItem and OriginalScopeItemList a trailing rather than leading comma, so that the disambiguation for whether they are present or not is on the first character and not on the second.

Comment on lines +1385 to +1387
ScopeVariableList :
ScopeVariable
ScopeVariableList ScopeVariable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't checked yet the SDOs, but it might make sense to have a generic VlqList production that we can reuse in multiple places.

<p>In grammar form:</p>
<emu-grammar type="definition">
GeneratedRangeTree :
GeneratedRangeStart GeneratedRangeBindingsItem? GeneratedRangeCallSiteItem? GeneratedRangeItemList? `,` GeneratedRangeEnd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here about using trailing rather than leading commas

</h1>
<dl class="header"></dl>
<emu-alg>
1. Let _scope_ be a new Original Scope Record with all fields default initialized.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things like [[Start]] do not have a clear default, it'd be better to explicitly initialize the record.

@nicolo-ribaudo
Copy link
Member

I find the decoding algorithms very hard to follow, with multiple mutable data structures used at the same time. I'm gong to submit a PR to your branch with a potential alternative, if I can find a way to make it clearer.

@nicolo-ribaudo
Copy link
Member

For ScopeFlags and ScopeNameOrKind/ScopeKind, what if instead of having a bit that says "is the first one present" and one that says "is the second one present" (leading to potentially invalid states, such as saying that both are present but then they are not), we had a single bit that says "if both are present, then it's the name/kind".

By doing so if neither or both are present we don't even need to check the bit.

1. Set _scope_.[[Start]].[[Column]] to _state_.[[ScopeColumn]].
1. Let _flags_ be the VLQUnsignedValue of |ScopeFlags|.
1. If _flags_ & 0x3 = 0x1, then
1. Assert: |ScopeNameOrKind| is present and |ScopeKind| is not present.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually guaranteed? What if I set 0x1 in the flags, and then put both |ScopeNameOrKind| and |ScopeKind| in it? Should we optionally report an error and return instead? Or maybe have an "early error" in the grammar that ensures it?

1. Let _flags_ be the VLQUnsignedValue of |ScopeFlags|.
1. If _flags_ & 0x3 = 0x1, then
1. Assert: |ScopeNameOrKind| is present and |ScopeKind| is not present.
1. DecodeOriginalScopeName(|ScopeNameOrKind|, _scope_, _state_, _names_).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. DecodeOriginalScopeName(|ScopeNameOrKind|, _scope_, _state_, _names_).
1. Perform DecodeOriginalScopeName(|ScopeNameOrKind|, _scope_, _state_, _names_).

(and the others)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants