Skip to content

Refactor BindingPattern to remove weirdness #11489

@overlookmotel

Description

@overlookmotel

Our JS-side AST for TypeScript files is now aligned with TS-ESLint (#9705).

However, the generated JSON isn't quite right - it contains repeated properties.

e.g. the AST for let [x] = y; contains repeated optional and typeAnnotation properties:

{
  "type": "ArrayPattern",
  "start": 4,
  "end": 7,
  "decorators": [],
  "elements": [
    {
      "type": "Identifier",
      "start": 5,
      "end": 6,
      "decorators": [],
      "name": "x",
      "optional": false,
      "typeAnnotation": null,
      "typeAnnotation": null,
      "optional": false
    }
  ],
  "optional": false,
  "typeAnnotation": null,
  "typeAnnotation": null,
  "optional": false
}

Playground (tick the "Raw" box to see the raw JSON)

Currently the TS-ESTree conformance tests are designed to ignore this. But the problem is widespread - when the duplicate fields are not ignored, conformance pass rate goes down from 99.9% to only 20%.

This is not a critical problem because JSON.parse deals with the duplicate properties, so everything works. However it very likely is making JSON.parse slower than it should be.

When using raw transfer, the same kind of thing is happening - properties are being initialized twice. This is very likely slowing down raw transfer a lot.

The cause

The main cause is BindingPattern.

It has its own optional and type_annotation fields:

pub struct BindingPattern<'a> {
// estree(flatten) the attributes because estree has no `BindingPattern`
#[estree(
flatten,
ts_type = "(BindingIdentifier | ObjectPattern | ArrayPattern | AssignmentPattern)"
)]
#[span]
pub kind: BindingPatternKind<'a>,
#[ts]
pub type_annotation: Option<Box<'a, TSTypeAnnotation<'a>>>,
#[ts]
pub optional: bool,
}

However, it also merges in the properties of BindingPatternKind:

pub enum BindingPatternKind<'a> {
/// `const a = 1`
BindingIdentifier(Box<'a, BindingIdentifier<'a>>) = 0,
/// `const {a} = 1`
ObjectPattern(Box<'a, ObjectPattern<'a>>) = 1,
/// `const [a] = 1`
ArrayPattern(Box<'a, ArrayPattern<'a>>) = 2,
/// A defaulted binding pattern, i.e.:
/// `const {a = 1} = 1`
/// the assignment pattern is `a = 1`
/// it has an inner left that has a BindingIdentifier
AssignmentPattern(Box<'a, AssignmentPattern<'a>>) = 3,
}

Each of the variants of BindingPatternKind also contains optional and typeAnnotation fields.

#[estree(
rename = "Identifier",
add_fields(decorators = TsEmptyArray, optional = TsFalse, typeAnnotation = TsNull),
field_order(span, decorators, name, optional, typeAnnotation),
)]
pub struct BindingIdentifier<'a> {
pub span: Span,
/// The identifier name being bound.
#[estree(json_safe)]
pub name: Atom<'a>,
/// Unique identifier for this binding.
///
/// This gets initialized during [`semantic analysis`] in the bind step. If
/// you choose to skip semantic analysis, this will always be [`None`].
///
/// [`semantic analysis`]: <https://docs.rs/oxc_semantic/latest/oxc_semantic/struct.SemanticBuilder.html>
pub symbol_id: Cell<Option<SymbolId>>,
}

We can't remove these fields from BindingIdentifier, because they need to be present in the other places where BindingIdentifier is used e.g. Function::id, Class::id, ImportSpecifier::local.

There may also be other places with duplicate fields, but BindingPattern is definitely the main cause.

The problems with BindingPattern

BindingPattern is a real oddity in the AST:

  • It's the only AST node which doesn't have its own Span.
  • It's the only case where an enum has to be flattened in ESTree AST.
  • It results in unintutive spans e.g. the BindingIdentifier for x in let x: T = foo(); has a span of length 4 (covering x: T, not just x).

In short, it's weird! It behaves unlike any other type in the AST.

Working around these oddities complicates code in various places in the codebase, and has been a source of bugs in the past. It'll probably continue to be a source of bugs in future.

Additionally, there are also places in the AST where we're using BindingPattern but it's not legal to have type annotations. e.g. this is not legal syntax:

let [x: number, ...y: Array<number>] = arr;

TS playground

Because BindingPattern is so common in AST, if we can avoid the extra 16 bytes for type_annotation and optional fields where they're always None and false, it will be a measurable memory saving.

How to fix?

We could add custom serializers for BindingPattern, BindingIdentifier, ObjectPattern, ArrayPattern etc to solve the problem in TS-ESTree AST. However, because they're so common in the AST, this would be really complicated and messy.

I feel this would be treating the symptom rather than the cause.

In my opinion, the best approach is to bite the bullet and alter the Rust types - remove the weirdness at source, rather than adding a further layer of workarounds on top of it.

What would this look like?

I'm not sure exactly what would be required, but possibly we'd need something like:

  • Separate BindingIdentifier and TypedBindingIdentifer AST types.
  • Only TypedBindingIdentifer would have type_annotation and optional fields.
  • Other AST types use whichever type is legal in each situation.

It may also be possible to avoid the need for a TypedBindingIdentifer type and instead move these fields to types which contain a BindingPattern e.g. VariableDeclarator.

What's the downside?

BindingPattern and friends are widely used. Changing them will require updating a lot of downstream code.

When?

As noted above, this is not a critical problem for ESTree serialization, just a perf hit.

However, it may prove a blocker to implementing an AST walker with lazy deserialization, because there you need to be able to deserialize each field individually. So having 2 possible sources for the value of a field isn't workable.

I am hoping I can find another way around that problem. Assuming I can, I propose we push this change back to a later date, unless it's unavoidable.

Metadata

Metadata

Assignees

Labels

A-astArea - AST

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions