You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2025-10-01-AST-me-anything.md
+30-31Lines changed: 30 additions & 31 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,15 @@
1
1
---
2
2
layout: post
3
3
title: C4 - AST me anything
4
-
date: 2025-04-16 00:00:00 -0500
4
+
date: 2025-10-01 00:00:00 -0500
5
5
categories: general
6
6
---
7
7
8
-
Alternative title: _Analyze this!_
9
8
We give a quick overview of all the AST node types used in the Clojure compiler..
10
9
We finish with a quick look at `Compiler.Analyze`, which generates an AST from a form.
11
10
11
+
Alternative title: _Analyze this!_
12
+
12
13
13
14
## The roots
14
15
@@ -17,20 +18,20 @@ To be an an AST node, a class must implement the `clojure.lang.Compiler.Expr` in
17
18
```C#
18
19
publicinterfaceExpr
19
20
{
20
-
// provides typing information
21
-
boolHasClrType { get; }
22
-
TypeClrType { get; }
23
-
24
21
// Supports direct evaluation and code generation
25
22
objectEval();
26
23
voidEmit(RHCrhc, ObjExprobjx, CljILGenilg);
24
+
25
+
// provides typing information
26
+
boolHasClrType { get; }
27
+
TypeClrType { get; }
27
28
28
29
// Technical detail -- more later.
29
30
boolHasNormalExit();
30
31
}
31
32
```
32
33
33
-
In addition, some node types support the possibility of generating unboxed primitive values and have special code generation code for this. These types implement `MaybePrimitiveExpr`:
34
+
In addition, some node types support the possibility of generating unboxed primitive values and have special code generation code for this. These types implement `MaybePrimitiveExpr` instead of just `Expr`:
34
35
35
36
```C#
36
37
publicinterfaceMaybePrimitiveExpr : Expr
@@ -91,7 +92,7 @@ And two node types relate specifically to `Var`s:
91
92
|`VarExpr`| A symbol in the code text maps to a `Var`. |
92
93
93
94
94
-
__Cluster #2 -- Control flow__: There are a few node types related to what we might call the flow of control:
95
+
__Cluster #2 -- Control flow__: There are a few node types related to the flow of control:
95
96
96
97
97
98
| Expr type | Description |
@@ -103,41 +104,43 @@ __Cluster #2 -- Control flow__: There are a few node types related to what we mi
103
104
|`TryExpr`| This has the main body, catch clauses (can be none), and an optional finally clause. A `try` form without any `catch`es and no `finally` will be output as a `BodyExpr`.|
104
105
105
106
106
-
__Cluster #3 -- Miscellaneous__: The things that I can't figure out another home for.
107
+
__Cluster #3 -- Miscellaneous__: The things that I couldn't figure out a better place for.
107
108
108
109
| Expr type | Description |
109
110
|:----------|:------------|
110
111
|`AssignExpr`| Comes from a `(set! target value)` expression |
111
112
|`DefExpr`| Comes from a `(def name value)` expression. This includes `defn`, `defmacro` and other forms that macroexpand to a `def`|
112
113
|`ImportExpr`| Yep, `(import* ...)` gets its own node type |
|`MetaExpr`| When we need to attach metadata to some other construct as we eval/code-gen. Primiarly used for functions and the collections (map, set, vector) |
115
+
|`MetaExpr`| When we need to attach metadata to some other construct as we eval/code-gen. Primarily used for functions and the collections (map, set, vector) |
115
116
|`MonitorEnterExpr`| Yep. |
116
117
|`MonitorExitExpr`| Just what you think. |
117
-
|`UnresolvedVarExpr`| There is a compiler mode that allows a symbol that we can resolve to something (local binding, `Var`, typename, etc., i.e., does not throw an error when encountered. We create this node for one of those. It is supposed to allow an extension point. No known uses of it. |
118
+
|`UnresolvedVarExpr`| There is a compiler mode that allows a symbol that prevents an error being thrown when we run into `Var` we can't resolve to something. We create this node for one of those. It is supposed to be an extension point for the compiler. No known uses of it. I've never turned on the flag allowing this, so I have never created a node of this type other than in tests. |
118
119
|`UntypedExpr`| This is just an abstract base class for certain other node types that don't have a return type: `MonitorEnterExpr`, `MonitorExitExpr`, `ThrowExpr`|
119
120
120
121
__Cluster #4 -- Host interop__:
122
+
121
123
The subtypes of `HostExpr` are all related to platform calls: methods, properties, fields, etc. There are a few node types similarly oriented that are not under `HostExpr`. I may went a little overboard with the class hierarchy, but there was common code that seemed to make this structure sensible. Things are a little more complicated on the CLR vs the JVM because the CLR has properties as first-class objects. That also increases the confusion level for no-argument constructs: Is `(.m x)` a field access, a property get, or a call to a 0-arity method?
124
+
122
125
Ignoring some of the intermediate abstract classes, we can organize the majority of the concrete classes as follows:
|`InstanceZeroArityCallExpr`| An interop call with no arguments, but we can't figure out at analysis time if it is a field, property, or expression. Reflection is going to be happening. |
136
+
|`InstanceZeroArityCallExpr`| An interop call with no arguments, but we can't figure out at analysis time if it is a field, property, or expression. There will be reflection happenging when we actually evaluate this node. |
134
137
135
138
There are two other node types I think go into this cluster:
136
139
137
140
138
141
| Expr type | Description |
139
142
|:----------|:------------|
140
-
|`QualifiedMethodExpr`| This results from the recently-introduced qualified method expressions. They look like `Type/.Name` for instance method calls, `Type/new` for constructor calls, and `Type/name` for static calls. During analysis, this node type sometimes resolves into one of the `HostExpr` types, but sometimes it generates a value (a 'thunk') to be passed as an argumment. |
143
+
|`QualifiedMethodExpr`| This results from the recently-introduced qualified method expressions. They look like `Type/.Name` for instance method calls, `Type/new` for constructor calls, and `Type/Name` for static calls. During analysis, this node type sometimes resolves into one of the `HostExpr` types, but sometimes it generates a value (a 'thunk') to be passed as an argumment. |
141
144
|`NewExpr`| A constructor call to 'new' a type. |
142
145
143
146
@@ -150,12 +153,10 @@ __Cluster #5 -- Functions and local binding scopes__: These node types related t
150
153
|`FnExpr`| From `(fn* name ...)` forms |
151
154
|`NewInstanceExpr`| from `deftype` and `reify` definitions |
152
155
153
-
I have no idea why `ObjExpr` and `NewInstanceExpr` are so named.
154
-
These create a scope for the definition of and reference to local bindings.
156
+
I have no idea why `ObjExpr` and `NewInstanceExpr` are so named. (I get confused every time I see them in the code -- and `ObjExpr` is _everywhere_.) These create a scope for the definition of and reference to local bindings.
155
157
They cause the creation of types that implement the `IFn` interface of `invoke` methods.
156
158
157
-
158
-
Though not AST nodes themselves, the types `FnMethod` and `NewInstanceMethod` encode the individual 'invoke' method definitions.
159
+
Though not AST nodes themselves, the types `FnMethod` and `NewInstanceMethod` encode the individual `invoke` method definitions.
159
160
160
161
Also creating scopes are:
161
162
@@ -178,7 +179,7 @@ into
178
179
( (fn* [] (let [x 12] (inc x))) )
179
180
```
180
181
181
-
This need only happen at the top level. If a `let` form was inside a function definition already, this wrapping would not be necessary.
182
+
This need only happen at the top level. If a `let` form was inside a function definition already, this wrapping would not be necessary. (This is one of the places the 'context' flag mentioned in the previous post comes into play.)
182
183
183
184
The `loop` special form explictly creates a scope for iteration via `recur`; function bodies implicly create such a scope. A `recur` form in such a context will be represented by
184
185
@@ -233,18 +234,16 @@ The main body of the `Compiler.Analyze` method just steps through a series of te
233
234
| a keyword | Create a `KeywordExpr`|
234
235
| a number | Create a `NumberExpr` (if an `int`, `long`, or `double`) or a `ConstantExpr`|
235
236
| a string | Create a `StringExpr`|
236
-
| an `IPersistentCollection` <br/> not an `IRecord` or `IType` <br/> has no elements |create an `EmptyExpr`|
237
-
| an `ISeq`|call the `ISeq` analyzer (see below) |
238
-
| an `IRecord` or `IType`|createa a `ConstantExpr`|
239
-
|`IPersistentVector`|create a `VectorExpr` or a `ConstantExpr`|
240
-
|`IPersistentMap`|create a `MapExpr` or a `ConstantExpr`|
241
-
|`IPersistentSet`|create a `SetExpr` or a `ConstantExpr`|
242
-
| otherwise |create a `ConstantExpr`. If we are eval'ing, we just return it. If we are compiling, let's hope we can figure out how to handle it. |
237
+
| an `IPersistentCollection` <br/> not an `IRecord` or `IType` <br/> has no elements |Create an `EmptyExpr`|
238
+
| an `ISeq`|Call the `ISeq` analyzer (see below) |
239
+
| an `IRecord` or `IType`|Create a `ConstantExpr`|
240
+
|`IPersistentVector`|Create a `VectorExpr` or a `ConstantExpr`|
241
+
|`IPersistentMap`|Create a `MapExpr` or a `ConstantExpr`|
242
+
|`IPersistentSet`|Create a `SetExpr` or a `ConstantExpr`|
243
+
| otherwise |Create a `ConstantExpr`. If we are eval'ing, we just return it. If we are compiling, let's hope we can figure out how to handle it. |
243
244
244
245
The node types mentioned in this list from a very small subset of the all the node types.
245
246
Here we see pretty much a few data-oriented node types. Clearly the `Symbol` and `ISeq` analyzers are doing the heavy lifting. Enough that each gets its own post:
246
247
247
-
-[Symbolic of what?]({{site.baseurl}}{% post_url 2025-04-17-symbolic-of-what})
Copy file name to clipboardExpand all lines: _posts/2025-10-02-symbolic-of-what.md
+12-16Lines changed: 12 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
2
layout: post
3
3
title: C4 - Symbolic of what?
4
-
date: 2023-04-17 00:00:00 -0500
4
+
date: 2025-10-02 00:00:00 -0500
5
5
categories: general
6
6
---
7
7
@@ -10,14 +10,12 @@ We look at the interpretation of symbols in Clojure code.
10
10
11
11
## Introduction
12
12
13
-
Symbols but are given meaning by a complex web of interactions among the Lisp reader,
13
+
Symbols are given meaning by a complex web of interactions among the Lisp reader,
14
14
namespaces, the Clojure compiler, and the Clojure runtime.
15
15
16
16
We'll skip the reader, though the interpretation of symbols as discussed below does come into just a bit in the reading of syntax-quote (` `` `) forms. But that's a bit off the path we need to travel.
17
17
18
-
The code for resolving symbols and translating them in context into nodes in the abstract syntax tree (AST) is complex.
19
-
There are appear to be some reduncancies that could be eliminated, along with a few other simplifications.
20
-
But for that, I needed more clarity on the rules for symbol interpretation. What follows is not complete, by any means, but it is a starting point.
18
+
The code for resolving symbols and translating them into nodes in the abstract syntax tree (AST) is complex. In face, there appear to be some reduncancies that could be eliminated, along with a few other simplifications. But let us proceed with the code we have.
21
19
22
20
## An example
23
21
@@ -53,32 +51,30 @@ Within the call to `f`, we must interpret each symbol that occurs in the form, i
53
51
54
52
55
53
```Clojure
56
-
f x y ns2/g namespace.with.a.long.name/h System.Int64String/ToUpper System.Text.StringBuilder
54
+
f x y ns2/g namespace.with.a.long.name/h Int64/MaxValue String/.ToUpper System.Text.StringBuilder
57
55
```
58
56
59
57
`x` and `y` are easy. They do not have a namespace, so they could be local bindings. Local binding takes precedence over other possible interpretations. Indeed, the current context has local binding for those symbols. The analyzer will produce `LocalBindingExpr` nodes for them.
60
58
61
59
`f` also does not have namespace. However, it not bound in the current lexical scope.
62
-
It does not have a namespace, so it does not refer to directly or indirectly (via an alias) to a namespace.
60
+
It does not have a namespace, so we don't need to figure out what its namespace actually is.
63
61
The remaining option is that it has a mapping in the current namespace. It does, to a `Var` and that is what we use. The analyer will produce a `VarExpr` node for it.
64
62
65
-
`ns2/g` is a bit more complicated. It has a namespace, so it can't be a local binding. We need to determine what namespace `ns2` stands for. This requires looking up `ns2` in the current namespace. The current namespace is `ns1`, which has an alias for `namespace.with.a.long.name`. So we look up `g` in `namespace.with.a.long.name`, finding a `Var`. We also check to see if `g` is private. It is not, so we can use it. The analyzer will also produce a `VarExpr` node.
63
+
`ns2/g` is a bit more complicated. It has a namespace, so it can't be a local binding. We need to determine what namespace `ns2` stands for. This requires looking up `ns2` in the current namespace. The current namespace is `ns1`, which has `ns2` as an alias for `namespace.with.a.long.name`. We look up `g` in `namespace.with.a.long.name`, finding a `Var`. We also check to see if `g` is private. It is not, so we can use it. The analyzer will produce a `VarExpr` node.
66
64
67
65
`namespace.with.a.long.name/h` is also easy. `namespace.with.a.long.name` is not an alias but the name of an existing namespace. And `h` is a public `Var` in that namespace. So we can use it. The analyzer will produce a `VarExpr` node for it.
68
66
69
67
Next consider `Int64/MaxValue`. It does have a namespace, so it can't be a local. We check for if `Int64` is a namespace alias; it is not.
70
-
However, the `ns1` namespace does have a mapping from the symbol `Int64` to the type `System.Int64`. (By default, all namespaces are set up with mappings to 'system' types from their unqualified names.) So we have a symbol with the namespace mapping to a type. We must check to see if the name of the symbol, in this case `MaxValue` is a property or field in that type. There is such a property in the type `System.Int64`, so we can use it. The analyzer will produce a `StaticFieldExpr` node for it.
68
+
However, the `ns1` namespace does have a mapping from the symbol `Int64` to the type `System.Int64`. (By default, all namespaces are set up with mappings to 'system' types from their unqualified names.) So we have a symbol with the namespace mapping to a type. We must check to see if the name of the symbol, in this case `MaxValue` is a property or field in that type. `System.Int64.MaxValue` existsThe analyzer will produce a `StaticFieldExpr` node.
71
69
72
-
`String/.ToUpper` is similar. In this case, because this symbol appears in the functional position of function invocation,
73
-
given that `String` maps to `System.String`, we look for methods also. Beacause the name starts with a period, we look for an instance method, and find one. In this case, there will not be a node separately for `String/.ToUpper`; rather, the analyzer will create an `InstanceMethodExpr` node for the entire expression.
70
+
`String/.ToUpper` is similar. In this case, because this symbol appears in the functional position of function invocation and given that `String` maps to `System.String`, we look for methods also. Beacause the name starts with a period, we look for an instance method, and find one. In this case, there will not be a node separately for `String/.ToUpper`; rather, the analyzer will create an `InstanceMethodExpr` node for the entire expression.
74
71
75
-
Finally, we have `System.Text.StringBuilder`. When we have a symbol with no namespace and periods in the name, we look for a type.
76
-
In this case, we do find a type. If it didn't name a type, we would go on and treat the same as a symbol with no periods. (And probably fail). To express the type in the AST, the analyzer will create a `ConstantExpr` node.
72
+
Finally, we have `System.Text.StringBuilder`. When we have a symbol with no namespace and periods in the name, we look for a type. In this case, we do find a type. If it didn't name a type, we would go on and treat the same as a symbol with no periods. (And probably fail). To express the type in the AST, the analyzer will create a `ConstantExpr` node.
77
73
78
74
79
75
## A look at the code
80
76
81
-
We can profitably take a look at the actual C# code for `Compiler.AnalyzeSymbol`.
77
+
Now that we are warmed up, we can profitably look at the actual C# code for `Compiler.AnalyzeSymbol`.
82
78
83
79
```C#
84
80
privatestaticExprAnalyzeSymbol(Symbolsymbol)
@@ -262,7 +258,7 @@ private static object ResolveIn(Namespace n, Symbol symbol, bool allowPrivate)
262
258
263
259
To finish of this code, some brief comments on a few of the auxiliary methods mentioned above.
264
260
265
-
`Compiler.ReferenceLocal` is called when we have identified a reference to a local binding. It does some bookkeeping needed for code-gen. Specifically, it notes the usage of the local binding in the containing function (if there is one) and any functions above that is might be nested in. This is so that we know to close over those variables when creating an instance of the function. It also notes if the local variable is the `this` variable; reference to `this` precludes static linking. But more about that in [C4: Functional anatomy]({{site.baseurl}}{% post_url 2025-04-19-functional-anatomy}).
261
+
`Compiler.ReferenceLocal` is called when we have identified a reference to a local binding. It does some bookkeeping needed for code-gen. Specifically, it notes the usage of the local binding in the containing function (if there is one) and any functions above that is might be nested in. This is so that we know to close over those variables when creating an instance of the function. It also notes if the local variable is the `this` variable; reference to `this` precludes static linking. But more about that in [C4: Functional anatomy]({{site.baseurl}}{% post_url 2025-10-04-functional-anatomy}).
266
262
267
263
`Compiler.RegisterVar` is similar. It just notes the reference to the `Var` in the containing function (if there is one). A field in the class implementing the function will be created and initialized to the `Var` in question.
268
264
@@ -292,7 +288,7 @@ These are when the symbol does not have a namespace:
292
288
-`ns` -- treated as a special case -- always found
293
289
- name found in current namespace (return var) (there are variants in the resolve/lookup code that will create the `Var` if not found)
294
290
295
-
Several kinds of AST nodes can be created from symbols. The details of node types are covered in [C4: AST me anything]({{site.baseurl}}{% post_url 2025-04-16-AST-me-anything}). For symbols with a namespace:
291
+
Several kinds of AST nodes can be created from symbols. The details of node types are covered in [C4: AST me anything]({{site.baseurl}}{% post_url 2025-10-01-AST-me-anything}). For symbols with a namespace:
296
292
297
293
- ns/name, ns names a `Type`, that type has a field or property with the given name => `StaticFieldExpr` or `StaticPropertyExpr`
298
294
- ns/name, ns names a `Type`, no field or property found, name does not start with a period => `QualifiedMethodExpr`, Static
0 commit comments