Skip to content

Commit dd94be2

Browse files
committed
Add functional anatomy draft. Publish first five posts of the C4 series
1 parent a2dae8c commit dd94be2

8 files changed

+539
-113
lines changed

_drafts/2025-04-19-functional-anatomy.md

Lines changed: 0 additions & 6 deletions
This file was deleted.

_drafts/2025-04-15-classic-clojure-compiler-contemplation.md renamed to _posts/2025-09-30-classic-clojure-compiler-contemplation.md

Lines changed: 42 additions & 39 deletions
Large diffs are not rendered by default.

_drafts/2025-04-16-AST-me-anything.md renamed to _posts/2025-10-01-AST-me-anything.md

Lines changed: 30 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,15 @@
11
---
22
layout: post
33
title: C4 - AST me anything
4-
date: 2025-04-16 00:00:00 -0500
4+
date: 2025-10-01 00:00:00 -0500
55
categories: general
66
---
77

8-
Alternative title: _Analyze this!_
98
We give a quick overview of all the AST node types used in the Clojure compiler..
109
We finish with a quick look at `Compiler.Analyze`, which generates an AST from a form.
1110

11+
Alternative title: _Analyze this!_
12+
1213

1314
## The roots
1415

@@ -17,20 +18,20 @@ To be an an AST node, a class must implement the `clojure.lang.Compiler.Expr` in
1718
```C#
1819
public interface Expr
1920
{
20-
// provides typing information
21-
bool HasClrType { get; }
22-
Type ClrType { get; }
23-
2421
// Supports direct evaluation and code generation
2522
object Eval();
2623
void Emit(RHC rhc, ObjExpr objx, CljILGen ilg);
24+
25+
// provides typing information
26+
bool HasClrType { get; }
27+
Type ClrType { get; }
2728

2829
// Technical detail -- more later.
2930
bool HasNormalExit();
3031
}
3132
```
3233

33-
In addition, some node types support the possibility of generating unboxed primitive values and have special code generation code for this. These types implement `MaybePrimitiveExpr`:
34+
In addition, some node types support the possibility of generating unboxed primitive values and have special code generation code for this. These types implement `MaybePrimitiveExpr` instead of just `Expr`:
3435

3536
```C#
3637
public interface MaybePrimitiveExpr : Expr
@@ -91,7 +92,7 @@ And two node types relate specifically to `Var`s:
9192
| `VarExpr` | A symbol in the code text maps to a `Var`. |
9293

9394

94-
__Cluster #2 -- Control flow__: There are a few node types related to what we might call the flow of control:
95+
__Cluster #2 -- Control flow__: There are a few node types related to the flow of control:
9596

9697

9798
| Expr type | Description |
@@ -103,41 +104,43 @@ __Cluster #2 -- Control flow__: There are a few node types related to what we mi
103104
| `TryExpr` | This has the main body, catch clauses (can be none), and an optional finally clause. A `try` form without any `catch`es and no `finally` will be output as a `BodyExpr`.|
104105

105106

106-
__Cluster #3 -- Miscellaneous__: The things that I can't figure out another home for.
107+
__Cluster #3 -- Miscellaneous__: The things that I couldn't figure out a better place for.
107108

108109
| Expr type | Description |
109110
|:----------|:------------|
110111
| `AssignExpr` | Comes from a `(set! target value)` expression |
111112
| `DefExpr` | Comes from a `(def name value)` expression. This includes `defn`, `defmacro` and other forms that macroexpand to a `def` |
112113
| `ImportExpr` | Yep, `(import* ...)` gets its own node type |
113114
| `InstanceOfExpr ` | Yep, `(instance-of ...)` ... |
114-
| `MetaExpr` | When we need to attach metadata to some other construct as we eval/code-gen. Primiarly used for functions and the collections (map, set, vector) |
115+
| `MetaExpr` | When we need to attach metadata to some other construct as we eval/code-gen. Primarily used for functions and the collections (map, set, vector) |
115116
| `MonitorEnterExpr` | Yep. |
116117
| `MonitorExitExpr` | Just what you think. |
117-
| `UnresolvedVarExpr` | There is a compiler mode that allows a symbol that we can resolve to something (local binding, `Var`, typename, etc., i.e., does not throw an error when encountered. We create this node for one of those. It is supposed to allow an extension point. No known uses of it. |
118+
| `UnresolvedVarExpr` | There is a compiler mode that allows a symbol that prevents an error being thrown when we run into `Var` we can't resolve to something. We create this node for one of those. It is supposed to be an extension point for the compiler. No known uses of it. I've never turned on the flag allowing this, so I have never created a node of this type other than in tests. |
118119
| `UntypedExpr` | This is just an abstract base class for certain other node types that don't have a return type: `MonitorEnterExpr`, `MonitorExitExpr`, `ThrowExpr` |
119120

120121
__Cluster #4 -- Host interop__:
122+
121123
The subtypes of `HostExpr` are all related to platform calls: methods, properties, fields, etc. There are a few node types similarly oriented that are not under `HostExpr`. I may went a little overboard with the class hierarchy, but there was common code that seemed to make this structure sensible. Things are a little more complicated on the CLR vs the JVM because the CLR has properties as first-class objects. That also increases the confusion level for no-argument constructs: Is `(.m x)` a field access, a property get, or a call to a 0-arity method?
124+
122125
Ignoring some of the intermediate abstract classes, we can organize the majority of the concrete classes as follows:
123126

124-
| | Field | Property | Method |
127+
| Mode | Field | Property | Method |
125128
|:-------:|:-------:|:-------:|:-------:|
126129
| Instance | `InstanceFieldExpr` | `InstancePropertyExpr` | `InstanceMethodExpr` |
127-
| Static | `StaticFieldExpr` | `StaticPropertyExpr` | `StaticInvokeExpr` | `StaticMethodExpr` |
130+
| Static | `StaticFieldExpr` | `StaticPropertyExpr` | `StaticMethodExpr` |
128131

129132
The remaining `HostExpr`-derived class is:
130133

131134
| Expr type | Description |
132135
|:----------|:------------|
133-
| `InstanceZeroArityCallExpr` | An interop call with no arguments, but we can't figure out at analysis time if it is a field, property, or expression. Reflection is going to be happening. |
136+
| `InstanceZeroArityCallExpr` | An interop call with no arguments, but we can't figure out at analysis time if it is a field, property, or expression. There will be reflection happenging when we actually evaluate this node. |
134137

135138
There are two other node types I think go into this cluster:
136139

137140

138141
| Expr type | Description |
139142
|:----------|:------------|
140-
| `QualifiedMethodExpr` | This results from the recently-introduced qualified method expressions. They look like `Type/.Name` for instance method calls, `Type/new` for constructor calls, and `Type/name` for static calls. During analysis, this node type sometimes resolves into one of the `HostExpr` types, but sometimes it generates a value (a 'thunk') to be passed as an argumment. |
143+
| `QualifiedMethodExpr` | This results from the recently-introduced qualified method expressions. They look like `Type/.Name` for instance method calls, `Type/new` for constructor calls, and `Type/Name` for static calls. During analysis, this node type sometimes resolves into one of the `HostExpr` types, but sometimes it generates a value (a 'thunk') to be passed as an argumment. |
141144
| `NewExpr` | A constructor call to 'new' a type. |
142145

143146

@@ -150,12 +153,10 @@ __Cluster #5 -- Functions and local binding scopes__: These node types related t
150153
| `FnExpr` | From `(fn* name ...)` forms |
151154
| `NewInstanceExpr` | from `deftype` and `reify` definitions |
152155

153-
I have no idea why `ObjExpr` and `NewInstanceExpr` are so named.
154-
These create a scope for the definition of and reference to local bindings.
156+
I have no idea why `ObjExpr` and `NewInstanceExpr` are so named. (I get confused every time I see them in the code -- and `ObjExpr` is _everywhere_.) These create a scope for the definition of and reference to local bindings.
155157
They cause the creation of types that implement the `IFn` interface of `invoke` methods.
156158

157-
158-
Though not AST nodes themselves, the types `FnMethod` and `NewInstanceMethod` encode the individual 'invoke' method definitions.
159+
Though not AST nodes themselves, the types `FnMethod` and `NewInstanceMethod` encode the individual `invoke` method definitions.
159160

160161
Also creating scopes are:
161162

@@ -178,7 +179,7 @@ into
178179
( (fn* [] (let [x 12] (inc x))) )
179180
```
180181

181-
This need only happen at the top level. If a `let` form was inside a function definition already, this wrapping would not be necessary.
182+
This need only happen at the top level. If a `let` form was inside a function definition already, this wrapping would not be necessary. (This is one of the places the 'context' flag mentioned in the previous post comes into play.)
182183

183184
The `loop` special form explictly creates a scope for iteration via `recur`; function bodies implicly create such a scope. A `recur` form in such a context will be represented by
184185

@@ -233,18 +234,16 @@ The main body of the `Compiler.Analyze` method just steps through a series of te
233234
| a keyword | Create a `KeywordExpr` |
234235
| a number | Create a `NumberExpr` (if an `int`, `long`, or `double`) or a `ConstantExpr` |
235236
| a string | Create a `StringExpr` |
236-
| an `IPersistentCollection` <br/> not an `IRecord` or `IType` <br/> has no elements | create an `EmptyExpr` |
237-
| an `ISeq` | call the `ISeq` analyzer (see below) |
238-
| an `IRecord` or `IType` | createa a `ConstantExpr` |
239-
| `IPersistentVector` | create a `VectorExpr` or a `ConstantExpr` |
240-
| `IPersistentMap` | create a `MapExpr` or a `ConstantExpr` |
241-
| `IPersistentSet` | create a `SetExpr` or a `ConstantExpr` |
242-
| otherwise | create a `ConstantExpr`. If we are eval'ing, we just return it. If we are compiling, let's hope we can figure out how to handle it. |
237+
| an `IPersistentCollection` <br/> not an `IRecord` or `IType` <br/> has no elements | Create an `EmptyExpr` |
238+
| an `ISeq` | Call the `ISeq` analyzer (see below) |
239+
| an `IRecord` or `IType` | Create a `ConstantExpr` |
240+
| `IPersistentVector` | Create a `VectorExpr` or a `ConstantExpr` |
241+
| `IPersistentMap` | Create a `MapExpr` or a `ConstantExpr` |
242+
| `IPersistentSet` | Create a `SetExpr` or a `ConstantExpr` |
243+
| otherwise | Create a `ConstantExpr`. If we are eval'ing, we just return it. If we are compiling, let's hope we can figure out how to handle it. |
243244

244245
The node types mentioned in this list from a very small subset of the all the node types.
245246
Here we see pretty much a few data-oriented node types. Clearly the `Symbol` and `ISeq` analyzers are doing the heavy lifting. Enough that each gets its own post:
246247

247-
- [Symbolic of what?]({{site.baseurl}}{% post_url 2025-04-17-symbolic-of-what})
248-
- [ISeq clarity]({{site.baseurl}}{% post_url 2025-04-18-iseq-clarity})
249-
250-
248+
- [Symbolic of what?]({{site.baseurl}}{% post_url 2025-10-02-symbolic-of-what})
249+
- [ISeq clarity]({{site.baseurl}}{% post_url 2025-10-03-iseq-clarity})

_drafts/2025-04-17-symbolic-of-what.md renamed to _posts/2025-10-02-symbolic-of-what.md

Lines changed: 12 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
layout: post
33
title: C4 - Symbolic of what?
4-
date: 2023-04-17 00:00:00 -0500
4+
date: 2025-10-02 00:00:00 -0500
55
categories: general
66
---
77

@@ -10,14 +10,12 @@ We look at the interpretation of symbols in Clojure code.
1010

1111
## Introduction
1212

13-
Symbols but are given meaning by a complex web of interactions among the Lisp reader,
13+
Symbols are given meaning by a complex web of interactions among the Lisp reader,
1414
namespaces, the Clojure compiler, and the Clojure runtime.
1515

1616
We'll skip the reader, though the interpretation of symbols as discussed below does come into just a bit in the reading of syntax-quote (` `` `) forms. But that's a bit off the path we need to travel.
1717

18-
The code for resolving symbols and translating them in context into nodes in the abstract syntax tree (AST) is complex.
19-
There are appear to be some reduncancies that could be eliminated, along with a few other simplifications.
20-
But for that, I needed more clarity on the rules for symbol interpretation. What follows is not complete, by any means, but it is a starting point.
18+
The code for resolving symbols and translating them into nodes in the abstract syntax tree (AST) is complex. In face, there appear to be some reduncancies that could be eliminated, along with a few other simplifications. But let us proceed with the code we have.
2119

2220
## An example
2321

@@ -53,32 +51,30 @@ Within the call to `f`, we must interpret each symbol that occurs in the form, i
5351

5452

5553
```Clojure
56-
f x y ns2/g namespace.with.a.long.name/h System.Int64 String/ToUpper System.Text.StringBuilder
54+
f x y ns2/g namespace.with.a.long.name/h Int64/MaxValue String/.ToUpper System.Text.StringBuilder
5755
```
5856

5957
`x` and `y` are easy. They do not have a namespace, so they could be local bindings. Local binding takes precedence over other possible interpretations. Indeed, the current context has local binding for those symbols. The analyzer will produce `LocalBindingExpr` nodes for them.
6058

6159
`f` also does not have namespace. However, it not bound in the current lexical scope.
62-
It does not have a namespace, so it does not refer to directly or indirectly (via an alias) to a namespace.
60+
It does not have a namespace, so we don't need to figure out what its namespace actually is.
6361
The remaining option is that it has a mapping in the current namespace. It does, to a `Var` and that is what we use. The analyer will produce a `VarExpr` node for it.
6462

65-
`ns2/g` is a bit more complicated. It has a namespace, so it can't be a local binding. We need to determine what namespace `ns2` stands for. This requires looking up `ns2` in the current namespace. The current namespace is `ns1`, which has an alias for `namespace.with.a.long.name`. So we look up `g` in `namespace.with.a.long.name`, finding a `Var`. We also check to see if `g` is private. It is not, so we can use it. The analyzer will also produce a `VarExpr` node.
63+
`ns2/g` is a bit more complicated. It has a namespace, so it can't be a local binding. We need to determine what namespace `ns2` stands for. This requires looking up `ns2` in the current namespace. The current namespace is `ns1`, which has `ns2` as an alias for `namespace.with.a.long.name`. We look up `g` in `namespace.with.a.long.name`, finding a `Var`. We also check to see if `g` is private. It is not, so we can use it. The analyzer will produce a `VarExpr` node.
6664

6765
`namespace.with.a.long.name/h` is also easy. `namespace.with.a.long.name` is not an alias but the name of an existing namespace. And `h` is a public `Var` in that namespace. So we can use it. The analyzer will produce a `VarExpr` node for it.
6866

6967
Next consider `Int64/MaxValue`. It does have a namespace, so it can't be a local. We check for if `Int64` is a namespace alias; it is not.
70-
However, the `ns1` namespace does have a mapping from the symbol `Int64` to the type `System.Int64`. (By default, all namespaces are set up with mappings to 'system' types from their unqualified names.) So we have a symbol with the namespace mapping to a type. We must check to see if the name of the symbol, in this case `MaxValue` is a property or field in that type. There is such a property in the type `System.Int64`, so we can use it. The analyzer will produce a `StaticFieldExpr` node for it.
68+
However, the `ns1` namespace does have a mapping from the symbol `Int64` to the type `System.Int64`. (By default, all namespaces are set up with mappings to 'system' types from their unqualified names.) So we have a symbol with the namespace mapping to a type. We must check to see if the name of the symbol, in this case `MaxValue` is a property or field in that type. `System.Int64.MaxValue` existsThe analyzer will produce a `StaticFieldExpr` node.
7169

72-
`String/.ToUpper` is similar. In this case, because this symbol appears in the functional position of function invocation,
73-
given that `String` maps to `System.String`, we look for methods also. Beacause the name starts with a period, we look for an instance method, and find one. In this case, there will not be a node separately for `String/.ToUpper`; rather, the analyzer will create an `InstanceMethodExpr` node for the entire expression.
70+
`String/.ToUpper` is similar. In this case, because this symbol appears in the functional position of function invocation and given that `String` maps to `System.String`, we look for methods also. Beacause the name starts with a period, we look for an instance method, and find one. In this case, there will not be a node separately for `String/.ToUpper`; rather, the analyzer will create an `InstanceMethodExpr` node for the entire expression.
7471

75-
Finally, we have `System.Text.StringBuilder`. When we have a symbol with no namespace and periods in the name, we look for a type.
76-
In this case, we do find a type. If it didn't name a type, we would go on and treat the same as a symbol with no periods. (And probably fail). To express the type in the AST, the analyzer will create a `ConstantExpr` node.
72+
Finally, we have `System.Text.StringBuilder`. When we have a symbol with no namespace and periods in the name, we look for a type. In this case, we do find a type. If it didn't name a type, we would go on and treat the same as a symbol with no periods. (And probably fail). To express the type in the AST, the analyzer will create a `ConstantExpr` node.
7773

7874

7975
## A look at the code
8076

81-
We can profitably take a look at the actual C# code for `Compiler.AnalyzeSymbol`.
77+
Now that we are warmed up, we can profitably look at the actual C# code for `Compiler.AnalyzeSymbol`.
8278

8379
```C#
8480
private static Expr AnalyzeSymbol(Symbol symbol)
@@ -262,7 +258,7 @@ private static object ResolveIn(Namespace n, Symbol symbol, bool allowPrivate)
262258

263259
To finish of this code, some brief comments on a few of the auxiliary methods mentioned above.
264260

265-
`Compiler.ReferenceLocal` is called when we have identified a reference to a local binding. It does some bookkeeping needed for code-gen. Specifically, it notes the usage of the local binding in the containing function (if there is one) and any functions above that is might be nested in. This is so that we know to close over those variables when creating an instance of the function. It also notes if the local variable is the `this` variable; reference to `this` precludes static linking. But more about that in [C4: Functional anatomy]({{site.baseurl}}{% post_url 2025-04-19-functional-anatomy}).
261+
`Compiler.ReferenceLocal` is called when we have identified a reference to a local binding. It does some bookkeeping needed for code-gen. Specifically, it notes the usage of the local binding in the containing function (if there is one) and any functions above that is might be nested in. This is so that we know to close over those variables when creating an instance of the function. It also notes if the local variable is the `this` variable; reference to `this` precludes static linking. But more about that in [C4: Functional anatomy]({{site.baseurl}}{% post_url 2025-10-04-functional-anatomy}).
266262

267263
`Compiler.RegisterVar` is similar. It just notes the reference to the `Var` in the containing function (if there is one). A field in the class implementing the function will be created and initialized to the `Var` in question.
268264

@@ -292,7 +288,7 @@ These are when the symbol does not have a namespace:
292288
- `ns` -- treated as a special case -- always found
293289
- name found in current namespace (return var) (there are variants in the resolve/lookup code that will create the `Var` if not found)
294290

295-
Several kinds of AST nodes can be created from symbols. The details of node types are covered in [C4: AST me anything]({{site.baseurl}}{% post_url 2025-04-16-AST-me-anything}). For symbols with a namespace:
291+
Several kinds of AST nodes can be created from symbols. The details of node types are covered in [C4: AST me anything]({{site.baseurl}}{% post_url 2025-10-01-AST-me-anything}). For symbols with a namespace:
296292

297293
- ns/name, ns names a `Type`, that type has a field or property with the given name => `StaticFieldExpr` or `StaticPropertyExpr`
298294
- ns/name, ns names a `Type`, no field or property found, name does not start with a period => `QualifiedMethodExpr`, Static

0 commit comments

Comments
 (0)