DLR Expression Trees status #3296

miloszkukla · 2020-03-22T20:37:44Z

miloszkukla
Mar 22, 2020

I wanted to ask what is the relation/status between "Expression Trees v2" described in [1] and changes proposed in #158 and partially implemented by @bartdesmet?

I wasn't able to read nearly 200 pages of that document so maybe someone knows - are some of the Expression Trees described in there related only to DLR or all of them (in theory) could be implemented for C#?

[1] https://github.com/IronLanguages/dlr/blob/master/Docs/expr-tree-spec.pdf

Answered by bartdesmet

May 4, 2020

It's complicated :-). Some historical context may be useful.

Expression trees were initially introduced in .NET Framework 3.5 with C# 3.0 and VB 9.0 as part of the LINQ feature set. There are two parts to it:

An expression tree API, in System.Linq.Expressions.
Language support to convert lambda expressions to Expression<TDelegate> (quotation).

For example:

Expression<Func<int, int>> f = x => x * 2;

is turned into

var x = Expression.Parameter(typeof(int), "x");
Expression<Func<int, int>> f = Expression.Lambda<Func<int, int>>(Expression.Multiply(x, Expression.Constant(2)), x);

With very few exceptions (multi-dimensional array initializers and assignment expressions come to mind), pretty m…

View full answer

bartdesmet · 2020-05-04T23:50:02Z

bartdesmet
May 4, 2020

It's complicated :-). Some historical context may be useful.

Expression trees were initially introduced in .NET Framework 3.5 with C# 3.0 and VB 9.0 as part of the LINQ feature set. There are two parts to it:

An expression tree API, in System.Linq.Expressions.
Language support to convert lambda expressions to Expression<TDelegate> (quotation).

For example:

Expression<Func<int, int>> f = x => x * 2;

is turned into

var x = Expression.Parameter(typeof(int), "x");
Expression<Func<int, int>> f = Expression.Lambda<Func<int, int>>(Expression.Multiply(x, Expression.Constant(2)), x);

With very few exceptions (multi-dimensional array initializers and assignment expressions come to mind), pretty much all expressions in the C# language were supported for use in expression tree lambdas. Examples include literals, variables, unary operators, binary operators, the conditional operator, member access, indexing, method and delegate invocations, new operators, etc. Statements, on the other hand, were not supported, so the following would not work:

Expression<Func<int, int>> f = x => { return x * 2; };

Thus, no blocks, if, switch, while, for, do, foreach, using, lock, goto, etc. statements. This limitation both existed at the level of the API and the language support.

Arguably, statements were not needed for LINQ query providers, where these lambdas being converted to expression trees originate from lowering of query expressions, or appear in fluent interface patterns with LINQ standard query operators:

from x in xs where x > 0 select x * 2

becomes

xs.Where(x => x > 0).Select(x => x * 2)

where the lambdas may get converted to a delegate type or an expression tree type, depending on the query source (e.g. IEnumerable<T> versus IQueryable<T>). Given that query expressions do not support statements within clauses (such as where and select), statement lambdas do not occur here. But a manual attempt at using them using the fluent interface pattern will obviously raise complaints when bound to an expression tree-based signature:

xs.Where(x => { return x > 0; }).Select(x => { return x * 2; })

Prior to continuing the story, I should point out that expression trees not only represent code as data structures, but also support runtime compilation (using System.Reflection.Emit under the hood). This functionality was particularly important in .NET Framework 3.5 to implement query providers in order to run parts of the query execution locally, after fetching results from a database. That is, you take a query expression (through IQueryable<T> and IQueryProvider), translate as much as you can to some query DSL (e.g. SQL), ship it off to a database, get results back, and then materialize those to obtain an IEnumerable<T>. That typically involves running the final projection after some rewrites have been applied to it (e.g. turn a Func<Product, string> to a Func<Row, string> where Row is some data provider type that represents a row retrieved from a database).

Either way, this support is available behind the Compile method:

Expression<Func<int, int>> f = x => x * 2;
Func<int, int> g = f.Compile();
int answer = g(21);

So far for C# 3.0, on to C# 4.0 and .NET Framework 4.0. Two things happened here:

The Dynamic Language Runtime was born, with Iron* languages.
C# 4.0 added dynamic.

The DLR project extended System.Linq.Expressions by adding support for a variety of nodes, inspired by dynamic languages such as Python and Ruby. Where the original APIs were pretty much modeled after similarities between VB and C#, the new APIs ended up with some dynamic language-isms, e.g. where all statements are expressions, thus can have a value. For example, a node introduced in .NET Framework 4.0 is SwitchExpression, which is very similar to switch expressions that now exist in C#, years later. At the time, switch in C# was nothing but a statement (thus void returning). Similarly, things like LoopExpression are expressions (the value yielded by the last iteration through the loop is the value produced by the loop expression).

Moreover, the nodes that were added turned out to be more primitive building blocks rather than capturing higher-level intent. So, they don't model the union of language constructs available in all target languages, they rather look at a common intersection. That is, there is no such thing as a node representing a while, do, for, or foreach loop (or whatever loop some higher level language may have). Instead, there's a LoopExpression with break and continue labels. Higher level languages that have specific loop constructs would "lower" into a LoopExpression with associated LabelTarget nodes that are used to support break and continue (if the body of the loop has a need for those).

This is quite different from the .NET 3.5 expression trees which were modeled closer to language constructs in a WYSIWYG fashion. If a language like C# were to target these expression tree APIs to support statement bodies, something like:

Expression<Action<int>> f = (int i) =>
{
  while (i > 0)
  {
    Console.WriteLine(i--);
  }
};

would not look like a LambdaExpression whose Body contains some WhileExpression node. Instead, it'd look something like the equivalent of:

Expression<Action<int>> f = (int i) =>
{
  C:

  {
    if (i > 0)
      goto B;
    Console.WriteLine(i--);
    goto C;
  }

  B:
    ;
};

represented by a LoopExpression with two LabelTargets for the break and continue operations that govern the while loop execution behavior.

That is, expression trees have become a code generation target, rather than a quotation mechanism that preserves the original user intent. If C# were to support statement trees using just the new APIs,
runtime libraries analyzing such trees would have to implement decompiler logic to make sense of the original intent (e.g. when transpiling to another language that does have similar loop constructs).

In fact, in C# 3.0, there was already some lossy conversion, so expression trees were never really "pure" quotations. Two examples come to mind:

int x = 1;
Expression<Func<int>> f = () => x;

won't look at all like a LambdaExpression whose Body simply represents a variable. Instead, it has become a MemberExpression that looks up a field called x in some DisplayClass value captured through a ConstantExpression. That is, the "lowered" form of the closure has leaked into the expression tree. (This could have been avoided if LambdaExpression had an Environment property that maps ParameterExpression nodes onto accessor expressions.)

The second example is nested query expressions:

Expression<Func<IEnumerable<int>, IEnumerable<int>> f = xs => xs where x > 0 select x + 1;

This won't look like some QueryExpression but rather a set of nested MethodCallExpression nodes, as if you had written:

Expression<Func<IEnumerable<int>, IEnumerable<int>> f = xs => xs.Where(x => x > 0).Select(x => x + 1);

While this is straightforward for simple queries, it becomes more cumbersome when operators like SelectMany are involved (introducing "transparent identifiers").

There are more examples to do with implicit conversions sneaking into expression trees as UnaryExpression nodes of type Convert, which turned out to be tricky (for compat reasons) when the Roslyn compiler was built.

More recently, support for interpolated strings ended up in expression trees, and shows up as a lowered string.Format call:

Expression<Func<int, string>> f = () => $"The answer is {x}";

looks like

Expression<Func<int, string>> f = () => string.Format("The answer is {0}", x);

This may not look like a big deal if the primary goal is to compile and evaluate the expression at runtime. However, if you're writing a query provider or some other form of expression tree transpiler (e.g. to some DSL) at runtime, where the target language supports interpolation, you're now faced with a decompilation task, at runtime, to turn string.Format style format strings back into interpolations. (Say you were compiling to C++ for some reason, you'd have to "raise" the format string back to some interpolation before "lowering" it to e.g. %d in printf format string syntax.)

The DLR extensions were primarily meant to support Iron* languages, producing expression trees at runtime, and having them get compiled - at runtime - into efficient IL code. They're the runtime backend for such languages.

At the same time, this enabled C# 4.0 to introduce dynamic. For example, when writing something like this:

dynamic Add(dynamic a, dynamic b)
{
  return a + b;
}

the C# compiler generates code using Microsoft.CSharp libraries to represent the dynamic + operation, something akin to:

Microsoft.CSharp.RuntimeBinder.BinaryOperation(flags, Expression.Add, context, new[] { arg1, arg2 })

where arg1 and arg2 are objects representing the operands using some CSharpArgumentInfo object. Note the use of Expression.Add in here; the expression tree API shows up here as well.

This whole binder business gets wrapped in a System.Runtime.CompilerServices.CallSite<T> object, which effectively contains a delegate implementing the operation at runtime. Initially, this delegate is filled in with something along those lines:

// I got two objects, a and b, please ask the binder (here, the C# compiler, at runtime) what to do

When invoked (because Add got called, e.g. with two int arguments), this delegate dispatches to Microsoft.CSharp's runtime implementation of the C# compiler (pretty much the binder, dealing with things like conversions, overload resolution, etc.), and it responds - at runtime - with some expression tree fragments that provide a test and an action, something like this:

Expression<Func<object, object, bool>> test = (a, b) => a is int && b is int;
Expression<Func<object, object, object>> action = (a, b) => (int)a + (int)b;

The CallSite<T> object then stitches these fragments together using expression tree APIs and recompiles the delegate, resulting in a "polymorphic inline cache" that speeds up subsequent invocations of the same code with arguments of the same type, e.g.:

if (a is int && b is int)
{
  return (int)a + int(b);
}
else
{
  // I got two objects, a and b, please ask the binder (here, the C# compiler, at runtime) what to do
}

When called again, e.g. with two string operands, the "cry for help" path would be called again, and the C# compiler would respond with a way to test for strings and how to concatenate them (e.g. a MethodCallExpression targeting string.Concat). This whole exchange is a dance between a CallSite<T>, a CallSiteBinder, and a DynamicMetaObject, in case you want to dig deeper into details.

Thus, C# 4.0 dynamic really boils down to metaprogramming at runtime. The call site is built from expression trees that are retrieved from the language-specific binder upon encountering new cases (based on the dynamic runtime type of operands). These expressions get compiled at runtime. The details are a bit more complicated, but it boils down to this in principle.

In all of this, the support for converting lambda expressions to expression trees was not enhanced in C# 4.0 (or beyond), even though new expression types were added. For example, C# 4.0 added support for named and optional parameters. "Support" for this feature in expression trees was implemented using a diagnostic check, rejecting the use in expression trees (see https://github.com/dotnet/roslyn/blob/master/src/Compilers/CSharp/Portable/Lowering/DiagnosticsPass_ExpressionTrees.cs#L304).

So, we ended up with C# moving along (async lambdas, conditional access operators, pattern matching, switch expressions, etc. etc.), while expression tree support effectively continued to capture the state of "expressions" in the C# 3.0 timeframe. So, an older version of the language is buried within the language. One argument that was made early on, in C# 4.0 days, was compatibility concerns for LINQ providers when they'd end up seeing new node types that they were not prepared to handle.

For those interested in gory details, the introduction of a Reduce mechanism on Expression doesn't quite help with that, because not all newer language constructs can be lowered into the original C# 3.0 subset of nodes. For example:

Expression<Func<int, int, int>> f => (a, b) => Foo(bar: 100 / a, qux: 200 / b);

When named parameters are used, we need a new way to represent a MethodCallExpression with some form of ArgumentInfo child nodes that represent the binding of names (or ParameterInfo objects) to expressions. Existing libraries would not know about this new form of MethodCallExpression, or the new node would need to lower into something that already exists. The Reduce method can be used to do so, but we'd need something known as a Comma that didn't exist in the APIs before. The .NET 4.0 APIs added BlockExpression to be exactly that, so the above could become:

var tmp1 = Expression.Parameter(typeof(int));
var tmp2 = Expression.Parameter(typeof(int));
Expression.Block(
  new[] { tmp1, tmp2 },
  Expression.Assign(tmp1, Expression.Divide(Expression.Constant(100), a),
  Expression.Assign(tmp2, Expression.Divide(Expression.Constant(200), b),
  Expression.Call(/*Foo*/, tmp1, tmp2)
);

This is needed to preserve the evaluation order of the arguments. Obviously, existing libraries would never have seen a BlockExpression (nor assignments etc.).

And that's where we stand today. An API that was originally intended to support some form of translation of (query) expressions to DSLs, with runtime compilation support, that further evolved into a backend for the DLR and dynamic operations. But without co-evolution with front-end languages such as C# and VB.

The work I've been doing over at https://github.com/bartdesmet/roslyn/tree/ExpressionTrees is showcasing one way expression trees could capture a bigger set of language constructs (new expression types, but also statements), targeting a library built at https://github.com/bartdesmet/ExpressionFutures/tree/master/CSharpExpressions.

Furthermore, the work over at https://github.com/bartdesmet/roslyn/blob/ExpressionTreeLikeTypes/docs/features/expression-types.md shows another approach where custom quotation types could be supported (similar to there being "task-like" types for async methods).

While there have been discussions with the LDM on and off, it's unclear at this point whether expression tree evolution or more general quotation mechanisms are features worth pursuing, relative to other areas of investment. Examples of new killer applications beyond query providers would be useful to consider this once again (e.g. "code shipping" mechanisms, translation to other DSLs, maybe scientific computing, ML, etc.).

FWIW, my original interest in this originates from an "expression shipping" big data event processing system built internally at Microsoft; it needed support for newer language features, async lambdas, etc. so I decided to prototype the work at the library and compiler level to unblock that effort.

0 replies

miloszkukla · 2020-05-05T19:43:42Z

miloszkukla
May 5, 2020
Author

In context of LoopExpression this repo by @jbevain may also interest someone https://github.com/jbevain/mono.linq.expressions

0 replies

bartdesmet · 2020-05-05T19:53:07Z

bartdesmet
May 5, 2020

Sure, that's one of the possible implementations of additional expression or statement nodes, just like the work at https://github.com/bartdesmet/ExpressionFutures/tree/master/CSharpExpressions/Microsoft.CSharp.Expressions/Microsoft/CSharp/Expressions which covers all expressions and statements up to C# 6.0, and some of the C# 7.0 and 8.0 features as a work in progress. The nodes types enum pretty much reflects the status, see https://github.com/bartdesmet/ExpressionFutures/blob/master/CSharpExpressions/Microsoft.CSharp.Expressions/Microsoft/CSharp/Expressions/CSharpExpressionType.cs. The corresponding Roslyn fork over at https://github.com/bartdesmet/roslyn/tree/ExpressionTrees binds to these node types when the library is referenced.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DLR Expression Trees status #3296

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

DLR Expression Trees status #3296

Uh oh!

Uh oh!

miloszkukla Mar 22, 2020

Replies: 3 comments

Uh oh!

bartdesmet May 4, 2020

Uh oh!

Uh oh!

miloszkukla May 5, 2020 Author

Uh oh!

bartdesmet May 5, 2020

miloszkukla
Mar 22, 2020

bartdesmet
May 4, 2020

miloszkukla
May 5, 2020
Author

bartdesmet
May 5, 2020