Completing CLR interop

Generally, CLR interop works the same as JVM interop. However, there are some unique aspects of the CLR that ClojureCLR does not yet address. Namely,

ref and out parameters
Type references (including generic types, nullable types, and assembly-qualified type names)
Assembly references
Multi-dimensional arrays

My recommendations, but open to suggestions

Below you will find some analysis on each of the points outlined above. I wanted to make sure that each problem was clearly stated and stated competing options where they exist. However, I do have some preferences, so let me state them clearly here.

`ref` and `out` parameters

Introduce syntactic forms for marking ref and out parameters in calls to methods.
When at least one ref or out parameter is involved in host expression, the call is set up to return a vector whose first element is the return value and successive elements are the return values on the ref and out parameters.
(Possible, not convinced) Introduce another binding form designed just for host expressions like these to avoid the overhead of vector creation.

Examples:

 (let  [ [p q r] (.method x 12 (refparam y) (outparam)) ... ] ... )
(with-results [ [p q r]  (.method x 12 (refparam y) (outparam)) ... ] ... )

Type references

Introduce a new Lisp reader construct, probably |…| to introduce arbitrary strings into symbol names directly. These would be used on either side of a / to separately deal with namespace name versus symbol name. This could be designed so that ab|$&*()|def creates a symbol with name "ab$&*()def" or so that |…| must surround the whole name, as in |ab$&*()def|. This would allow constructs such as:

(|com.myco.mytype+nested, MyAssembly, Version=1.3.0.0, Culture=neutral, PublicKeyToken=b14a123334343434|/DoSomething x y)

Assembly references

I’m not sure this is a complete answer, but something like the following extension to import would be nice:

(import 
  ; establishes mapping from Class1 to Some.Namespace.Class1, etc.
  '(Some.Namespace  Class1 Class2)                     
  ;  establishes mapping from Class3 to  |Some.Namespace.Class3, assembly id|, etc.
  '([ |Assembly id|  Some.Namespace] Class3 Class4)   
  ;  establishes mapping from SomeSymbol  to  |Some.Namespace.Class5, assembly id|, etc.
  '([ |Assembly id|  Some.Namespace] [Class5  SomeSymbol] [Class6 AnotherSymbol] ))

And now for the details
h2. `ref` and `out` parameters

CLR methods allows ref and out parameters. Two problems arise:

We need to indicate in some way that ref or out parameters are being used in certain positions in method calls.
We need to assign the changed/new values to something.

Indication

Having a positive indication is necessary: CLR allows overloading on ref/out parameters. In other words, the following is legal:

class Test
  {
    static void m(int x) { ... }
    static void m(ref int x) { ... }
  }

At present, a host expression of the form (.m v x) is ambiguous. Providing type hints, as in (.m v (int x)), still does not suffice. Even with reflection at runtime, there is no way to indicate that a ref parameter is being passed.

Two solutions jump to mind:

metadata tagging: (.m v #^{:ref true} x)
A special syntactic form, in the same sense as (int x) is used: (.m v (out x))
- Unfortunately, ref is already in use.

Metadata tagging might work. Though it is not possible to tag primitive types (among others), as in (.m v #^{:ref true} 12), for ref and out parameters, we will need to have a variable. However, metadata tagging might not be compatible with solutions to:

Assignment semantics

ref and out parameters imply an assignment of a value. Clojure does not have an assignment semantics for local variable names. It has been suggested that we

allow ref and out parameters to be filled only by atoms/vars/refs, but that precludes the efficiency of local variable bindings and unnecessarily conflates

this problem with the problems being solved by those mechanisms.

Possible solutions:

Introduce a new mutable binding type solely for this application — one hesitates to do so
Multiple return values
Introduce a new binding form into the language
Something I haven’t thought of — I hope

New mutable binding type

Let’s forget I mentioned it.

Multiple return values

One way to handle ref and out paremeters is to treat the changed values as multiple return values. In languages that don’t support multiple returns, this is

typically done by returning a vector of values. We could just use the destructuring bind of a let to handle this. Calling a method with signature String m(Object x, ref Int32 y, out String z) would look like:

(let [ [p q r] (.m x (refparam y) (outparam)) ] ... )

This requires explicit indication of ref and out positions so that the destructuring bind for multiple return values can be distinguished from a destructuring of a seq return value from a regular method call.

Advantage: no new mechanisms are required. Host expression analysis can detect the (clr:ref y) or (clr:out) forms syntactically and arrange for the appropriate machinery to be inserted around this call.
Disadvantage: We force a vector to be created on each call. In a tight loop, this could have a non-trivial performance impact. Small vectors can be handled fairly cheaply, but there is still a cost.

New binding form

We need to provide a scope for the variables that receive the values.

 (in-out [ [p q r]  (.m x (clr:ref x) (clr:out)) ... ] ... )

follows the pattern of let. Only host expressions would be allowed in the value positions. The only point would be to avoid the vector creation. A host expression not occuring within this special form could either return just the return value of the call or operate as multiple return values via a vector. The latter is most likely preferable, as it coexists with the let solution.

Something else

Please. Go for it.

Type references

Clojure uses symbols to name types in two ways:

a package-qualified symbol (one containing periods internally) is taken to name the Java class with the same character sequence
a namespace may contain a mapping from a symbol to a Java class, via import.

Resolving a symbol is the process of determining the value of a symbol during evalution. Relevant pieces of code from Compiler.resolveIn:

 ...
	else if(sym.name.indexOf('.') > 0 || sym.name.charAt(0) == '[')
		{
		return RT.classForName(sym.name);
		}
	...
	else {
		Object o = n.getMapping(sym);
		if(o == null) {
			if(RT.booleanCast(RT.ALLOW_UNRESOLVED_VARS.deref())) {
				return sym;
			} 
			else {
				throw new Exception("Unable to resolve symbol: " + sym + " in this context");
			}
		}
		return o;
	}
}

There is similar code used by the syntax-quote processor in the Lisp reader.

Identifying types with symbol names works reasonably well for Java because package-qualified class names are syntactically compatible with symbols.

Not so for the CLR. Typenames can contain arbitrary characters. Backslashes can escape characters that do have special meaning in the typename syntax (comma, plus, ampersand, asterisk, left and right square bracket, left and right angle bracket, backslash). Fully-qualified type names can contain an assembly identifier, which involves spaces and commas. Thus, fully-qualified type names cannot be represented as symbols.

I do not see a way we can just use strings. We can use (symbol s) to construct a symbol from an arbitrary string but only by wrapping all interop statements with a syntax-quote. That can get nasty when trying to do a Type/member construct, such as:

 `( ~(string "com.myco.mytype+nested, MyAssembly, Version=1.3.0.0, Culture=neutral, PublicKeyToken=b14a123334343434" "DoSomething") x y)

One solution would be to add new Lisp reader functionality that would allow arbitrary names and namespace names for symbols. It could be a special macro character, or a #-macro. This could be the rough equivalent to the Common Lisp |:

|

Vertical bars are used in pairs to surround the name (or part of the name) of a symbol that has many special characters in it. It is roughly equivalent to putting a backslash in front of every character so surrounded. For example, |A(B)|, A|(|B|)|, and A$B$ all mean the symbol whose name consists of the four characters A, (, B, and ).

We would only need to do this for the namespace name and name parts, leaving the / separating namespace from name in the open. We could also simplify by surrounding the whole name and not only part of a name. The code above would become

(|com.myco.mytype+nested, MyAssembly, Version=1.3.0.0, Culture=neutral, PublicKeyToken=b14a123334343434|/DoSomething x y)

I would recommend either |…| or #|…| as the convention.

Assembly references

As shown in the previous example, fully-qualifying type names with assembly names is uuuggly. And, in fact, we can’t do it at the moment. So how does ClojureCLR deal with type references at the moment? It looks for the type name in the current assembly and mscorlib (the default behavior of Type.GetType(String). It then looks for the type name in all loaded assemblies. If there is a unique type with that name, it takes it. If there is not, then it fails.

Clojure on the JVM uses class loaders and classpath hacking to achieve type uniqueness.

CLojureCLR at the moment is not robust in handling type identity. A piece of code that evaluates properly one moment can be hosed on the next evaluation by the loading of an assembly between the two evals.

I don’t have a definitive answer to this. One solution is to extend namespace mapping of types to deal with this. I’m open to suggestions on the syntactical details, but something like:

(import 
  ; establishes mapping from Class1 to Some.Namespace.Class1, etc.
  '(Some.Namespace  Class1 Class2)                     
  ;  establishes mapping from Class3 to  |Some.Namespace.Class3, assembly id|, etc.
  '([ |Assembly id|  Some.Namespace] Class3 Class4)   
  ;  establishes mapping from SomeSymbol  to  |Some.Namespace.Class5, assembly id|, etc.
  '([ |Assembly id|  Some.Namespace] [Class5  SomeSymbol] [Class6 AnotherSymbol] ))

I’m guessing this would handle most cases of potential ambiguity and greatly simply user code.

Multi-dimensional arrays

The JVM does not have true multi-dimensional arrays, just ragged arrays. The core Clojure functions that manipulate multi-dimensional arrays assume raggedness.

The CLR of course has ragged arrays, but it also supports true (rectangular) multi-dimensional arrays. In the implementation of the core Clojure functions on the CLR, we assumed ragged arrays. Thus, we have no support for true multi-dimensional arrays.

The functions of interest are:

(aget array idx+) — Returns the value at the index/indices. Works on arrays of all types.
(aset array idx+ val) — Sets the value at the index/indices. Works on arrays of reference types. Returns val.
(make-array class dim+) — Creates and returns an array of instances of the specified class of the specified dimension(s).

We could easily overload make-array to take a second argument of a vector of ints specifying the dimensions. Thus:

(make-array Int32 4 5 6)  ; => a ragged array
(make-array Int32 [4 5 6])  ; => a multi-dimensional array

Or we could just have a new function called make-multidim-array.

For aget and aset, I think overloading them in this way would not be advised due to performance implications. We can expect these functions to be called in tight loops. Better to introduce new functions:

(aget-md array idx+)
(aset-md  array idx+)

We would also need to introduce equivalents to aset-int, etc.

I’m open to suggestions on names.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Completing CLR interop

My recommendations, but open to suggestions

`ref` and `out` parameters

Type references

Assembly references

And now for the details
h2. `ref` and `out` parameters

Indication

Assignment semantics

New mutable binding type

Multiple return values

New binding form

Something else

Type references

Assembly references

Multi-dimensional arrays

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Completing CLR interop

My recommendations, but open to suggestions

ref and out parameters

Type references

Assembly references

And now for the details h2. ref and out parameters

Indication

Assignment semantics

New mutable binding type

Multiple return values

New binding form

Something else

Type references

Assembly references

Multi-dimensional arrays

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

`ref` and `out` parameters

And now for the details
h2. `ref` and `out` parameters