Skip to content

Completing CLR interop

dmiller edited this page Sep 13, 2010 · 6 revisions

Generally, CLR interop works the same as JVM interop. However, there are some unique aspects of the CLR that ClojureCLR does not yet address. Namely,

  • ref and out parameters Implemented.
  • Type references (including generic types, nullable types, and assembly-qualified type names)
  • Assembly references
  • Multi-dimensional arrays

My recommendations, but open to suggestions

Below you will find some analysis on each of the points outlined above. I wanted to make sure that each problem was clearly stated and stated competing options where they exist. However, I do have some preferences, so let me state them clearly here.

Type references

Introduce a new Lisp reader construct, probably |…| to introduce arbitrary strings into symbol names directly. These would be used on either side of a / to separately deal with namespace name versus symbol name. This could be designed so that ab|$&*()|def creates a symbol with name "ab$&*()def" or so that |…| must surround the whole name, as in |ab$&*()def|. (Allowing |-quoting in the middle of a symbol seems to require simpler modifications to the existing reader code. There are still some decisions regarding \-escaping.) This would allow constructs such as:

(|com.myco.mytype+nested, MyAssembly, Version=1.3.0.0, Culture=neutral, PublicKeyToken=b14a123334343434|/DoSomething x y)

Assembly references

I’m not sure this is a complete answer, but something like the following extension to import would be nice:

(import 
  ; establishes mapping from Class1 to Some.Namespace.Class1, etc.
  '(Some.Namespace  Class1 Class2)                     
  ;  establishes mapping from Class3 to  |Some.Namespace.Class3, assembly id|, etc.
  '([ |Assembly id|  Some.Namespace] Class3 Class4)   
  ;  establishes mapping from SomeSymbol  to  |Some.Namespace.Class5, assembly id|, etc.
  '([ |Assembly id|  Some.Namespace] [Class5  SomeSymbol] [Class6 AnotherSymbol] ))   

And now for the details.

Type references

Clojure uses symbols to name types in two ways:

  • a package-qualified symbol (one containing periods internally) is taken to name the Java class with the same character sequence
  • a namespace may contain a mapping from a symbol to a Java class, via import.

Resolving a symbol is the process of determining the value of a symbol during evalution. Relevant pieces of code from Compiler.resolveIn:

 ...
	else if(sym.name.indexOf('.') > 0 || sym.name.charAt(0) == '[')
		{
		return RT.classForName(sym.name);
		}
	...
	else {
		Object o = n.getMapping(sym);
		if(o == null) {
			if(RT.booleanCast(RT.ALLOW_UNRESOLVED_VARS.deref())) {
				return sym;
			} 
			else {
				throw new Exception("Unable to resolve symbol: " + sym + " in this context");
			}
		}
		return o;
	}
}

There is similar code used by the syntax-quote processor in the Lisp reader.

Identifying types with symbol names works reasonably well for Java because package-qualified class names are syntactically compatible with symbols.

Not so for the CLR. Typenames can contain arbitrary characters. Backslashes can escape characters that do have special meaning in the typename syntax (comma, plus, ampersand, asterisk, left and right square bracket, left and right angle bracket, backslash). Fully-qualified type names can contain an assembly identifier, which involves spaces and commas. Thus, fully-qualified type names cannot be represented as symbols.

I do not see a way we can just use strings. We can use (symbol s) to construct a symbol from an arbitrary string but only by wrapping all interop statements with a syntax-quote. That can get nasty when trying to do a Type/member construct, such as:

 `( ~(string "com.myco.mytype+nested, MyAssembly, Version=1.3.0.0, Culture=neutral, PublicKeyToken=b14a123334343434" "DoSomething") x y)

One solution would be to add new Lisp reader functionality that would allow arbitrary names and namespace names for symbols. It could be a special macro character, or a #-macro. This could be the rough equivalent to the Common Lisp |:

|

Vertical bars are used in pairs to surround the name (or part of the name) of a symbol that has many special characters in it. It is roughly equivalent to putting a backslash in front of every character so surrounded. For example, |A(B)|, A|(|B|)|, and A\(B\) all mean the symbol whose name consists of the four characters A, (, B, and ).

We would only need to do this for the namespace name and name parts, leaving the / separating namespace from name in the open. We could also simplify by surrounding the whole name and not only part of a name. The code above would become

(|com.myco.mytype+nested, MyAssembly, Version=1.3.0.0, Culture=neutral, PublicKeyToken=b14a123334343434|/DoSomething x y)

I would recommend either |…| or #|…| as the convention.

Assembly references

As shown in the previous example, fully-qualifying type names with assembly names is uuuggly. And, in fact, we can’t do it at the moment. So how does ClojureCLR deal with type references at the moment? It looks for the type name in the current assembly and mscorlib (the default behavior of Type.GetType(String). It then looks for the type name in all loaded assemblies. If there is a unique type with that name, it takes it. If there is not, then it fails.

Clojure on the JVM uses class loaders and classpath hacking to achieve type uniqueness.

CLojureCLR at the moment is not robust in handling type identity. A piece of code that evaluates properly one moment can be hosed on the next evaluation by the loading of an assembly between the two evals.

I don’t have a definitive answer to this. One solution is to extend namespace mapping of types to deal with this. I’m open to suggestions on the syntactical details, but something like:

(import 
  ; establishes mapping from Class1 to Some.Namespace.Class1, etc.
  '(Some.Namespace  Class1 Class2)                     
  ;  establishes mapping from Class3 to  |Some.Namespace.Class3, assembly id|, etc.
  '([ |Assembly id|  Some.Namespace] Class3 Class4)   
  ;  establishes mapping from SomeSymbol  to  |Some.Namespace.Class5, assembly id|, etc.
  '([ |Assembly id|  Some.Namespace] [Class5  SomeSymbol] [Class6 AnotherSymbol] ))   

I’m guessing this would handle most cases of potential ambiguity and greatly simply user code.

Multi-dimensional arrays

The JVM does not have true multi-dimensional arrays, just ragged arrays. The core Clojure functions that manipulate multi-dimensional arrays assume raggedness.

The CLR of course has ragged arrays, but it also supports true (rectangular) multi-dimensional arrays. In the implementation of the core Clojure functions on the CLR, we assumed ragged arrays. Thus, we have no support for true multi-dimensional arrays.

The functions of interest are:

  • (aget array idx+) — Returns the value at the index/indices. Works on arrays of all types.
  • (aset array idx+ val) — Sets the value at the index/indices. Works on arrays of reference types. Returns val.
  • (make-array class dim+) — Creates and returns an array of instances of the specified class of the specified dimension(s).

We could easily overload make-array to take a second argument of a vector of ints specifying the dimensions. Thus:

(make-array Int32 4 5 6)  ; => a ragged array
(make-array Int32 [4 5 6])  ; => a multi-dimensional array

Or we could just have a new function called make-multidim-array.

For aget and aset, I think overloading them in this way would not be advised due to performance implications. We can expect these functions to be called in tight loops. Better to introduce new functions:

(aget-md array idx+)
(aset-md  array idx+)

We would also need to introduce equivalents to aset-int, etc.

I’m open to suggestions on names.

Clone this wiki locally