-
Notifications
You must be signed in to change notification settings - Fork 14
Completing CLR interop
Generally, CLR interop works the same as JVM interop. However, there are some unique aspects of the CLR that ClojureCLR does not yet address. Namely,
- Type references (including generic types, nullable types, and assembly-qualified type names)
- Multi-dimensional arrays
Clojure uses symbols to name types in two ways:
- a package-qualified symbol (one containing periods internally) is taken to name the Java class with the same character sequence
- a namespace may contain a mapping from a symbol to a Java class, via
import
.
Resolving a symbol is the process of determining the value of a symbol during evalution.
Identifying types with symbol names works reasonably well for Java because package-qualified class names are syntactically compatible with symbols.
Not so for the CLR. Typenames can contain arbitrary characters. Backslashes can escape characters that do have special meaning in the typename syntax (comma, plus, ampersand, asterisk, left and right square bracket, left and right angle bracket, backslash). Fully-qualified type names can contain an assembly identifier, which involves spaces and commas. Thus, fully-qualified type names cannot be represented as symbols.
One solution would be to add new Lisp reader functionality that would allow arbitrary names and namespace names for symbols. It could be a special macro character, or a #-macro. This could be the rough equivalent to the Common Lisp |:
|
Vertical bars are used in pairs to surround the name (or part of the name) of a symbol that has many special characters in it. It is roughly equivalent to putting a backslash in front of every character so surrounded. For example, |A(B)|, A|(|B|)|, and A\(B\) all mean the symbol whose name consists of the four characters A, (, B, and ).
We would only need to do this for the namespace name and name parts, leaving the / separating namespace from name in the open. We could also simplify by surrounding the whole name and not only part of a name. The code above would become
(|com.myco.mytype+nested, MyAssembly, Version=1.3.0.0, Culture=neutral, PublicKeyToken=b14a123334343434|/DoSomething x y)
The JVM does not have true multi-dimensional arrays, just ragged arrays. The core Clojure functions that manipulate multi-dimensional arrays assume raggedness.
The CLR of course has ragged arrays, but it also supports true (rectangular) multi-dimensional arrays. In the implementation of the core Clojure functions on the CLR, we assumed ragged arrays. Thus, we have no support for true multi-dimensional arrays.
The functions of interest are:
-
(aget array idx+)
— Returns the value at the index/indices. Works on arrays of all types. -
(aset array idx+ val)
— Sets the value at the index/indices. Works on arrays of reference types. Returns val. -
(make-array class dim+)
— Creates and returns an array of instances of the specified class of the specified dimension(s).
We could easily overload make-array to take a second argument of a vector of ints specifying the dimensions. Thus:
(make-array Int32 4 5 6) ; => a ragged array
(make-array Int32 [4 5 6]) ; => a multi-dimensional array
Or we could just have a new function called make-multidim-array
.
For aget
and aset
, I think overloading them in this way would not be advised due to performance implications. We can expect these functions to be called in tight loops. Better to introduce new functions:
(aget-md array idx+)
(aset-md array idx+)
We would also need to introduce equivalents to aset-int
, etc.
I’m open to suggestions on names.