Extend operators of `T` to operators of `T[]` #2656

ja72 · 2019-07-15T15:20:44Z

ja72
Jul 15, 2019

Make arrays a first-class citizen by allowing functions that are defined for the element types to be applied to the collections and slices. Imagine if you could write code like this

static class Program
{
    static void Main(string[] args)
    {
        double[] A = new[] { 1.0, 2.0, 3.0, 4.0, 5.0 };
        double[] B = 1.0 + A/2.0;
        double[] C = A - B;
    }
}

and handle every error of type CS0019 Operator '-' cannot be applied to operands of type 'double[]' and 'double[]' or CS0019 Operator '/' cannot be applied to operands of type 'double[]' and 'double' with compiler-generated code of the form

        double[] B = A.Select((x) => 1.0 + x/2.0).ToArray();
        double[] C = A.Zip(B, (x, y) => x-y).ToArray();

My proposal extends definitions of functions/methods and operators of type T to also T[] by applying them to all the elements of an array. In addition, the compiler can produce numerically optimized code (loop unrolling, vectorization). Although the inspiration for this is numerical simulations and languages like Fortran and Matlab, as well as the python map(operator.add, a, b) operations, the usages can extend to any array of structures.

Fortran can do this with elemental functions(no side effects), and so the C# compiler should be able to distinguish which operators and functions can be used in this way.

YairHalberstadt · 2019-07-15T15:24:45Z

YairHalberstadt
Jul 15, 2019
Collaborator

@ja72

Arrays are not commonly used as types for doing calculations on. IEnumerable is more commonly used, especially given that it has the advantage of not allocating many intermediate arrays when processing a collection in a pipeline.

Is there any reason this should apply just to arrays?

0 replies

YairHalberstadt · 2019-07-15T15:34:23Z

YairHalberstadt
Jul 15, 2019
Collaborator

Fortran can do this with elemental functions(no side effects), and so the C# compiler should be able to distinguish which operators and functions can be used in this way.

There is currently no way for the C# compiler to know if a method has side effects, since it can't see inside a method which is defined in another assembly. Changing this would be a huge task in and of itself, and would make this feature request pale in comparison.

0 replies

scalablecory · 2019-07-15T16:46:16Z

scalablecory
Jul 15, 2019

I don't feel that C# needs to be extended to adequately support this. Maybe extend the Vector class to have static methods that operate on primitive arrays.

0 replies

brabebhin · 2019-07-15T19:17:39Z

brabebhin
Jul 15, 2019

Wouldn't extension operators (i.e extension everything) partially fix this problem? Might be mistaken though.

0 replies

msedi · 2019-07-15T21:28:06Z

msedi
Jul 15, 2019

@ja72: We had the same problem, first problem was that there are no math operations defined on T since there is no numeric type. This has been very often discussed here.

The other problem is why we got rid of operators on arrays is that if you have very large arrays, intermediate arrays are created which puts some load on the GC (of course depending on the size of the arrays). Our case ist something like 512x512, 1024x1024, 2048x2048 or even more. Sometimes we have volume data float: 512x512x512, lets say you have a volume A, B and C of the mentioned size and you need to write something like this:

D = (A-B)/2 + (A+B)/2 + C

Every step creates an intermediate data volume that lands in the LOH. So you are putting a lot of pressure on the GC. What we did, and I personally still don't like the approach is to resolve the whole expression into one by resolving some formula that is given as string (because I haven't found a better way) using the C# runtime compiler, so our method looks like this, I won't go into details now, just give you the clue:

ArrayMath.ApplyFormula(string formula, params IVolume[] volume);

for the above mentioned equation you will write

ArrayMath.ApplyFormula("D = (A-B)/2 + (A+B)/2 + C", A, B, C);

If you resolve it via runtime code compilation you end up by allocating and running through the data only once, and not 6 times. The performance gain and less memory consumption is enormous.

For all other primitive operations math, we created methods, assume capital letter are arrays, lowercase letters are scalar values:

float[] ArrayMath.Sub<T> (T[] A, T[] B);
float[] ArrayMath.Sub<T> (T[] A, T b);
float[] ArrayMath.Sub<T> (T a, T[] B);

For performance reasons, we are have methods to return the result into an already allocated array:

float[] ArrayMath.Sub<T> (T[] A, T[] B, T[] C);
float[] ArrayMath.Sub<T> (T[] A, T b, T[] C);
float[] ArrayMath.Sub<T> (T a, T[] B, T[] C);

As you can see we have managed to use the T which included most primitive value types by writing the necessary math in CIL. As you can imagine the library has a lot of lines since +,-,*,/ have to implemented. But there are a lot of additional operator that e.g. System.Math support. You can think of Pow, Log, Sqrt and many more. I absolutely hate this approach, but for performance reasons I see no better way currently.

0 replies

msedi · 2019-07-15T21:32:39Z

msedi
Jul 15, 2019

@YairHalberstadt

Is there any reason this should apply just to arrays?

You are suggesting IEnumerable instead of regular "native" arrays. Have you benchmarked the difference between using IEnumerable and Array? It would be interesting to hear your results or see your approach. My tests always ended up with the best performance working with unsafe code and pointers.

Although the IEnumerable would be the most generic approach I found it's the slowest.

0 replies

YairHalberstadt · 2019-07-15T21:38:55Z

YairHalberstadt
Jul 15, 2019
Collaborator

@msedi

It is possible to use generics to remove the performance costs that are associated with Linq and IEnumerable. See #2482. Whilst that is pretty ugly without language support, if you only wanted this to work with arrays used as the backing collection it would be possible to make it much simpler.

0 replies

msedi · 2019-07-15T21:48:01Z

msedi
Jul 15, 2019

@YairHalberstadt : I have read your exploration in your post, that was excellent work. I was not really sure if your descriptions currently work, or they are only a future excursion of what could be possible.

On the other hand, since Span came up, I'm really not quite sure if using Span for those things wouldn't be better, at least when it comes to slicing.

0 replies

Thaina · 2019-07-17T05:06:39Z

Thaina
Jul 17, 2019

I agree that this is good feature but I strongly disagree to just extend to be like that automatically

We should have some other syntax to distinguish the intentionally operation on array and parallellizing

Maybe

double[] A = new[] { 1.0, 2.0, 3.0, 4.0, 5.0 };
double[] B = 1.0 + foreach(A) / 2.0; // syntax foreach(collection) become select((item) => func(item))

Another problem is

double[] C = A - B;

What happen if A and B is not the same length?

0 replies

ronnygunawan · 2019-09-04T19:16:44Z

ronnygunawan
Sep 4, 2019

Hardware intrinsics is out https://devblogs.microsoft.com/dotnet/hardware-intrinsics-in-net-core/

0 replies

Extend operators of T to operators of T[] #2656

Uh oh!

Uh oh!

Replies: 10 comments

Uh oh!

YairHalberstadt Jul 15, 2019 Collaborator

Uh oh!

YairHalberstadt Jul 15, 2019 Collaborator

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

YairHalberstadt Jul 15, 2019 Collaborator

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Extend operators of `T` to operators of `T[]` #2656

YairHalberstadt
Jul 15, 2019
Collaborator

YairHalberstadt
Jul 15, 2019
Collaborator

YairHalberstadt
Jul 15, 2019
Collaborator