Proposal: Removing C# stack copies syntactically via (Out-Returns) #8786

zezba9000 · 2019-02-12T19:23:19Z

zezba9000
Feb 12, 2019

Removing C# stack copies syntactically

A proposal that eliminates stack copies at the heart of C type'ish languages like C# when using 'structs' (aka value types) heavily as is done in image manipulation, graphics in general, physics, games, Unity3D, etc (I'm sure there are many more fields as well).

Key arguments for this feature

40% .NET Core & Framework performance increase on i3-7100 using this benchmark with 'USE_OUT' enabled: Link
- Performance increase could actually be higher if the code complexity increased.
- Not yet tested but guessing even a bigger gain on ARM SoC.
Allows one to describe vector based algorithms in C# as you do in HLSL, GLSL, CG, etc without performance loss due to stack copies.
You can maintain operator 'precedence' in vector math as is done with primitive types in C#.
Doesn't break older C# or .NET runtime versions.
.NET Native / UWP / Tablets / Laptops save battery life.
CoreRT even more performance gains.
Mono and in turn major WASM performance gains.
IL2CPP and in turn major Unity3D performance gains.
No static analysis needed.

The issue!

C# has no way syntactically to create an operator method that doesn't copy stack memory needlessly ('out' cannot be used).
- This causes a huge performance loss that could otherwise be avoided.
C# has no way to elegantly create methods which are used in heavy vector math that return a resulting non-primitive struct.

A potential solution!

Use an existing/new keyword or attribute that tells the compile an 'out' parameter can be used like a return would be at the calling site. Don't let your hypothalamus get in your way, there is a reason!

Here is some initial pseudocode

struct Mat3
{
    public Vec3 x, y, z;
}

struct Vec3
{
    // existing reference operator
    /*public static Vec3 operator+(Vec3 p1, in Vec3 p2)// SLOW
    {
        p1.X += p2.X;
        p1.Y += p2.Y;
        p1.Z += p2.Z;
        return p1;
    }*/

    // 'result' functions as an 'out' parameter
    public static void operator+(in Vec3 p1, in Vec3 p2, return Vec3 result)// FAST
    {
        result.X = p1.X + p2.X;
        result.Y = p1.Y + p2.Y;
        result.Z = p1.Z + p2.Z;
    }

    // ALTERNATIVE: 'value' like used in property setter could be used here
    public static out Vec3 operator+(in Vec3 p1, in Vec3 p2)// FAST
    {
        value.X = p1.X + p2.X;
        value.Y = p1.Y + p2.Y;
        value.Z = p1.Z + p2.Z;
    }

    public static void operator*(in Vec3 p1, float p2, return Vec3 result) {...}
    public float Dot(in Vec3 vec) {...}// primitives have no performance gain using 'out'
    public void Transform(in Mat3 matrix, return Vec3 result) {...}
}

void Foo(in Vec3 a, in Vec3 b, in Vec3 c, in Mat3 m)
{
    var v = a.Transform(m) + b * c.Dot(c);// methods / operators can be invoked in a single easy to read line.
}

If the 'return' keyword isn't the best choice there are other options. However, this is more of a minor syntax issue.

// example version:
  public static void operator+(in Vec3 p1, in Vec3 p2, return Vec3 result) {...}

// example version 2:
  public static out Vec3 operator+(in Vec3 p1, in Vec3 p2) {...}

// OR:
  public static void operator+(in Vec3 p1, in Vec3 p2, [return] out Vec3 result) {...}

// OR:
  public static void operator+(in Vec3 p1, in Vec3 p2, [OutReturn] out Vec3 result) {...}

// OR:
  [return:OutReturn(???)]
  public static void operator+(in Vec3 p1, in Vec3 p2, out Vec3 result) {...}

// OR: Others ???

HLSL Comparison

Say we wanted to run a color manipulation algorithm. Here is some standard code that might be used in HLSL. However imagine if you will a similar algorithm being used in C#. I'm simply showing this to give some frame of reference as to why this kind of syntax should be done this way while being able to achieve maximum performance.

float brightness;
float alpha;

void main(in v2p IN, out float4 OUT : COLOR)
{
    float3 color0 = tex2D(tex[0], IN.Texcoord0);
    float3 color1 = tex2D(tex[1], IN.Texcoord1);
    float3 mask   = tex2D(tex[2], IN.Texcoord2);

    OUT.rgb = brightness * (color0 * mask + color1 * (1.0 - mask));
    OUT.a = 1.0;
}

So lets take this line: "OUT.rgb = brightness * (color0 * mask + color1 * (1.0 - mask));"
If I wanted to get the best performance here in C# this becomes very verbose and hard to read.
As you can see below writing performant vector based code in C# isn't fun.

float brightness;
float alpha;

void GetColor(in Vec4 color0, in Vec4 color1, in Vec4 mask, out Vec4 result)
{
    color0.Mul(color0, mask, out var color0MulResult);
    mask.Sub(1.0f, mask, out var maskSubResult);
    color1.Mul(color1, maskSubResult, out var color1MulResult);
    color0MulResult.Add(color0MulResult, color1MulResult, out var someResult);
    someResult.Mul(brightness, someResult, out result);

    // Now compair this single line below with that confusion above (ugg).
    // NOTE: In C# this line runs MUCH SLOWER than the crazy lines above (currently).
    result = brightness * (color0 * mask + color1 * (1.0 - mask));
}

Why does this happen?

Every time anything is returned out in C# it must first have its value stored on the stack that in turn gets copied back to the stack before it unwinds. However if you use an 'out' parameter this extra copy is explicitly avoided. Taking a quick look at the IL difference is very telling.

public struct Vec3
{
    public float x, y, z;

    public void Foo(out Vec3 result)// FAST
    {
        result = new Vec3();
    }
    /*.method public hidebysig 
    instance void Foo (
    [out] valuetype Vec3& result
    ) cil managed 
    {
        // Method begins at RVA 0x2050
        // Code size 9 (0x9)
        .maxstack 8

        IL_0000: nop
        IL_0001: ldarg.1
        IL_0002: initobj Vec3
        IL_0008: ret
    } // end of method Vec3::Foo*/

    public Vec3 Foo2()// SLOW
    {
        return new Vec3();
    }
    /*.method public hidebysig 
    instance valuetype Vec3 Foo2 () cil managed 
    {
        // Method begins at RVA 0x205c
        // Code size 15 (0xf)
        .maxstack 1
        .locals init (
        [0] valuetype Vec3,
        [1] valuetype Vec3
        )

        IL_0000: nop
        IL_0001: ldloca.s 0
        IL_0003: initobj Vec3
        IL_0009: ldloc.0
        IL_000a: stloc.1
        IL_000b: br.s IL_000d

        IL_000d: ldloc.1
        IL_000e: ret
    } // end of method Vec3::Foo2*/
}

How to handle older C# or .NET versions

Take the example below

// C# pseudocode
void Foo(return Vec3 result)
{
    result = new Vec3();
}

// ALTERNATIVE: C# pseudocode
out Vec3 Foo()
{
    value = new Vec3();
}

// IL pseudocode
/*.method public hidebysig 
instance void Foo (
[out, return] valuetype Vec3& result// <<< <<< NOTE: 'return' attribute <<< <<<
) cil managed 
{
    // Method begins at RVA 0x2050
    // Code size 9 (0x9)
    .maxstack 8

    IL_0000: nop
    IL_0001: ldarg.1
    IL_0002: initobj Vec3
    IL_0008: ret
} // end of method Vec3::Foo*/

To call the example code above in older C# versions one must do:

void Main()
{
    Foo(out var result);
}

To call the example code in newer C# versions one can do:

void Main()
{
    var result = Foo();
    // OR: Foo(out var result);
}

And finally just as you would give a compiler error for methods that only differ in return type, so would you for 'out-return' types. All examples methods below conflict with one another if defined in the same type.

void Foo(return Mat3 result) {...}
void Foo(return Vec3 result) {...}
Vec3 Foo() {...}

Final thoughts

Given the major performance gains and relatively little changes needed I see this as a big win.

HaloFour · 2019-02-12T19:58:38Z

HaloFour
Feb 12, 2019

A hacky way to implement these non-copy operators as fluent methods in C# today would be to use ref returns combined with an out parameter which is used to allow the consumer to provide a safe reference which can be returned by the method:

https://gist.github.com/HaloFour/5bf3f2f3a48183aeaab26a3c5aad25ab

Works great on .NET Core, not so much on .NET Framework.

Of course that's not remotely as nice to use as actual proper operators.

0 replies

HaloFour · 2019-02-12T20:12:27Z

HaloFour
Feb 12, 2019

Theoretically supporting out for the "return value" of an operator wouldn't require any new syntax, just that the compiler recognize a new form of signature for the operator, both when defining the operator and when consuming the operator:

public struct BigStruct {
    public double x, y, z;

    public static void operator +(in BigStruct p1, in BigStruct p2, out BigStruct result) {
        result.x = p1.x + p2.x;
        result.y = p1.y + p2.y;
        result.z = p1.z + p2.z;
    }
}

BigStruct a, b, c;
// init structs here

BigStruct result = a + b + c;
// compiled into the equivalent of
a.op_Addition(b, out BigStruct $temp1);
$temp1.op_Addition(c, out BigStruct result);

0 replies

zezba9000 · 2019-02-12T20:22:05Z

zezba9000
Feb 12, 2019
Author

A hacky way to implement these non-copy operators as fluent methods in C# today would be to use ref returns combined with an out parameter

-- Just pointing it out but keep in mind thats still slower which will start to add up in more complex code situations.

Theoretically supporting out for the "return value" of an operator wouldn't require any new syntax

-- True and while operators are a major issue here I should point out there are many other non-operator methods used on vectors that should have the ability to work the same way. As the HLSL example shows these algorithms are normally a combination of operator and non-operator methods stringed together. So ideally the proposed solution should work for both.

0 replies

theunrepentantgeek · 2019-02-12T20:26:12Z

theunrepentantgeek
Feb 12, 2019

I'm wondering whether this needs any syntax change at all.

Rather, I think this should be taken care of by the compiler, perhaps with assistance from the runtime - they have all the context to do the requisite analysis of the methods and could rewrite things to achieve this speedup without imposing any burden on either the C# language nor on developers.

0 replies

HaloFour · 2019-02-12T20:26:15Z

HaloFour
Feb 12, 2019

@zezba9000

Just pointing it out but keep in mind thats still slower which will start to add up in more complex code situations.

The benchmarks I ran showed it to be as fast as the out parameter, at least on .NET Core. On .NET Framework it was definitely slower as that JIT has not been optimized to understand ref returns.

0 replies

zezba9000 · 2019-02-12T20:41:05Z

zezba9000
Feb 12, 2019
Author

@theunrepentantgeek

I'm wondering whether this needs any syntax change at all.

Rather, I think this should be taken care of by the compiler, perhaps with assistance from the runtime - they have all the context to do the requisite analysis of the methods and could rewrite things to achieve this speedup without imposing any burden on either the C# language nor on developers.

A couple issues I see with this approach.

A lot more work is needed that wont solve the issue in many situations. C++ optimizers don't even optimize well here. I get a 30%-40% increase even in C++.
If you don't guarantee the designed outcome syntactically, you're likely not going to get the results you want.
Its not optimized for older runtimes like .NET Framework.
Its not optimized for Mono and in turn WASM.
Its not optimized for IL2CPP in Unity3D.

I'm not saying optimizing at the IL level is bad I'm just saying its not ideal in this situation. Being explicit is usually better for optimizing in this area.

@HaloFour
Oops guess I misread for .NET Core however solving the issue syntactically guarantees performant results in all runtimes. As you put it, .NET Framework has issues with that approach. Having a lang that lets you be explicit in this area is a lot less work in the long run (more bang for your buck).

0 replies

scalablecory · 2019-02-12T21:55:22Z

scalablecory
Feb 12, 2019

I'm also in favor of CLR doing RVO automatically, similar to C++, with rules that would be easy for a dev to reason about.

0 replies

DarthVella · 2019-02-12T22:23:14Z

DarthVella
Feb 12, 2019

As much as I would have wanted an implementation of this to be automatic, the fact that doing so would silently change the call signature raised some red flags. It's probably best to have new syntax. It can be backed up/enforced by an analyser if the performance boost is paramount.

0 replies

tannergooding · 2019-02-13T00:04:25Z

tannergooding
Feb 13, 2019
Collaborator

I've commented a bit on the Gitter channel but most of this seems like a runtime issue.

From an ABI perspective. static MyStruct Method() is effectively treated as static MyStruct* Method(MyStruct* pResult). That is (on Windows), the caller allocates space on the stack and passes the address of that local as the first parameter (ECX), on exit it also returns the same address (EAX).

If there are codegen problems here, they are likely in the JIT and they should be updated there so that it ends up improving all existing APIs.

From the language perspective, I think allowing in and ref parameters for operators would be useful. It would allow a reasonable perf boost and elide copies that can't otherwise be elided outside of the runtime inlining the method.

0 replies

zezba9000 · 2019-02-13T01:34:49Z

zezba9000
Feb 13, 2019
Author

@tannergooding I know we talked on Gitter but will post my disagreement with this solution here.

If there are codegen problems here, they are likely in the JIT and they should be updated there so that it ends up improving all existing APIs.

This requires every relevant .NET runtime to make the same optimizations for what is actually a semantic one. Because this will probably never happen and the fact that even C++ can't optimize here correctly is rather telling it may not be the right approach.

Many things I agree the JIT should optimize but not this and the primary reason being is there is no way to handle the multitude of random different situations that come up if left up to the JIT. However fixing this syntactically enforces common sense rules just as an 'out' keyword does guaranteeing a performance increase.

The calling conversion isn't that related to the problem: "static MyStruct* Method(MyStruct* pResult)"
The issue comes from you not being unable to access "MyStruct* pResult" so you can set its memory directly or in steps.
You're forced into the situation below:

static Vec3 operator+(Vec3 a, Vec3 b)
{
  Vec3 result;
  result.x = a.x + b.x;
  result.x = a.y + b.y;
  result.x = a.z + b.z;
  return result;// This COPIES "Vec3 result;" back to the callings stack memory (the copy is the issue)
}

You also can't argue this approach below as there is no guarantee there will be a suitable constructor.

static 3rdPartyVec3 operator+(Vec3 a, 3rdPartyVec3 b)
{
  // this constructor doesn't exist so we can't avoid stack copy
   return new 3rdPartyVec3(a.x + b.x, a.t + b.t, a.z + b.z); // compiler error
}

// must use
static 3rdPartyVec3 operator+(Vec3 a, 3rdPartyVec3 b)
{
  3rdPartyVec3 result;
  result.x = a.x + b.x;
  result.x = a.y + b.y;
  result.x = a.z + b.z;
  return result;// again a copy
}

There are many other situations the JIT will be confused about.

0 replies

zezba9000 · 2019-02-13T04:25:19Z

zezba9000
Feb 13, 2019
Author

@tannergooding How about this approach below. This approach lets the JIT do everything BUT lets the framework developer give a hint to how this is expected to be optimized.

Example:

[return:OutReturn]
public static Vec3 operator+(in Vec3 a, in Vec3 b)
{
	return new Vec3(a.x + b.x, a.y + b.y, a.z + b.z);
}

When the JIT sees that attribute it could convert it in C terms like so:

void Vec3_add_Operator(Vec3* result, Vec3* a, Vec3* b)
{
	Vec3_INIT(result, a->x + b->x, a->y + b->y, a->z + b->z);
	return;
}

This at least allows us to get rid of stack copies / memory duplication with less flexibility BUT can be fully done in the JIT while making it easy for Mono, IL2CPP, etc have an easier time knowing what is expected. Everything just works. Seems like a far trade off to me?

0 replies

tannergooding · 2019-02-13T04:41:09Z

tannergooding
Feb 13, 2019
Collaborator

Like I said above. That is already implicitly done by the calling convention. You can see https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2017#return-values and https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-1.0.pdf for more details on how each calling convention deals with this.

The CoreCLR runtime also already does this kind of optimization in some simple cases for the latest .NET Core 3.0 Preview. For example:

public struct Vector3 {
    public float X;
    public float Y;
    public float Z;
    
    public Vector3(float x, float y, float z)
    {
        X = x;
        Y = y;
        Z = z;
    }
    
    public static Vector3 One()
    {
        return new Vector3(1.0f, 1.0f, 1.0f);
    }
}

Vector3.One() is compiled to:

00007FFD7786F650  vzeroupper  
00007FFD7786F653  xchg        ax,ax  
00007FFD7786F655  vmovss      xmm0,dword ptr [7FFD7786F680h]  
00007FFD7786F65D  vmovss      xmm1,dword ptr [7FFD7786F684h]  
00007FFD7786F665  vmovss      dword ptr [rcx],xmm0  
00007FFD7786F669  vmovss      dword ptr [rcx+4],xmm1  
00007FFD7786F66E  vmovss      dword ptr [rcx+8],xmm1  
00007FFD7786F673  mov         rax,rcx  
00007FFD7786F676  ret

You can also see that native compilers do the same for cases they can rationalize: https://godbolt.org/z/ZuUOBz

0 replies

tannergooding · 2019-02-13T04:41:45Z

tannergooding
Feb 13, 2019
Collaborator

Some cases where the JIT wasn't doing a great job are tracked by https://github.com/dotnet/coreclr/issues/19522.

0 replies

mikedn · 2019-02-13T07:07:36Z

mikedn
Feb 13, 2019

That is (on Windows), the caller allocates space on the stack and passes the address of that local as the first parameter (ECX), on exit it also returns the same address (EAX).

That's not true and it's one of the reasons why optimizing the copy return introduces is a bit complicated. The caller doesn't allocate any stack space, it just passes the address of the destination variable, whatever that is - a local variable, a class instance/static field etc. That means that the copy can only be elided if no exceptions can be thrown such that the destination variable would be left partially assigned.

But all the ABI discussion isn't really relevant in this case because operator+ does inline and its code is trivial (no exceptions, no loops, no anything). That the JIT can't optimize away the return copy even after inlining is a different story and something that really should be fixed in the JIT. It has all kinds of problems with structs and they're not all ABI related. And many are fixable, it's just that someone has to do that work.

just as an 'out' keyword does guaranteeing a performance increase.

No, it doesn't guarantee performance increase. Supporting in/out parameters on operators is a two edged sword. If the operator does inline then they simply shouldn't be needed, this JIT should improve its struct handling story. If it doesn't inline then you're left with address exposed variables, that can easily kill performance in a large method such as the benchmark's trace. And if it doesn't inline it's probably because the code is big enough that the return copy won't matter that much anyway.

0 replies

zezba9000 · 2019-02-13T07:13:55Z

zezba9000
Feb 13, 2019
Author

@mikedn

No, it doesn't guarantee performance increase. Supporting in/out parameters on operators is a two edged sword. If the operator does inline then they simply shouldn't be needed, this JIT should improve its struct handling story. If it doesn't inline then you're left with address exposed variables, that can easily kill performance in a large method such as the benchmark's trace. And if it doesn't inline it's probably because the code is big enough that the return copy won't matter that much anyway.

I meant you can guarantee a performance increase if x64 is your target and you're optimizing for a larger struct like Vec3 for example. Vs relying on runtime updates that may or may not handle struct copies or inline correctly. Its explicit. So yes it can be slower but if you explicitly test its faster you can guarantee it. This is why I make a point of non-primitive structs as those for example are slower with this method.

0 replies

mikedn · 2019-02-13T07:20:32Z

mikedn
Feb 13, 2019

So yes it can be slower but if you explicitly test its faster you can guarantee it

It may be guaranteed on the particular runtime version you test with. It's not necessarily guaranteed on other runtimes or future version of the same runtime. While I would expect a new version of the runtime to not regress code gen quality, this has happened occasionally in the past. Either as a result of an oversight or a bug fix that made the JIT more conservative.

Vs relaying on runtime updates that may or may not handle struct copies correctly

Using ref/out parameters also relies on the JIT to do its job correctly. If I go delete a certain piece of code from the JIT your out version will be much slower.

0 replies

mikedn · 2019-02-13T08:21:29Z

mikedn
Feb 13, 2019

this concept of 'out' at its core is pretty standard and doesn't require so much specialization

I'm not sure what you mean by that. The concept of out implies taking the address of the variable. Dealing with address taken variables is not a trivial, standard matter and different compilers may have different capabilities in this regard.

How the calling conversions work for return types seem like something that is much more likely to differ depending on the platform and architecture.

Again, the ABI doesn't have anything to do with your case. Pretty much all interesting methods in your benchmark are inlined.

There is a balance between the syntax and the code generation of a lang.

Sure, but there's also a balance in how much stuff you can add to a language's syntax. Especially when what you want to add isn't guaranteed to be always an improvement.

0 replies

zezba9000 · 2019-02-13T08:36:04Z

zezba9000
Feb 13, 2019
Author

Sure, but there's also a balance in how much stuff you can add to a language's syntax. Especially when what you want to add isn't guaranteed to be always an improvement.

Ignore the other stuff I said tired and going to bed after this but if I was going to make a game and target Xbox One using .NET Core 3 or C++ I could guarantee its performance on that system. Again its generally explicit (you're telling the compiler what you think should happen when it may not know whats best). Think in terms of AOT targets. This just gives devs more lang tools to enforce concepts they expect when trying to micro manage memory at a higher level. Anyway hope that makes sense.

0 replies

mikedn · 2019-02-13T18:43:11Z

mikedn
Feb 13, 2019

but if I was going to make a game and target Xbox One using .NET Core 3

OK, but does it make sense to complicate a general purpose language like C# because you find that a certain approach happens to work well on a certain target? This sounds more like a hack rather than a well designed solution.

Anyway, I took a look at the JIT generated code. It certainly looks horrible in the non-USE_OUT case. And one of the reasons it is horrible has an certain compiler issue I already mentioned. Can you guess which one? 😁

0 replies

zezba9000 · 2019-02-13T19:35:33Z

zezba9000
Feb 13, 2019
Author

C# is more than just a general purpose lang at this point (at least its reaching out beyond that and being used for more than that). It may have started out that way but with that logic you could call SIMD a hack vs just letting the compiler auto vectorize stuff. Being explicit isn't a hack. Just like using out explicitly isn't a hack or using SIMD isn't. Being able to simplify an expression in syntax isn't a hack, its syntax sugar and something C# seems to have a lot of... much of which I consider bad short-hands that make stuff harder to look at (shorter isn't always better).

If you have to manually modify the IL in a post build step to get this working correctly on Non .NET Core runtimes... now thats a hack (which is whats going to happen). The JIT is basically being used as a hack for flaws in the C# lang that can actually be described in langs like Nim which can explicitly avoid these issues.... and don't tell me the C# / .NET goals haven't changed over the years. What seems best on the surface from a classical perspective isn't always correct just because its always been done a particular way.

Having a lang feature and IL optimizations can live together, these principles aren't mutually exclusive.

0 replies

mikedn · 2019-02-13T19:55:35Z

mikedn
Feb 13, 2019

Being explicit isn't a hack

It depends. What's for sure is that implementing a feature based on poor/incomplete analysis is a hack. And that's exactly what you're proposing here. You have no idea why the non-USE_OUT version of your code is significantly slower than the USE_OUT version. You looked at the IL, missed the facts, jumped to conclusions and sketched up solution for non-existing and/or different problems.

0 replies

zezba9000 · 2019-02-13T19:58:07Z

zezba9000
Feb 13, 2019
Author

Being explicit isn't a hack

It depends. What's for sure is that implementing a feature based on poor/incomplete analysis is a hack. And that's exactly what you're proposing here. You have no idea why the non-USE_OUT version of your code is significantly slower than the USE_OUT version. You looked at the IL, missed the facts, jumped to conclusions and sketched up solution for non-existing and/or different problems.

And I'm the one making fallacies? You're arguing a straw-man... good luck.

0 replies

xen2 · 2019-02-15T03:41:52Z

xen2
Feb 15, 2019

Theoretically supporting out for the "return value" of an operator wouldn't require any new syntax, just that the compiler recognize a new form of signature for the operator, both when defining the operator and when consuming the operator:

@HaloFour That would be great!
Should it be a separate issue/proposal? Or keep it as part of this thread?

0 replies

CyrusNajmabadi · 2019-02-15T23:49:37Z

CyrusNajmabadi
Feb 15, 2019
Collaborator

I'm with the group thinking this makes far more sense as a runtime optimization. @zezba9000 Do you want to port this to dotnet/coreclr (and maybe provide a sample impl)?

0 replies

zezba9000 · 2019-02-16T19:34:35Z

zezba9000
Feb 16, 2019
Author

@CyrusNajmabadi By 'this' you mean the C# syntax prototype idea and topic? As that would be more on the Roslyn side correct or did you mean solely a runtime optimization?

0 replies

CyrusNajmabadi · 2019-02-18T08:06:20Z

CyrusNajmabadi
Feb 18, 2019
Collaborator

@zezba9000 I meant solely as a runtime optimization.

0 replies

Krakean · 2019-03-25T11:14:00Z

Krakean
Mar 25, 2019

@zezba9000 just curious, any progress on this?

0 replies

zezba9000 · 2019-03-25T23:33:44Z

zezba9000
Mar 25, 2019
Author

@Krakean Have not looked at it outside of IL generation.

If you want to avoid bad IL generation in C# that causes the issue make sure you format your operator methods like so:

    public static Vec3 operator+(Vec3 p1, Vec3 p2)
    {
        return new Vec3(p1.x + p2.x, p1.y + p2.y, p1.z + p2.z);
    }

If you don't format your operator methods like the above example, Roslyn starts referencing then de-referencing parameter values for no reason... which I can only guess the JIT is having a very hard time understanding what to optimize.

0 replies

Krakean · 2019-03-26T15:24:28Z

Krakean
Mar 26, 2019

@zezba9000 Curious then, in/ref/out like shown in @HaloFour example (https://gist.github.com/HaloFour/5bf3f2f3a48183aeaab26a3c5aad25ab) makes sense?
or it should be exactly return new Vec3(...)?

0 replies

zezba9000 · 2019-03-26T19:05:09Z

zezba9000
Mar 26, 2019
Author

@Krakean .NET Core (and I think .NET Framework but can't remember) will optimize it to be the same. The example I gave with "return new Vec3(...)" is the simplest to read and yields the same results as the out method for the most part. Thats the one you should be using.

0 replies

Proposal: Removing C# stack copies syntactically via (Out-Returns) #8786

Uh oh!

Uh oh!

Removing C# stack copies syntactically

Key arguments for this feature

The issue!

A potential solution!

HLSL Comparison

Why does this happen?

How to handle older C# or .NET versions

Final thoughts

Replies: 30 comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zezba9000 Feb 12, 2019 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zezba9000 Feb 12, 2019 Author

Uh oh!

Uh oh!

Uh oh!

tannergooding Feb 13, 2019 Collaborator

Uh oh!

Uh oh!

zezba9000 Feb 13, 2019 Author

Uh oh!

Uh oh!

zezba9000 Feb 13, 2019 Author

Uh oh!

tannergooding Feb 13, 2019 Collaborator

Uh oh!

tannergooding Feb 13, 2019 Collaborator

Uh oh!

Uh oh!

Uh oh!

zezba9000 Feb 13, 2019 Author

Uh oh!

Uh oh!

Uh oh!

zezba9000 Feb 13, 2019 Author

Uh oh!

Uh oh!

Uh oh!

zezba9000 Feb 13, 2019 Author

Uh oh!

Uh oh!

zezba9000 Feb 13, 2019 Author

Uh oh!

Uh oh!

CyrusNajmabadi Feb 15, 2019 Collaborator

Uh oh!

Uh oh!

zezba9000
Feb 12, 2019
Author

zezba9000
Feb 12, 2019
Author

tannergooding
Feb 13, 2019
Collaborator

zezba9000
Feb 13, 2019
Author

zezba9000
Feb 13, 2019
Author

tannergooding
Feb 13, 2019
Collaborator

tannergooding
Feb 13, 2019
Collaborator

zezba9000
Feb 13, 2019
Author

zezba9000
Feb 13, 2019
Author

zezba9000
Feb 13, 2019
Author

zezba9000
Feb 13, 2019
Author

CyrusNajmabadi
Feb 15, 2019
Collaborator