Proposal: Mutable methods to reduce or avoid branching #695
Replies: 18 comments
-
To me, this looks like a lot of complicated machinery and syntax, just to avoid a single field load every time the method is called. Especially considering that, if the version really doesn't change often, branch prediction should work well for such code. Also, how would this work for instance methods? You can't modify the code, since all instances share it. |
Beta Was this translation helpful? Give feedback.
-
This looks like a coroutine. C# iterator methods already provide coroutines with a similar state machine mechanism. The first block of code could be replaced with this today: private string _name;
public IEnumerable<string> Name
{
get
{
if (_name == null)
{
_name = GetRandomName();
}
while (true) yield return _name;
}
}
...
// Calling the iterator
var n = Name.GetEnumerator();
// First time, executes the if block
n.MoveNext();
Console.WriteLine(n.Current);
// Subsequent calls execute after the previous "yield", i.e. loops
n.MoveNext();
Console.WriteLine(n.Current);
n.MoveNext();
Console.WriteLine(n.Current); The call site is a bit awkward, which is where this proposal would work better. I would like to propose a tweak to the iterator syntax to allow for non-iterator coroutines. This would give us more natural call syntax, with the familiar C# private string _name;
public coroutine string Name
{
get
{
if (_name == null)
{
_name = GetRandomName();
}
while (true) yield return _name;
}
}
...
// Calling the coroutine
// First time, executes the if block
Console.WriteLine(Name);
// Subsequent calls execute after the previous "yield", i.e. loops
Console.WriteLine(Name);
Console.WriteLine(Name); |
Beta Was this translation helpful? Give feedback.
-
@brandon942 for cases the state may be cached in static readonly field you may implement the desired behavior right now. The JIT is smart enough to promote the value of the static readonly field to constant and therefore it may eliminate unneeded branches completely. Prooflink. |
Beta Was this translation helpful? Give feedback.
-
That post only specifies that it works for primitives, aka, the types that can already be expressed as |
Beta Was this translation helpful? Give feedback.
-
@svick That is a good point. Unless the state used by the instance method is static it will have to be a register-indirect jump. Instances would have to store the address offset in a field and pass it to the method. Static classes would not need that. @bondsbw yield methods return a reference-type Enumerator that contains at least 4 fields. The branching code is moved into the |
Beta Was this translation helpful? Give feedback.
-
So how is that better than just using the original code with normal |
Beta Was this translation helpful? Give feedback.
-
@svick You avoid the checks. |
Beta Was this translation helpful? Give feedback.
-
I think @Opiumtm is right, this should be proposed to the JIT team. And in order for that, you would need to have some cold hard data about why your proposal would perform better in all situations than what it does currently. For sheer example, what if branching is actually a really cheap thing to do and loading a field takes much more time overall (relatively speaking)? Then it would be useless to try and eliminate checks when it's not the main source of your code running slower than you'd like (though personally, it's all pretty negligible). Without actual numbers of how much time each operation takes individually, it's not exactly good form to just claim "do this to make the language more performant!" Plus (again) most of performance should be handled at the CLR-level before needing a feature in the language that'll expose it. |
Beta Was this translation helpful? Give feedback.
-
Btw, expanding on the idea and the good point made by @svick, it might be necessary to differentiate syntactically between local state dependent behavior and static state dependent behavior. For that, the version of methods marked as |
Beta Was this translation helpful? Give feedback.
-
And then it's not any better than the original |
Beta Was this translation helpful? Give feedback.
-
Another concern would be the reliability. Your opening post starts with the following claim:
The "only once" part is a bit different from the rest: you'd be best off setting that state in your object constructor, so that the field can be But if the field is mutable, and will change over the object lifetime, how exactly do you intend to keep track of when a check should be done or not for subsequent invocations of the method? From your own example:
Once a method has been set to jump to that label immediately, and something else changes the Remember, the onus is on you to prove (likely by way of some example IL/ASM that would have to be emitted) that your proposal will make code perform so much better, yet remains reliable in all cases that it is worth the time and money for Microsoft to implement. |
Beta Was this translation helpful? Give feedback.
-
I like this, but I don't think your syntax would be valid. The way you presented it suggests that the first version of the method shares its scope with the second version, which is not possible. private string _name;
public static string Name{
mutable get{
int x = 2; //new variable
if (_name == null){
_name = GetRandomName();
mutate(Name.get, initialized);
}
initialized:
x++; //out of scope
return _name;
}
} |
Beta Was this translation helpful? Give feedback.
-
I have some crude benchmark on this proposal. Baseline "if" scenario private void RunBenchmark()
{
gcnt = 0;
int c = 0;
bool state = false;
for (var i = 0; i < Consts.RunCount; i++)
{
c++;
if (c >= 100)
{
c = 0;
state = !state;
}
if (state)
{
gcnt += 1;
}
else
{
gcnt += 2;
}
}
} "Delegate" scenario (emulating change of routine address when state is changed at 1/100 rate to calls) using static delegates. private void RunBenchmark()
{
gcnt = 0;
int c = 0;
bool state = false;
BranchDelegate branch = FalseBranch;
for (var i = 0; i < Consts.RunCount; i++)
{
c++;
if (c >= 100)
{
c = 0;
state = !state;
branch = state ? TrueDelegate : FalseDelegate;
}
branch(this);
}
}
private static readonly BranchDelegate TrueDelegate = TrueBranch;
private static readonly BranchDelegate FalseDelegate = FalseBranch;
private static void TrueBranch(DelegateBranchCallScenario thisObj)
{
thisObj.gcnt += 1;
}
private static void FalseBranch(DelegateBranchCallScenario thisObj)
{
thisObj.gcnt += 2;
}
private delegate void BranchDelegate(DelegateBranchCallScenario thisObj);
No performance benefits. Actual results for such "optimizations" are much worse. Tested on x64 release build, .NET Native UWP, Core i7 CPU. Benchmark results for JIT-ed x64 debug build:
Not so dramatically worse, but anyway no benefits at all. JIT and .NET Native are already good at optimizations of simple and trivial branching. Update public static ulong gcnt = 0;
private static readonly BranchDelegate TrueDelegate = TrueBranch;
private static readonly BranchDelegate FalseDelegate = FalseBranch;
private static void TrueBranch()
{
gcnt += 1;
}
private static void FalseBranch()
{
gcnt += 2;
}
private delegate void BranchDelegate(); Here are results on x64 release .NET Native UWP:
Some final thoughts on this. Any "jump rewrite" scenario obviously can not be inlined by compiler or JIT, so there is much harm than benefits because of it. |
Beta Was this translation helpful? Give feedback.
-
@Joe4evr It's up to the programmer to decide when a state has changed. In the mentioned case you can switch the getter version back in the setter. Can mutable methods be inlined? They can if you store the addresses of every call site in a list. Changing the method version means changing the code on every call site in the same way. Despite the memory cost of inlined code I believe that the performance gains add up and become very significant in the end. |
Beta Was this translation helpful? Give feedback.
-
@brandon942 Again, some benchmarks on direct method call (inlined), direct method call (not inlined), via delegate call (static and instance). .NET Native Release x64
JIT-ed x64:
On JIT-ed runtime, invoke via delegate time is almost same as direct call (3.89% for static delegates isn't a number to worry about). On static compiled .NET Native call inlining is playing major role (almost 2x worse with "no inline" attribute on called method). Delegate costs are lesser than "no inline" costs. |
Beta Was this translation helpful? Give feedback.
-
Delegate calls are certainly costlier than the simple jump that was suggested. It's not the delegate call itself that it's the problem, it's the fact that you're actually calling a method and that involves argument passing and frame setup. Still, in the case of instance methods where the jump has to be indirect there's little chance that doing this would be faster than the original This is a rather "strange" optimization that's very costly to implement and has very little value compared to a zillion other optimizations that could be added to the JIT. |
Beta Was this translation helpful? Give feedback.
-
My proposed syntax does not rely on an enumerator. Why not optimize it like we are discussing here? It could be compiled into pretty much the same IL as the main proposal. |
Beta Was this translation helpful? Give feedback.
-
I do believe the ability to hotpatch methods is an interesting one, but I agree with others that this won't help the problem of optimization. Consider that you're only going to save two instructions. The
While the mutable version would be:
Both versions perform the same number of memory loads, and both will have zero latency if branch prediction / branch target prediction gets a hit. Want to benchmark? Write something in C. An optimizing compiler will translate a function pointer call with identical arguments & calling convention into a |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Method behavior is often determined by states that need to be polled by the code within the method each time the method is called. In most cases states change less frequently than the method that checks them is called, or only once. Those situations can be optimized and the performance improved with the introduction of mutable methods.
Implementation
Let me start with the implementation because it is very simple. A method marked as
mutable
always starts with ajmp
instruction that leads to the currently set version of the method. Changing the version of the method is done by overwriting the jmp instruction in memory. Since the write is atomic it is thread safe. Threads that are still executing the previous version of the method are unaffected.Syntax
Coming up with a syntax is harder. Let's look at a property getter that initializes a value when called for the first time:
Here
mutate
is a keyword,Name.get
is the method andinitialized
is the label to jump to or version name of the method.Having multiple versions of a method can lead to duplicate code in the source. To avoid code duplication (let the compiler do that) we'd need some syntactic sugar. Let's have another example.
could be rewritten as:
GetValue()
has 3 versions: default (unassigned name), versionA and versionB. The code for each is inferred.Beta Was this translation helpful? Give feedback.
All reactions