Proposal: Annotate pure methods #1157

GeirGrusom · 2017-11-29T09:11:06Z

GeirGrusom
Nov 29, 2017

I recently learned (in #1155 ) that C# will create a copy of structs if you use in since it can't trust that the code it calls on them really does not mutate the struct.

So I have a proposal: make C# respect ~~System.ComponentModel.PureAttribute~~ System.Diagnostics.Contracts.PureAttribute (or similar), and add this attribute itself if methods do not mutate state (I.e. write to variables, with the exception of constructors, or call other non-pure methods). The advantage here is that the user can additionally annotate with [Pure] even if the method does mutate (such as caching the result).

Advantages:

No tooling change required
No user code changes required

Disadvantage:

Tons of code has been written without it, but all they would have to do is recompile to get it.

Why would this help?
If methods are pure, there is no reason for the compiler to produce a copy of the struct. It can just use regular good old ldloca instead of copying the value. It only needs to produce a copy if the method that gets called is non-pure.

DavidArno · 2017-11-29T09:14:16Z

DavidArno
Nov 29, 2017

At the risk of being a pedant, the other requirement to be a pure method is that it must not be void. But, I like the idea of the compiler auto-annotating methods that meet these requirements and thus avoiding copying structs when readonly ref is used.

0 replies

yaakov-h · 2017-11-29T11:49:50Z

yaakov-h
Nov 29, 2017

The only PureAttribute I know of is in System.Diagnostics.Contracts and I really don’t like the idea of reviving that namespace.

Perhaps make a new attribute.

0 replies

HaloFour · 2017-11-29T12:07:02Z

HaloFour
Nov 29, 2017

@yaakov-h

That attribute is already used extensively in the BCL. I don't think it makes much sense to introduce another one.

0 replies

yaakov-h · 2017-11-29T12:16:35Z

yaakov-h
Nov 29, 2017

It’s used about 600 times in .NET Framework, but virtually unused in .NET Core outside of System.Collections.Immutable.

0 replies

iam3yal · 2017-11-29T13:23:28Z

iam3yal
Nov 29, 2017

I feel like pure should be a language construct and not an attribute, the compiler, beyond optimization, should throw an error when it tries to access a field, IO or some other global state, finally, pure functions cannot call non-pure functions.

In terms of implementation, it's probably obvious that pure functions are static with additional constraints.

In terms of APIs, I guess that many APIs today follow the semantics of pure so there shouldn't be a problem changing them.

Finally, it might be backward compatible with static functions that are marked with [Pure].

0 replies

GeirGrusom · 2017-11-29T14:21:30Z

GeirGrusom
Nov 29, 2017
Author

Why does pure functions have to be static? The only extra thing to consider is that any base call also has to be pure.

0 replies

DavidArno · 2017-11-29T14:36:51Z

DavidArno
Nov 29, 2017

Why does pure functions have to be static?

If it's not static, then surely it has state (this) and thus isn't pure?

0 replies

HaloFour · 2017-11-29T14:39:04Z

HaloFour
Nov 29, 2017

@DavidArno

Does that matter terribly if the state itself is readonly/immutable? An instance method is really just syntax candy for a static method that accepts the instance as the first parameter, which would qualify for being pure.

0 replies

DavidArno · 2017-11-29T14:43:28Z

DavidArno
Nov 29, 2017

@HaloFour,

Probably not. I've just always assumed pure functions have to be static without giving much thought to why.

0 replies

GeirGrusom · 2017-11-29T14:44:55Z

GeirGrusom
Nov 29, 2017
Author

@DavidArno I would claim that the only thing of importance is that member functions can be virtual, and that's easy to fix: just require that base calls are also pure. Aside from that, the difference between calling a member function and calling a static function with that object passed as the first argument is the access level you get (i.e. the member function will have access to private and protected variables on that object) but that hardly matters for purity.

0 replies

alrz · 2017-11-29T14:56:25Z

alrz
Nov 29, 2017

@DavidArno

String.Length is an instance method and is pure (or constexpr). What makes it pure? it only returns a readonly member. What makes it constexpr? The type string can be a constant e.g. a literal. Of course if you call String.Length off of a non-constant e.g. a variable, it's no longer constexpr, but it's still pure.

I think these two concepts are fairly similar, but we need to annotate a whole bunch of operators/members for it to be actually useful. For example built-in operators on primitives are all both pure and constexpr.. As an example int Member() => readonlyField + 1; is pure but can't be constexpr because we don't have "user-defined constable types" beyond some built-in primitives.

An interesting example that came up on roslyn repo is when you're using an impure type like StringBuilder

public string M() {
    var builder = new StringBuilder();
    builder.Append("Hello world!");
    return builder.ToString();
}

What rules we need to define to allow this as a pure function? dotnet/roslyn#7561 (comment)

0 replies

DavidArno · 2017-11-29T16:27:25Z

DavidArno
Nov 29, 2017

@alrz,

An interesting example that came up on roslyn repo is when you're using an impure type like StringBuilder

That's the challenge with working out what is pure. That method M is deterministic and has no side effects. Sure it uses StringBuilder, but only as a local builder to build a string. So I'd say that is a pure method.

(caveat: "has no side effects" ignores heap allocations, which may be a mistake on my part).

0 replies

iam3yal · 2017-11-29T17:19:56Z

iam3yal
Nov 29, 2017

@GeirGrusom

Why does pure functions have to be static? The only extra thing to consider is that any base call also has to be pure.

Sure, you're right.

0 replies

iam3yal · 2017-11-29T17:36:00Z

iam3yal
Nov 29, 2017

@alrz Why would you think that Length is pure?

It's implemented externally.
I didn't see any guarantees in the specification about it being pure.
It's certainly not annotated with a Pure attribute.
Most if not all of the methods in the framework that are annotated with the Pure attribute are static, at least from what I've seen but I could be wrong, wouldn't be the first time but I agree with @GeirGrusom maybe it doesn't have to be so.
Isn't Length cached? thus makes it impure by definition? just an assumption.

0 replies

bondsbw · 2017-11-29T18:06:58Z

bondsbw
Nov 29, 2017

What if a dependency is taken on a particular method being [Pure]? That could break in future releases of the API, though the API author never intended to make that part of the method contract.

0 replies

tannergooding · 2017-11-30T17:17:31Z

tannergooding
Nov 30, 2017
Collaborator

implementing something like constexpr in C#/.NET is sort of impossible

Could you elaborate? Much like readonly struct or in, both constexpr and pure would conceivably be part of the method/type/field contract. It would be entirely possible for a non-C# language to create a method/type/field that lies about that contract, but that is not normally expected (just like it isn't normally expected that a string can be modified, but you can using unsafe code).

0 replies

mikedn · 2017-11-30T17:22:18Z

mikedn
Nov 30, 2017

I recently learned (in #1155 ) that C# will create a copy of structs if you use in since it can't trust that the code it calls on them really does not mutate the struct.

For this particular nail, "pure" is just a big hammer. All that is needed is "readonly" instance methods - methods that promise not to modify the object on which they're invoked. Not that it would be easier to add "readonly" methods...

And as @tannergooding already mentioned above, examples involving Vector3 and similar types aren't convincing. In general, there shouldn't be any need to pass such types by ref unless you really want to change the location the ref points to. The fact that MS failed to produce a JIT of adequate quality in more than 15 years IMO doesn't justify hacking languages and libraries in attempt to workaround JIT specific problems.

0 replies

mikedn · 2017-11-30T17:26:30Z

mikedn
Nov 30, 2017

Could you elaborate?

C++ constexpr usually requires compile time evaluation to be actually useful. That's in general not achievable in .NET (unless you're willing to give up using constexpr methods across projects).

0 replies

tannergooding · 2017-11-30T17:34:54Z

tannergooding
Nov 30, 2017
Collaborator

There are certainly additional concerns when you have JIT'd code, but for methods that:

Don't P/Invoke
Don't throw exceptions
<Insert other similar limitations>

The rules of how the IL gets executed are fairly explicit and shouldn't have inputs/outputs differing between runtimes or hosts.

So constexpr int RotateLeft(int value, int bits) should be possible, as should something like static constexpr Vector4 Add(Vector4 left, Vector4 right), since in both cases the operations they would perform are well defined both in the VM and in the language.

I'll leave it at that, since further discussion should probably be moved to the constexpr issue (#504)

0 replies

mikedn · 2017-11-30T17:43:28Z

mikedn
Nov 30, 2017

The rules of how the IL gets executed are fairly explicit and shouldn't have inputs/outputs differing between runtimes or hosts.

If the method is not in the project you're compiling then there is no IL. That's also why something like pure/readonly needs to be part of the method definition in metadata (an attribute on the method). Otherwise the C# compiler could just look at method's IL and figure out that it doesn't change the object (though having the C# compiler automatically determine that isn't the best way, it's too easy to cause unexpected breaking changes when changing the method).

0 replies

CyrusNajmabadi · 2017-12-01T00:30:22Z

CyrusNajmabadi
Dec 1, 2017
Collaborator

This is the common case I want to solve:

Why would i write a mutable vector using properties? That doesn't seem to make sense to me. Especially properties that are no-ops...

0 replies

CyrusNajmabadi · 2017-12-01T00:31:10Z

CyrusNajmabadi
Dec 1, 2017
Collaborator

@CyrusNajmabadi but we are getting wishy-washy nullable reference types. 😉

Yes. That's why i literally said: So, if you wanted this, it would likely be wishywashy, just like nullable reference types :)

0 replies

CyrusNajmabadi · 2017-12-01T00:33:00Z

CyrusNajmabadi
Dec 1, 2017
Collaborator

For this particular nail, "pure" is just a big hammer. All that is needed is "readonly" instance methods - methods that promise not to modify the object on which they're invoked. Not that it would be easier to add "readonly" methods...

Yup. And this was considered and not ruled out. It simply was made out of scope for the initial work being done here. It was felt that the majority of cases early on would suffice with a "readonly struct". That made things simpler overall, and helped the feature fit into tight schedules. More fine-grained readonly-ness is certainly something possible in the future, without going whole-hog into 'purity'.

0 replies

GeirGrusom · 2017-12-01T14:22:33Z

GeirGrusom
Dec 1, 2017
Author

I err'ed with this today.

I had a ResettableValueLazy<T> struct, and by mistake (and habit) it was declared readonly on the class. So in practice it would call the factory for every invocation (since every assignment would be on the implied copy).

It would have been great if C# could warn me about it rather than just silently voiding mutations.

0 replies

HaloFour · 2017-12-01T14:24:40Z

HaloFour
Dec 1, 2017

@GeirGrusom

You'd get that warning for literally every instance method you invoke on a readonly struct because the compiler doesn't know what methods are mutators. That includes all properties. Combined with a way to annotate/enforce method purity I agree that would be useful.

0 replies

ufcpp · 2017-12-02T02:13:11Z

ufcpp
Dec 2, 2017

If this proposal were championed, in extension method could be allowed?

In https://github.com/dotnet/csharplang/pull/1165/files

in extension methods exist specifically to reduce implicit copying. However any use of an in T parameter will have to be done through an interface member. Since all interface members are considered mutating, any such use would require a copy.

pure interface method could solve this issue.

0 replies

Richiban · 2017-12-04T18:04:32Z

Richiban
Dec 4, 2017

@alrz

An interesting example that came up on roslyn repo is when you're using an impure type like StringBuilder

That's a pretty tricky case, because we need to introduce two new concepts--in addition to pure.

We need the concept of (and I'm sorry, but I can't come up with a better name for this right now) semi-pure. A semi-pure method is one that does not have side effects but does mutate one or more of the method's arguments (or, in OO, the receiver of the instance method). In your example .Append( is semi-pure under this definition.

Then we need the concept of ownership. A pure method M can call semi-pure methods and properties but only on objects that are owned by M, that is to say that no reference to that object will ever escape this method in an impure way.

Our method M can therefore be called pure because:

public string M() {
    var builder = new StringBuilder();  // Creating objects is okay in a pure method as long as the constructor is pure
    builder.Append("Hello world!");     // This method is semi-pure. We are allowed to call it only if we can prove that we **own** `builder`
    return builder.ToString();          // StringBuilder.ToString() is already pure and this is fine
}

I'm pretty sure that this causes the feature to blow up out of all proportion and is probably not going to be implemented, but it's worth thinking about.

0 replies

DavidArno · 2017-12-04T18:41:31Z

DavidArno
Dec 4, 2017

@Richiban,

That is similar to my thoughts. "Ownership" is a good term. If a type has local mutation only (ie mutates only its own state) and the scope of an instance of that type is confined to the method (ie, the method owns that instance) and its final output before going out of scope is an immutable "value", then that method can still be viewed as pure.

0 replies

RedFox20 · 2018-01-11T02:39:53Z

RedFox20
Jan 11, 2018

So what's wrong with using C++'s concept of const methods? It solves pretty much 95% of all use cases that you might encounter. Consider the every-day problem of having two different threads accessing shared state at the same time.

In C#, there is no practical way to guarantee whether a method modifies shared state or not. Even if you create an IReadonlyMyClass for every class you have, it still doesn't solve the problem since there is no compile-time enforcement (note: I don't care about CLR-level validation or restricting reflection, I care about basic compile-time validation when the programmer is doing the obvious thing).

class Ship
{
    Universe* Owner; // Ships have access to Universe
public:
    // does Draw use Owner to cause side effects?? what about other member fields??
    // No way to tell unless you examine hundreds of lines of code
    void Draw()
    { ... } 
};

class Universe
{
    mutex Sync;
    vector<Ship*> Ships;
public:
    void KillShips(Predicate<Ship*> shouldKillShip)  // called by simulation thread
    {
        lock_guard<mutex> writeLock {Sync};
        utils::erase_if(Ships, shouldKillShip);
    }
    void DrawShips()  // called by UI thread
    {
        lock_guard<mutex> writeLock {Sync};
        for (Ship* ship : Ships)
            ship->Draw(); // will this modify Ship state? Will this modify Universe state?
    }
};

Because of multi-threaded access, of course, the class requires synchronization. Since there is no way to prove whether DrawShips() will modify Universe state, the code is forced to use exclusive locking. This pretty much kills any performance gains one would hope from multithreading. In some games, this has ended up in worse performance than single-threaded.

So C++ has an extremely simple solution to this. It doesn't try to be perfectly pure. Most applications don't care about purely pure functions. Developers think "If I call this method, will it change the internal state? I don't want that to happen, otherwise, this other thread will crash."

Methods declared const shall not modify instance fields. Except when those fields are explicitly declared mutable.
Methods declared const can only call const methods on itself or on its instance fields.
Any variable declared const cannot be reassigned.
Any variable declared const can only have const methods called.

Since const is a part of the method declaration, the const correctness check is trivial.

Now, what are the benefits? We can greatly improve performance:

class Ship
{
    Universe* Owner; // Ships have access to Universe
public:
    // const propagation: Owner is implicitly `const Universe*` so we have a very good guarantee
    // Draw won't cause dangerous side effects or dead-locks; only const methods may be called
    // With just one glance at `const`, you can be reassured
    void Draw() const
    { ... }
};

class Universe
{
    mutable shared_mutex Sync; // we don't care if mutex state is modified (lock, unlock)
    vector<Ship*> Ships;
public:
    void KillShips(Predicate<Ship*> shouldKillShip)
    {
        lock_guard<shared_mutex> writeLock {Sync}; // exclusive lock
        utils::erase_if(Ships, shouldKillShip);
    }
    void DrawShips() const
    {
        shared_lock<shared_mutex> readLock {Sync}; // allow multiple threads to read
        // const propagation: Draw must be const.
        // due to that, we know it's safe to parallelize this code! this gives a huge perf boost!
        parallel_for(Ships, [](const Ship* ship) {
            ship->Draw(); // safe and enforced by the compiler!
        });
    }
};

Huge performance boost thanks to immutability guarantees. If you do this in a language without const or other immutability guarantees, your program will explode in production because the new intern Jimmy accidentally modified Ship state in Draw() and the compiler didn't care.

To sum up, this whole discussion about perfect pureness is counterproductive to real-world use cases. Simply having const from C++ is enough to solve most problems. Even though const is a huge part of the C++ type system, so C# would also need to decorate parameters as const. Only newer languages have realized that the correct default for methods, variables and arguments is immutable.

0 replies

roji · 2018-05-29T10:44:12Z

roji
May 29, 2018
Collaborator

A big plus one on the C++ const method approach - I think this is severely lacking in C#. I'd only change the keyword to be readonly to better align with existing C# keyword usage and meaning.

One example of where this is useful, is the defensive copy problem when invoking a method on a readonly field which is a struct. If methods could be flagged as readonly, this would allow us to avoid producing the defensive copy (and possibly warning on all non-readonly method invocations on readonly fields which have a non-readonly struct type).

Similarly, if a non-readonly struct is passed via an in parameter, a defensive copy would not be created if the function only invokes readonly methods on it. This allows us more flexibility and is a finer-grained approach to immutability; we're currently stuck in an all-or-nothing situation where structs are either mutable or immutable. Instead, we could have the same struct used mutably by some, and immutably by others via flagging which methods mutate and which don't.

I'm sure there are other possible advantages for this - hope it gets picked up.

0 replies

Proposal: Annotate pure methods #1157

Uh oh!

Replies: 39 comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

GeirGrusom Nov 29, 2017 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

GeirGrusom Nov 29, 2017 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tannergooding Nov 30, 2017 Collaborator

Uh oh!

Uh oh!

Uh oh!

tannergooding Nov 30, 2017 Collaborator

Uh oh!

Uh oh!

CyrusNajmabadi Dec 1, 2017 Collaborator

Uh oh!

CyrusNajmabadi Dec 1, 2017 Collaborator

Uh oh!

CyrusNajmabadi Dec 1, 2017 Collaborator

Uh oh!

GeirGrusom Dec 1, 2017 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

roji May 29, 2018 Collaborator

GeirGrusom
Nov 29, 2017
Author

GeirGrusom
Nov 29, 2017
Author

tannergooding
Nov 30, 2017
Collaborator

tannergooding
Nov 30, 2017
Collaborator

CyrusNajmabadi
Dec 1, 2017
Collaborator

CyrusNajmabadi
Dec 1, 2017
Collaborator

CyrusNajmabadi
Dec 1, 2017
Collaborator

GeirGrusom
Dec 1, 2017
Author

roji
May 29, 2018
Collaborator