[Discussion] Collection initializer support for fixed size collections #3763

SamPruden · 2020-07-31T18:01:19Z

SamPruden
Jul 31, 2020

#3707 (Remaining design work in and around records) talks about adding more initialization functionality. There's mention of expanding functionality around collection initializers.

For init-only we have focused on object initializers. It's likely to be worth thinking through what can and should be done with collection initializers - after all they also function via mutation, and might equally benefit from something morally similar to init-only properties. Init-only Add methods?

The intent there is to add initializer support for immutable collections that don't want to expose an Add method after they've been created. However, I don't believe that an init-only Add method is a good solution, because the incremental building model is poorly suited to immutable collections that need a fixed size allocation.

As a design goal, it should be possible to support new MyCustomFixedLengthArray<int>() {1, 2, 3}. It's not currently possible to do this efficiently, as the constructor does not know how much memory to allocate, because the initializer repeatedly invokes the Add method after construction.

A workaround using the Add method is passing the length to the constructor like new MyCustomFixedLengthArray<int>(3) {1, 2, 3}. This is clumsy as information is repeated unnecessarily and it's easy to accidentally have the lengths become out of sync when maintaining the code. The arguments in favour of allowing the omission are the same as the ones for allowing the explicit length omission in new int[] {1, 2, 3}.

I propose that some construct be added to allow collection initialization to pass in all of the items at once, like init(ReadOnlySpan<T> items). new MyCustomFixedLengthArray<int>() {1, 2, 3} would be transformed into something like new MyCustomFixedLengthArray<int>().init(stackalloc[] {1, 2, 3}). init would then be able to read the Length property and allocate memory accordingly.

@svick pointed out below that this use of stackalloc creates a problem for reference types. Two alternative approaches to achieving this goal are discussed in comments. One option is some compiler magic to achieve something similar to stackalloc for this case, the other is transforming the initializer into a chain of calls like .SetSize(2); .SetValue(0, "hello"); .SetValue(1, "world");

#3707 talks about potentially adding factory tagging.

Methods (and other members?) could be designated as factories. They are restricted to return a fresh object. In return, their clients can apply object initializers to their result.

There would need to be some kind of restriction on collection initializers with factories, as the collection initializer would have to be prevented from being called multiple times.

A custom array implementation using this approach may look something a little bit like this.

struct MyCustomArray<T>
{
    public int Length { get; init set; }
    // Custom array backed by native array for simplicity of example
    private T[] buffer;
    
    // Optional because structs implicitly have parameterless constructors
    public MyCustomArray(int length) => this.Length = length;

    // Collection initializer - only runs if used
    // Can run a maximum of one times
    public init(ReadOnlySpan<T> items)
    {
        if (items.Length > this.Length)
            throw new ArgumentException("Collection initializer has more items than manually specified length.");

        // Default length when not manually specified
        if (this.Length == 0)
           this.Length = items.Length;

	// Allocate and initialize with items
        this.buffer = new T[this.Length];
        for (var i = 0; i < items.Length; i++)
        {
	        this.buffer[i] = items[i];
        }
    }

    // Final initializer - runs always, and after collection initializer
    // Mentioned in #3707
    init
    {
        if (this.Length == 0)
            throw new Exception("Length must either be specified or a collection initializer must be used.");

        // If no allocation was done by a collection initializer
        if (this.buffer == null)
            this.buffer = new T[this.Length];
    }
}

There's a significant real use case for this. The Unity game engine is pushing their experimental new DOTS (Data Oriented Technology Stack) tech at the moment, and they have a restriction on having no managed types. This means they have custom "native collection" types which are structs with manually managed memory, like NativeArray<T>. Documentation. They don't support collection initializers at the moment, but could if this change were made. This is becoming a core API in a fairly major C# product, so many people would benefit from this change.

The proposal around exact syntax here is vague. I'm focusing on the functionality and leaving the specifics as part of the whole collection of work to be done around #3707, which should all be designed in tandem.

SamPruden · 2020-07-31T18:24:14Z

SamPruden
Jul 31, 2020
Author

An alternative way of realising this goal would be to add AddRange(ReadOnlySpan<T> items) support for collection initializers throughout the language. These could optionally be annotated with the init keyword the same as everything else.

This approach may better mirror the existing Add functionality, and it avoids creating a new special construct for collection initializers. I see two issues with this approach, but they may be solvable.

In order to support the allocation scenario, it would need to be possible to ensure that an init AddRange method is not invoked more than once. I suppose that this could be done using only runtime checks, but I don't think that's a desirable situation. The conventions around the AddRange name imply that multiple invocations are allowed, so this restriction would be a bit ugly. Adding some keyword that means an initializer may only run once would have to be useful in other situations too in order to justify its inclusion. Perhaps it is.
In order to avoid breaking backward compatibility, Add would need to take precedence over AddRange for collection initializers. The problem is that we would want AddRange to take precedence. The collection initializer that receives all items at once is likely to be more efficient, so we would prefer its use when both are available. Imagine for example an expandable list type that wants to initially allocate its size according to the length of the initialization span but that also provides an Add method for optional expansion.

0 replies

svick · 2020-07-31T22:57:03Z

svick
Jul 31, 2020
Collaborator

new MyCustomFixedLengthArray<int>() {1, 2, 3} would be transformed into something like new MyCustomFixedLengthArray<int>().init(stackalloc[] {1, 2, 3}).

stackalloc can't be used with managed types, which means e.g. new MyCustomFixedLengthArray<string>() {"foo"} wouldn't work like this. I think that's a very serious downside.

0 replies

SamPruden · 2020-08-01T01:37:50Z

SamPruden
Aug 1, 2020
Author

stackalloc can't be used with managed types, which means e.g. new MyCustomFixedLengthArray<string>() {"foo"} wouldn't work like this. I think that's a very serious downside.

That's a very good point. I confess that I missed that restriction because I've only ever used stackalloc in high performance contexts where everything is already unmanaged.

@svick Do you happen to know whether there may be another way of passing a stack allocated list of managed references for this special case? Bearing in mind that the size is known at compile time because the collection initializer is hard coded, and that the limitation is what the CLR can be made to do, not necessarily the confines of existing language features.

I think something like the following should work on a conceptual level, demonstrating that a solution is possible with sufficient compiler magic.

interface ICollectionInit<T> {
    int Length { get; }
    T this[int i] { get; }
}

struct CustomArray<T>
{
    public init void CollectionInit<TInit>(TInit items) where TInit : ICollectionInit<T> => throw new NotImplementedException();
}

Because the initializer sizes are known at compile time, the compiler can generate hidden structs of specific lengths as needed.

new CustomArray<string>() { "hello", "world" };

could become something like

new CustomArray<string>().CollectionInit(new HiddenMangledCollectionInitializerStructLength2<string>("hello", "world"));

// ...

struct HiddenMangledCollectionInitializerStructLength2<T> : ICollectionInit<T>
{
    private readonly T Item0;
    private readonly T Item1;

    public HiddenMangledCollectionInitializerStructLength2(T item0, T item1) => (this.Item0, this.Item1) = (item0, item1);

    public int Length => 2;

    public T this[int i] {
        get {
            switch (i) {
                case 0: return this.Item0;
                case 1: return this.Item1;
                default: throw new IndexOutOfRangeException();
            }
        }
    }
}

I think that this demonstrates that it's a solvable problem, even if this isn't a very good solution. This is only a haphazard composition of existing ideas.

0 replies

SamPruden · 2020-08-01T02:06:17Z

SamPruden
Aug 1, 2020
Author

We could also approach the problem from a completely different angle.

struct MyCustomArray<T>
{
    private T[] buffer;

    // Some special construct visible only to collection initializers
    public init SetSize(int size) {
        this.buffer = new T[size];
    }

    // Some special construct visible only to collection initializers
    // SetSize is guaranteed to have been called first
    public init SetValue(int index, T value) {
        this.buffer[index] = value;
    }
}

Where

new MyCustomArray<string>() { "hello", "world" };

is transformed into

var temp = new MyCustomArray<string>();
temp.SetSize(2);
temp.SetValue(0, "hello");
temp.SetValue(1, "world");

This is much more like the existing incremental build functionality of Add, but the size is made available first, and SetValue is guaranteed to only be called with valid indices. It looks a bit more complicated, but it's ultimately simpler and there isn't a barrier to implementation.

0 replies

thargy · 2020-08-07T15:46:48Z

thargy
Aug 7, 2020

In order to avoid breaking backward compatibility, Add would need to take precedence over AddRange for collection initializers. The problem is that we would want AddRange to take precedence. The collection initializer that receives all items at once is likely to be more efficient, so we would prefer its use when both are available. Imagine for example an expandable list type that wants to initially allocate its size according to the length of the initialization span but that also provides an Add method for optional expansion.

I love this idea, there are loads of examples where AddRange is significantly more performant than Add (it's the main reason for adding an AddRange method after all). I accept that changing the behaviour to use AddRange in preference to Add is technically a breaking change, but in practice does it occur? i.e. are there any good examples where calling AddRange instead of Add would actually break behaviour (I know this is a 'how long is a piece of string' question at this point); ultimately an AddRange method that didn't work as an efficient Add method would be very much an anti-pattern.

The upside of changing the behaviour en masse would be potentially large performance benefits when utilizing fundamental classes such as Lists, and particularly in Concurrent collections. Where code does break (which is practically unlikely), it would be trivial to work around from either end (manually call Add instead of a collection initialiser). I've experienced many scenarios where Collection Initializers are just 'bad' because of their reliance on Add over AddRange, to the point where I've removed the Add method to stop them.

Once used with records, having an init keyword applied, and your suggestion to add a compile-time check to prevent calling outside of initialisation seems very useful.

0 replies

SamPruden · 2020-08-07T16:06:30Z

SamPruden
Aug 7, 2020
Author

I don't believe that any kind of breaking change is on the table. It simply won't be considered. I agree that if this were a breaking change in somebody's codebase that would probably indicate that it's bad code, but breaking is breaking and they won't do it.

It should be possible to make init AddRange take precedence over init Add because those are both new constructs without legacy behaviour. That doesn't help with things like List<T> though.

If we introduced a completely new init construct - as is already proposed with final initializers - then it could be added to the existing collections and get the same benefit. It would be okay for that to take precedence.

0 replies

thargy · 2020-08-07T17:02:14Z

thargy
Aug 7, 2020

I don't believe that any kind of breaking change is on the table. It simply won't be considered. I agree that if this were a breaking change in somebody's codebase that would probably indicate that it's bad code, but breaking is breaking and they won't do it.

I don't disagree that it's unlikely they'll do it, though it is worth noting that breaking changes are introduced in every major release of the compiler. Though the majority are 'bug fixes', that's not always the case. For example, the auto-calling of a Dispose() method in a ref struct (even when IDisposable is not implemented) and similar enhancements are equally breaking changes in line with what is proposed (auto calling a method that was not previously called).

Considering one of the core drives for .NET 5 is performance, it may be something they consider if benchmarking shows a big improvement in ASP.NET performance (the favourite love child).

If we introduced a completely new init construct - as is already proposed with final initializers - then it could be added to the existing collections and get the same benefit. It would be okay for that to take precedence.

Yeah, this would be a fair compromise, but it's a shame not to 'fix' the collection initialiser who's initial implementation always felt a bit rushed.

An alternative may be to allow opt-in behaviour of AddRange during collection initialisation, (e.g. via an attribute), that would allow libraries (including framework code) that can see performance benefits from collection initialisers using AddRange to be recompiled. This is an approach that has been adopted before.
e.g. List<T>.AddRange could be recompiled as

        // Adds the elements of the given collection to the end of this list. If
        // required, the capacity of the list is increased to twice the previous
        // capacity or the new size, whichever is larger.
        //
		[CollectionInitializer]
        public void AddRange(IEnumerable<T> collection)
            => InsertRange(_size, collection);

Giving preference to AddRange, over Add where the attribute is present would not be a breaking change. And the addition of the attribute would not break existing code.

0 replies

thargy · 2020-08-07T17:04:49Z

thargy
Aug 7, 2020

An analyser could recommend adding the Attribute.

0 replies

SamPruden · 2020-08-07T17:32:50Z

SamPruden
Aug 7, 2020
Author

What would you be passing into these AddRange methods? We can't be instantiating arrays for this, we really want the initializer elements to be a stack allocation only. Even if we made a stack allocated struct that implements IEnumerable<T>, we would have a boxing conversion. That's not good for performance, and both of those cases would be incompatible with the Unity use case. That's GC churn, which we definitely don't want.

I think that even if the AddRange name were adopted for this pattern for symmetry with the existing Add behaviour, a new overload would be needed anyway. At that point, there's not much point trying to make this work with the existing structure - just add a new construct and backport it to existing collections.

0 replies

thargy · 2020-08-07T18:33:03Z

thargy
Aug 7, 2020

It's possible the compiler can optimise away a lot of the IEnumerable overhead during collection initialisation, including loop unrolling, preventing the need for boxing or array allocation, there's even a few proposals out there for that. But I don't dispute your point that adding a new construct would be simpler.

0 replies

SamPruden · 2020-08-07T21:02:53Z

SamPruden
Aug 7, 2020
Author

what about this?

struct MyArray<T, n>
{
    T[n] _buffer;
}

// array of length 7
var array = new MyArray<int, 7>();

That value parameterised types functionality is not in the language, and I'm not aware of any plans to add it in the immediate future. That would be a major feature itself, so I doubt that it would be quickly added just to support this.

It's quite similar to what I proposed above about having the compiler spit out hidden types. If the AddRange equivalent type has a signature like void AddRange<TContainer>(TContainer items) where TContainer : ICollectionInitializerItems<T> then the compiler could produce hidden structs with fixed sizes to call it with. That's the same result without actually adding value templating to the language.

0 replies

SamPruden · 2020-08-07T21:12:38Z

SamPruden
Aug 7, 2020
Author

I've changed the title from [Proposal] to [Discussion] as this is about an aim rather than a tight proposal for how to achieve that aim. I'm not sure whether this was a sensible change. I can change it back if people think that's preferable.

0 replies

vierlijner · 2024-03-13T21:06:44Z

vierlijner
Mar 13, 2024

I see it's only about array, not what the title is suggested. I've hoped it was a wider discussion.
I would like an collection with a maximum capacity. Like some custom examples in here. I don't find this on the discussion page. Should I make a new discussion for this?

5 replies

CyrusNajmabadi Mar 13, 2024
Collaborator

@vierlijner it's unclear to me what you're asking for.

HaloFour Mar 13, 2024

I would think that collection literals do satisfy this ask.

TahirAhmadov Mar 14, 2024

I think this is an API request, not a language request.

vierlijner Mar 15, 2024

Say I want a collection from max 10 items, then when I add number eleven, one is automatically removed because the size is ten.

TahirAhmadov Mar 16, 2024

@vierlijner sure, this would be a collection type with a behavior that you need. You can either 1) develop one your own, 2) request https://github.com/dotnet/runtime/ to add one to the BCL, or even better, 3) develop one and offer them to include it in the BCL.

PS. What it is not is a language change. This discussion board is for language changes.

[Discussion] Collection initializer support for fixed size collections #3763

Uh oh!

Uh oh!

Replies: 13 comments · 5 replies

Uh oh!

SamPruden Jul 31, 2020 Author

Uh oh!

svick Jul 31, 2020 Collaborator

Uh oh!

Uh oh!

SamPruden Aug 1, 2020 Author

Uh oh!

Uh oh!

SamPruden Aug 1, 2020 Author

Uh oh!

Uh oh!

SamPruden Aug 7, 2020 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SamPruden Aug 7, 2020 Author

Uh oh!

Uh oh!

Uh oh!

SamPruden Aug 7, 2020 Author

Uh oh!

SamPruden Aug 7, 2020 Author

Uh oh!

Uh oh!

CyrusNajmabadi Mar 13, 2024 Collaborator

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Replies: 13 comments 5 replies

SamPruden
Jul 31, 2020
Author

svick
Jul 31, 2020
Collaborator

SamPruden
Aug 1, 2020
Author

SamPruden
Aug 1, 2020
Author

SamPruden
Aug 7, 2020
Author

SamPruden
Aug 7, 2020
Author

SamPruden
Aug 7, 2020
Author

SamPruden
Aug 7, 2020
Author

CyrusNajmabadi Mar 13, 2024
Collaborator