[Performance] Proposal - aligned new
and stackalloc
with alignas(x)
for arrays of primitive types and less primitive as well
#8758
Replies: 18 comments
-
What's a "less primitive type"? |
Beta Was this translation helpful? Give feedback.
-
What IL code should be generated for alignas? |
Beta Was this translation helpful? Give feedback.
-
How would the stack alignment look like? Would the compiler generate some "dummy" |
Beta Was this translation helpful? Give feedback.
-
I don't think this is something that C# can or should control and is, instead, probably something that the GC/Runtime should handle and expose as an API on CoreFX. For both of these, C# has no existing mechanism to relay this information to the GC/Runtime and would likely need a |
Beta Was this translation helpful? Give feedback.
-
My impression is that you have misunderstood proposal. First of all this is not an API proposal but language proposal which asks for language feature which can be implemented via additional runtime API but it can be implemented at the IL level as well. It is not the intention of the proposal to show how the feature will be implemented by runtime bcs it is up to runtime implementation to decide how this will work. Therefore, comment To explain better problem I would like to compare language implementation using only runtime API support: Runtime.DeclareStructure()
.AddAllocatingParameterlessConstructor()
.AddAllocatingConstructor(new Type[] {Object, Int32, Int64})
.AddDeallocatingDestructor()
.AddMethod( ...... We can go similarly with allocations and deallocations but it is not what is proposed. The preference is given to language based memory management features. |
Beta Was this translation helpful? Give feedback.
-
This goes against one of the core features of C#, which is that you don't manage memory yourself: https://github.com/dotnet/csharplang/blob/98043cdc889303d956d540d7ab3bc4f5044a9d3b/spec/basic-concepts.md#automatic-memory-management |
Beta Was this translation helpful? Give feedback.
-
Other languages are other languages, not all features fit into C#. To be honest, I would love having the ability to specify the alignment of my types (hence issues I opened like: https://github.com/dotnet/corefx/issues/22790). But I also don't think this is a feature the current C# team (or a future team) would be likely to take:
|
Beta Was this translation helpful? Give feedback.
-
Manual memory management is generally not something that is baked into a language (aside from basic support for object creation and deletion, and rules for where those allocations end up).
My point of view (and I would speculate the same response you would get from the Language Design Team) would be that this is best handled by some Framework/Runtime methods: public static class Unsafe
{
public static void* StackAlloc(IntPtr size);
public static void* StackAlloc(IntPtr size, IntPtr alignment);
}
public sealed class StructAlignmentAttribute : Attribute
{
public IntPtr Alignment { get; }
} |
Beta Was this translation helpful? Give feedback.
-
Fully understand your points and arguments, however, if nobody tries to push C# into areas that are critical to some applications than important changes will never happen. Would you ever think at the time of .NET Framework v2.0 that developer using C# would be capable to use intrinsics methods and write pseudo inline assembly code with them? If the first step will be to have the API you proposed above it would be good. If we would have some language syntactic sugar to use it that would be great. |
Beta Was this translation helpful? Give feedback.
-
Fully understand and treat it as a C# current paradigm. The nature of every paradigm is, however, that it may change 😃 |
Beta Was this translation helpful? Give feedback.
-
Was true before, in the past. These days we have HW instrics, we try to write highly optimized code, and we even looking in to the 'safe' manual memory management. The option suggested is optional, and clearly for those who understand why he needs it. PS: The array allocated should be probably pinned? |
Beta Was this translation helpful? Give feedback.
-
I'm with @tannergooding on this, I think that it's more appropriate for the CLR/BCL to supply helper methods to allocate this memory on a specific alignment. It seems that the runtime would have to expose a helper to allow this anyway, at which point you can already use it without requiring additional language changes. HW intrinsics are already exposed as APIs and not special language syntax. |
Beta Was this translation helpful? Give feedback.
-
Felt I'd also like to second an interest in this or an object that lets a user allocate an aligned array of memory in the standard library. Not being able to allocate aligned memory easily just makes using the new intrinsics more difficult or causes massive performance penalties for basic operations on Vector256 loads. Vector256 loads in particular are very non-performant when memory is not aligned on most cpu's. https://www.agner.org/optimize/instruction_tables.pdf Memory loading for parallel reads is punished by 233% compared to parallel aligned reads VMOVAPS vs VMOVUPS. This for many basic operations can completely dwarf AVX gains over SSE2 for basic things like multiplying/adding/dot producting arrays. Only loading points of aligned memory from an unaligned array like many trivial examples I've seen doesn't really work if you work with more than 1 input array either as there is no guarantee multiple arrays are offset by the same amount. People needing to roll their own pinned memory structures to maybe take full advantage of these new language features in many cases and seems kinda backwards from bringing them into the core runtime. SSE2 luckily doesn't have any real performance penalty anymore so you can get the 2-4x speedup pretty often with SSE2 out of the box without aligned memory. At least from my day 1 benchmarks I'm working up. My current solution to this is to overallocate an array, pin it, and do all my operations from a Span that aligns with the proper boundary since memory should be aligned to at least 4 bytes. |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Any comments from the stakeholders regarding this issue? Its been already year and a half since this issue was opened, but no decisions still made, which is quite strange, because at the same time .net core 3.0 already released with intrinsic support, where this feature would be extremely helpful not saying about scenario which I mentioned above. |
Beta Was this translation helpful? Give feedback.
-
Tanner has pointed out that we wouldn't likely take this as a language change. I tend to agree. He did open an issue on corefx, I'm not sure what the status on that is. |
Beta Was this translation helpful? Give feedback.
-
@tannergooding Could you please give a link for this ticket on corefx? |
Beta Was this translation helpful? Give feedback.
-
It was linked above, but here it is again for convenience 😄 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Problem statement
One of the pain points while working with hardware intrinsics is to keep data aligned to boundaries accepted by used instructions. Usual expectations are that data will be aligned on 16 byte or 32 byte boundaries. Possibly there are many solutions to doing this using attributes for non primitive data types, however, one place where it does not work is allocation of arrays composed of primitive types (byte, int, float, double).
For instance following expression may not allocate array on 16 or 32 byte boundaries:
The size of array is equal to 80 bytes and is divisible by 16 but not by 32. Possible allocations on stack may happen with smaller than 16 alignment. Similar problem will arise in case of Array allocations on heap.
Essentially developer has no control over allocation alignment of the buffer which is pointed to from Array class. Similar problems arise when developers would like to store in assembly initialization data for arrays. Essentially there is no control how blob data in assembly will mapped into memory.
Essentially the only ways to control primitive arrays alignment (I could be wrong) are either to use sequential/fixed layout in structures with fixed arrays and
StructLayout.Pack
property or by using pointer aligning functions. IMO it places to much burden on developers needed to write high performing code with Hardware Intrinsic and even with simple loop blocking optimization techniques.Proposal
One of the simplest solutions from perspective of developers using such feature would be introduction of new keyword
alignas
(yes, exactly the same as in C++ standard) which could proceed or follow allocation declarations withnew
orstackalloc
keywords.Alternatives
One of the ugly approaches used for stackalloc with
enforcedtricked alignment is to allocate amount of memory very slightly below the memory page size what usually leads to allocation on memory page boundary or usage of functions adjusting the native pointer to array.There are no effective methods to control Array buffer alignment on heap.
Beta Was this translation helpful? Give feedback.
All reactions