Idea: stackalloc from caller's stack frame #1817
Replies: 5 comments
-
I don't think this is a good idea. Being able to stackalloc a buffer on the caller's stack space would add a lot of complexity for what looks to be relatively little benefit (IMO). If calculating the number of results is trivial, it is infinitely better to have some function which tells the consumer how many results they can expect ( If calculating the number of results is non-trivial, you likely won't know the total size of the buffer needed until after you finish processing the string, in which case the output buffer will likely need to be resized multiple times to make things "right". In which case, I believe the better solution is to have some kind of stream class, which tracks the current stream position and parses data into the user provided buffer until said buffer is full, which allso allows the user to determine the appropriately sized buffer and where to allocate it. |
Beta Was this translation helpful? Give feedback.
-
If we know the number of results we can allocate the span required size in the caller method. Another scenario is that in the reallocation point we immediately re-allocate to heap. Something similar happens in the internal ValueStringBuilder. |
Beta Was this translation helpful? Give feedback.
-
What about two other possiblities:
|
Beta Was this translation helpful? Give feedback.
-
I'm not sure about that. I think implementations of
I don't think that would work well. One of the great things about the stack is that you can use very simple machine code with it, e.g. on x86 you can use |
Beta Was this translation helpful? Give feedback.
-
One case where the runtime needs to be able to do this is for this API: dotnet/runtime#25423, where the runtime decides whether or not to StackAlloc an array. This API is necessary for |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I think one of the current issues with
stackalloc
is that it cannot be hidden behind a layer of abstraction. More specifically, when a method wants to return stack allocated data, its caller has to manually allocate the right amount of space on its own stack before calling it.As a somewhat convoluted motivating example, consider a method to parse a string that contains a sequence of numbers like
1,2,3
:The above code obviously won't compile, because it can return stack allocated
Span
to its caller.When implementing such method today, it would have to take
Span<int> values
as another parameter, and it would be the caller's responsibility to decide whether it should stack allocate it and how much it should allocate. I think this means the caller has to know too much about implementation details of the called method and it will often mean that the caller allocates either too much, or too little.Being able to somehow stack allocate from the caller's frame would solve this issue. The big obvious problem with that is that the callee's stack frame is in the way. I can think of several ways of working around that:
Approach 1: Coroutines
With this approach, callee would be implemented as a coroutine. Whenever callee wanted to stack allocate from the caller's frame, it would yield, the caller would then allocate the required buffer and finally resume callee. This would require that any callee state that needs to persist during a yield would have to be stored in the caller's frame (probably in a
ref struct
).The disadvantages of this approach are that the callee could not stack allocate from its own frame (because that allocation would not survive yield) and that its state would take up space on the stack even after it returns.
The advantage is that this could be implemented in the C# compiler itself, no CLR changes necessary.
A picture of how this process would work (not to scale):
Approach 2: Make the CLR understand
With this approach, the C# compiler would give some signal to the CLR (probably using an intrinsic) and CLR would modify the stack as required.
I can think of two ways of how exactly the CLR could make this work:
Approach 2a: Expand the caller's frame
When the callee returns, whatever was in its frame will become part of caller's frame, including any buffers it's returning (which is what we want) and any other state (which we don't).
The disadvantage of this is that the callee state still takes up stack space even after it returns. It would also require changes to the CLR (but maybe those wouldn't be too big?).
The advantage is that this would not require creating a state machine for a coroutine, which would likely make this more efficient, and possibly easier to implement.
A picture:
Another approach that would achieve a similar result would be to force inlining of the callee, but this comes with its own set of disadvantages (e.g. it likely couldn't be used in virtual methods).
Approach 2b: Move the callee's frame
When the callee wants to stack allocate from the caller's frame, the CLR first moves the callee's frame by the required amount and then performs the allocation directly from the caller's frame.
The main disadvantage is that it would require moving the callee's frame, which can take some time (especially if it contains its own stack allocations). This would also require adjusting any references to variables from the callee's frame, which might not be easy. Out of the three approaches, this one would likely require the biggest CLR changes.
The main advantage, compared with the other approaches, is that when the callee returns, its state does not stay on the stack.
A picture:
Closing thoughts
One thing to note is that the immediate caller is not special, so it might make sense to have some way of deciding from which frame to allocate, not limited to just the immediate caller.
Finally, I do realize that all the approaches I suggested above are complicated and unorthodox, especially when considering what problem they're trying to solve. So it's likely they're not going to be implemented, at least not anytime soon. But I do think it's a problem worth solving, so I wanted to start this discussion, even if the eventual solution looked completely different.
Beta Was this translation helpful? Give feedback.
All reactions