Skip to content

Commit a47f93b

Browse files
authored
[Lazy] Update docs for memory allocation (#440)
1 parent 56fe2f6 commit a47f93b

File tree

2 files changed

+84
-0
lines changed

2 files changed

+84
-0
lines changed

docs/docs.cn/Lazy.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -274,6 +274,48 @@ void func(int x, Executor *e) {
274274
275275
总之,当我们需要为多个 Lazy 组成的任务链指定调度器时,我们只需要在任务链的开头指定调度器就好了。
276276
277+
## 内存分配
278+
279+
### 用户自定义分配器
280+
281+
async_simple 支持用户为每个 Lazy 函数定义内存分配器。接口为,Lazy 函数的第一个参数为 `std::allocator_arg_t`,第二个参数为支持 `void *allocate(unsigned)` 和
282+
`void deallocate(void*, unsigned)` 成员函数的接口。例如 `std::pmr::polymorphic_allocator<>`。
283+
284+
具体使用方式可参考 `demo_example/pmr_lazy.cpp`。
285+
286+
### 编译器合并内存分配
287+
288+
async_simple 支持 clang 的 `[[clang::coro_await_elidable]]` 属性。只需要使用支持 `[[clang::coro_await_elidable]]` 的编译器编译 async_simple,在 `co_await`
289+
后的 Lazy 调用所需的内存会被自动叠加进当前协程的协程帧中。例如:
290+
291+
```
292+
Lazy<int> foo() { ... }
293+
Lazy<int> bar() {
294+
auto f = co_await foo();
295+
...
296+
}
297+
```
298+
299+
在这个例子中,`bar()` 协程调用 `foo()` 时并不会为 `foo()` 协程触发内存分配,而是 `bar()` 自己会申请一块更大的协程帧,将其中的一部分内容给 `foo()` 使用。而 `bar()` 自己
300+
的协程帧的生命周期,则是由 `bar()` 的调用者负责,若 `bar()` 的调用者依然使用 `co_await` 后直接调用 `bar()` 的方式,则 `bar()` 自身的协程帧依然不会被分配,而是复用其调用环境的协程帧的一部分。这个过程是递归的。
301+
302+
注意,这种策略可能并不总是好的,考虑如下情况:
303+
304+
```
305+
Lazy<int> foo() { ... }
306+
Lazy<int> bar(bool cond) {
307+
if (cond) {
308+
co_await foo();
309+
...
310+
}
311+
...
312+
}
313+
```
314+
315+
此时在开启 `[[clang::coro_await_elidable]]` 优化之后 `bar()` 的协程帧总是会更大以包含 `foo()` 的协程帧,然而,若实际运行时 `cond` 总是为 `false`,则这必然是一个负优化。
316+
317+
为了缓解这一点,我们在内部编译器中做了更智能的优化,编译器会根据上下文的冷热信息来判断是否要对调用点进行转换,以避免这类负优化产生。
318+
277319
# LazyLocals
278320
279321
LazyLocals类似于线程环境下的thread_local。用户可以通过派生LazyLocals并实现静态函数`T::classof(const LazyLocalBase*)`来自定义LazyLocals。

docs/docs.en/Lazy.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -272,6 +272,48 @@ In the above example, `task1...task4` represents a task chain consists of Lazy.
272272
273273
So we could assign the executor at the root the task chain simply.
274274
275+
我来为您翻译这段关于内存分配的内容:
276+
277+
Ran tool
278+
## Memory Allocation
279+
280+
### User-Defined Allocator
281+
282+
async_simple supports user-defined memory allocators for each Lazy function. The interface requires the first parameter of the Lazy function to be `std::allocator_arg_t`, and the second parameter to be an interface that supports `void *allocate(unsigned)` and `void deallocate(void*, unsigned)` member functions. For example, `std::pmr::polymorphic_allocator<>`.
283+
284+
For specific usage, please refer to `demo_example/pmr_lazy.cpp`.
285+
286+
### Compiler-Integrated Memory Allocation
287+
288+
async_simple supports clang's `[[clang::coro_await_elidable]]` attribute. Simply compile async_simple with a compiler that supports `[[clang::coro_await_elidable]]`, and the memory required for Lazy calls after `co_await` will be automatically merged into the current coroutine's coroutine frame. For example:
289+
290+
```
291+
Lazy<int> foo() { ... }
292+
Lazy<int> bar() {
293+
auto f = co_await foo();
294+
...
295+
}
296+
```
297+
298+
In this example, when the `bar()` coroutine calls `foo()`, it will not trigger memory allocation for the `foo()` coroutine. Instead, `bar()` itself will allocate a larger coroutine frame and give a portion of it to `foo()` to use. The lifecycle of `bar()`'s own coroutine frame is managed by `bar()`'s caller. If `bar()`'s caller still uses the method of directly calling `bar()` after `co_await`, then `bar()`'s own coroutine frame will still not be allocated, but will reuse a portion of its calling environment's coroutine frame. This process is recursive.
299+
300+
Note that this strategy may not always be beneficial. Consider the following scenario:
301+
302+
```
303+
Lazy<int> foo() { ... }
304+
Lazy<int> bar(bool cond) {
305+
if (cond) {
306+
co_await foo();
307+
...
308+
}
309+
...
310+
}
311+
```
312+
313+
In this case, after enabling the `[[clang::coro_await_elidable]]` optimization, `bar()`'s coroutine frame will always be larger to include `foo()`'s coroutine frame. However, if `cond` is always `false` at runtime, this would inevitably be a negative optimization.
314+
315+
To mitigate this issue, we have implemented more intelligent optimizations in our internal compiler. The compiler will determine whether to perform transformations at call sites based on context hot/cold information to avoid such negative optimizations.
316+
275317
# LazyLocals
276318
277319
LazyLocals is similar to `thread_local` in a thread environment. Users can customize their own LazyLocals by deriving from LazyLocals and implement static function `T::classsof(const LazyLocalBase*)`

0 commit comments

Comments
 (0)