Skip to content

Commit 0ed5a55

Browse files
authored
"update doc" (#5682)
1 parent dc78f3c commit 0ed5a55

File tree

1 file changed

+139
-2
lines changed

1 file changed

+139
-2
lines changed

paddle/memory/README.md

Lines changed: 139 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,141 @@
11
# Region-based Heterogeneous Memory Management
2+
## Design
23

3-
Please check out the [design documentation](http://gangliao.me) to find out more details about
4-
buddy memory allocator for both CPU and GPU.
4+
### Usage
5+
6+
To allocate 4KB CPU memory:
7+
8+
```cpp
9+
p = memory::Alloc(platform::CPUPlace(), 4*1024);
10+
```
11+
12+
To allocate 4KB memory on the 3rd GPU:
13+
14+
```cpp
15+
p = memory::Alloc(platform::GPUPlace(2), 4*1024);
16+
```
17+
18+
To free memory and check the so-far used amount of memory on a place:
19+
20+
```cpp
21+
auto pl = platform::GPUPlace(0);
22+
p = memory::Alloc(pl, 4*1024);
23+
cout << memory::Used(pl);
24+
memory::Free(pl, p);
25+
```
26+
27+
### API
28+
29+
In `paddle/memory/memory.h` we have:
30+
31+
```cpp
32+
namespace memory {
33+
template <typename Place> void* Alloc(Place, size_t);
34+
template <typename Place> void Free(Place, void*);
35+
template <typename Place> size_t Used(Place);
36+
} // namespace memory
37+
```
38+
39+
These function templates have specializations on either `platform::CPUPlace` or `platform::GPUPlace`:
40+
41+
```cpp
42+
template<>
43+
void* Alloc<CPUPlace>(CPUPlace p, size_t size) {
44+
return GetCPUBuddyAllocator()->Alloc(size);
45+
}
46+
```
47+
48+
and
49+
50+
```cpp
51+
template<>
52+
void Alloc<GPUPlace>(GPUPlace p, size_t size) {
53+
return GetGPUBuddyAllocator(p.id)->Alloc(size);
54+
}
55+
```
56+
57+
Similar specializations exist for `Free` and `Used`.
58+
59+
### Implementation
60+
61+
`GetCPUBuddyAllocator` and `GetGPUBuddyAllocator` are singletions.
62+
63+
```cpp
64+
BuddyAllocator* GetCPUBuddyAllocator() {
65+
static BuddyAllocator* a = NULL;
66+
if (a == NULL) {
67+
a = new BuddyAllocator(new CPUAllocator /*backup allocator*/, ...);
68+
}
69+
return a;
70+
}
71+
72+
BuddyAllocator* GetGPUBuddyAllocator(int gpu_id) {
73+
static BuddyAllocator* as = NULL;
74+
if (as == NULL) {
75+
as = new BuddyAllocator*[platform::NumGPUs()];
76+
for (int gpu = 0; gpu < platform::NumGPUs(); gpu++) {
77+
as[gpu] = new BuddyAllocator(new GPUAllocator(gpu) /* backup allocator */, ...);
78+
}
79+
}
80+
return as[gpu_id);
81+
```
82+
83+
#### `BuddyAllocator`
84+
85+
`BuddyAllocator` implements the buddy allocation algorithm. Its constructor takes parameters only related with the algorithm:
86+
87+
```cpp
88+
BuddyAllocator::BuddyAllocator(initial_pool_size, max_pool_size) {
89+
...
90+
}
91+
```
92+
93+
Please be aware that **`BuddyAllocator` always allocate aligned memory**, aligned on 32-bytes, which can hold a `BuddyAllocator::Block` object:
94+
95+
```cpp
96+
class BuddyAllocator {
97+
private:
98+
struct Block {
99+
size_t size;
100+
Block* left, right;
101+
size_t index; // allocator id
102+
};
103+
...
104+
};
105+
```
106+
107+
Because BuddyAllocator has the meta-data of each block, it can trace the used memory -- record the amount returned by `Alloc` freed in `Free`. Instead, `CPUAllocator` and `GPUAllocator` doesn't know the size of freed memory block and cannot do the trace.
108+
109+
#### System Allocators
110+
111+
The `GPUAllocator` and `CPUAllocator` are calls *system allocators*. They work as the fallback allocators of `BuddyAllocator`.
112+
113+
## Justification
114+
115+
I got inspiration from Majel and Caffe2, though above design look different from both.
116+
117+
### Caffe2
118+
119+
In Caffe2, `Tensor<Context>::mutable_data()` allocates the memroy. In particular, [`Tensor<Context>::mutable_data`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/tensor.h#L523) calls [`Tensor<Context>::raw_mutable_data`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/tensor.h#L459), which in turn calls [`Context::New`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/tensor.h#L479).
120+
121+
There are two implementations of `Context`:
122+
123+
1. [`CPUContext`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context.h#L105), whose [`New` method](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context.h#L131) calls [`g_cpu_allocator.get()->New(size_t)`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context.cc#L15) to allocate the memory.
124+
125+
1. [`CUDAContext`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context_gpu.h#L99), which has a data member [`int gpu_id_`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context_gpu.h#L202). This looks very similar to class `majel::GPUPlace`, who also has an `int id_` data member. `CUDAContext::New(size_t)` calls [`g_cub_allocator->DeviceAllocate(&ptr, nbytes)`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context_gpu.cu#L355) to allocate the memory.
126+
127+
### Majel
128+
129+
In Majel, there are basically two allocator types:
130+
131+
1. `cpu::SystemAllocator`, which has similar functionality to `caffe2::CPUContext::New/Delete`.
132+
1. `gpu::SystemAllocator`, which has similar functionality to `caffe2::CUDAContext::New/Delete`.
133+
134+
However, memory allocation is not via these two allocators. Instead, these two allocators are defined in hidden namespaces.
135+
136+
In Majel there are hidden global variables like:
137+
138+
1. `cpu::SystemAllocator g_cpu_allocator`, and
139+
1. `vector<gpu::SystemAllocator*> g_gpu_allocators(NUM_GPUS)`.
140+
141+
Programs allocate memory via a BuddyAllocator, which can take the `g_cpu_allocator` or a `g_gpu_allocators[gpu_id]` as its *fallback allocator*, so that if BuddyAllocator cannot find a block in its memory pool, it extends its memory pool by calling the fallback allocator's `New(size_t)`.

0 commit comments

Comments
 (0)