@@ -154,6 +154,93 @@ that no processes or threads escape the cgroups. This sync is
154154done via a pipe ( specified in the runtime section below ) that the container's
155155init process will block waiting for the parent to finish setup.
156156
157+ ** intelRdt** :
158+ Intel platforms with new Xeon CPU support Intel Resource Director Technology
159+ (RDT). Cache Allocation Technology (CAT) is a sub-feature of RDT, which
160+ currently supports L3 cache resource allocation.
161+
162+ This feature provides a way for the software to restrict cache allocation to a
163+ defined 'subset' of L3 cache which may be overlapping with other 'subsets'.
164+ The different subsets are identified by class of service (CLOS) and each CLOS
165+ has a capacity bitmask (CBM).
166+
167+ It can be used to handle L3 cache resource allocation for containers if
168+ hardware and kernel support Intel RDT/CAT.
169+
170+ ` intelRdt ` is implemented as the ` intel_rdt ` cgroup subsystem in libcontainer
171+ even though the Linux kernel interface is not real cgroup. When intelRdt is
172+ joined, the statistics can be collected from intel_rdt cgroup subsystem.
173+
174+ In Linux kernel, it is exposed via "resource control" filesystem, which is a
175+ "cgroup-like" interface.
176+
177+ Comparing with cgroups, it has similar process management lifecycle and
178+ interfaces in a container. But unlike cgroups' hierarchy, it has single level
179+ filesystem layout.
180+
181+ Intel RDT "resource control" filesystem hierarchy:
182+ ```
183+ mount -t resctrl resctrl /sys/fs/resctrl
184+ tree /sys/fs/resctrl
185+ /sys/fs/resctrl/
186+ |-- info
187+ | |-- L3
188+ | |-- cbm_mask
189+ | |-- num_closids
190+ |-- cpus
191+ |-- schemata
192+ |-- tasks
193+ |-- <container_id>
194+ |-- cpus
195+ |-- schemata
196+ |-- tasks
197+
198+ ```
199+
200+ For runc, we can make use of ` tasks ` and ` schemata ` configuration for L3 cache
201+ resource constraints.
202+
203+ The file ` tasks ` has a list of tasks that belongs to this group (e.g.,
204+ <container_id>" group). Tasks can be added to a group by writing the task ID
205+ to the "tasks" file (which will automatically remove them from the previous
206+ group to which they belonged). New tasks created by fork(2) and clone(2) are
207+ added to the same group as their parent. If a pid is not in any sub group, it
208+ is in root group.
209+
210+ The file ` schemata ` has allocation masks/values for L3 cache on each socket,
211+ which contains L3 cache id and capacity bitmask (CBM).
212+ ```
213+ Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."
214+ ```
215+ For example, on a two-socket machine, L3's schema line could be ` L3:0=ff;1=c0 `
216+ Which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM is 0xc0.
217+
218+ The valid L3 cache CBM is a * contiguous bits set* and number of bits that can
219+ be set is less than the max bit. The max bits in the CBM is varied among
220+ supported Intel Xeon platforms. In Intel RDT "resource control" filesystem
221+ layout, the CBM in a group should be a subset of the CBM in root. Kernel will
222+ check if it is valid when writing. e.g., 0xfffff in root indicates the max bits
223+ of CBM is 20 bits, which mapping to entire L3 cache capacity. Some valid CBM
224+ values to set in a group: 0xf, 0xf0, 0x3ff, 0x1f00 and etc.
225+
226+ For more information about Intel RDT/CAT kernel interface:
227+ https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=x86/cache&id=f20e57892806ad244eaec7a7ae365e78fee53377
228+
229+ An example for runc:
230+ ```
231+ There are two L3 caches in the two-socket machine, the default CBM is 0xfffff
232+ and the max CBM length is 20 bits. This configuration assigns 4/5 of L3 cache
233+ id 0 and the whole L3 cache id 1 for the container:
234+
235+ "linux": {
236+ "resources": {
237+ "intelRdt": {
238+ "l3CacheSchema": "L3:0=ffff0;1=fffff"
239+ }
240+ }
241+ }
242+ ```
243+
157244### Security
158245
159246The standard set of Linux capabilities that are set in a container
0 commit comments