Skip to content

Commit 2bc9c04

Browse files
committed
drm/doc/rfc: i915 DG1 uAPI
Add an entry for the new uAPI needed for DG1. Also add the overall upstream plan, including some notes for the TTM conversion. v2(Daniel): - include the overall upstreaming plan - add a note for mmap, there are differences here for TTM vs i915 - bunch of other suggestions from Daniel v3: (Daniel) - add a note for set/get caching stuff - add some more docs for existing query and extensions stuff - add an actual code example for regions query - bunch of other stuff (Jason) - uAPI change(!): - try a simpler design with the placements extension - rather than have a generic setparam which can cover multiple use cases, have each extension be responsible for one thing only v4: (Daniel) - add some more notes for ttm conversion - bunch of other stuff (Jason) - uAPI change(!): - drop all the extra rsvd members for the region_query and region_info, just keep the bare minimum needed for padding v5: (Jason) - for the upstream plan, add a requirement that we send the uAPI bits again for final sign off before turning it on for real - document how we intend to extend the rsvd bits for the region query (Kenneth) - improve the comment for the smem+lmem mmap mode and caching Signed-off-by: Matthew Auld <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Thomas Hellström <[email protected]> Cc: Daniele Ceraolo Spurio <[email protected]> Cc: Lionel Landwerlin <[email protected]> Cc: Jon Bloomfield <[email protected]> Cc: Jordan Justen <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Kenneth Graunke <[email protected]> Cc: Jason Ekstrand <[email protected]> Cc: Dave Airlie <[email protected]> Cc: [email protected] Cc: [email protected] Acked-by: Daniel Vetter <[email protected]> Acked-by: Dave Airlie <[email protected]> Acked-by: Kenneth Graunke <[email protected]> Acked-by: Jon Bloomfield <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
1 parent 0333ec8 commit 2bc9c04

File tree

3 files changed

+372
-0
lines changed

3 files changed

+372
-0
lines changed

Documentation/gpu/rfc/i915_gem_lmem.h

Lines changed: 237 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,237 @@
1+
/**
2+
* enum drm_i915_gem_memory_class - Supported memory classes
3+
*/
4+
enum drm_i915_gem_memory_class {
5+
/** @I915_MEMORY_CLASS_SYSTEM: System memory */
6+
I915_MEMORY_CLASS_SYSTEM = 0,
7+
/** @I915_MEMORY_CLASS_DEVICE: Device local-memory */
8+
I915_MEMORY_CLASS_DEVICE,
9+
};
10+
11+
/**
12+
* struct drm_i915_gem_memory_class_instance - Identify particular memory region
13+
*/
14+
struct drm_i915_gem_memory_class_instance {
15+
/** @memory_class: See enum drm_i915_gem_memory_class */
16+
__u16 memory_class;
17+
18+
/** @memory_instance: Which instance */
19+
__u16 memory_instance;
20+
};
21+
22+
/**
23+
* struct drm_i915_memory_region_info - Describes one region as known to the
24+
* driver.
25+
*
26+
* Note that we reserve some stuff here for potential future work. As an example
27+
* we might want expose the capabilities for a given region, which could include
28+
* things like if the region is CPU mappable/accessible, what are the supported
29+
* mapping types etc.
30+
*
31+
* Note that to extend struct drm_i915_memory_region_info and struct
32+
* drm_i915_query_memory_regions in the future the plan is to do the following:
33+
*
34+
* .. code-block:: C
35+
*
36+
* struct drm_i915_memory_region_info {
37+
* struct drm_i915_gem_memory_class_instance region;
38+
* union {
39+
* __u32 rsvd0;
40+
* __u32 new_thing1;
41+
* };
42+
* ...
43+
* union {
44+
* __u64 rsvd1[8];
45+
* struct {
46+
* __u64 new_thing2;
47+
* __u64 new_thing3;
48+
* ...
49+
* };
50+
* };
51+
* };
52+
*
53+
* With this things should remain source compatible between versions for
54+
* userspace, even as we add new fields.
55+
*
56+
* Note this is using both struct drm_i915_query_item and struct drm_i915_query.
57+
* For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS
58+
* at &drm_i915_query_item.query_id.
59+
*/
60+
struct drm_i915_memory_region_info {
61+
/** @region: The class:instance pair encoding */
62+
struct drm_i915_gem_memory_class_instance region;
63+
64+
/** @rsvd0: MBZ */
65+
__u32 rsvd0;
66+
67+
/** @probed_size: Memory probed by the driver (-1 = unknown) */
68+
__u64 probed_size;
69+
70+
/** @unallocated_size: Estimate of memory remaining (-1 = unknown) */
71+
__u64 unallocated_size;
72+
73+
/** @rsvd1: MBZ */
74+
__u64 rsvd1[8];
75+
};
76+
77+
/**
78+
* struct drm_i915_query_memory_regions
79+
*
80+
* The region info query enumerates all regions known to the driver by filling
81+
* in an array of struct drm_i915_memory_region_info structures.
82+
*
83+
* Example for getting the list of supported regions:
84+
*
85+
* .. code-block:: C
86+
*
87+
* struct drm_i915_query_memory_regions *info;
88+
* struct drm_i915_query_item item = {
89+
* .query_id = DRM_I915_QUERY_MEMORY_REGIONS;
90+
* };
91+
* struct drm_i915_query query = {
92+
* .num_items = 1,
93+
* .items_ptr = (uintptr_t)&item,
94+
* };
95+
* int err, i;
96+
*
97+
* // First query the size of the blob we need, this needs to be large
98+
* // enough to hold our array of regions. The kernel will fill out the
99+
* // item.length for us, which is the number of bytes we need.
100+
* err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
101+
* if (err) ...
102+
*
103+
* info = calloc(1, item.length);
104+
* // Now that we allocated the required number of bytes, we call the ioctl
105+
* // again, this time with the data_ptr pointing to our newly allocated
106+
* // blob, which the kernel can then populate with the all the region info.
107+
* item.data_ptr = (uintptr_t)&info,
108+
*
109+
* err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
110+
* if (err) ...
111+
*
112+
* // We can now access each region in the array
113+
* for (i = 0; i < info->num_regions; i++) {
114+
* struct drm_i915_memory_region_info mr = info->regions[i];
115+
* u16 class = mr.region.class;
116+
* u16 instance = mr.region.instance;
117+
*
118+
* ....
119+
* }
120+
*
121+
* free(info);
122+
*/
123+
struct drm_i915_query_memory_regions {
124+
/** @num_regions: Number of supported regions */
125+
__u32 num_regions;
126+
127+
/** @rsvd: MBZ */
128+
__u32 rsvd[3];
129+
130+
/** @regions: Info about each supported region */
131+
struct drm_i915_memory_region_info regions[];
132+
};
133+
134+
#define DRM_I915_GEM_CREATE_EXT 0xdeadbeaf
135+
#define DRM_IOCTL_I915_GEM_CREATE_EXT DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_CREATE_EXT, struct drm_i915_gem_create_ext)
136+
137+
/**
138+
* struct drm_i915_gem_create_ext - Existing gem_create behaviour, with added
139+
* extension support using struct i915_user_extension.
140+
*
141+
* Note that in the future we want to have our buffer flags here, at least for
142+
* the stuff that is immutable. Previously we would have two ioctls, one to
143+
* create the object with gem_create, and another to apply various parameters,
144+
* however this creates some ambiguity for the params which are considered
145+
* immutable. Also in general we're phasing out the various SET/GET ioctls.
146+
*/
147+
struct drm_i915_gem_create_ext {
148+
/**
149+
* @size: Requested size for the object.
150+
*
151+
* The (page-aligned) allocated size for the object will be returned.
152+
*
153+
* Note that for some devices we have might have further minimum
154+
* page-size restrictions(larger than 4K), like for device local-memory.
155+
* However in general the final size here should always reflect any
156+
* rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
157+
* extension to place the object in device local-memory.
158+
*/
159+
__u64 size;
160+
/**
161+
* @handle: Returned handle for the object.
162+
*
163+
* Object handles are nonzero.
164+
*/
165+
__u32 handle;
166+
/** @flags: MBZ */
167+
__u32 flags;
168+
/**
169+
* @extensions: The chain of extensions to apply to this object.
170+
*
171+
* This will be useful in the future when we need to support several
172+
* different extensions, and we need to apply more than one when
173+
* creating the object. See struct i915_user_extension.
174+
*
175+
* If we don't supply any extensions then we get the same old gem_create
176+
* behaviour.
177+
*
178+
* For I915_GEM_CREATE_EXT_MEMORY_REGIONS usage see
179+
* struct drm_i915_gem_create_ext_memory_regions.
180+
*/
181+
#define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
182+
__u64 extensions;
183+
};
184+
185+
/**
186+
* struct drm_i915_gem_create_ext_memory_regions - The
187+
* I915_GEM_CREATE_EXT_MEMORY_REGIONS extension.
188+
*
189+
* Set the object with the desired set of placements/regions in priority
190+
* order. Each entry must be unique and supported by the device.
191+
*
192+
* This is provided as an array of struct drm_i915_gem_memory_class_instance, or
193+
* an equivalent layout of class:instance pair encodings. See struct
194+
* drm_i915_query_memory_regions and DRM_I915_QUERY_MEMORY_REGIONS for how to
195+
* query the supported regions for a device.
196+
*
197+
* As an example, on discrete devices, if we wish to set the placement as
198+
* device local-memory we can do something like:
199+
*
200+
* .. code-block:: C
201+
*
202+
* struct drm_i915_gem_memory_class_instance region_lmem = {
203+
* .memory_class = I915_MEMORY_CLASS_DEVICE,
204+
* .memory_instance = 0,
205+
* };
206+
* struct drm_i915_gem_create_ext_memory_regions regions = {
207+
* .base = { .name = I915_GEM_CREATE_EXT_MEMORY_REGIONS },
208+
* .regions = (uintptr_t)&region_lmem,
209+
* .num_regions = 1,
210+
* };
211+
* struct drm_i915_gem_create_ext create_ext = {
212+
* .size = 16 * PAGE_SIZE,
213+
* .extensions = (uintptr_t)&regions,
214+
* };
215+
*
216+
* int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext);
217+
* if (err) ...
218+
*
219+
* At which point we get the object handle in &drm_i915_gem_create_ext.handle,
220+
* along with the final object size in &drm_i915_gem_create_ext.size, which
221+
* should account for any rounding up, if required.
222+
*/
223+
struct drm_i915_gem_create_ext_memory_regions {
224+
/** @base: Extension link. See struct i915_user_extension. */
225+
struct i915_user_extension base;
226+
227+
/** @pad: MBZ */
228+
__u32 pad;
229+
/** @num_regions: Number of elements in the @regions array. */
230+
__u32 num_regions;
231+
/**
232+
* @regions: The regions/placements array.
233+
*
234+
* An array of struct drm_i915_gem_memory_class_instance.
235+
*/
236+
__u64 regions;
237+
};
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
=========================
2+
I915 DG1/LMEM RFC Section
3+
=========================
4+
5+
Upstream plan
6+
=============
7+
For upstream the overall plan for landing all the DG1 stuff and turning it for
8+
real, with all the uAPI bits is:
9+
10+
* Merge basic HW enabling of DG1(still without pciid)
11+
* Merge the uAPI bits behind special CONFIG_BROKEN(or so) flag
12+
* At this point we can still make changes, but importantly this lets us
13+
start running IGTs which can utilize local-memory in CI
14+
* Convert over to TTM, make sure it all keeps working. Some of the work items:
15+
* TTM shrinker for discrete
16+
* dma_resv_lockitem for full dma_resv_lock, i.e not just trylock
17+
* Use TTM CPU pagefault handler
18+
* Route shmem backend over to TTM SYSTEM for discrete
19+
* TTM purgeable object support
20+
* Move i915 buddy allocator over to TTM
21+
* MMAP ioctl mode(see `I915 MMAP`_)
22+
* SET/GET ioctl caching(see `I915 SET/GET CACHING`_)
23+
* Send RFC(with mesa-dev on cc) for final sign off on the uAPI
24+
* Add pciid for DG1 and turn on uAPI for real
25+
26+
New object placement and region query uAPI
27+
==========================================
28+
Starting from DG1 we need to give userspace the ability to allocate buffers from
29+
device local-memory. Currently the driver supports gem_create, which can place
30+
buffers in system memory via shmem, and the usual assortment of other
31+
interfaces, like dumb buffers and userptr.
32+
33+
To support this new capability, while also providing a uAPI which will work
34+
beyond just DG1, we propose to offer three new bits of uAPI:
35+
36+
DRM_I915_QUERY_MEMORY_REGIONS
37+
-----------------------------
38+
New query ID which allows userspace to discover the list of supported memory
39+
regions(like system-memory and local-memory) for a given device. We identify
40+
each region with a class and instance pair, which should be unique. The class
41+
here would be DEVICE or SYSTEM, and the instance would be zero, on platforms
42+
like DG1.
43+
44+
Side note: The class/instance design is borrowed from our existing engine uAPI,
45+
where we describe every physical engine in terms of its class, and the
46+
particular instance, since we can have more than one per class.
47+
48+
In the future we also want to expose more information which can further
49+
describe the capabilities of a region.
50+
51+
.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
52+
:functions: drm_i915_gem_memory_class drm_i915_gem_memory_class_instance drm_i915_memory_region_info drm_i915_query_memory_regions
53+
54+
GEM_CREATE_EXT
55+
--------------
56+
New ioctl which is basically just gem_create but now allows userspace to provide
57+
a chain of possible extensions. Note that if we don't provide any extensions and
58+
set flags=0 then we get the exact same behaviour as gem_create.
59+
60+
Side note: We also need to support PXP[1] in the near future, which is also
61+
applicable to integrated platforms, and adds its own gem_create_ext extension,
62+
which basically lets userspace mark a buffer as "protected".
63+
64+
.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
65+
:functions: drm_i915_gem_create_ext
66+
67+
I915_GEM_CREATE_EXT_MEMORY_REGIONS
68+
----------------------------------
69+
Implemented as an extension for gem_create_ext, we would now allow userspace to
70+
optionally provide an immutable list of preferred placements at creation time,
71+
in priority order, for a given buffer object. For the placements we expect
72+
them each to use the class/instance encoding, as per the output of the regions
73+
query. Having the list in priority order will be useful in the future when
74+
placing an object, say during eviction.
75+
76+
.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
77+
:functions: drm_i915_gem_create_ext_memory_regions
78+
79+
One fair criticism here is that this seems a little over-engineered[2]. If we
80+
just consider DG1 then yes, a simple gem_create.flags or something is totally
81+
all that's needed to tell the kernel to allocate the buffer in local-memory or
82+
whatever. However looking to the future we need uAPI which can also support
83+
upcoming Xe HP multi-tile architecture in a sane way, where there can be
84+
multiple local-memory instances for a given device, and so using both class and
85+
instance in our uAPI to describe regions is desirable, although specifically
86+
for DG1 it's uninteresting, since we only have a single local-memory instance.
87+
88+
Existing uAPI issues
89+
====================
90+
Some potential issues we still need to resolve.
91+
92+
I915 MMAP
93+
---------
94+
In i915 there are multiple ways to MMAP GEM object, including mapping the same
95+
object using different mapping types(WC vs WB), i.e multiple active mmaps per
96+
object. TTM expects one MMAP at most for the lifetime of the object. If it
97+
turns out that we have to backpedal here, there might be some potential
98+
userspace fallout.
99+
100+
I915 SET/GET CACHING
101+
--------------------
102+
In i915 we have set/get_caching ioctl. TTM doesn't let us to change this, but
103+
DG1 doesn't support non-snooped pcie transactions, so we can just always
104+
allocate as WB for smem-only buffers. If/when our hw gains support for
105+
non-snooped pcie transactions then we must fix this mode at allocation time as
106+
a new GEM extension.
107+
108+
This is related to the mmap problem, because in general (meaning, when we're
109+
not running on intel cpus) the cpu mmap must not, ever, be inconsistent with
110+
allocation mode.
111+
112+
Possible idea is to let the kernel picks the mmap mode for userspace from the
113+
following table:
114+
115+
smem-only: WB. Userspace does not need to call clflush.
116+
117+
smem+lmem: We only ever allow a single mode, so simply allocate this as uncached
118+
memory, and always give userspace a WC mapping. GPU still does snooped access
119+
here(assuming we can't turn it off like on DG1), which is a bit inefficient.
120+
121+
lmem only: always WC
122+
123+
This means on discrete you only get a single mmap mode, all others must be
124+
rejected. That's probably going to be a new default mode or something like
125+
that.
126+
127+
Links
128+
=====
129+
[1] https://patchwork.freedesktop.org/series/86798/
130+
131+
[2] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5599#note_553791

Documentation/gpu/rfc/index.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,3 +15,7 @@ host such documentation:
1515

1616
* Once the code has landed move all the documentation to the right places in
1717
the main core, helper or driver sections.
18+
19+
.. toctree::
20+
21+
i915_gem_lmem.rst

0 commit comments

Comments
 (0)