Skip to content

Commit 25fed6b

Browse files
committed
Merge tag 'drm-intel-gt-next-2021-08-06-1' of ssh://git.freedesktop.org/git/drm/drm-intel into drm-next
UAPI Changes: - Add I915_MMAP_OFFSET_FIXED On devices with local memory `I915_MMAP_OFFSET_FIXED` is the only valid type. On devices without local memory, this caching mode is invalid. As caching mode when specifying `I915_MMAP_OFFSET_FIXED`, WC or WB will be used, depending on the object placement on creation. WB will be used when the object can only exist in system memory, WC otherwise. Userspace: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11888 - Reinstate the mmap ioctl for (already released) integrated Gen12 platforms Rationale: Otherwise media driver breaks eg. for ADL-P. Long term goal is still to sunset the IOCTL even for integrated and require using mmap_offset. - Reject caching/set_domain IOCTLs on discrete Expected to become immutable property of the BO - Disallow changing context parameters after first use on Gen12 and earlier - Require setting context parameters at creation on platforms after Gen12 Rationale (for both): Allow less dynamic changes to the context to simplify the implementation and avoid user shooting theirselves in the foot. - Drop I915_CONTEXT_PARAM_RINGSIZE Userspace PR for compute-driver has not been merged - Drop I915_CONTEXT_PARAM_NO_ZEROMAP Userspace PR for libdrm / Beignet was never landed - Drop CONTEXT_CLONE API Userspace PR for Mesa was never landed - Drop getparam support for I915_CONTEXT_PARAM_ENGINES Only existed for symmetry wrt. setparam, never used. - Disallow bonding of virtual engines Drop the prep work, no hardware has been released needing it. - (Implicit) Disable gpu relocations Media userspace was the last userspace to still use them. They have converted so performance can be regained with an update. Core Changes: - Merge topic branch 'topic/i915-ttm-2021-06-11' (from Maarten) - Merge topic branch 'topic/revid_steppings' (from Matt R) - Merge topic branch 'topic/xehp-dg2-definitions-2021-07-21' (from Matt R) - Backmerges drm-next (Rodrigo) Driver Changes: - Initial workarounds for ADL-P (Clint) - Preliminary code for XeHP/DG2 (Stuart, Umesh, Matt R, Prathap, Ram, Venkata, Akeem, Tvrtko, John, Lucas) - Fix ADL-S DMA mask size to 39 bits (Tejas) - Remove code for CNL (Lucas) - Add ADL-P GuC/HuC firmwares (John) - Update HuC to 7.9.3 for TGL/ADL-S/RKL (John) - Fix -EDEADLK handling regression (Ville) - Implement Wa_1508744258 for DG1 and Gen12 iGFX (Jose) - Extend Wa_1406941453 to ADL-S (Jose) - Drop unnecessary workarounds per stepping for SKL/BXT/ICL (Matt R) - Use fuse info to enable SFC on Gen12 (Venkata) - Unconditionally flush the pages on acquire on EHL/JSL (Matt A) - Probe existence of backing struct pages upon userptr creation (Chris, Matt A) - Add an intermediate GEM proto-context to delay real context creation (Jason) - Implement SINGLE_TIMELINE with a syncobj (Jason) - Set the watchdog timeout directly in intel_context_set_gem (Jason) - Disallow userspace from creating contexts with too many engines (Jason) - Revert "drm/i915/gem: Asynchronous cmdparser" (Jason) - Revert "drm/i915: Propagate errors on awaiting already signaled fences" (Jason) - Revert "drm/i915: Skip over MI_NOOP when parsing" (Jason) - Revert "drm/i915: Shrink the GEM kmem_caches upon idling" (Daniel) - Always let TTM handle object migration (Jason) - Correct the locking and pin pattern for dma-buf (Thomas H, Michael R, Jason) - Migrate to system at dma-buf attach time (Thomas, Michael R) - MAJOR refactoring of the GuC backend code to allow for enabling on Gen11+ (Matt B, John, Michal Wa., Fernando, Daniele, Vinay) - Update GuC firmware interface to v62.0.0 (John, Michal Wa., Matt B) - Add GuCRC feature to hand over the control of HW RC6 to the GuC on Gen12+ when GuC submission is enabled (Vinay, Sujaritha, Daniele, John, Tvrtko) - Use the correct IRQ during resume and eliminate DRM IRQ midlayer (Thomas Z) - Add pipelined page migration and clearing (Chris, Thomas H) - Use TTM for system memory on discrete (Thomas H) - Implement object migration for display vs. dma-buf (Thomas H) - Perform execbuffer object locking as a separate step (Thomas H) - Add support for explicit L3BANK steering (Matt, Daniele) - Remove duplicated call to ops->pread (Daniel) - Fix pagefault disabling in the first execbuf slowpath (Daniel) - Simplify userptr locking (Thomas H) - Improvements to the GuC CTB code (Matt B, John) - Make GT workaround upper bounds exclusive (Matt R) - Check for nomodeset in i915_init() first (Daniel) - Delete now unused gpu reloc code (Daniel) - Document RFC plans for GuC submission, DRM scheduler and new parallel submit uAPI (Matt B) - Reintroduce buddy allocator this time with TTM (Matt A) - Support forcing page size with LMEM (Matt A) - Add i915_sched_engine to abstract a submission queue between backends (Matt B) - Use accelerated move in TTM (Ram) - Fix memory leaks from TTM backend (Thomas H) - Introduce WW transaction helper (Thomas H) - Improve debug Kconfig texts a bit (Daniel) - Unify user object creation code (Jason) - Use a table for i915_init/exit (Jason) - Move slabs to module init/exit (Daniel) - Remove now unused i915_globals (Daniel) - Extract i915_module.c (Daniel) - Consistently use adl-p/adl-s in WA comments (Jose) - Finish INTEL_GEN and friends conversion (Lucas) - Correct variable/function namings (Lucas) - Code checker fixes (Wan, Matt A) - Tracepoint improvements (Matt B) - Kerneldoc improvements (Tvrtko, Jason, Matt A, Maarten) - Selftest improvements (Chris, Matt A, Tejas, Thomas H, John, Matt B, Rahul, Vinay) Signed-off-by: Dave Airlie <[email protected]> From: Joonas Lahtinen <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2 parents 59b9d6b + 927dfdd commit 25fed6b

File tree

208 files changed

+17964
-8053
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

208 files changed

+17964
-8053
lines changed

Documentation/gpu/i915.rst

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -422,9 +422,16 @@ Batchbuffer Parsing
422422
User Batchbuffer Execution
423423
--------------------------
424424

425+
.. kernel-doc:: drivers/gpu/drm/i915/gem/i915_gem_context_types.h
426+
425427
.. kernel-doc:: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
426428
:doc: User command execution
427429

430+
Scheduling
431+
----------
432+
.. kernel-doc:: drivers/gpu/drm/i915/i915_scheduler_types.h
433+
:functions: i915_sched_engine
434+
428435
Logical Rings, Logical Ring Contexts and Execlists
429436
--------------------------------------------------
430437

@@ -518,6 +525,14 @@ GuC-based command submission
518525
.. kernel-doc:: drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
519526
:doc: GuC-based command submission
520527

528+
GuC ABI
529+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
530+
531+
.. kernel-doc:: drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
532+
.. kernel-doc:: drivers/gpu/drm/i915/gt/uc/abi/guc_communication_mmio_abi.h
533+
.. kernel-doc:: drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
534+
.. kernel-doc:: drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
535+
521536
HuC
522537
---
523538
.. kernel-doc:: drivers/gpu/drm/i915/gt/uc/intel_huc.c
Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
/* SPDX-License-Identifier: MIT */
2+
/*
3+
* Copyright © 2021 Intel Corporation
4+
*/
5+
6+
#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see i915_context_engines_parallel_submit */
7+
8+
/**
9+
* struct drm_i915_context_engines_parallel_submit - Configure engine for
10+
* parallel submission.
11+
*
12+
* Setup a slot in the context engine map to allow multiple BBs to be submitted
13+
* in a single execbuf IOCTL. Those BBs will then be scheduled to run on the GPU
14+
* in parallel. Multiple hardware contexts are created internally in the i915
15+
* run these BBs. Once a slot is configured for N BBs only N BBs can be
16+
* submitted in each execbuf IOCTL and this is implicit behavior e.g. The user
17+
* doesn't tell the execbuf IOCTL there are N BBs, the execbuf IOCTL knows how
18+
* many BBs there are based on the slot's configuration. The N BBs are the last
19+
* N buffer objects or first N if I915_EXEC_BATCH_FIRST is set.
20+
*
21+
* The default placement behavior is to create implicit bonds between each
22+
* context if each context maps to more than 1 physical engine (e.g. context is
23+
* a virtual engine). Also we only allow contexts of same engine class and these
24+
* contexts must be in logically contiguous order. Examples of the placement
25+
* behavior described below. Lastly, the default is to not allow BBs to
26+
* preempted mid BB rather insert coordinated preemption on all hardware
27+
* contexts between each set of BBs. Flags may be added in the future to change
28+
* both of these default behaviors.
29+
*
30+
* Returns -EINVAL if hardware context placement configuration is invalid or if
31+
* the placement configuration isn't supported on the platform / submission
32+
* interface.
33+
* Returns -ENODEV if extension isn't supported on the platform / submission
34+
* interface.
35+
*
36+
* .. code-block:: none
37+
*
38+
* Example 1 pseudo code:
39+
* CS[X] = generic engine of same class, logical instance X
40+
* INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
41+
* set_engines(INVALID)
42+
* set_parallel(engine_index=0, width=2, num_siblings=1,
43+
* engines=CS[0],CS[1])
44+
*
45+
* Results in the following valid placement:
46+
* CS[0], CS[1]
47+
*
48+
* Example 2 pseudo code:
49+
* CS[X] = generic engine of same class, logical instance X
50+
* INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
51+
* set_engines(INVALID)
52+
* set_parallel(engine_index=0, width=2, num_siblings=2,
53+
* engines=CS[0],CS[2],CS[1],CS[3])
54+
*
55+
* Results in the following valid placements:
56+
* CS[0], CS[1]
57+
* CS[2], CS[3]
58+
*
59+
* This can also be thought of as 2 virtual engines described by 2-D array
60+
* in the engines the field with bonds placed between each index of the
61+
* virtual engines. e.g. CS[0] is bonded to CS[1], CS[2] is bonded to
62+
* CS[3].
63+
* VE[0] = CS[0], CS[2]
64+
* VE[1] = CS[1], CS[3]
65+
*
66+
* Example 3 pseudo code:
67+
* CS[X] = generic engine of same class, logical instance X
68+
* INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
69+
* set_engines(INVALID)
70+
* set_parallel(engine_index=0, width=2, num_siblings=2,
71+
* engines=CS[0],CS[1],CS[1],CS[3])
72+
*
73+
* Results in the following valid and invalid placements:
74+
* CS[0], CS[1]
75+
* CS[1], CS[3] - Not logical contiguous, return -EINVAL
76+
*/
77+
struct drm_i915_context_engines_parallel_submit {
78+
/**
79+
* @base: base user extension.
80+
*/
81+
struct i915_user_extension base;
82+
83+
/**
84+
* @engine_index: slot for parallel engine
85+
*/
86+
__u16 engine_index;
87+
88+
/**
89+
* @width: number of contexts per parallel engine
90+
*/
91+
__u16 width;
92+
93+
/**
94+
* @num_siblings: number of siblings per context
95+
*/
96+
__u16 num_siblings;
97+
98+
/**
99+
* @mbz16: reserved for future use; must be zero
100+
*/
101+
__u16 mbz16;
102+
103+
/**
104+
* @flags: all undefined flags must be zero, currently not defined flags
105+
*/
106+
__u64 flags;
107+
108+
/**
109+
* @mbz64: reserved for future use; must be zero
110+
*/
111+
__u64 mbz64[3];
112+
113+
/**
114+
* @engines: 2-d array of engine instances to configure parallel engine
115+
*
116+
* length = width (i) * num_siblings (j)
117+
* index = j + i * num_siblings
118+
*/
119+
struct i915_engine_class_instance engines[0];
120+
121+
} __packed;
122+
Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
=========================================
2+
I915 GuC Submission/DRM Scheduler Section
3+
=========================================
4+
5+
Upstream plan
6+
=============
7+
For upstream the overall plan for landing GuC submission and integrating the
8+
i915 with the DRM scheduler is:
9+
10+
* Merge basic GuC submission
11+
* Basic submission support for all gen11+ platforms
12+
* Not enabled by default on any current platforms but can be enabled via
13+
modparam enable_guc
14+
* Lots of rework will need to be done to integrate with DRM scheduler so
15+
no need to nit pick everything in the code, it just should be
16+
functional, no major coding style / layering errors, and not regress
17+
execlists
18+
* Update IGTs / selftests as needed to work with GuC submission
19+
* Enable CI on supported platforms for a baseline
20+
* Rework / get CI heathly for GuC submission in place as needed
21+
* Merge new parallel submission uAPI
22+
* Bonding uAPI completely incompatible with GuC submission, plus it has
23+
severe design issues in general, which is why we want to retire it no
24+
matter what
25+
* New uAPI adds I915_CONTEXT_ENGINES_EXT_PARALLEL context setup step
26+
which configures a slot with N contexts
27+
* After I915_CONTEXT_ENGINES_EXT_PARALLEL a user can submit N batches to
28+
a slot in a single execbuf IOCTL and the batches run on the GPU in
29+
paralllel
30+
* Initially only for GuC submission but execlists can be supported if
31+
needed
32+
* Convert the i915 to use the DRM scheduler
33+
* GuC submission backend fully integrated with DRM scheduler
34+
* All request queues removed from backend (e.g. all backpressure
35+
handled in DRM scheduler)
36+
* Resets / cancels hook in DRM scheduler
37+
* Watchdog hooks into DRM scheduler
38+
* Lots of complexity of the GuC backend can be pulled out once
39+
integrated with DRM scheduler (e.g. state machine gets
40+
simplier, locking gets simplier, etc...)
41+
* Execlists backend will minimum required to hook in the DRM scheduler
42+
* Legacy interface
43+
* Features like timeslicing / preemption / virtual engines would
44+
be difficult to integrate with the DRM scheduler and these
45+
features are not required for GuC submission as the GuC does
46+
these things for us
47+
* ROI low on fully integrating into DRM scheduler
48+
* Fully integrating would add lots of complexity to DRM
49+
scheduler
50+
* Port i915 priority inheritance / boosting feature in DRM scheduler
51+
* Used for i915 page flip, may be useful to other DRM drivers as
52+
well
53+
* Will be an optional feature in the DRM scheduler
54+
* Remove in-order completion assumptions from DRM scheduler
55+
* Even when using the DRM scheduler the backends will handle
56+
preemption, timeslicing, etc... so it is possible for jobs to
57+
finish out of order
58+
* Pull out i915 priority levels and use DRM priority levels
59+
* Optimize DRM scheduler as needed
60+
61+
TODOs for GuC submission upstream
62+
=================================
63+
64+
* Need an update to GuC firmware / i915 to enable error state capture
65+
* Open source tool to decode GuC logs
66+
* Public GuC spec
67+
68+
New uAPI for basic GuC submission
69+
=================================
70+
No major changes are required to the uAPI for basic GuC submission. The only
71+
change is a new scheduler attribute: I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP.
72+
This attribute indicates the 2k i915 user priority levels are statically mapped
73+
into 3 levels as follows:
74+
75+
* -1k to -1 Low priority
76+
* 0 Medium priority
77+
* 1 to 1k High priority
78+
79+
This is needed because the GuC only has 4 priority bands. The highest priority
80+
band is reserved with the kernel. This aligns with the DRM scheduler priority
81+
levels too.
82+
83+
Spec references:
84+
----------------
85+
* https://www.khronos.org/registry/EGL/extensions/IMG/EGL_IMG_context_priority.txt
86+
* https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/chap5.html#devsandqueues-priority
87+
* https://spec.oneapi.com/level-zero/latest/core/api.html#ze-command-queue-priority-t
88+
89+
New parallel submission uAPI
90+
============================
91+
The existing bonding uAPI is completely broken with GuC submission because
92+
whether a submission is a single context submit or parallel submit isn't known
93+
until execbuf time activated via the I915_SUBMIT_FENCE. To submit multiple
94+
contexts in parallel with the GuC the context must be explicitly registered with
95+
N contexts and all N contexts must be submitted in a single command to the GuC.
96+
The GuC interfaces do not support dynamically changing between N contexts as the
97+
bonding uAPI does. Hence the need for a new parallel submission interface. Also
98+
the legacy bonding uAPI is quite confusing and not intuitive at all. Furthermore
99+
I915_SUBMIT_FENCE is by design a future fence, so not really something we should
100+
continue to support.
101+
102+
The new parallel submission uAPI consists of 3 parts:
103+
104+
* Export engines logical mapping
105+
* A 'set_parallel' extension to configure contexts for parallel
106+
submission
107+
* Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL
108+
109+
Export engines logical mapping
110+
------------------------------
111+
Certain use cases require BBs to be placed on engine instances in logical order
112+
(e.g. split-frame on gen11+). The logical mapping of engine instances can change
113+
based on fusing. Rather than making UMDs be aware of fusing, simply expose the
114+
logical mapping with the existing query engine info IOCTL. Also the GuC
115+
submission interface currently only supports submitting multiple contexts to
116+
engines in logical order which is a new requirement compared to execlists.
117+
Lastly, all current platforms have at most 2 engine instances and the logical
118+
order is the same as uAPI order. This will change on platforms with more than 2
119+
engine instances.
120+
121+
A single bit will be added to drm_i915_engine_info.flags indicating that the
122+
logical instance has been returned and a new field,
123+
drm_i915_engine_info.logical_instance, returns the logical instance.
124+
125+
A 'set_parallel' extension to configure contexts for parallel submission
126+
------------------------------------------------------------------------
127+
The 'set_parallel' extension configures a slot for parallel submission of N BBs.
128+
It is a setup step that must be called before using any of the contexts. See
129+
I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE or I915_CONTEXT_ENGINES_EXT_BOND for
130+
similar existing examples. Once a slot is configured for parallel submission the
131+
execbuf2 IOCTL can be called submitting N BBs in a single IOCTL. Initially only
132+
supports GuC submission. Execlists supports can be added later if needed.
133+
134+
Add I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT and
135+
drm_i915_context_engines_parallel_submit to the uAPI to implement this
136+
extension.
137+
138+
.. kernel-doc:: Documentation/gpu/rfc/i915_parallel_execbuf.h
139+
:functions: drm_i915_context_engines_parallel_submit
140+
141+
Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL
142+
-------------------------------------------------------------------
143+
Contexts that have been configured with the 'set_parallel' extension can only
144+
submit N BBs in a single execbuf2 IOCTL. The BBs are either the last N objects
145+
in the drm_i915_gem_exec_object2 list or the first N if I915_EXEC_BATCH_FIRST is
146+
set. The number of BBs is implicit based on the slot submitted and how it has
147+
been configured by 'set_parallel' or other extensions. No uAPI changes are
148+
required to the execbuf2 IOCTL.

Documentation/gpu/rfc/index.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,3 +19,7 @@ host such documentation:
1919
.. toctree::
2020

2121
i915_gem_lmem.rst
22+
23+
.. toctree::
24+
25+
i915_scheduler.rst

drivers/gpu/drm/i915/Kconfig.debug

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -207,6 +207,8 @@ config DRM_I915_LOW_LEVEL_TRACEPOINTS
207207
This provides the ability to precisely monitor engine utilisation
208208
and also analyze the request dependency resolving timeline.
209209

210+
Recommended for driver developers only.
211+
210212
If in doubt, say "N".
211213

212214
config DRM_I915_DEBUG_VBLANK_EVADE
@@ -220,6 +222,8 @@ config DRM_I915_DEBUG_VBLANK_EVADE
220222
is exceeded, even if there isn't an actual risk of missing
221223
the vblank.
222224

225+
Recommended for driver developers only.
226+
223227
If in doubt, say "N".
224228

225229
config DRM_I915_DEBUG_RUNTIME_PM
@@ -232,4 +236,6 @@ config DRM_I915_DEBUG_RUNTIME_PM
232236
runtime PM functionality. This may introduce overhead during
233237
driver loading, suspend and resume operations.
234238

239+
Recommended for driver developers only.
240+
235241
If in doubt, say "N"

0 commit comments

Comments
 (0)