|
| 1 | +========================================= |
| 2 | +I915 GuC Submission/DRM Scheduler Section |
| 3 | +========================================= |
| 4 | + |
| 5 | +Upstream plan |
| 6 | +============= |
| 7 | +For upstream the overall plan for landing GuC submission and integrating the |
| 8 | +i915 with the DRM scheduler is: |
| 9 | + |
| 10 | +* Merge basic GuC submission |
| 11 | + * Basic submission support for all gen11+ platforms |
| 12 | + * Not enabled by default on any current platforms but can be enabled via |
| 13 | + modparam enable_guc |
| 14 | + * Lots of rework will need to be done to integrate with DRM scheduler so |
| 15 | + no need to nit pick everything in the code, it just should be |
| 16 | + functional, no major coding style / layering errors, and not regress |
| 17 | + execlists |
| 18 | + * Update IGTs / selftests as needed to work with GuC submission |
| 19 | + * Enable CI on supported platforms for a baseline |
| 20 | + * Rework / get CI heathly for GuC submission in place as needed |
| 21 | +* Merge new parallel submission uAPI |
| 22 | + * Bonding uAPI completely incompatible with GuC submission, plus it has |
| 23 | + severe design issues in general, which is why we want to retire it no |
| 24 | + matter what |
| 25 | + * New uAPI adds I915_CONTEXT_ENGINES_EXT_PARALLEL context setup step |
| 26 | + which configures a slot with N contexts |
| 27 | + * After I915_CONTEXT_ENGINES_EXT_PARALLEL a user can submit N batches to |
| 28 | + a slot in a single execbuf IOCTL and the batches run on the GPU in |
| 29 | + paralllel |
| 30 | + * Initially only for GuC submission but execlists can be supported if |
| 31 | + needed |
| 32 | +* Convert the i915 to use the DRM scheduler |
| 33 | + * GuC submission backend fully integrated with DRM scheduler |
| 34 | + * All request queues removed from backend (e.g. all backpressure |
| 35 | + handled in DRM scheduler) |
| 36 | + * Resets / cancels hook in DRM scheduler |
| 37 | + * Watchdog hooks into DRM scheduler |
| 38 | + * Lots of complexity of the GuC backend can be pulled out once |
| 39 | + integrated with DRM scheduler (e.g. state machine gets |
| 40 | + simplier, locking gets simplier, etc...) |
| 41 | + * Execlists backend will minimum required to hook in the DRM scheduler |
| 42 | + * Legacy interface |
| 43 | + * Features like timeslicing / preemption / virtual engines would |
| 44 | + be difficult to integrate with the DRM scheduler and these |
| 45 | + features are not required for GuC submission as the GuC does |
| 46 | + these things for us |
| 47 | + * ROI low on fully integrating into DRM scheduler |
| 48 | + * Fully integrating would add lots of complexity to DRM |
| 49 | + scheduler |
| 50 | + * Port i915 priority inheritance / boosting feature in DRM scheduler |
| 51 | + * Used for i915 page flip, may be useful to other DRM drivers as |
| 52 | + well |
| 53 | + * Will be an optional feature in the DRM scheduler |
| 54 | + * Remove in-order completion assumptions from DRM scheduler |
| 55 | + * Even when using the DRM scheduler the backends will handle |
| 56 | + preemption, timeslicing, etc... so it is possible for jobs to |
| 57 | + finish out of order |
| 58 | + * Pull out i915 priority levels and use DRM priority levels |
| 59 | + * Optimize DRM scheduler as needed |
| 60 | + |
| 61 | +TODOs for GuC submission upstream |
| 62 | +================================= |
| 63 | + |
| 64 | +* Need an update to GuC firmware / i915 to enable error state capture |
| 65 | +* Open source tool to decode GuC logs |
| 66 | +* Public GuC spec |
| 67 | + |
| 68 | +New uAPI for basic GuC submission |
| 69 | +================================= |
| 70 | +No major changes are required to the uAPI for basic GuC submission. The only |
| 71 | +change is a new scheduler attribute: I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP. |
| 72 | +This attribute indicates the 2k i915 user priority levels are statically mapped |
| 73 | +into 3 levels as follows: |
| 74 | + |
| 75 | +* -1k to -1 Low priority |
| 76 | +* 0 Medium priority |
| 77 | +* 1 to 1k High priority |
| 78 | + |
| 79 | +This is needed because the GuC only has 4 priority bands. The highest priority |
| 80 | +band is reserved with the kernel. This aligns with the DRM scheduler priority |
| 81 | +levels too. |
| 82 | + |
| 83 | +Spec references: |
| 84 | +---------------- |
| 85 | +* https://www.khronos.org/registry/EGL/extensions/IMG/EGL_IMG_context_priority.txt |
| 86 | +* https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/chap5.html#devsandqueues-priority |
| 87 | +* https://spec.oneapi.com/level-zero/latest/core/api.html#ze-command-queue-priority-t |
| 88 | + |
| 89 | +New parallel submission uAPI |
| 90 | +============================ |
| 91 | +The existing bonding uAPI is completely broken with GuC submission because |
| 92 | +whether a submission is a single context submit or parallel submit isn't known |
| 93 | +until execbuf time activated via the I915_SUBMIT_FENCE. To submit multiple |
| 94 | +contexts in parallel with the GuC the context must be explicitly registered with |
| 95 | +N contexts and all N contexts must be submitted in a single command to the GuC. |
| 96 | +The GuC interfaces do not support dynamically changing between N contexts as the |
| 97 | +bonding uAPI does. Hence the need for a new parallel submission interface. Also |
| 98 | +the legacy bonding uAPI is quite confusing and not intuitive at all. Furthermore |
| 99 | +I915_SUBMIT_FENCE is by design a future fence, so not really something we should |
| 100 | +continue to support. |
| 101 | + |
| 102 | +The new parallel submission uAPI consists of 3 parts: |
| 103 | + |
| 104 | +* Export engines logical mapping |
| 105 | +* A 'set_parallel' extension to configure contexts for parallel |
| 106 | + submission |
| 107 | +* Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL |
| 108 | + |
| 109 | +Export engines logical mapping |
| 110 | +------------------------------ |
| 111 | +Certain use cases require BBs to be placed on engine instances in logical order |
| 112 | +(e.g. split-frame on gen11+). The logical mapping of engine instances can change |
| 113 | +based on fusing. Rather than making UMDs be aware of fusing, simply expose the |
| 114 | +logical mapping with the existing query engine info IOCTL. Also the GuC |
| 115 | +submission interface currently only supports submitting multiple contexts to |
| 116 | +engines in logical order which is a new requirement compared to execlists. |
| 117 | +Lastly, all current platforms have at most 2 engine instances and the logical |
| 118 | +order is the same as uAPI order. This will change on platforms with more than 2 |
| 119 | +engine instances. |
| 120 | + |
| 121 | +A single bit will be added to drm_i915_engine_info.flags indicating that the |
| 122 | +logical instance has been returned and a new field, |
| 123 | +drm_i915_engine_info.logical_instance, returns the logical instance. |
| 124 | + |
| 125 | +A 'set_parallel' extension to configure contexts for parallel submission |
| 126 | +------------------------------------------------------------------------ |
| 127 | +The 'set_parallel' extension configures a slot for parallel submission of N BBs. |
| 128 | +It is a setup step that must be called before using any of the contexts. See |
| 129 | +I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE or I915_CONTEXT_ENGINES_EXT_BOND for |
| 130 | +similar existing examples. Once a slot is configured for parallel submission the |
| 131 | +execbuf2 IOCTL can be called submitting N BBs in a single IOCTL. Initially only |
| 132 | +supports GuC submission. Execlists supports can be added later if needed. |
| 133 | + |
| 134 | +Add I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT and |
| 135 | +drm_i915_context_engines_parallel_submit to the uAPI to implement this |
| 136 | +extension. |
| 137 | + |
| 138 | +.. kernel-doc:: Documentation/gpu/rfc/i915_parallel_execbuf.h |
| 139 | + :functions: drm_i915_context_engines_parallel_submit |
| 140 | + |
| 141 | +Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL |
| 142 | +------------------------------------------------------------------- |
| 143 | +Contexts that have been configured with the 'set_parallel' extension can only |
| 144 | +submit N BBs in a single execbuf2 IOCTL. The BBs are either the last N objects |
| 145 | +in the drm_i915_gem_exec_object2 list or the first N if I915_EXEC_BATCH_FIRST is |
| 146 | +set. The number of BBs is implicit based on the slot submitted and how it has |
| 147 | +been configured by 'set_parallel' or other extensions. No uAPI changes are |
| 148 | +required to the execbuf2 IOCTL. |
0 commit comments