summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/i915/i915_scheduler.c
AgeCommit message (Collapse)AuthorFilesLines
2020-05-25drm/i915: Don't set queue-priority hint when supressing the rescheduleChris Wilson1-8/+8
We recorded the execlists->queue_priority_hint update for the inflight request without kicking the tasklet. The next submitted request then failed to be scheduled as it had a lower priority than the hint, leaving the HW running with only the inflight request. Fixes: 6cebcf746f3f ("drm/i915: Tweak scheduler's kick_submission()") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200519063123.20673-1-chris@chris-wilson.co.uk (cherry picked from commit b86fc6e5e89e5645b43f57171c26740ef38f9f4a) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-05-14drm/i915: Drop no-semaphore boostingChris Wilson1-7/+4
Now that we have fast timeslicing on semaphores, we no longer need to prioritise none-semaphore work as we will yield any work blocked on a semaphore to the next in the queue. Previously with no timeslicing, blocking on the semaphore caused extremely bad scheduling with multiple clients utilising multiple rings. Now, there is no impact and we can remove the complication. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200513173504.28322-1-chris@chris-wilson.co.uk
2020-05-07drm/i915: Remove wait priority boostingChris Wilson1-11/+1
Upon waiting a request (when asked), we gave that request a small priority boost, not enough for it to cause preemption, but enough for it to be scheduled next before all equals. We also used that bit to give new clients a small priority boost, similar to FQ_CODEL, such that we favoured short interactive tasks ahead of long running streams. However, this is causing lots of complications with timeslicing where we both want to honour the boost and yet ignore it. Those complications cause unexpected user behaviour (tasks not being timesliced and run concurrently as epxected), and the easiest way to resolve that is to remove the boost. Hopefully, we can find a compromise again if we need to, but in theory timeslicing itself and future more advanced schedulers should give us the interactivity boost we seek. Testcase: igt/gem_exec_schedule/lateslice Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200507152338.7452-3-chris@chris-wilson.co.uk
2020-05-07drm/i915: Mark concurrent submissions with a weak-dependencyChris Wilson1-3/+3
We recorded the dependencies for WAIT_FOR_SUBMIT in order that we could correctly perform priority inheritance from the parallel branches to the common trunk. However, for the purpose of timeslicing and reset handling, the dependency is weak -- as we the pair of requests are allowed to run in parallel and not in strict succession. The real significance though is that this allows us to rearrange groups of WAIT_FOR_SUBMIT linked requests along the single engine, and so can resolve user level inter-batch scheduling dependencies from user semaphores. Fixes: c81471f5e95c ("drm/i915: Copy across scheduler behaviour flags across submit fences") Testcase: igt/gem_exec_fence/submit Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.6+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200507155109.8892-1-chris@chris-wilson.co.uk
2020-03-31drm/i915/gt: Include a few tracek for timeslicingChris Wilson1-0/+6
Add a few telltales to see when timeslicing is being enabled. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200331120502.14713-1-chris@chris-wilson.co.uk
2020-03-31drm/i915: Defer kicking the tasklet until all rescheduling is completeChris Wilson1-0/+4
Since we may kick more than engine, and may kick each one a couple of times, coalesce the tasklet execution to the end. This also ensures that we have the chance to run the tasklet immediately after priority bumping. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200331114852.11583-1-chris@chris-wilson.co.uk
2020-03-10drm/i915: Tweak scheduler's kick_submission()Chris Wilson1-1/+2
Skip useless priority bumping on adding a new dependency by making sure that we do update the priority if we would have rescheduled the active cotnext. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200310115947.6482-1-chris@chris-wilson.co.uk
2020-03-09drm/i915/execlsts: Mark up racy inspection of current i915_request priorityChris Wilson1-1/+1
[ 120.176548] BUG: KCSAN: data-race in __i915_schedule [i915] / effective_prio [i915] [ 120.176566] [ 120.176577] write to 0xffff8881e35e6540 of 4 bytes by task 730 on cpu 3: [ 120.176792] __i915_schedule+0x63e/0x920 [i915] [ 120.177007] __bump_priority+0x63/0x80 [i915] [ 120.177220] __i915_sched_node_add_dependency+0x258/0x300 [i915] [ 120.177438] i915_sched_node_add_dependency+0x50/0xa0 [i915] [ 120.177654] i915_request_await_dma_fence+0x1da/0x530 [i915] [ 120.177867] i915_request_await_object+0x2fe/0x470 [i915] [ 120.178081] i915_gem_do_execbuffer+0x45dc/0x4c20 [i915] [ 120.178292] i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915] [ 120.178309] drm_ioctl_kernel+0xe4/0x120 [ 120.178322] drm_ioctl+0x297/0x4c7 [ 120.178335] ksys_ioctl+0x89/0xb0 [ 120.178348] __x64_sys_ioctl+0x42/0x60 [ 120.178361] do_syscall_64+0x6e/0x2c0 [ 120.178375] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 120.178387] [ 120.178397] read to 0xffff8881e35e6540 of 4 bytes by interrupt on cpu 2: [ 120.178606] effective_prio+0x25/0xc0 [i915] [ 120.178812] process_csb+0xe8b/0x10a0 [i915] [ 120.179021] execlists_submission_tasklet+0x30/0x170 [i915] [ 120.179038] tasklet_action_common.isra.0+0x42/0xa0 [ 120.179053] __do_softirq+0xd7/0x2cd [ 120.179066] irq_exit+0xbe/0xe0 [ 120.179078] do_IRQ+0x51/0x100 [ 120.179090] ret_from_intr+0x0/0x1c [ 120.179104] cpuidle_enter_state+0x1b8/0x5d0 [ 120.179117] cpuidle_enter+0x50/0x90 [ 120.179131] do_idle+0x1a1/0x1f0 [ 120.179145] cpu_startup_entry+0x14/0x16 [ 120.179158] start_secondary+0x120/0x180 [ 120.179172] secondary_startup_64+0xa4/0xb0 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200309110934.868-5-chris@chris-wilson.co.uk
2020-03-06drm/i915: Always propagate the invocation to i915_scheduleChris Wilson1-4/+1
We only call i915_schedule() when we know we have changed the priority on a request and so require to propagate any change in priority to its signalers (for PI). By unconditionally checking all of our signalers, we avoid skipping changes made prior to construction of the request (as the request may be waited upon before submission when used in parallel). References: https://gitlab.freedesktop.org/drm/intel/issues/1318 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200306071614.2846708-2-chris@chris-wilson.co.uk
2020-02-20drm/i915: Double check bumping after the spinlockChris Wilson1-0/+3
In preparation for making GEM execbuf parallel, we need to be prepared to handle very early declaration of dependencies -- even before our signaler has itself been submitted. References: a79ca656b648 ("drm/i915: Push the wakeref->count deferral to the backend") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200220123608.1666271-1-chris@chris-wilson.co.uk
2020-02-20drm/i915/gt: Protect signaler walk with RCUChris Wilson1-3/+4
While we know that the waiters cannot disappear as we walk our list (only that they might be added), the same cannot be said for our signalers as they may be completed by the HW and retired as we process this request. Ergo we need to use rcu to protect the list iteration and remember to mark up the list_del_rcu. v2: Mark the deps as safe-for-rcu Fixes: 793c22617367 ("drm/i915/gt: Protect execlists_hold/unhold from new waiters") Fixes: 32ff621fd744 ("drm/i915/gt: Allow temporary suspension of inflight requests") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200220075025.1539375-1-chris@chris-wilson.co.uk
2020-02-07drm/i915/gt: Protect execlists_hold/unhold from new waitersChris Wilson1-1/+1
As we may add new waiters to a request as it is being run, we need to mark the list iteration as being safe for concurrent addition. v2: Mika spotted that we used the same trick for signalers_list, so warn the compiler about the lockless walk there as well. Fixes: 32ff621fd744 ("drm/i915/gt: Allow temporary suspension of inflight requests") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200207110213.2734386-1-chris@chris-wilson.co.uk
2020-02-07drm/i915/gt: Protect defer_request() from new waitersChris Wilson1-2/+4
Mika spotted <4>[17436.705441] general protection fault: 0000 [#1] PREEMPT SMP PTI <4>[17436.705447] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.5.0+ #1 <4>[17436.705449] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 3805 05/16/2018 <4>[17436.705512] RIP: 0010:__execlists_submission_tasklet+0xc4d/0x16e0 [i915] <4>[17436.705516] Code: c5 4c 8d 60 e0 75 17 e9 8c 07 00 00 49 8b 44 24 20 49 39 c5 4c 8d 60 e0 0f 84 7a 07 00 00 49 8b 5c 24 08 49 8b 87 80 00 00 00 <48> 39 83 d8 fe ff ff 75 d9 48 8b 83 88 fe ff ff a8 01 0f 84 b6 05 <4>[17436.705518] RSP: 0018:ffffc9000012ce80 EFLAGS: 00010083 <4>[17436.705521] RAX: ffff88822ae42000 RBX: 5a5a5a5a5a5a5a5a RCX: dead000000000122 <4>[17436.705523] RDX: ffff88822ae42588 RSI: ffff8881e32a7908 RDI: ffff8881c429fd48 <4>[17436.705525] RBP: ffffc9000012cf00 R08: ffff88822ae42588 R09: 00000000fffffffe <4>[17436.705527] R10: ffff8881c429fb80 R11: 00000000a677cf08 R12: ffff8881c42a0aa8 <4>[17436.705529] R13: ffff8881c429fd38 R14: ffff88822ae42588 R15: ffff8881c429fb80 <4>[17436.705532] FS: 0000000000000000(0000) GS:ffff88822ed00000(0000) knlGS:0000000000000000 <4>[17436.705534] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[17436.705536] CR2: 00007f858c76d000 CR3: 0000000005610003 CR4: 00000000003606e0 <4>[17436.705538] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>[17436.705540] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 <4>[17436.705542] Call Trace: <4>[17436.705545] <IRQ> <4>[17436.705603] execlists_submission_tasklet+0xc0/0x130 [i915] which is us consuming a partially initialised new waiter in defer_requests(). We can prevent this by initialising the i915_dependency prior to making it visible, and since we are using a concurrent list_add/iterator mark them up to the compiler. Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200206204915.2636606-2-chris@chris-wilson.co.uk
2020-01-16drm/i915: Keep track of request among the scheduling listsChris Wilson1-12/+10
If we keep track of when the i915_request.sched.link is on the HW runlist, or in the priority queue we can simplify our interactions with the request (such as during rescheduling). This also simplifies the next patch where we introduce a new in-between list, for requests that are ready but neither on the run list or in the queue. v2: Update i915_sched_node.link explanation for current usage where it is a link on both the queue and on the runlists. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-1-chris@chris-wilson.co.uk
2019-12-20drm/i915: Drop GEM context as a direct link from i915_requestChris Wilson1-1/+1
Keep the intel_context as being the primary state for i915_request, with the GEM context a backpointer from the low level state for the rarer cases we need client information. Our goal is to remove such references to clients from the backend, and leave the HW submission agnostic to client interfaces and self-contained. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191220101230.256839-1-chris@chris-wilson.co.uk
2019-12-11drm/i915: Copy across scheduler behaviour flags across submit fencesChris Wilson1-1/+0
We want the bonded request to have the same scheduler properties as its master so that it is placed at the same depth in the queue. For example, consider we have requests A, B and B', where B & B' are a bonded pair to run in parallel on two engines. A -> B \- B' B will run after A and so may be scheduled on an idle engine and wait on A using a semaphore. B' sees B being executed and so enters the queue on the same engine as A. As B' did not inherit the semaphore-chain from B, it may have higher precedence than A and so preempts execution. However, B' then sits on a semaphore waiting for B, who is waiting for A, who is blocked by B. Ergo B' needs to inherit the scheduler properties from B (i.e. the semaphore chain) so that it is scheduled with the same priority as B and will not be executed ahead of Bs dependencies. Furthermore, to prevent the priorities changing via the expose fence on B', we need to couple in the dependencies for PI. This requires us to relax our sanity-checks that dependencies are strictly in order. v2: Synchronise (B, B') execution on all platforms, regardless of using a scheduler, any no-op syncs should be elided. Fixes: ee1136908e9b ("drm/i915/execlists: Virtual engine bonding") Closes: https://gitlab.freedesktop.org/drm/intel/issues/464 Testcase: igt/gem_exec_balancer/bonded-chain Testcase: igt/gem_exec_balancer/bonded-semaphore Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191210151332.3902215-1-chris@chris-wilson.co.uk
2019-11-22drm/i915: Use a ctor for TYPESAFE_BY_RCU i915_requestChris Wilson1-0/+12
As we start peeking into requests for longer and longer, e.g. incorporating use of spinlocks when only protected by an rcu_read_lock(), we need to be careful in how we reset the request when recycling and need to preserve any barriers that may still be in use as the request is reset for reuse. Quoting Linus Torvalds: > If there is refcounting going on then why use SLAB_TYPESAFE_BY_RCU? .. because the object can be accessed (by RCU) after the refcount has gone down to zero, and the thing has been released. That's the whole and only point of SLAB_TYPESAFE_BY_RCU. That flag basically says: "I may end up accessing this object *after* it has been free'd, because there may be RCU lookups in flight" This has nothing to do with constructors. It's ok if the object gets reused as an object of the same type and does *not* get re-initialized, because we're perfectly fine seeing old stale data. What it guarantees is that the slab isn't shared with any other kind of object, _and_ that the underlying pages are free'd after an RCU quiescent period (so the pages aren't shared with another kind of object either during an RCU walk). And it doesn't necessarily have to have a constructor, because the thing that a RCU walk will care about is (a) guaranteed to be an object that *has* been on some RCU list (so it's not a "new" object) (b) the RCU walk needs to have logic to verify that it's still the *same* object and hasn't been re-used as something else. In contrast, a SLAB_TYPESAFE_BY_RCU memory gets free'd and re-used immediately, but because it gets reused as the same kind of object, the RCU walker can "know" what parts have meaning for re-use, in a way it couidn't if the re-use was random. That said, it *is* subtle, and people should be careful. > So the re-use might initialize the fields lazily, not necessarily using a ctor. If you have a well-defined refcount, and use "atomic_inc_not_zero()" to guard the speculative RCU access section, and use "atomic_dec_and_test()" in the freeing section, then you should be safe wrt new allocations. If you have a completely new allocation that has "random stale content", you know that it cannot be on the RCU list, so there is no speculative access that can ever see that random content. So the only case you need to worry about is a re-use allocation, and you know that the refcount will start out as zero even if you don't have a constructor. So you can think of the refcount itself as always having a zero constructor, *BUT* you need to be careful with ordering. In particular, whoever does the allocation needs to then set the refcount to a non-zero value *after* it has initialized all the other fields. And in particular, it needs to make sure that it uses the proper memory ordering to do so. NOTE! One thing to be very worried about is that re-initializing whatever RCU lists means that now the RCU walker may be walking on the wrong list so the walker may do the right thing for this particular entry, but it may miss walking *other* entries. So then you can get spurious lookup failures, because the RCU walker never walked all the way to the end of the right list. That ends up being a much more subtle bug. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191122094924.629690-1-chris@chris-wilson.co.uk
2019-11-04drm/i915: Protect request peeking with RCUChris Wilson1-2/+7
Since the execlists_active() is no longer protected by the engine->active.lock, we need to protect the request pointer with RCU to prevent it being freed as we evaluate whether or not we need to preempt. Fixes: df403069029d ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock") Fixes: 13ed13a4dcbf ("drm/i915: Don't set queue_priority_hint if we don't kick the submission") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191104090158.2959-2-chris@chris-wilson.co.uk
2019-10-21drm/i915: Don't set queue_priority_hint if we don't kick the submissionChris Wilson1-15/+22
If we change the priority of the active context, then it has no impact on the decision of whether to preempt the active context -- we don't preempt the context with itself. In this situation, we elide the tasklet rescheduling and should *not* be marking up the queue_priority_hint as that may mask a later submission where we decide we don't have to kick the tasklet as a higher priority submission is pending (spoiler alert, it was not). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191021080226.537-1-chris@chris-wilson.co.uk
2019-10-18drm/i915/execlists: Don't merely skip submission if maybe timeslicingChris Wilson1-2/+15
Normally, we try and skip submission if ELSP[1] is filled. However, we may desire to enable timeslicing due to the queue priority, even if ELSP[1] itself does not require timeslicing. That is the queue is equal priority to ELSP[0] and higher priority then ELSP[1]. Previously, we would wait until the context switch to preempt the current ELSP[1], but with timeslicing, we want to preempt ELSP[0] and replace it with the queue. In writing the test case, it become quickly apparent that we were also suppressing the tasklet during promotion and so failing to notice when the queue started requiring timeslicing. Fixes: 2229adc81380 ("drm/i915/execlist: Trim immediate timeslice expiry") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191018072027.31948-1-chris@chris-wilson.co.uk
2019-08-13drm/i915: Push the wakeref->count deferral to the backendChris Wilson1-2/+1
If the backend wishes to defer the wakeref parking, make it responsible for unlocking the wakeref (i.e. bumping the counter). This allows it to time the unlock much more carefully in case it happens to needs the wakeref to be active during its deferral. For instance, during engine parking we may choose to emit an idle barrier (a request). To do so, we borrow the engine->kernel_context timeline and to ensure exclusive access we keep the engine->wakeref.count as 0. However, to submit that request to HW may require a intel_engine_pm_get() (e.g. to keep the submission tasklet alive) and before we allow that we have to rewake our wakeref to avoid a recursive deadlock. <4> [257.742916] IRQs not enabled as expected <4> [257.742930] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:169 __local_bh_enable_ip+0xa9/0x100 <4> [257.742936] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 btusb btrtl btbcm btintel snd_hda_intel snd_intel_nhlt bluetooth snd_hda_codec coretemp snd_hwdep crct10dif_pclmul snd_hda_core crc32_pclmul ecdh_generic ecc ghash_clmulni_intel snd_pcm r8169 realtek lpc_ich prime_numbers i2c_hid <4> [257.742991] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G U W 5.3.0-rc3-g5d0a06cd532c-drmtip_340+ #1 <4> [257.742998] Hardware name: GIGABYTE GB-BXBT-1900/MZBAYAB-00, BIOS F6 02/17/2015 <4> [257.743008] RIP: 0010:__local_bh_enable_ip+0xa9/0x100 <4> [257.743017] Code: 37 5b 5d c3 8b 80 50 08 00 00 85 c0 75 a9 80 3d 0b be 25 01 00 75 a0 48 c7 c7 f3 0c 06 ac c6 05 fb bd 25 01 01 e8 77 84 ff ff <0f> 0b eb 89 48 89 ef e8 3b 41 06 00 eb 98 e8 e4 5c f4 ff 5b 5d c3 <4> [257.743025] RSP: 0018:ffffa78600003cb8 EFLAGS: 00010086 <4> [257.743035] RAX: 0000000000000000 RBX: 0000000000000200 RCX: 0000000000010302 <4> [257.743042] RDX: 0000000080010302 RSI: 0000000000000000 RDI: 00000000ffffffff <4> [257.743050] RBP: ffffffffc0494bb3 R08: 0000000000000000 R09: 0000000000000001 <4> [257.743058] R10: 0000000014c8f0e9 R11: 00000000fee2ff8e R12: ffffa23ba8c38008 <4> [257.743065] R13: ffffa23bacc579c0 R14: ffffa23bb7db0f60 R15: ffffa23b9cc8c430 <4> [257.743074] FS: 0000000000000000(0000) GS:ffffa23bbba00000(0000) knlGS:0000000000000000 <4> [257.743082] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [257.743089] CR2: 00007fe477b20778 CR3: 000000011f72a000 CR4: 00000000001006f0 <4> [257.743096] Call Trace: <4> [257.743104] <IRQ> <4> [257.743265] __i915_request_commit+0x240/0x5d0 [i915] <4> [257.743427] ? __i915_request_create+0x228/0x4c0 [i915] <4> [257.743584] __engine_park+0x64/0x250 [i915] <4> [257.743730] ____intel_wakeref_put_last+0x1c/0x70 [i915] <4> [257.743878] i915_sample+0x2ee/0x310 [i915] <4> [257.744030] ? i915_pmu_cpu_offline+0xb0/0xb0 [i915] <4> [257.744040] __hrtimer_run_queues+0x11e/0x4b0 <4> [257.744068] hrtimer_interrupt+0xea/0x250 <4> [257.744079] ? lockdep_hardirqs_off+0x79/0xd0 <4> [257.744101] smp_apic_timer_interrupt+0x96/0x280 <4> [257.744114] apic_timer_interrupt+0xf/0x20 <4> [257.744125] RIP: 0010:__do_softirq+0xb3/0x4ae v2: Keep the priority_hint assert v3: That assert was desperately trying to point out my bug. Sorry, little assert. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111378 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190813190705.23869-1-chris@chris-wilson.co.uk
2019-06-20drm/i915/execlists: Minimalistic timeslicingChris Wilson1-0/+1
If we have multiple contexts of equal priority pending execution, activate a timer to demote the currently executing context in favour of the next in the queue when that timeslice expires. This enforces fairness between contexts (so long as they allow preemption -- forced preemption, in the future, will kick those who do not obey) and allows us to avoid userspace blocking forward progress with e.g. unbounded MI_SEMAPHORE_WAIT. For the starting point here, we use the jiffie as our timeslice so that we should be reasonably efficient wrt frequent CPU wakeups. Testcase: igt/gem_exec_scheduler/semaphore-resolve Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190620142052.19311-2-chris@chris-wilson.co.uk
2019-06-20drm/i915/execlists: Preempt-to-busyChris Wilson1-2/+1
When using a global seqno, we required a precise stop-the-workd event to handle preemption and unwind the global seqno counter. To accomplish this, we would preempt to a special out-of-band context and wait for the machine to report that it was idle. Given an idle machine, we could very precisely see which requests had completed and which we needed to feed back into the run queue. However, now that we have scrapped the global seqno, we no longer need to precisely unwind the global counter and only track requests by their per-context seqno. This allows us to loosely unwind inflight requests while scheduling a preemption, with the enormous caveat that the requests we put back on the run queue are still _inflight_ (until the preemption request is complete). This makes request tracking much more messy, as at any point then we can see a completed request that we believe is not currently scheduled for execution. We also have to be careful not to rewind RING_TAIL past RING_HEAD on preempting to the running context, and for this we use a semaphore to prevent completion of the request before continuing. To accomplish this feat, we change how we track requests scheduled to the HW. Instead of appending our requests onto a single list as we submit, we track each submission to ELSP as its own block. Then upon receiving the CS preemption event, we promote the pending block to the inflight block (discarding what was previously being tracked). As normal CS completion events arrive, we then remove stale entries from the inflight tracker. v2: Be a tinge paranoid and ensure we flush the write into the HWS page for the GPU semaphore to pick in a timely fashion. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190620142052.19311-1-chris@chris-wilson.co.uk
2019-06-14drm/i915: Replace engine->timeline with a plain listChris Wilson1-19/+19
To continue the onslaught of removing the assumption of a global execution ordering, another casualty is the engine->timeline. Without an actual timeline to track, it is overkill and we can replace it with a much less grand plain list. We still need a list of requests inflight, for the simple purpose of finding inflight requests (for retiring, resetting, preemption etc). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-3-chris@chris-wilson.co.uk
2019-05-22drm/i915: Load balancing across a virtual engineChris Wilson1-3/+16
Having allowed the user to define a set of engines that they will want to only use, we go one step further and allow them to bind those engines into a single virtual instance. Submitting a batch to the virtual engine will then forward it to any one of the set in a manner as best to distribute load. The virtual engine has a single timeline across all engines (it operates as a single queue), so it is not able to concurrently run batches across multiple engines by itself; that is left up to the user to submit multiple concurrent batches to multiple queues. Multiple users will be load balanced across the system. The mechanism used for load balancing in this patch is a late greedy balancer. When a request is ready for execution, it is added to each engine's queue, and when an engine is ready for its next request it claims it from the virtual engine. The first engine to do so, wins, i.e. the request is executed at the earliest opportunity (idle moment) in the system. As not all HW is created equal, the user is still able to skip the virtual engine and execute the batch on a specific engine, all within the same queue. It will then be executed in order on the correct engine, with execution on other virtual engines being moved away due to the load detection. A couple of areas for potential improvement left! - The virtual engine always take priority over equal-priority tasks. Mostly broken up by applying FQ_CODEL rules for prioritising new clients, and hopefully the virtual and real engines are not then congested (i.e. all work is via virtual engines, or all work is to the real engine). - We require the breadcrumb irq around every virtual engine request. For normal engines, we eliminate the need for the slow round trip via interrupt by using the submit fence and queueing in order. For virtual engines, we have to allow any job to transfer to a new ring, and cannot coalesce the submissions, so require the completion fence instead, forcing the persistent use of interrupts. - We only drip feed single requests through each virtual engine and onto the physical engines, even if there was enough work to fill all ELSP, leaving small stalls with an idle CS event at the end of every request. Could we be greedy and fill both slots? Being lazy is virtuous for load distribution on less-than-full workloads though. Other areas of improvement are more general, such as reducing lock contention, reducing dispatch overhead, looking at direct submission rather than bouncing around tasklets etc. sseu: Lift the restriction to allow sseu to be reconfigured on virtual engines composed of RENDER_CLASS (rcs). v2: macroize check_user_mbz() v3: Cancel virtual engines on wedging v4: Commence commenting v5: Replace 64b sibling_mask with a list of class:instance v6: Drop the one-element array in the uabi v7: Assert it is an virtual engine in to_virtual_engine() v8: Skip over holes in [class][inst] so we can selftest with (vcs0, vcs2) Link: https://github.com/intel/media-driver/pull/283 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-6-chris@chris-wilson.co.uk
2019-05-17drm/i915: Bump signaler priority on adding a waiterChris Wilson1-0/+11
The handling of the no-preemption priority level imposes the restriction that we need to maintain the implied ordering even though preemption is disabled. Otherwise we may end up with an AB-BA deadlock across multiple engine due to a real preemption event reordering the no-preemption WAITs. To resolve this issue we currently promote all requests to WAIT on unsubmission, however this interferes with the timeslicing requirement that we do not apply any implicit promotion that will defeat the round-robin timeslice list. (If we automatically promote the active request it will go back to the head of the queue and not the tail!) So we need implicit promotion to prevent reordering around semaphores where we are not allowed to preempt, and we must avoid implicit promotion on unsubmission. So instead of at unsubmit, if we apply that implicit promotion on adding the dependency, we avoid the semaphore deadlock and we also reduce the gains made by the promotion for user space waiting. Furthermore, by keeping the earlier dependencies at a higher level, we reduce the search space for timeslicing without altering runtime scheduling too badly (no dependencies at all will be assigned a higher priority for rrul). v2: Limit the bump to external edges (as originally intended) i.e. between contexts and out to the user. Testcase: igt/gem_concurrent_blit Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-3-chris@chris-wilson.co.uk
2019-05-13drm/i915: Check for no-op priority changes firstChris Wilson1-2/+2
In all likelihood, the priority and node are already in the CPU cache and by checking them first, we can avoid having to chase the *request->hwsp for the current breadcrumb. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190513120102.29660-3-chris@chris-wilson.co.uk
2019-05-13drm/i915: Pass i915_sched_node around internallyChris Wilson1-17/+19
To simplify the next patch, update bump_priority and schedule to accept the internal i915_sched_ndoe directly and not expect a request pointer. add/remove: 0/0 grow/shrink: 2/1 up/down: 8/-15 (-7) Function old new delta i915_schedule_bump_priority 109 113 +4 i915_schedule 50 54 +4 __i915_schedule 922 907 -15 v2: Adopt node for the old rq local, since it no longer is a request but the origin node. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190513120102.29660-2-chris@chris-wilson.co.uk
2019-05-13drm/i915: Rearrange i915_scheduler.cChris Wilson1-105/+105
To avoid pulling in a forward declaration in the next patch, move the i915_sched_node handling to after the main dfs of the scheduler. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190513120102.29660-1-chris@chris-wilson.co.uk
2019-05-07drm/i915: Only reschedule the submission tasklet if preemption is possibleChris Wilson1-15/+19
If we couple the scheduler more tightly with the execlists policy, we can apply the preemption policy to the question of whether we need to kick the tasklet at all for this priority bump. v2: Rephrase it as a core i915 policy and not an execlists foible. v3: Pull the kick together. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190507122544.12698-1-chris@chris-wilson.co.uk
2019-04-11drm/i915: Bump ready tasks ahead of busywaitsChris Wilson1-10/+11
Consider two tasks that are running in parallel on a pair of engines (vcs0, vcs1), but then must complete on a shared engine (rcs0). To maximise throughput, we want to run the first ready task on rcs0 (i.e. the first task that completes on either of vcs0 or vcs1). When using semaphores, however, we will instead queue onto rcs in submission order. To resolve this incorrect ordering, we want to re-evaluate the priority queue when each of the request is ready. Normally this happens because we only insert into the priority queue requests that are ready, but with semaphores we are inserting ahead of their readiness and to compensate we penalize those tasks with reduced priority (so that tasks that do not need to busywait should naturally be run first). However, given a series of tasks that each use semaphores, the queue degrades into submission fifo rather than readiness fifo, and so to counter this we give a small boost to semaphore users as their dependent tasks are completed (and so we no longer require any busywait prior to running the user task as they are then ready themselves). v2: Fixup irqsave for schedule_lock (Tvrtko) Testcase: igt/gem_exec_schedule/semaphore-codependency Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Cc: Dmitry Ermilov <dmitry.ermilov@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190409152922.23894-1-chris@chris-wilson.co.uk
2019-04-02drm/i915: Only emit one semaphore per requestChris Wilson1-2/+3
Ideally we only need one semaphore per ring to accommodate waiting on multiple engines in parallel. However, since we do not know which fences we will finally be waiting on, we emit a semaphore for every fence. It turns out to be quite easy to trick ourselves into exhausting our ringbuffer causing an error, just by feeding in a batch that depends on several thousand contexts. Since we never can be waiting on more than one semaphore in parallel (other than perhaps the desire to busywait on multiple engines), just pick the first fence for our semaphore. If we pick the wrong fence to busywait on, we just miss an opportunity to reduce latency. An adaption might be to use sched.flags as either a semaphore counter, or to track the first busywait on each engine, converting it back to a single use bit prior to closing the request. v2: Track first semaphore used per-engine (this caters for our basic igt that semaphores are working). Reported-by: Mika Kuoppala <mika.kuoppala@intel.com> Testcase: igt/gem_exec_fence/long-history Fixes: e88619646971 ("drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190401162641.10963-3-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2019-03-06drm/i915: Use i915_global_register()Chris Wilson1-12/+20
Rather than manually add every new global into each hook, use i915_global_register() function and keep a list of registered globals to invoke instead. However, I haven't found a way for random drivers to add an .init table to avoid having to manually add ourselves to i915_globals_init() each time. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190305213830.18094-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2019-03-01drm/i915: Prioritise non-busywait semaphore workloadsChris Wilson1-0/+6
We don't want to busywait on the GPU if we have other work to do. If we give non-busywaiting workloads higher (initial) priority than workloads that require a busywait, we will prioritise work that is ready to run immediately. We then also have to be careful that we don't give earlier semaphores an accidental boost because later work doesn't wait on other rings, hence we keep a history of semaphore usage of the dependency chain. v2: Stop rolling the bits into a chain and just use a flag in case this request or any of our dependencies use a semaphore. The rolling around was contagious as Tvrtko was heard to fall off his chair. Testcase: igt/gem_exec_schedule/semaphore Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-4-chris@chris-wilson.co.uk
2019-02-28drm/i915/execlists: Suppress mere WAIT preemptionChris Wilson1-1/+0
WAIT is occasionally suppressed by virtue of preempted requests being promoted to NEWCLIENT if they have not all ready received that boost. Make this consistent for all WAIT boosts that they are not allowed to preempt executing contexts and are merely granted the right to be at the front of the queue for the next execution slot. This is in keeping with the desire that the WAIT boost be a minor tweak that does not give excessive promotion to its user and open ourselves to trivial abuse. The problem with the inconsistent WAIT preemption becomes more apparent as the preemption is propagated across the engines, where one engine may preempt and the other not, and we be relying on the exact execution order being consistent across engines (e.g. using HW semaphores to coordinate parallel execution). v2: Also protect GuC submission from false preemption loops. v3: Build bug safeguards and better debug messages for st. v4: Do the priority bumping in unsubmit (i.e. on preemption/reset unwind), applying it earlier during submit causes out-of-order execution combined with execute fences. v5: Call sw_fence_fini for our dummy request (Matthew) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190228220639.3173-1-chris@chris-wilson.co.uk
2019-02-28drm/i915: Make request allocation caches globalChris Wilson1-14/+52
As kmem_caches share the same properties (size, allocation/free behaviour) for all potential devices, we can use global caches. While this potential has worse fragmentation behaviour (one can argue that different devices would have different activity lifetimes, but you can also argue that activity is temporal across the system) it is the default behaviour of the system at large to amalgamate matching caches. The benefit for us is much reduced pointer dancing along the frequent allocation paths. v2: Defer shrinking until after a global grace period for futureproofing multiple consumers of the slab caches, similar to the current strategy for avoiding shrinking too early. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190228102035.5857-1-chris@chris-wilson.co.uk
2019-02-27drm/i915: Skip scanning for signalers if we are already inflightChris Wilson1-0/+9
When a request has its priority changed, we traverse the graph of all of its signalers to raise their priorities to match (priority inheritance). If the request has already started executing its payload, we know that all of its signalers must have signaled and we do not need to process our list of signalers. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190226102404.29153-1-chris@chris-wilson.co.uk
2019-02-11drm/i915: Reacquire priolist cache after dropping the engine lockChris Wilson1-10/+17
If we drop the engine lock, we may run execlists_dequeue which may free the priolist. Therefore if we ever drop the execution lock on the engine, we have to discard our cache and refetch the priolist to ensure we do not use a stale pointer. [ 506.418935] [IGT] gem_exec_whisper: starting subtest contexts-priority [ 593.240825] general protection fault: 0000 [#1] SMP [ 593.240863] CPU: 1 PID: 494 Comm: gem_exec_whispe Tainted: G U 5.0.0-rc6+ #100 [ 593.240879] Hardware name: /NUC6CAYB, BIOS AYAPLCEL.86A.0029.2016.1124.1625 11/24/2016 [ 593.240965] RIP: 0010:__i915_schedule+0x1fe/0x320 [i915] [ 593.240981] Code: 48 8b 0c 24 48 89 c3 49 8b 45 28 49 8b 75 20 4c 89 3c 24 48 89 46 08 48 89 30 48 8b 43 08 48 89 4b 08 49 89 5d 20 49 89 45 28 <48> 89 08 45 39 a7 b8 03 00 00 7d 44 45 89 a7 b8 03 00 00 49 8b 85 [ 593.240999] RSP: 0018:ffffc90000057a60 EFLAGS: 00010046 [ 593.241013] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8882582d7870 RCX: ffff88826baba6f0 [ 593.241026] RDX: 0000000000000000 RSI: ffff8882582d6e70 RDI: ffff888273482194 [ 593.241049] RBP: ffffc90000057a68 R08: ffff8882582d7680 R09: ffff8882582d7840 [ 593.241068] R10: 0000000000000000 R11: ffffea00095ebe08 R12: 0000000000000728 [ 593.241105] R13: ffff88826baba6d0 R14: ffffc90000057a40 R15: ffff888273482158 [ 593.241120] FS: 00007f4613fb3900(0000) GS:ffff888277a80000(0000) knlGS:0000000000000000 [ 593.241133] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 593.241146] CR2: 00007f57d3c66a84 CR3: 000000026e2b6000 CR4: 00000000001406e0 [ 593.241158] Call Trace: [ 593.241233] i915_schedule+0x1f/0x30 [i915] [ 593.241326] i915_request_add+0x1a9/0x290 [i915] [ 593.241393] i915_gem_do_execbuffer+0x45f/0x1150 [i915] [ 593.241411] ? init_object+0x49/0x80 [ 593.241425] ? ___slab_alloc.constprop.91+0x4b8/0x4e0 [ 593.241491] ? i915_gem_execbuffer2_ioctl+0x99/0x380 [i915] [ 593.241563] ? i915_gem_execbuffer_ioctl+0x270/0x270 [i915] [ 593.241629] i915_gem_execbuffer2_ioctl+0x1bb/0x380 [i915] [ 593.241705] ? i915_gem_execbuffer_ioctl+0x270/0x270 [i915] [ 593.241724] drm_ioctl_kernel+0x81/0xd0 [ 593.241738] drm_ioctl+0x1a7/0x310 [ 593.241803] ? i915_gem_execbuffer_ioctl+0x270/0x270 [i915] [ 593.241819] ? __update_load_avg_se+0x1c9/0x240 [ 593.241834] ? pick_next_entity+0x7e/0x120 [ 593.241851] do_vfs_ioctl+0x88/0x5d0 [ 593.241880] ksys_ioctl+0x35/0x70 [ 593.241894] __x64_sys_ioctl+0x11/0x20 [ 593.241907] do_syscall_64+0x44/0xf0 [ 593.241924] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 593.241940] RIP: 0033:0x7f4615ffe757 [ 593.241952] Code: 00 00 90 48 8b 05 39 a7 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 09 a7 0c 00 f7 d8 64 89 01 48 [ 593.241970] RSP: 002b:00007ffc1030ddf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 593.241984] RAX: ffffffffffffffda RBX: 00007ffc10324420 RCX: 00007f4615ffe757 [ 593.241997] RDX: 00007ffc1030e220 RSI: 0000000040406469 RDI: 0000000000000003 [ 593.242010] RBP: 00007ffc1030e220 R08: 00007f46160c9208 R09: 00007f46160c9240 [ 593.242022] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000040406469 [ 593.242038] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000 [ 593.242058] Modules linked in: i915 intel_gtt drm_kms_helper prime_numbers v2: Track the local engine cache and explicitly clear it when switching engine locks. Fixes: a02eb975be78 ("drm/i915/execlists: Cache the priolist when rescheduling") Testcase: igt/gem_exec_whisper/contexts-priority # rare! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: MichaƂ Winiarski <michal.winiarski@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190211204647.26723-1-chris@chris-wilson.co.uk
2019-01-29drm/i915: Replace global breadcrumbs with per-context interrupt trackingChris Wilson1-1/+1
A few years ago, see commit 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd"), the issue of handling multiple clients waiting in parallel was brought to our attention. The requirement was that every client should be woken immediately upon its request being signaled, without incurring any cpu overhead. To handle certain fragility of our hw meant that we could not do a simple check inside the irq handler (some generations required almost unbounded delays before we could be sure of seqno coherency) and so request completion checking required delegation. Before commit 688e6c725816, the solution was simple. Every client waiting on a request would be woken on every interrupt and each would do a heavyweight check to see if their request was complete. Commit 688e6c725816 introduced an rbtree so that only the earliest waiter on the global timeline would woken, and would wake the next and so on. (Along with various complications to handle requests being reordered along the global timeline, and also a requirement for kthread to provide a delegate for fence signaling that had no process context.) The global rbtree depends on knowing the execution timeline (and global seqno). Without knowing that order, we must instead check all contexts queued to the HW to see which may have advanced. We trim that list by only checking queued contexts that are being waited on, but still we keep a list of all active contexts and their active signalers that we inspect from inside the irq handler. By moving the waiters onto the fence signal list, we can combine the client wakeup with the dma_fence signaling (a dramatic reduction in complexity, but does require the HW being coherent, the seqno must be visible from the cpu before the interrupt is raised - we keep a timer backup just in case). Having previously fixed all the issues with irq-seqno serialisation (by inserting delays onto the GPU after each request instead of random delays on the CPU after each interrupt), we can rely on the seqno state to perfom direct wakeups from the interrupt handler. This allows us to preserve our single context switch behaviour of the current routine, with the only downside that we lose the RT priority sorting of wakeups. In general, direct wakeup latency of multiple clients is about the same (about 10% better in most cases) with a reduction in total CPU time spent in the waiter (about 20-50% depending on gen). Average herd behaviour is improved, but at the cost of not delegating wakeups on task_prio. v2: Capture fence signaling state for error state and add comments to warm even the most cold of hearts. v3: Check if the request is still active before busywaiting v4: Reduce the amount of pointer misdirection with list_for_each_safe and using a local i915_request variable inside the loops v5: Add a missing pluralisation to a purely informative selftest message. References: 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-2-chris@chris-wilson.co.uk
2019-01-29drm/i915/execlists: Suppress preempting selfChris Wilson1-4/+16
In order to avoid preempting ourselves, we currently refuse to schedule the tasklet if we reschedule an inflight context. However, this glosses over a few issues such as what happens after a CS completion event and we then preempt the newly executing context with itself, or if something else causes a tasklet_schedule triggering the same evaluation to preempt the active context with itself. However, when we avoid preempting ELSP[0], we still retain the preemption value as it may match a second preemption request within the same time period that we need to resolve after the next CS event. However, since we only store the maximum preemption priority seen, it may not match the subsequent event and so we should double check whether or not we actually do need to trigger a preempt-to-idle by comparing the top priorities from each queue. Later, this gives us a hook for finer control over deciding whether the preempt-to-idle is justified. The sequence of events where we end up preempting for no avail is: 1. Queue requests/contexts A, B 2. Priority boost A; no preemption as it is executing, but keep hint 3. After CS switch, B is less than hint, force preempt-to-idle 4. Resubmit B after idling v2: We can simplify a bunch of tests based on the knowledge that PI will ensure that earlier requests along the same context will have the highest priority. v3: Demonstrate the stale preemption hint with a selftest References: a2bf92e8cc16 ("drm/i915/execlists: Avoid kicking priority on the current context") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-4-chris@chris-wilson.co.uk
2019-01-29drm/i915: Rename execlists->queue_priority to queue_priority_hintChris Wilson1-6/+5
After noticing that we trigger preemption events for currently executing requests, as well as requests that complete before the preemption and attempting to suppress those preemption events, it is wise to not consider the queue_priority to be authoritative. As we only track the maximum priority seen between dequeue passes, if the maximum priority request is no longer available for dequeuing (it completed or is even executing on another engine), we have no knowledge of the previous queue_priority as it would require us to keep a full history of enqueued requests -- but we already have that history in the priolists! Rename the queue_priority to queue_priority_hint so that we do not confuse it as being exactly the maximum priority in the queue, but merely an indication that we have seen a new maximum priority value and as such we should check whether it should preempt the currently running request. v2: s/preempt_priority_hint/queue_priority_hint/ as preempt implies it being only used for the singular task of preemption and not the wider question of waking up due to a change in the queue. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-3-chris@chris-wilson.co.uk
2018-10-01drm/i915: Priority boost for waiting clientsChris Wilson1-6/+28
Latency is in the eye of the beholder. In the case where a client stops and waits for the gpu, give that request chain a small priority boost (not so that it overtakes higher priority clients, to preserve the external ordering) so that ideally the wait completes earlier. v2: Tvrtko recommends to keep the boost-from-user-stall as small as possible and to allow new client flows to be preferred for interactivity over stalls. Testcase: igt/gem_sync/switch-default Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181001144755.7978-3-chris@chris-wilson.co.uk
2018-10-01drm/i915: Pull scheduling under standalone lockChris Wilson1-0/+377
Currently, the backend scheduling code abuses struct_mutex into order to have a global lock to manipulate a temporary list (without widespread allocation) and to protect against list modifications. This is an extraneous coupling to struct_mutex and further can not extend beyond the local device. Pull all the code that needs to be under the one true lock into i915_scheduler.c, and make it so. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181001144755.7978-2-chris@chris-wilson.co.uk