summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/i915/intel_ringbuffer.h
AgeCommit message (Collapse)AuthorFilesLines
2019-04-16drm/i915: Drop bool return from breadcrumbs signalerChris Wilson1-2/+2
Since removal of the "missed interrupt detection" nobody used the result of whether or not we signaled anybody during that invocation, so now remove the return value. References: 789659f4307a ("drm/i915: Drop fake breadcrumb irq") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190416085218.431-1-chris@chris-wilson.co.uk
2019-04-11drm/i915/guc: Implement reset locallyChris Wilson1-1/+1
Before causing guc and execlists to diverge further (breaking guc in the process), take a copy of the current reset procedure and make it local to the guc submission backend Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190411130515.20716-1-chris@chris-wilson.co.uk
2019-04-10drm/i915: Only reset the pinned kernel contexts on resumeChris Wilson1-2/+1
On resume, we know that the only pinned contexts in danger of seeing corruption are the kernel context, and so we do not need to walk the list of all GEM contexts as we tracked them on each engine. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190410190120.830-1-chris@chris-wilson.co.uk
2019-03-26drm/i915: take a reference to uncore in the engine and use itDaniele Ceraolo Spurio1-12/+33
A few advantages: - Prepares us for the planned split of display uncore from GT uncore - Improves our engine-centric view of the world in the engine code and allows us to avoid jumping back to dev_priv. - Allows us to wrap accesses to engine register in nice macros that automatically pick the right mmio base. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190325214940.23632-10-daniele.ceraolospurio@intel.com
2019-03-18drm/i915: Hold a ref to the ring while retiringChris Wilson1-1/+12
As the final request on a ring may hold the reference to this ring (via retiring the last pinned context), we may find ourselves chasing a dangling pointer on completion of the list. A quick solution is to hold a reference to the ring itself as we retire along it so that we only free it after we stop dereferencing it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190318095204.9913-4-chris@chris-wilson.co.uk
2019-03-18drm/i915: Switch to use HWS indices rather than addressesChris Wilson1-2/+2
If we use the STORE_DATA_INDEX function we can use a fixed offset and avoid having to lookup up the engine HWS address. A step closer to being able to emit the final breadcrumb during request_add rather than later in the submission interrupt handler. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190318095204.9913-9-chris@chris-wilson.co.uk
2019-03-08drm/i915: Split struct intel_context definition to its own headerChris Wilson1-501/+1
This complex struct pulling in half the driver deserves its own isolation in preparation for intel_context becoming an outright complicated class of its own. In order to split this beast into its own header also requests splitting several of its dependent types and their dependencies into their own headers as well. v2: Add standalone compilation tests Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-2-chris@chris-wilson.co.uk
2019-03-08drm/i915: Remove has-kernel-contextChris Wilson1-1/+0
We can no longer assume execution ordering, and in particular we cannot assume which context will execute last. One side-effect of this is that we cannot determine if the kernel-context is resident on the GPU, so remove the routines that claimed to do so. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308093657.8640-4-chris@chris-wilson.co.uk
2019-03-05drm/i915: Move find_active_request() to the engineChris Wilson1-0/+3
To find the active request, we need only search along the individual engine for the right request. This does not require touching any global GEM state, so move it into the engine compartment. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190305180332.30900-3-chris@chris-wilson.co.uk
2019-03-05drm/i915: Store the BIT(engine->id) as the engine's maskChris Wilson1-16/+11
In the next patch, we are introducing a broad virtual engine to encompass multiple physical engines, losing the 1:1 nature of BIT(engine->id). To reflect the broader set of engines implied by the virtual instance, lets store the full bitmask. v2: Use intel_engine_mask_t (s/ring_mask/engine_mask/) v3: Tvrtko voted for moah churn so teach everyone to not mention ring and use $class$instance throughout. v4: Comment upon the disparity in bspec for using VCS1,VCS2 in gen8 and VCS[0-4] in later gen. We opt to keep the code consistent and use 0-index naming throughout. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190305180332.30900-1-chris@chris-wilson.co.uk
2019-03-05drm/i915: Remove last traces of exec-id (GEM_BUSY)Chris Wilson1-1/+0
As we allow per-context engine allows the legacy concept of I915_EXEC_RING no longer applies universally. We are still exposing the unrelated exec-id in GEM_BUSY, so transition this ioctl (once more slightly changing its ABI, but no one cares) over to only reporting the uabi-class (not instance as we can not foreseeably fit those into the small bitmask). The only user of the extended ring information from GEM_BUSY is ddx/sna, which tries to use the non-rcs business information to guide which engine to use for subsequent operations on foreign bo. All that matters for it is the decision between rcs and !rcs, so it is unaffected by the change in higher bits. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190305162643.20243-1-chris@chris-wilson.co.uk
2019-03-01drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+Chris Wilson1-0/+7
Having introduced per-context seqno, we now have a means to identity progress across the system without feel of rollback as befell the global_seqno. That is we can program a MI_SEMAPHORE_WAIT operation in advance of submission safe in the knowledge that our target seqno and address is stable. However, since we are telling the GPU to busy-spin on the target address until it matches the signaling seqno, we only want to do so when we are sure that busy-spin will be completed quickly. To achieve this we only submit the request to HW once the signaler is itself executing (modulo preemption causing us to wait longer), and we only do so for default and above priority requests (so that idle priority tasks never themselves hog the GPU waiting for others). As might be reasonably expected, HW semaphores excel in inter-engine synchronisation microbenchmarks (where the 3x reduced latency / increased throughput more than offset the power cost of spinning on a second ring) and have significant improvement (can be up to ~10%, most see no change) for single clients that utilize multiple engines (typically media players and transcoders), without regressing multiple clients that can saturate the system or changing the power envelope dramatically. v3: Drop the older NEQ branch, now we pin the signaler's HWSP anyway. v4: Tell the world and include it as part of scheduler caps. Testcase: igt/gem_exec_whisper Testcase: igt/benchmarks/gem_wsim Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-3-chris@chris-wilson.co.uk
2019-02-28drm/i915: Make request allocation caches globalChris Wilson1-17/+0
As kmem_caches share the same properties (size, allocation/free behaviour) for all potential devices, we can use global caches. While this potential has worse fragmentation behaviour (one can argue that different devices would have different activity lifetimes, but you can also argue that activity is temporal across the system) it is the default behaviour of the system at large to amalgamate matching caches. The benefit for us is much reduced pointer dancing along the frequent allocation paths. v2: Defer shrinking until after a global grace period for futureproofing multiple consumers of the slab caches, similar to the current strategy for avoiding shrinking too early. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190228102035.5857-1-chris@chris-wilson.co.uk
2019-02-28drm/i915: Compute the global scheduler capsChris Wilson1-0/+2
Do a pass over all the engines upon starting to determine the global scheduler capability flags (those that are agreed upon by all). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190226102404.29153-7-chris@chris-wilson.co.uk
2019-02-26drm/i915: Remove i915_request.global_seqnoChris Wilson1-2/+0
Having weaned the interrupt handling off using a single global execution queue, we no longer need to emit a global_seqno. Note that we still have a few assumptions about execution order along engine timelines, but this removes the most obvious artefact! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190226094922.31617-3-chris@chris-wilson.co.uk
2019-02-26drm/i915: Remove access to global seqno in the HWSPChris Wilson1-40/+0
Stop accessing the HWSP to read the global seqno, and stop tracking the mirror in the engine's execution timeline -- it is unused. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190226094922.31617-2-chris@chris-wilson.co.uk
2019-02-26drm/i915: Replace global_seqno with a hangcheck heartbeat seqnoChris Wilson1-1/+18
To determine whether an engine has 'stuck', we simply check whether or not is still on the same seqno for several seconds. To keep this simple mechanism intact over the loss of a global seqno, we can simply add a new global heartbeat seqno instead. As we cannot know the sequence in which requests will then be completed, we use a primitive random number generator instead (with a cycle long enough to not matter over an interval of a few thousand requests between hangcheck samples). The alternative to using a dedicated seqno on every request is to issue a heartbeat request and query its progress through the system. Sadly this requires us to reduce struct_mutex so that we can issue requests without requiring that bkl. v2: And without the extra CS_STALL for the hangcheck seqno -- we don't need strict serialisation with what comes later, we just need to be sure we don't write the hangcheck seqno before our batch is flushed. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190226094922.31617-1-chris@chris-wilson.co.uk
2019-02-23drm/i915/pmu: Always sample an active ringbufferChris Wilson1-1/+1
As we no longer have a precise indication of requests queued to an engine, make no presumptions and just sample the ring registers to see if the engine is busy. v2: Report busy while the ring is idling on a semaphore/event. v3: Give the struct a name! v4: Always 0 outside the powerwell; trusting the powerwell is accurate enough for our sampling pmu. v5: Protect against gen7 mmio madness and try to improve grammar Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190223000102.14290-1-chris@chris-wilson.co.uk
2019-02-07drm/i915: Hack and slash, throttle execbuffer hogsChris Wilson1-0/+12
Apply backpressure to hogs that emit requests faster than the GPU can process them by waiting for their ring to be less than half-full before proceeding with taking the struct_mutex. This is a gross hack to apply throttling backpressure, the long term goal is to remove the struct_mutex contention so that each client naturally waits, preferably in an asynchronous, nonblocking fashion (pipelined operations for the win), for their own resources and never blocks another client within the driver at least. (Realtime priority goals would extend to ensuring that resource contention favours high priority clients as well.) This patch only limits excessive request production and does not attempt to throttle clients that block waiting for eviction (either global GTT or system memory) or any other global resources, see above for the long term goal. No microbenchmarks are harmed (to the best of my knowledge). Testcase: igt/gem_exec_schedule/pi-ringfull-* Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190207071829.5574-1-chris@chris-wilson.co.uk
2019-02-06drm/i915/pmu: Fix enable count array size and bounds checkingTvrtko Ursulin1-4/+5
Enable count array is supposed to have one counter for each possible engine sampler. As such, array sizing and bounds checking is not correct and would blow up the asserts if more samplers were added. No ill-effect in the current code base but lets fix it for correctness. At the same time tidy the assert for readability and robustness. v2: * One check per assert. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: b46a33e271ed ("drm/i915/pmu: Expose a PMU interface for perf queries") Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190205130353.21105-1-tvrtko.ursulin@linux.intel.com
2019-02-04drm/i915: Allow normal clients to always preempt idle priority clientsChris Wilson1-1/+14
When first enabling preemption, we hesitated from making it a free-for-all where every higher priority client would force a preempt-to-idle cycle and take over from all lower priority clients. We hesitated because we were uncertain just how well preemption would work in practice, whether the preemption latency itself would detract from the latency gains for higher priority tasks and whether it would work at all. Since introducing preemption, we have been enabling it for more common tasks, even giving normal clients a small preemptive boost when they first start (to aide fairness and improve interactivity). Now lets take one step further and give permission for all normal (priority:0) clients to preempt any idle (priority:<0) task so that users running long compute jobs do not overly impact other jobs (i.e. their desktop) and the system remains responsive under such idle loads. References: f6322eddaff7 ("drm/i915/preemption: Allow preemption between submission ports") References: b16c765122f9 ("drm/i915: Priority boost for new clients") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: "Bloomfield, Jon" <jon.bloomfield@intel.com> Cc: "Stead, Alan" <alan.stead@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190204084116.3013-1-chris@chris-wilson.co.uk
2019-01-29drm/i915: Drop fake breadcrumb irqChris Wilson1-5/+0
Missed breadcrumb detection is defunct due to the tight coupling with dma_fence signaling and the myriad ways we may signal fences from everywhere but from an interrupt, i.e. we frequently signal a fence before we even see its interrupt. This means that even if we miss an interrupt for a fence, it still is signaled before our breadcrumb hangcheck fires, so simplify the breadcrumb hangchecking by moving it into the GPU hangcheck and forgo fake interrupts. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-3-chris@chris-wilson.co.uk
2019-01-29drm/i915: Replace global breadcrumbs with per-context interrupt trackingChris Wilson1-75/+19
A few years ago, see commit 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd"), the issue of handling multiple clients waiting in parallel was brought to our attention. The requirement was that every client should be woken immediately upon its request being signaled, without incurring any cpu overhead. To handle certain fragility of our hw meant that we could not do a simple check inside the irq handler (some generations required almost unbounded delays before we could be sure of seqno coherency) and so request completion checking required delegation. Before commit 688e6c725816, the solution was simple. Every client waiting on a request would be woken on every interrupt and each would do a heavyweight check to see if their request was complete. Commit 688e6c725816 introduced an rbtree so that only the earliest waiter on the global timeline would woken, and would wake the next and so on. (Along with various complications to handle requests being reordered along the global timeline, and also a requirement for kthread to provide a delegate for fence signaling that had no process context.) The global rbtree depends on knowing the execution timeline (and global seqno). Without knowing that order, we must instead check all contexts queued to the HW to see which may have advanced. We trim that list by only checking queued contexts that are being waited on, but still we keep a list of all active contexts and their active signalers that we inspect from inside the irq handler. By moving the waiters onto the fence signal list, we can combine the client wakeup with the dma_fence signaling (a dramatic reduction in complexity, but does require the HW being coherent, the seqno must be visible from the cpu before the interrupt is raised - we keep a timer backup just in case). Having previously fixed all the issues with irq-seqno serialisation (by inserting delays onto the GPU after each request instead of random delays on the CPU after each interrupt), we can rely on the seqno state to perfom direct wakeups from the interrupt handler. This allows us to preserve our single context switch behaviour of the current routine, with the only downside that we lose the RT priority sorting of wakeups. In general, direct wakeup latency of multiple clients is about the same (about 10% better in most cases) with a reduction in total CPU time spent in the waiter (about 20-50% depending on gen). Average herd behaviour is improved, but at the cost of not delegating wakeups on task_prio. v2: Capture fence signaling state for error state and add comments to warm even the most cold of hearts. v3: Check if the request is still active before busywaiting v4: Reduce the amount of pointer misdirection with list_for_each_safe and using a local i915_request variable inside the loops v5: Add a missing pluralisation to a purely informative selftest message. References: 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-2-chris@chris-wilson.co.uk
2019-01-29drm/i915/execlists: Suppress preempting selfChris Wilson1-0/+1
In order to avoid preempting ourselves, we currently refuse to schedule the tasklet if we reschedule an inflight context. However, this glosses over a few issues such as what happens after a CS completion event and we then preempt the newly executing context with itself, or if something else causes a tasklet_schedule triggering the same evaluation to preempt the active context with itself. However, when we avoid preempting ELSP[0], we still retain the preemption value as it may match a second preemption request within the same time period that we need to resolve after the next CS event. However, since we only store the maximum preemption priority seen, it may not match the subsequent event and so we should double check whether or not we actually do need to trigger a preempt-to-idle by comparing the top priorities from each queue. Later, this gives us a hook for finer control over deciding whether the preempt-to-idle is justified. The sequence of events where we end up preempting for no avail is: 1. Queue requests/contexts A, B 2. Priority boost A; no preemption as it is executing, but keep hint 3. After CS switch, B is less than hint, force preempt-to-idle 4. Resubmit B after idling v2: We can simplify a bunch of tests based on the knowledge that PI will ensure that earlier requests along the same context will have the highest priority. v3: Demonstrate the stale preemption hint with a selftest References: a2bf92e8cc16 ("drm/i915/execlists: Avoid kicking priority on the current context") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-4-chris@chris-wilson.co.uk
2019-01-29drm/i915: Rename execlists->queue_priority to queue_priority_hintChris Wilson1-2/+6
After noticing that we trigger preemption events for currently executing requests, as well as requests that complete before the preemption and attempting to suppress those preemption events, it is wise to not consider the queue_priority to be authoritative. As we only track the maximum priority seen between dequeue passes, if the maximum priority request is no longer available for dequeuing (it completed or is even executing on another engine), we have no knowledge of the previous queue_priority as it would require us to keep a full history of enqueued requests -- but we already have that history in the priolists! Rename the queue_priority to queue_priority_hint so that we do not confuse it as being exactly the maximum priority in the queue, but merely an indication that we have seen a new maximum priority value and as such we should check whether it should preempt the currently running request. v2: s/preempt_priority_hint/queue_priority_hint/ as preempt implies it being only used for the singular task of preemption and not the wider question of waking up due to a change in the queue. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-3-chris@chris-wilson.co.uk
2019-01-29drm/i915: Identify active requestsChris Wilson1-2/+4
To allow requests to forgo a common execution timeline, one question we need to be able to answer is "is this request running?". To track whether a request has started on HW, we can emit a breadcrumb at the beginning of the request and check its timeline's HWSP to see if the breadcrumb has advanced past the start of this request. (This is in contrast to the global timeline where we need only ask if we are on the global timeline and if the timeline has advanced past the end of the previous request.) There is still confusion from a preempted request, which has already started but relinquished the HW to a high priority request. For the common case, this discrepancy should be negligible. However, for identification of hung requests, knowing which one was running at the time of the hang will be much more important. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-2-chris@chris-wilson.co.uk
2019-01-28drm/i915: Allocate a status page for each timelineChris Wilson1-2/+4
Allocate a page for use as a status page by a group of timelines, as we only need a dword of storage for each (rounded up to the cacheline for safety) we can pack multiple timelines into the same page. Each timeline will then be able to track its own HW seqno. v2: Reuse the common per-engine HWSP for the solitary ringbuffer timeline, so that we do not have to emit (using per-gen specialised vfuncs) the breadcrumb into the distinct timeline HWSP and instead can keep on using the common MI_STORE_DWORD_INDEX. However, to maintain the sleight-of-hand for the global/per-context seqno switchover, we will store both temporarily (and so use a custom offset for the shared timeline HWSP until the switch over). v3: Keep things simple and allocate a page for each timeline, page sharing comes next. v4: I was caught repeating the same MI_STORE_DWORD_IMM over and over again in selftests. v5: And caught red handed copying create timeline + check. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-3-chris@chris-wilson.co.uk
2019-01-28drm/i915: Always allocate an object/vma for the HWSPChris Wilson1-17/+6
Currently we only allocate an object and vma if we are using a GGTT virtual HWSP, and a plain struct page for a physical HWSP. For convenience later on with global timelines, it will be useful to always have the status page being tracked by a struct i915_vma. Make it so. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190128102356.15037-4-chris@chris-wilson.co.uk
2019-01-25drm/i915: Remove GPU reset dependence on struct_mutexChris Wilson1-8/+9
Now that the submission backends are controlled via their own spinlocks, with a wave of a magic wand we can lift the struct_mutex requirement around GPU reset. That is we allow the submission frontend (userspace) to keep on submitting while we process the GPU reset as we can suspend the backend independently. The major change is around the backoff/handoff strategy for performing the reset. With no mutex deadlock, we no longer have to coordinate with any waiter, and just perform the reset immediately. Testcase: igt/gem_mmap_gtt/hang # regresses Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190125132230.22221-3-chris@chris-wilson.co.uk
2019-01-25drm/i915: Compute the HWS offsets explicitlyChris Wilson1-5/+5
Simplify by using sizeof(u32) to convert from the index inside the HWSP to the byte offset. This has the advantage of not only being shorter (and so not upsetting checkpatch!) but that it matches use where we are writing to byte addresses using other commands than MI_STORE_DWORD_IMM. v2: Drop the now superfluous MI_STORE_DWORD_INDEX_SHIFT, it appears to be a local invention so keeping it after the final use does not help to clarify the GPU instruction. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190125120005.25191-2-chris@chris-wilson.co.uk
2019-01-25drm/i915: Remove manual breadcumb countingChris Wilson1-1/+1
Now that we know we measure the size of the engine->emit_breadcrumb() correctly, we can remove the previous manual counting. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190125120005.25191-1-chris@chris-wilson.co.uk
2019-01-25drm/i915: Measure the required reserved size for request emissionChris Wilson1-1/+1
Instead of tediously and fragilely counting up the number of dwords required to emit the breadcrumb to seal a request, fake a request and measure it automatically once during engine setup. The downside is that this requires a fair amount of mocking to create a proper breadcrumb. Still, should be less error prone in future as the breadcrumb size fluctuates! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190125100520.20163-1-chris@chris-wilson.co.uk
2019-01-18drm/i915: Use b->irq_enable() as predicate for mock engineChris Wilson1-1/+0
Since commit d4ccceb05591 ("drm/i915/icl: Ringbuffer interrupt handling") we have required a mechanism to avoid touching the interrupt hardware for breadcrumbs, superseding our mock interface for selftests. The residual problem (ideas welcome) is in probing the mock ring registers for ring_is_idle. Hmm, maybe we should just install mock handlers for i915->uncore.mmio__write and friends? Only problem being is that we would to truly mock some expected reads. :( References: d4ccceb05591 ("drm/i915/icl: Ringbuffer interrupt handling") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190118112225.13780-1-chris@chris-wilson.co.uk
2019-01-17drm/i915: small isolated c99 types to kernel types switchJani Nikula1-1/+1
Mixed C99 and kernel types use is getting ugly. Prefer kernel types. sed -i 's/\buint\(8\|16\|32\|64\)_t\b/u\1/g' Minor checkpatch fixes sprinkled on top of the changed lines. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/14ed72e7f04c9340a057855c5950b54811f8a477.1547629303.git.jani.nikula@intel.com
2019-01-03drm/i915: Always try to reset the GPU on takeoverChris Wilson1-1/+1
When we first introduced the reset to sanitize the GPU on taking over from the BIOS and before returning control to third parties (the BIOS!), we restricted it to only systems utilizing HW contexts as we were uncertain of how stable our reset mechanism truly was. We now have reasonable coverage across all machines that expose a GPU reset method, and so we should be safe to sanitize the GPU state everywhere. v2: We _have_ to skip the reset if it would clobber the display. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190103112104.19561-1-chris@chris-wilson.co.uk
2019-01-02drm/i915: start moving runtime device info to a separate structJani Nikula1-2/+2
First move the low hanging fruit, the fields that are only initialized runtime. Use RUNTIME_INFO() exclusively to access the fields. Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/c24fe7a4b0492a888690c46814c0ff21ce2f12b1.1546267488.git.jani.nikula@intel.com
2018-12-31drm/i915: Drop unused engine->irq_seqno_barrier w/aChris Wilson1-10/+0
Now that we have eliminated the CPU-side irq_seqno_barrier by moving the delays on the GPU before emitting the MI_USER_INTERRUPT, we can remove the engine->irq_seqno_barrier infrastructure. Though intentionally slowing down the GPU is nasty, so is the code we can now remove! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181228171641.16531-6-chris@chris-wilson.co.uk
2018-12-31drm/i915: Remove redundant trailing request flushChris Wilson1-10/+0
Now that we perform the request flushing inline with emitting the breadcrumb, we can remove the now redundant manual flush. And we can also remove the infrastructure that remained only for its purpose. v2: emit_breadcrumb_sz is in dwords, but rq->reserved_space is in bytes Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181228171641.16531-1-chris@chris-wilson.co.uk
2018-12-28drm/i915/execlists: Pull the render flush into breadcrumb emissionChris Wilson1-3/+2
In preparation for removing the manual EMIT_FLUSH prior to emitting the breadcrumb implement the flush inline with writing the breadcrumb for execlists. Using one command to both flush and write the breadcrumb is naturally a tiny bit faster than splitting it into two. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181228153114.4948-1-chris@chris-wilson.co.uk
2018-12-28drm/i915: Remove HW semaphores for gen7 inter-engine synchronisationChris Wilson1-55/+1
The writing is on the wall for the existence of a single execution queue along each engine, and as a consequence we will not be able to track dependencies along the HW queue itself, i.e. we will not be able to use HW semaphores on gen7 as they use a global set of registers (and unlike gen8+ we can not effectively target memory to keep per-context seqno and dependencies). On the positive side, when we implement request reordering for gen7 we also can not presume a simple execution queue and would also require removing the current semaphore generation code. So this bring us another step closer to request reordering for ringbuffer submission! The negative side is that using interrupts to drive inter-engine synchronisation is much slower (4us -> 15us to do a nop on each of the 3 engines on ivb). This is much better than it was at the time of introducing the HW semaphores and equally important userspace weaned itself off intermixing dependent BLT/RENDER operations (the prime culprit was glyph rendering in UXA). So while we regress the microbenchmarks, it should not impact the user. References: https://bugs.freedesktop.org/show_bug.cgi?id=108888 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181228140736.32606-2-chris@chris-wilson.co.uk
2018-12-18drm/i915: Apply missed interrupt after reset w/a to all ringbuffer genChris Wilson1-0/+2
Having completed a test run of gem_eio across all machines in CI we also observe the phenomenon (of lost interrupts after resetting the GPU) on gen3 machines as well as the previously sighted gen6/gen7. Let's apply the same HWSTAM workaround that was effective for gen6+ for all, as although we haven't seen the same failure on gen4/5 it seems prudent to keep the code the same. As a consequence we can remove the extra setting of HWSTAM and apply the register from a single site. v2: Delazy and move the HWSTAM into its own function v3: Mask off all HWSP writes on driver unload and engine cleanup. v4: And what about the physical hwsp? v5: No, engine->init_hw() is not called from driver_init_hw(), don't be daft. Really scrub HWSTAM as early as we can in driver_init_mmio() v6: Rename set_hwsp as it was setting the mask not the hwsp register. v7: Ville pointed out that although vcs(bsd) was introduced for g4x/ilk, per-engine HWSTAM was not introduced until gen6! References: https://bugs.freedesktop.org/show_bug.cgi?id=108735 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181218102712.11058-1-chris@chris-wilson.co.uk
2018-12-12drm/i915: replace IS_GEN<N> with IS_GEN(..., N)Lucas De Marchi1-2/+2
Define IS_GEN() similarly to our IS_GEN_RANGE(). but use gen instead of gen_mask to do the comparison. Now callers can pass then gen as a parameter, so we don't require one macro for each gen. The following spatch was used to convert the users of these macros: @@ expression e; @@ ( - IS_GEN2(e) + IS_GEN(e, 2) | - IS_GEN3(e) + IS_GEN(e, 3) | - IS_GEN4(e) + IS_GEN(e, 4) | - IS_GEN5(e) + IS_GEN(e, 5) | - IS_GEN6(e) + IS_GEN(e, 6) | - IS_GEN7(e) + IS_GEN(e, 7) | - IS_GEN8(e) + IS_GEN(e, 8) | - IS_GEN9(e) + IS_GEN(e, 9) | - IS_GEN10(e) + IS_GEN(e, 10) | - IS_GEN11(e) + IS_GEN(e, 11) ) v2: use IS_GEN rather than GT_GEN and compare to info.gen rather than using the bitmask Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181212181044.15886-2-lucas.demarchi@intel.com
2018-12-04drm/i915: Allocate a common scratch pageChris Wilson1-5/+0
Currently we allocate a scratch page for each engine, but since we only ever write into it for post-sync operations, it is not exposed to userspace nor do we care for coherency. As we then do not care about its contents, we can use one page for all, reducing our allocations and avoid complications by not assuming per-engine isolation. For later use, it simplifies engine initialisation (by removing the allocation that required struct_mutex!) and means that we can always rely on there being a scratch page. v2: Check that we allocated a large enough scratch for I830 w/a Fixes: 06e562e7f515 ("drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5") # v4.18.20 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108850 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181204141522.13640-1-chris@chris-wilson.co.uk Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <stable@vger.kernel.org> # v4.18.20+
2018-12-04drm/i915: Fuse per-context workaround handling with the common frameworkTvrtko Ursulin1-0/+1
Convert the per context workaround handling code to run against the newly introduced common workaround framework and fuse the two to use the existing smarter list add helper, the one which does the sorted insert and merges registers where possible. This completes migration of all four classes of workarounds onto the common framework. Existing macros are kept untouched for smaller code churn. v2: * Rename to list name ctx_wa_list and move from dev_priv to engine. v3: * API rename and parameters tweaking. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20181203133357.10341-1-tvrtko.ursulin@linux.intel.com
2018-12-04drm/i915: Move register white-listing to the common workaround frameworkTvrtko Ursulin1-0/+1
Instead of having a separate list of white-listed registers we can trivially move this to the common workarounds framework. This brings us one step closer to the goal of driving all workaround classes using the same code. v2: * Use GEM_DEBUG_WARN_ON for the sanity check. (Chris Wilson) v3: * API rename. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20181203125014.3219-6-tvrtko.ursulin@linux.intel.com
2018-12-04drm/i915: Introduce per-engine workaroundsTvrtko Ursulin1-0/+2
We stopped re-applying the GT workarounds after engine reset since commit 59b449d5c82a ("drm/i915: Split out functions for different kinds of workarounds"). Issue with this is that some of the GT workarounds live in the MMIO space which gets lost during engine resets. So far the registers in 0x2xxx and 0xbxxx address range have been identified to be affected. This losing of applied workarounds has obvious negative effects and can even lead to hard system hangs (see the linked Bugzilla). Rather than just restoring this re-application, because we have also observed that it is not safe to just re-write all GT workarounds after engine resets (GPU might be live and weird hardware states can happen), we introduce a new class of per-engine workarounds and move only the affected GT workarounds over. Using the framework introduced in the previous patch, we therefore after engine reset, re-apply only the workarounds living in the affected MMIO address ranges. v2: * Move Wa_1406609255:icl to engine workarounds as well. * Rename API. (Chris Wilson) * Drop redundant IS_KABYLAKE. (Chris Wilson) * Re-order engine wa/ init so latest platforms are first. (Rodrigo Vivi) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Bugzilla: https://bugzilla.freedesktop.org/show_bug.cgi?id=107945 Fixes: 59b449d5c82a ("drm/i915: Split out functions for different kinds of workarounds") Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: intel-gfx@lists.freedesktop.org Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20181203133341.10258-1-tvrtko.ursulin@linux.intel.com
2018-12-03drm/i915/vgpu: Disallow loading on old vGPU hostsChris Wilson1-16/+0
Since commit fd8526e50902 ("drm/i915/execlists: Trust the CSB") we actually broke the force-mmio mode for our execlists implementation. No one noticed, so ergo no one is actually using an old vGPU host (where we required the older method) and so can simply remove the broken support. v2: csb_read can go as well (Mika) Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Fixes: fd8526e50902 ("drm/i915/execlists: Trust the CSB") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181130125954.11924-1-chris@chris-wilson.co.uk
2018-10-31drm/i915/ringbuffer: change header SPDX identifier to MITJonathan Gray1-1/+1
Commit b24413180f56 ("License cleanup: add SPDX GPL-2.0 license identifier to files with no license") added "SPDX-License-Identifier: GPL-2.0" to files which previously had no license, change this to MIT for intel_ringbuffer.h matching the license text of intel_ringbuffer.c. Signed-off-by: Jonathan Gray <jsg@jsg.id.au> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181031005331.20775-1-jsg@jsg.id.au
2018-10-29drm/i915: Prefer IS_GEN<n> check with bitmask.Rodrigo Vivi1-2/+2
Whenever possible we should stick with IS_GEN<n> checks. Bitmaks has been introduced on commit ae7617f0ef18 ("drm/i915: Allow optimized platform checks") for efficiency. Let's stick with it whenever possible. This patch was generated with coccinelle: spatch -sp_file is_gen.cocci *{c,h} --in-place is_gen.cocci: @gen2@ expression e; @@ -INTEL_GEN(e) == 2 +IS_GEN2(e) @gen3@ expression e; @@ -INTEL_GEN(e) == 3 +IS_GEN3(e) @gen4@ expression e; @@ -INTEL_GEN(e) == 4 +IS_GEN4(e) @gen5@ expression e; @@ -INTEL_GEN(e) == 5 +IS_GEN5(e) @gen6@ expression e; @@ -INTEL_GEN(e) == 6 +IS_GEN6(e) @gen7@ expression e; @@ -INTEL_GEN(e) == 7 +IS_GEN7(e) @gen8@ expression e; @@ -INTEL_GEN(e) == 8 +IS_GEN8(e) @gen9@ expression e; @@ -INTEL_GEN(e) == 9 +IS_GEN9(e) @gen10@ expression e; @@ -INTEL_GEN(e) == 10 +IS_GEN10(e) @gen11@ expression e; @@ -INTEL_GEN(e) == 11 +IS_GEN11(e) Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181026195143.20353-1-rodrigo.vivi@intel.com
2018-10-01drm/i915: Pull scheduling under standalone lockChris Wilson1-3/+2
Currently, the backend scheduling code abuses struct_mutex into order to have a global lock to manipulate a temporary list (without widespread allocation) and to protect against list modifications. This is an extraneous coupling to struct_mutex and further can not extend beyond the local device. Pull all the code that needs to be under the one true lock into i915_scheduler.c, and make it so. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181001144755.7978-2-chris@chris-wilson.co.uk