summaryrefslogtreecommitdiff
path: root/src/gallium/drivers/freedreno/freedreno_gmem.c
AgeCommit message (Collapse)AuthorFilesLines
2018-10-26freedreno: import libdrm_freedreno + redesign submitRob Clark1-5/+3
In the pursuit of lowering driver overhead, it became clear that some amount of redesign of how libdrm_freedreno constructs the submit ioctl would be needed. In particular, as the gallium driver is starting to make heavier use of CP_SET_DRAW_STATE state groups/objects, the over- head of tracking cmd buffers and relocs becomes too much. And for "streaming" state, which isn't ever reused (like uniform uploads) the overhead of allocating/freeing ringbuffer[1] objects is too high. This redesign makes two main changes: 1) Introduces a fd_submit object for tracking bos and cmds table for the submit ioctl, making ringbuffer objects more light- weight. This was previously done in the ringbuffer. But we have many ringbuffer instances involved in a submit (gmem + draw + potentially 1000's of state-group rbs), and only need a single bos and cmds table. (Reloc table is still per-rb) The submit is also a convenient place for a slab allocator for ringbuffer objects. Other options would have required locking because, while we can guarantee allocations will only happen on a single thread, free's could happen either on the application thread or the flush_queue thread. With the slab allocator in the submit object, any frees that happen on the flush_queue thread happen after we know that the application thread is done with the submit. 2) Introduce a new "softpin" msm_ringbuffer_sp implementation that does not use relocs and only has cmds table entries for IB1 (ie. the cmdstream buffers that kernel needs to CP_INDIRECT_BUFFER to from the RB). To do this properly will require some updates on the kernel side, so whether you get the softpin or legacy submit/ringbuffer implementation at runtime depends on your kernel version. To make all these changes in libdrm would basically require adding a libdrm_freedreno2, so this is a good point to just pull the libdrm code into mesa. Plus it allows for using mesa's hashtable, slab allocator, etc. And it lets us have asserts enabled for debug mesa buids but omitted for release builds. And it makes life easier if further API changes become necessary. At this point I haven't tried to pull in the kgsl backend. Although I left the level of vfunc indirection which would make it possible to have other backends. (And this was convenient to keep to allow for the "softpin" ringbuffer to coexist.) NOTE: if bisecting a build error takes you here, try a clean build. There are a bunch of ways things can go wrong if you still have libdrm_freedreno cflags. [1] "ringbuffer" is probably a bad name, the only level of cmdstream buffer that is actually a ring is RB managed by kernel. User- space cmdstream is all IB1/IB2 and state-groups. Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17freedreno: Remove the Emacs mode linesNeil Roberts1-2/+0
These are not necessary because the corresponding settings are set via the .dir-locals.el file anyway. Most of them were missing a ‘:’ after “tab-width” which was making Emacs display an annoying warning whenever you open the file. This patch was made with: sed -ri '/-\*- mode:/,/^$/d' \ $(find src/gallium/{drivers,winsys} -name \*.\[ch\] \ -exec grep -l -- '-\*- mode:' {} \+) Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-02freedreno/a6xx: hwbinningRob Clark1-7/+8
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-09-27freedreno: simplify pctx->clear()Rob Clark1-30/+0
This is defined to always clear the entire surface(s) specified, regardless of scissor state.. mesa/st will turn scissored clears into a draw. So rip about a bunch of unnecessary machinery. Also remove a comment that was obsolete since using u_blitter to turn clear into draw (for the cases where there isn't a hw blitter fast-path). Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-08-16freedreno: Add a6xx backendKristian H. Kristensen1-1/+1
This adds a freedreno backend for the a6xx generation GPUs, which at the time of this commit is about 98% GLES2 conformant. Much remains to be done - both performance work and feature work towards more recent GLES versions, but this is a good start. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org> Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-07-17freedreno: get rid of noop renderRob Clark1-15/+0
This was basically to avoid a zero-dword IB (indirect-branch), but instead just don't emit the IB packet in that case. Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-06-21freedreno/a5xx: MSAARob Clark1-3/+7
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-12-17freedreno: add non-draw batches for compute/blitRob Clark1-12/+11
Get rid of "gmem" (ie. tiling) ringbuffer, and just emit setup commands directly to "draw" ringbuffer for compute (and in future for blits not using the 3d pipe). This way we can have a simple flat cmdstream buffer and bypass setup related to 3d pipe. Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-12-03freedreno: rework fence trackingRob Clark1-4/+3
ctx->last_fence isn't such a terribly clever idea, if batches can be flushed out of order. Instead, each batch now holds a fence, which is created before the batch is flushed (useful for next patch), that later gets populated after the batch is actually flushed. Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-11-16freedreno/a5xx: ARB_framebuffer_no_attachments supportRob Clark1-0/+5
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-10-24freedreno: rename pipe -> vsc_pipeRob Clark1-2/+2
To add context priority support we need to have an fd_pipe per context, rather than per-screen. Which conflicts with existing ctx->pipe (which is actually a visibility stream pipe (hw resource). So just rename it. Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-14Revert "freedreno: use bypass if only clears"Rob Clark1-4/+1
Causing issues with stk on a4xx.. still probably a good idea, but seems some debugging is needed first. This reverts commit 3ab072d3c8643c66d8e07e63df970b792728bac6.
2017-05-13freedreno/a5xx: hw binning supportRob Clark1-3/+4
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-13freedreno: use bypass if only clearsRob Clark1-1/+4
Some things trigger batches that only contain a clear (like glmark2 startup). No point to use GMEM for this. Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-06freedreno/a3xx: fix hang w/ large render targets and small gmemRob Clark1-0/+3
Possibly other gen's have a similar limit. Fixes glmark2 -b shadow with larger resolutions on devices with small gmem (for example, fullscreen 1080p on 8x16/db410c). Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-04freedreno: core compute state supportRob Clark1-0/+7
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-22freedreno: make hw-query a helperRob Clark1-4/+8
For a5xx (and actually some queries on a4xx) we can accumulate results in the cmdstream, so we don't need this elaborate mechanism of tracking per-tile query results. So make it into vfuncs so generation specific backend can use it when it makes sense. Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-18freedreno/a5xx: cargo-cult end-batch sequence more faithfullyRob Clark1-0/+3
Fixes some issues at least with GMEM bypass mode, where we'd sometimes end up with some FS quads not hitting memory. Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-06freedreno: pitch alignment should match gmem alignmentRob Clark1-8/+9
Deal w/ differing gmem tile size alignment between generations, and make sure texture pitch matches. Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-01freedreno: no-op render when we need a fenceRob Clark1-6/+29
If app tries to create a fence but there is no rendering to submit, we need a dummy/no-op submit. Use a string-marker for the purpose.. mostly since it avoids needing to realize that the packet format changes in later gen's (so one less place to fixup for a5xx). Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-01freedreno: native fence fd supportRob Clark1-2/+5
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-01freedreno: some fence cleanupRob Clark1-0/+4
Prep-work for next patch, mostly move to tracking last_fence as a pipe_fence_handle (created now only in fd_gmem_render_tiles()), and a bit of superficial renaming. Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-30freedreno/a5xx: initial supportRob Clark1-1/+4
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-30freedreno: make gmem tile size alignment configurableRob Clark1-8/+10
a5xx seems to prefer 64 pixel alignment, in at least some cases. Make this configurable per generation. Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30freedreno: move needs_wfi into batchRob Clark1-5/+3
This is also used in gmem code, which executes from the "bottom half" (ie. from the flush_queue worker thread), so it cannot be in fd_context. Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30freedreno: drop mem2gmem/gmem2mem query stagesRob Clark1-9/+0
They weren't really used, and it gets somewhat more complicated to deal with if batches are flushed asynchronously (on another thread). So just drop them, and move _query_set_state(NULL) call into batch (so it is not happening on background thread). Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30freedreno: threaded batch flushRob Clark1-2/+0
With the state accessed from GMEM+submit factored out of fd_context and into fd_batch, now it is possible to punt this off to a helper thread. And more importantly, since there are cases where one context might force the batch-cache to flush another context's batches (ie. when there are too many in-flight batches), using a per-context helper thread keeps various different flushes for a given context serialized. TODO as with batch-cache, there are a few places where we'll need a mutex to protect critical sections, which is completely missing at the moment. Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30freedreno: re-order support for hw queriesRob Clark1-9/+9
Push query state down to batch, and use the resource tracking to figure out which batch(es) need to be flushed to get the query result. This means we actually need to allocate the prsc up front, before we know the size. So we have to add a special way to allocate an un- backed resource, and then later allocate the backing storage. Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30freedreno: spiff up some debug tracesRob Clark1-2/+4
Make it easier to track batches, to ensure things happen properly when they are reordered. Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30freedreno: move more batch related tracking to fd_batchRob Clark1-50/+46
To flush batches out of order, the gmem code needs to not depend on state from fd_context (since that may apply to a more recent batch). So this all moves into batch. The one exception is the gmem/pipe/tile state itself. But this is only used from gmem code (and batches are flushed serially). The alternative would be having to re-calculate GMEM layout on every batch, even if the dimensions of the render targets are the same. Note: This opens up the possibility of pushing gmem/submit into a helper thread. Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30freedreno: introduce fd_batchRob Clark1-12/+8
Introduce the batch object, to track a batch/submit's worth of ringbuffers and other bookkeeping. In this first step, just move the ringbuffers into batch, since that is mostly uninteresting churn. For now there is just a single batch at a time. Note that one outcome of this change is that rb's are allocated/freed on each use. But the expectation is that the bo pool in libdrm_freedreno will save us the GEM bo alloc/free which was the initial reason to implement a rb pool in gallium. The purpose of the batch is to eventually facilitate out-of-order rendering, with batches associated to framebuffer state, and tracking the dependencies on other batches. Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-01-18freedreno: per-generation OUT_IB packetRob Clark1-2/+2
Some a4xx firmware doesn't implement the "PFD" (prefetch-disabled) version of the CP_INDIRECT_BUFFER packet. So allow for PFD vs PFE per generation. Switch a3xx and a4xx over to using prefetch-enabled version (which is also what blob does.. it seems only on a2xx we cannot use PFE). Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-08-04freedreno: small bit of cleanup about max rendertargetsRob Clark1-5/+10
We hard-coded 4 or 8 as the max in various places. Switch it all to a define since the limit will go up with a4xx (and maybe even again in the future?) Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-05-14freedreno: fix bug in tile/slot calculationRob Clark1-5/+4
This was causing corruption with hw binning on a306. Unlikely that it is a306 specific, but rather the smaller gmem size resulted in different tile configuration which was triggering the bug at certain resolutions. Signed-off-by: Rob Clark <robclark@freedesktop.org> Cc: "10.4" and "10.5" and "10.6" <mesa-stable@lists.freedesktop.org>
2015-04-27freedreno/a3xx: add support for S8 and Z32F_S8Ilia Mirkin1-10/+19
Enables ARB_depth_buffer_float. There is no sampling support for interleaved Z32F_S8, so we store the two textures separately, one as Z32F, the other as S8. As a result, we need a lot of additional logic for restores and transfers. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-05freedreno: don't bother setting resource timestampsIlia Mirkin1-9/+0
Waiting on a bo being ready is handled in fd_bo_cpu_prep. No need to keep separate timestamps around. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-02freedreno: add support for laying out MRTs in gmemIlia Mirkin1-14/+39
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-02freedreno: add core infrastructure support for MRTsIlia Mirkin1-3/+4
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-12-13freedreno: add is_a3xx()/is_a4xx() helpersRob Clark1-1/+3
A bunch of open-coded 'gpu_id > 300's seems like it will eventually cause problems with future generations. There were already a few minor problems with caps for features that still need additional work on a4xx. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-10-25freedreno: rename a couple debug flagsRob Clark1-2/+2
dscis -> noscis dbypass -> nobypass a bit more consistant w/ nobin, etc. And IMO a bit more sensible names. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-10-21freedreno: clear vs scissorRob Clark1-2/+46
The optimization of avoiding restore (mem2gmem) if there was a clear falls down a bit if you don't have a fullscreen scissor. We need to make the decision logic a bit more clever to keep track of *what* was cleared, so that we can (a) completely skip mem2gmem if entire buffer was cleared, or (b) skip mem2gmem on a per-tile basis for tiles that were completely cleared. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-09-12freedreno: "fix" problems with excessive flushesRob Clark1-24/+1
4f338c9b introduced logic to trigger a flush rather than overflowing cmdstream buffer. But the threshold was too low, triggering flushes where they were not needed. This caused problems with games like xonotic. Part of the problem is that we need to mark all state dirty between cmdstream submit ioctls, because we cannot rely on state being preserved across ioctls. But even with that, there are still some problems that are still being debugged. For now: 1) correctly mark all state dirty 2) introduce FD_MESA_DEBUG flush flag to force rendering to be flushed between each draw, to trigger problems (so that I can debug) 3) use a more reasonable threshold so for normal usecases we don't trigger the problems This at least corrects the regression, but there is still more debugging to do. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-06-13freedreno: try for more squarish tile dimensionsRob Clark1-3/+9
Worth about ~0.5fps in xonotic, for example. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-05-13freedreno: add support for hw queriesRob Clark1-1/+18
Real GPU queries need some infrastructure to track samples per tile and accumulate the results. But fortunately this can be shared across GPU generation. See: https://github.com/freedreno/freedreno/wiki/Queries#hardware-queries Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-03-05WIP: freedreno/a3xx: incorrect scissor for binning passRob Clark1-0/+2
If scissor optimization is used (to avoid bringing scissored portions of the render target into GMEM and then back out to system memory) in combination with hw binning pass, the result would be a scissor mismatch between binning pass and rendering pass. This would cause rendering bugs in some scenarios with (for example) gnome-shell. I would have expected that simply using the correct screen-scissor during the binning pass would be enough, but seems like there is something else missing. So for now disable binning pass if scissor optimization is used.
2014-02-01freedreno: better manage our WFI'sRob Clark1-1/+5
Updates to non-banked registers, CP_LOAD_STATE, etc, need a WFI if there is potentially pending rendering. Track this better, and add fd_wfi() calls everywhere that might potentially need CP_WAIT_FOR_IDLE. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-08freedreno: add basic query supportRob Clark1-0/+7
Add for now some simple/basic query support (ie. things not actually requiring the GPU). Might change around a bit when I actually add GPU queries, but for now this enables some useful performance info in the GALLIUM_HUD. For example: GALLIUM_HUD=fps+batches+batches-sysmem+batches-gmem+restores,draw-calls The driver specific specific queries are: + draw-calls + batches - number of batches per second, sum of batches-sysmem plus batches-gmem + batches-gmem - render a set of tiles in GMEM, for each tile (optionally) system mem -> gmem (restore), plus N draws, plus gmem -> system mem (resolve) per second + batches-sysmem - N draws to system memory (GMEM bypass) per second + restores - number of GMEM batches that required restore per second Ideally for GMEM rendering, you want batches-gmem to equal fps. If the app is doing something that triggers multiple passes (ie. requires extra round trip gmem <-> system memory) then the # of batches per second will go up relative to fps. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-08freedreno/a3xx: support for hw binning passRob Clark1-24/+76
The binning pass sorts vertices into which bins/tiles they apply to. The visibility information generated during the binning pass can be used to speed up the rendering pass by filtering out vertices which do not apply to the current tile. See: https://github.com/freedreno/freedreno/wiki/Adreno-tiling#optimized-approach This brings a significant fps boost. A rough assortment of tests (supertuxkart, etracer, tremulous, glmark2 'build' test, etc) seems to yield a ~35-45% fps improvement. For now, to be conservative, the binning pass is not enabled yet by default. To enable it use: FD_MESA_DEBUG=binning So far I haven't found anything that breaks with binning enabled, but I'd like a bit more testing before I enable it as default. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-08freedreno: be more clever about gmem usageRob Clark1-9/+17
Only need to leave room for depth/stencil if it is actually used, etc. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-12-26freedreno/a3xx: fix blend state corruption issueRob Clark1-0/+2
Using RMW on banked context registers is not safe. The value read could be the wrong one. So if there has been a DRAW_IDX launched, the RMW must be preceded by a WAIT_FOR_IDLE to ensure the read part of RMW sees the correct value. To avoid unnecessary WFI's, keep track if there is a need for WFI, and only emit one if needed. Furthermore, keep track if we even need to update the register in the first place. And to cut down on the amount of RMW to avoid excessive WFI's, at the tiling/GMEM level we can always overwrite RB_RENDER_CONTROL, as the state at beginning of draw/clear cmds (which we IB to) is always undefined. In the draw/clear commands, we always still use RMW (with WFI if needed), but only if the register value actually changes. (At points where the current value cannot be known, the saved value is reset to ~0, which includes bits outside of RBRC_DRAW_STATE, so there never is chance for confusion.) Signed-off-by: Rob Clark <robclark@freedesktop.org>