summaryrefslogtreecommitdiff
path: root/src/gallium/drivers/freedreno/freedreno_gmem.c
AgeCommit message (Collapse)AuthorFilesLines
2014-05-20freedreno: add support for hw queriesRob Clark1-1/+18
Real GPU queries need some infrastructure to track samples per tile and accumulate the results. But fortunately this can be shared across GPU generation. See: https://github.com/freedreno/freedreno/wiki/Queries#hardware-queries Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-03-05WIP: freedreno/a3xx: incorrect scissor for binning passRob Clark1-0/+2
If scissor optimization is used (to avoid bringing scissored portions of the render target into GMEM and then back out to system memory) in combination with hw binning pass, the result would be a scissor mismatch between binning pass and rendering pass. This would cause rendering bugs in some scenarios with (for example) gnome-shell. I would have expected that simply using the correct screen-scissor during the binning pass would be enough, but seems like there is something else missing. So for now disable binning pass if scissor optimization is used.
2014-02-01freedreno: better manage our WFI'sRob Clark1-1/+5
Updates to non-banked registers, CP_LOAD_STATE, etc, need a WFI if there is potentially pending rendering. Track this better, and add fd_wfi() calls everywhere that might potentially need CP_WAIT_FOR_IDLE. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-08freedreno: add basic query supportRob Clark1-0/+7
Add for now some simple/basic query support (ie. things not actually requiring the GPU). Might change around a bit when I actually add GPU queries, but for now this enables some useful performance info in the GALLIUM_HUD. For example: GALLIUM_HUD=fps+batches+batches-sysmem+batches-gmem+restores,draw-calls The driver specific specific queries are: + draw-calls + batches - number of batches per second, sum of batches-sysmem plus batches-gmem + batches-gmem - render a set of tiles in GMEM, for each tile (optionally) system mem -> gmem (restore), plus N draws, plus gmem -> system mem (resolve) per second + batches-sysmem - N draws to system memory (GMEM bypass) per second + restores - number of GMEM batches that required restore per second Ideally for GMEM rendering, you want batches-gmem to equal fps. If the app is doing something that triggers multiple passes (ie. requires extra round trip gmem <-> system memory) then the # of batches per second will go up relative to fps. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-08freedreno/a3xx: support for hw binning passRob Clark1-24/+76
The binning pass sorts vertices into which bins/tiles they apply to. The visibility information generated during the binning pass can be used to speed up the rendering pass by filtering out vertices which do not apply to the current tile. See: https://github.com/freedreno/freedreno/wiki/Adreno-tiling#optimized-approach This brings a significant fps boost. A rough assortment of tests (supertuxkart, etracer, tremulous, glmark2 'build' test, etc) seems to yield a ~35-45% fps improvement. For now, to be conservative, the binning pass is not enabled yet by default. To enable it use: FD_MESA_DEBUG=binning So far I haven't found anything that breaks with binning enabled, but I'd like a bit more testing before I enable it as default. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-08freedreno: be more clever about gmem usageRob Clark1-9/+17
Only need to leave room for depth/stencil if it is actually used, etc. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-12-26freedreno/a3xx: fix blend state corruption issueRob Clark1-0/+2
Using RMW on banked context registers is not safe. The value read could be the wrong one. So if there has been a DRAW_IDX launched, the RMW must be preceded by a WAIT_FOR_IDLE to ensure the read part of RMW sees the correct value. To avoid unnecessary WFI's, keep track if there is a need for WFI, and only emit one if needed. Furthermore, keep track if we even need to update the register in the first place. And to cut down on the amount of RMW to avoid excessive WFI's, at the tiling/GMEM level we can always overwrite RB_RENDER_CONTROL, as the state at beginning of draw/clear cmds (which we IB to) is always undefined. In the draw/clear commands, we always still use RMW (with WFI if needed), but only if the register value actually changes. (At points where the current value cannot be known, the saved value is reset to ~0, which includes bits outside of RBRC_DRAW_STATE, so there never is chance for confusion.) Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-12-26freedreno: prepare for hw binningRob Clark1-29/+69
Actually assign VSC_PIPE's properly, which will be needed for tiling. And introduce fd_tile for per-tile state (including the assignment of tile to VSC_PIPE). This gives us the proper pipe setup that we'll need for hw binning pass, and also cleans things up a bit by not having to pass so many parameters around. And will also make it easier to introduce different tiling patterns (since we may no longer render tiles in a simple left-to-right top-to-bottom pattern). Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-09-14freedreno: add debug option to disable GMEM bypassRob Clark1-1/+1
Useful for debugging. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-08-24freedreno: fix segfault when no color buffer boundRob Clark1-7/+11
Don't crash when no color buffer bound. Something caught when starting to run piglit, fixes a hanful of piglit tests. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-08-24freedreno: add debug option to disable scissor optimizationRob Clark1-10/+16
Useful for testing and debugging. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-06-08freedreno: gmem bypassRob Clark1-15/+47
The GPU (at least a3xx, but I think also a2xx) can render directly to memory, bypassing tiling. Although it can't do this if blend, depth, and a few other features of the pipeline are enabled. This direct memory mode can be faster for some sorts of operations, such as simple blits. In particular, this significantly speeds up XA by avoiding to pull the entire dest pixmap into GMEM, render tiles, and write it all back out again. This should also speed up resource copy-region and blit. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-06-08freedreno: prepare for a3xxRob Clark1-317/+11
Split the parts that are specific to adreno a2xx series GPUs from the parts that will be in common with a3xx, so that a3xx support can be added more cleanly. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-04-24freedreno: clear fixes and debuggingRob Clark1-0/+3
Set a few extra registers to make sure we are in proper state for clearing. And also add some debug options to mark all state dirty in clear and gmem operations to aid in debugging. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-04-24freedreno: use u_math macros/helpers moreRob Clark1-7/+7
Get rid of a few self-defined macros: ALIGN() -> align() min() -> MIN2() max() -> MAX2() Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-04-24freedreno: set SWAP bit based on formatRob Clark1-7/+19
Really this should be set based on buffer format, not on color vs depth/stencil. Probably there should be more formats that set the bit as we add support for more render target formats. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-04-05freedreno: use autogenerated register defsRob Clark1-123/+111
Switch to use the envytools generated headers for register/bitfield definitions. This is the first step in preparing to add a3xx support, since it avoids having conflicting names for a3xx and a2xx registers. And since I'm using envytools for a3xx it is simpler to just use it for everything. This shouldn't cause any functional change, it is really just a lot of renaming. Signed-off-by: Rob Clark <robdclark@gmail.com>
2013-03-25freedreno: track maximal scissor boundsRob Clark1-84/+124
Optimize out parts of the render target that are scissored out by taking into account maximal scissor bounds in fd_gmem_render_tiles(). This is a big win on things like gnome-shell which frequently do partial screen updates. Signed-off-by: Rob Clark <robdclark@gmail.com>
2013-03-19freedreno: clear fixesRob Clark1-1/+1
Some fixes for clearing only depth or only stencil. Signed-off-by: Rob Clark <robdclark@gmail.com>
2013-03-11freedreno: gallium driver for adrenoRob Clark1-0/+491
Currently works on a220. Others in the a2xx family look pretty similar and should be pretty straightforward to support with the same driver. The a3xx has a new shader ISA, and while many registers appear similar, the register addresses have been completely shuffled around. I am not sure yet whether it is best to support with the same driver, but different compiler, or whether it should be split into a different driver. v1: original v2: build file updates from review comments, and remove GPL licensed header files from msm kernel v3: smarter temp/pred register assignment, fix clear and depth/stencil format issues, resource_transfer fixes, scissor fixes Signed-off-by: Rob Clark <robdclark@gmail.com>