Age | Commit message (Collapse) | Author | Files | Lines |
|
Real GPU queries need some infrastructure to track samples per tile and
accumulate the results. But fortunately this can be shared across GPU
generation.
See:
https://github.com/freedreno/freedreno/wiki/Queries#hardware-queries
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
If scissor optimization is used (to avoid bringing scissored portions of
the render target into GMEM and then back out to system memory) in
combination with hw binning pass, the result would be a scissor mismatch
between binning pass and rendering pass. This would cause rendering
bugs in some scenarios with (for example) gnome-shell.
I would have expected that simply using the correct screen-scissor
during the binning pass would be enough, but seems like there is
something else missing. So for now disable binning pass if scissor
optimization is used.
|
|
Updates to non-banked registers, CP_LOAD_STATE, etc, need a WFI if there
is potentially pending rendering. Track this better, and add fd_wfi()
calls everywhere that might potentially need CP_WAIT_FOR_IDLE.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
Add for now some simple/basic query support (ie. things not actually
requiring the GPU). Might change around a bit when I actually add
GPU queries, but for now this enables some useful performance info
in the GALLIUM_HUD. For example:
GALLIUM_HUD=fps+batches+batches-sysmem+batches-gmem+restores,draw-calls
The driver specific specific queries are:
+ draw-calls
+ batches - number of batches per second, sum of batches-sysmem
plus batches-gmem
+ batches-gmem - render a set of tiles in GMEM, for each tile
(optionally) system mem -> gmem (restore), plus N draws,
plus gmem -> system mem (resolve) per second
+ batches-sysmem - N draws to system memory (GMEM bypass) per
second
+ restores - number of GMEM batches that required restore per
second
Ideally for GMEM rendering, you want batches-gmem to equal fps. If
the app is doing something that triggers multiple passes (ie. requires
extra round trip gmem <-> system memory) then the # of batches per
second will go up relative to fps.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
The binning pass sorts vertices into which bins/tiles they apply to.
The visibility information generated during the binning pass can be
used to speed up the rendering pass by filtering out vertices which
do not apply to the current tile. See:
https://github.com/freedreno/freedreno/wiki/Adreno-tiling#optimized-approach
This brings a significant fps boost. A rough assortment of tests
(supertuxkart, etracer, tremulous, glmark2 'build' test, etc) seems
to yield a ~35-45% fps improvement.
For now, to be conservative, the binning pass is not enabled yet by
default. To enable it use:
FD_MESA_DEBUG=binning
So far I haven't found anything that breaks with binning enabled,
but I'd like a bit more testing before I enable it as default.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
Only need to leave room for depth/stencil if it is actually used, etc.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
Using RMW on banked context registers is not safe. The value read
could be the wrong one. So if there has been a DRAW_IDX launched,
the RMW must be preceded by a WAIT_FOR_IDLE to ensure the read part
of RMW sees the correct value.
To avoid unnecessary WFI's, keep track if there is a need for WFI,
and only emit one if needed. Furthermore, keep track if we even
need to update the register in the first place.
And to cut down on the amount of RMW to avoid excessive WFI's, at the
tiling/GMEM level we can always overwrite RB_RENDER_CONTROL, as the
state at beginning of draw/clear cmds (which we IB to) is always
undefined. In the draw/clear commands, we always still use RMW (with
WFI if needed), but only if the register value actually changes. (At
points where the current value cannot be known, the saved value is
reset to ~0, which includes bits outside of RBRC_DRAW_STATE, so there
never is chance for confusion.)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
Actually assign VSC_PIPE's properly, which will be needed for tiling.
And introduce fd_tile for per-tile state (including the assignment of
tile to VSC_PIPE). This gives us the proper pipe setup that we'll
need for hw binning pass, and also cleans things up a bit by not having
to pass so many parameters around. And will also make it easier to
introduce different tiling patterns (since we may no longer render
tiles in a simple left-to-right top-to-bottom pattern).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
Useful for debugging.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
Don't crash when no color buffer bound. Something caught when starting
to run piglit, fixes a hanful of piglit tests.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
Useful for testing and debugging.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
The GPU (at least a3xx, but I think also a2xx) can render directly to
memory, bypassing tiling. Although it can't do this if blend, depth,
and a few other features of the pipeline are enabled. This direct
memory mode can be faster for some sorts of operations, such as simple
blits. In particular, this significantly speeds up XA by avoiding to
pull the entire dest pixmap into GMEM, render tiles, and write it all
back out again. This should also speed up resource copy-region and
blit.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
Split the parts that are specific to adreno a2xx series GPUs from the
parts that will be in common with a3xx, so that a3xx support can be
added more cleanly.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
Set a few extra registers to make sure we are in proper state for
clearing. And also add some debug options to mark all state dirty in
clear and gmem operations to aid in debugging.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
Get rid of a few self-defined macros:
ALIGN() -> align()
min() -> MIN2()
max() -> MAX2()
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
Really this should be set based on buffer format, not on color vs
depth/stencil. Probably there should be more formats that set the bit
as we add support for more render target formats.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
Switch to use the envytools generated headers for register/bitfield
definitions. This is the first step in preparing to add a3xx support,
since it avoids having conflicting names for a3xx and a2xx registers.
And since I'm using envytools for a3xx it is simpler to just use it for
everything.
This shouldn't cause any functional change, it is really just a lot of
renaming.
Signed-off-by: Rob Clark <robdclark@gmail.com>
|
|
Optimize out parts of the render target that are scissored out by taking
into account maximal scissor bounds in fd_gmem_render_tiles().
This is a big win on things like gnome-shell which frequently do partial
screen updates.
Signed-off-by: Rob Clark <robdclark@gmail.com>
|
|
Some fixes for clearing only depth or only stencil.
Signed-off-by: Rob Clark <robdclark@gmail.com>
|
|
Currently works on a220. Others in the a2xx family look pretty similar
and should be pretty straightforward to support with the same driver.
The a3xx has a new shader ISA, and while many registers appear similar,
the register addresses have been completely shuffled around. I am not
sure yet whether it is best to support with the same driver, but
different compiler, or whether it should be split into a different
driver.
v1: original
v2: build file updates from review comments, and remove GPL licensed
header files from msm kernel
v3: smarter temp/pred register assignment, fix clear and depth/stencil
format issues, resource_transfer fixes, scissor fixes
Signed-off-by: Rob Clark <robdclark@gmail.com>
|