Age | Commit message (Collapse) | Author | Files | Lines |
|
We have a special case where non-shadow comparison with LOD requires using a
SIMD16 vec4 in an 8-wide shader, which appears in the register allocator as a
size 8 vgrf.
Fixes assertions in various piglit tests and webgl conformance.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56521
|
|
I'm amazed that my usual warnings check didn't catch this, and that this
passed piglit.
|
|
Now that we've replaced all the variable settings other than reg_width, it's
easy to hang on to this (the expensive part of setting up the allocator).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
This should also reduce register pressure on gen7+, like the previous commit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
Improves performance of the Lightsmark penumbra shadows scene by 15.7% +/-
1.0% (n=15), by eliminating register spilling. (tested by smashing the list of
scenes to have all other scenes have 0 duration -- includes additional
rendering of scene description text that normally doesn't appear in that
scene)
v2: Allow allocation of all but g0/g1 of the payload.
v3: Pull count_to_loop_end() out to a helper function.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v2, recommended v3)
|
|
For now, nothing else can get allocated over them, but that will change.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
This was to slot in the magic aligned pairs class, but it got moved to a
descriptive name later.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
Based on split_virtual_grfs(), we choose the same set every time, so set it in
stone. This will help us avoid regenerating the somewhat expensive
class/register set setup every compile.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
Note that BRW_PREDICATE_NONE is 0 and BRW_PREDICATE_NORMAL is 1, so that's a
lot like the true/false we had in the FS before.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
Fixes an assertion failure when compiling certain shaders that need both
pull constants and register spilling:
brw_eu_emit.c:204: validate_reg: Assertion `execsize >= width' failed.
NOTE: This is a candidate for release branches.
Signed-off-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
This allows the user to pass precomputed q values to the allocator.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
Previously, if we were spilling the result of a texture call, we would store
all 4 regs, then for each use of one of those regs as the source of an
instruction, we would unspill all 4 regs even though only one was needed.
In both lightsmark and l4d2 with my current graphics config, the shaders that
produce spilling do so on split GRFs, so this doesn't help them out. However,
in a capture of the l4d2 shaders with a different snapshot and playing the
game instead of using a demo, it reduced one shader from 2817 instructions to
2179, due to choosing a now-cheaper texture result to spill instead of piles
of texcoords.
v2: Fix comment noted by Ken, and fix the if condition associated with it for
the current state of what constitutes a partial write of the destination.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
|
|
"count" is a more useful name, since most of the time we're using it for
looping over the variables.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
It turns out the same messages work on gen7, we were just being paranoid.
Fixes the penumbra shadows mode of Lightsmark since the register
allocation fix.
NOTE: This is a candidate for release branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
Our only instruction with a 3rd source so far was linterp, and that
value was never register-allocated.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
We were allocating registers into the MRF hack region, resulting in
sparkly renering in a few of the scenes. We could do better
allocation by making an MRF class, having MRFs conflict with the
corresponding GRFs, and tracking the live intervals of the "MRF"s and
setting up the conflicts. But this is way easier for the moment.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
This fixes a memory leak on i965 context destruction.
NOTE: This is a candidate for the 8.0 branch.
|
|
This patch modifies the fragment shader back-end so that instead of
using a single delta_x/delta_y register pair to store barycentric
coordinates, it uses an array of such register pairs, one for each
possible intepolation mode.
When setting up the WM, we intstruct it to only provide the
barycentric coordinates that are actually needed by the fragment
shader--that is computed by brw_compute_barycentric_interp_modes().
Currently this function returns just
BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC, because this is the only
interpolation mode we support. However, that will change in a later
patch.
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
Replace each occurence of
#include "../glsl/*.h"
with
#include "glsl/*.h"
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
|
|
This should make gdbing more pleasant, and it might be used in sharing
part of the codegen between the VS and FS backends.
|
|
"reg" was set in only one case, virtual GRFs pre register allocation,
and would be unset and have hw_reg set after allocation. Since we
never bothered with looking at virtual GRF number after allocation
anyway, just use the same storage and avoid confusion.
|
|
Besides separating out a logical step of the giant register allocator
function, this now communicates a bunch of the allocator information
through entries in brw_context, which will make this code partially
reusable for caching the expensive allocator setup.
|
|
It's fewer pointers to track, and when we start caching the register
set, should be algorithmically better in the cache hit case (lookup in
a byte-per-register array, instead of a linear walk through
desctiption of register classes to find how to translate that class).
|
|
This was a debugging aid at one point -- virtual grf 0 should never be
allocated, and it would be used if undefined register access occurred
in codegen. However, it made the confusing register allocation code
even more confusing by indexing things off of 1 all over.
|
|
That code I wrote was impenetrable, and hard to write the first time.
This makes things a lot more obvious.
|
|
The old style has gone out of favor in the project, but I kept copy
and pasting from existing iterator code.
|
|
If we happened to allocate a texture result (or other vector) to the
highest hardware register slot, and we were in 16-wide, we would
under-count the registers used and potentially wrap around to g0 if
that allocation crossed a 16-register block boundary. Bad rendering
and hangs ensued.
Tested-by: Ian Romanick <idr@freedesktop.org>
|
|
The data port messages for this are rather different. For now, fail to
compile rather than hanging the GPU.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
Note that the virtual grfs are in increments of the dispatch_width,
not hardware registers -- this makes the 16-wide emit and 8-wide emit
mostly the same.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
Most of this is code movement to get the scratch space allocated in a
shared location. Other than that, the only real changes are that the
old oword block messages now operate on oword-aligned areas (with new
messages for unaligned access, which we don't do), and that the
caching control is in the SFID part of the descriptor instead of
message control.
Fixes glsl-fs-convolution-1.
|
|
|
|
|
|
These are already picked up by ir.h or glsl_types.h.
|
|
The ad-hoc placement of recalculation somewhere between when they got
invalidated and when they were next needed was confusing. This should
clarify what's going on here.
|
|
Avoids GPU hang on glsl-fs-convolution-1.
|
|
Fixes glsl-fs-uniform-array-5, but not 6 which fails in ir_to_mesa.
|
|
It can be tested with if (0) replaced with if (1) to force spilling for all
virtual GRFs. Some simple tests work, but large texturing tests fail.
|
|
|