summaryrefslogtreecommitdiff
path: root/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
AgeCommit message (Collapse)AuthorFilesLines
2012-11-25i965/gen4: Fix LOD bias texturing since my fixed reg classes change.Eric Anholt1-10/+18
We have a special case where non-shadow comparison with LOD requires using a SIMD16 vec4 in an 8-wide shader, which appears in the register allocator as a size 8 vgrf. Fixes assertions in various piglit tests and webgl conformance. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56521
2012-10-19i965/fs: Fix typo in refactor of brw_fs_reg_allocate.cpp.Eric Anholt1-1/+1
I'm amazed that my usual warnings check didn't catch this, and that this passed piglit.
2012-10-17i965/fs: Statically allocate the reg_sets at context initialization.Eric Anholt1-27/+35
Now that we've replaced all the variable settings other than reg_width, it's easy to hang on to this (the expensive part of setting up the allocator). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2012-10-17i965/fs: Allocate registers in the unused parts of the gen7 MRF hack range.Eric Anholt1-1/+61
This should also reduce register pressure on gen7+, like the previous commit. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2012-10-17i965/fs: Reduce the interference between payload regs and virtual GRFs.Eric Anholt1-19/+150
Improves performance of the Lightsmark penumbra shadows scene by 15.7% +/- 1.0% (n=15), by eliminating register spilling. (tested by smashing the list of scenes to have all other scenes have 0 duration -- includes additional rendering of scene description text that normally doesn't appear in that scene) v2: Allow allocation of all but g0/g1 of the payload. v3: Pull count_to_loop_end() out to a helper function. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v2, recommended v3)
2012-10-17i965/fs: Expose the payload registers to the register allocator.Eric Anholt1-7/+39
For now, nothing else can get allocated over them, but that will change. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2012-10-17i965/fs: Remove extra allocation for classes[].Eric Anholt1-1/+1
This was to slot in the magic aligned pairs class, but it got moved to a descriptive name later. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2012-10-17i965/fs: Make the register allocation class_sizes[] choice static.Eric Anholt1-60/+41
Based on split_virtual_grfs(), we choose the same set every time, so set it in stone. This will help us avoid regenerating the somewhat expensive class/register set setup every compile. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2012-10-17i965: Share the predicate field between FS and VS.Eric Anholt1-1/+1
Note that BRW_PREDICATE_NONE is 0 and BRW_PREDICATE_NORMAL is 1, so that's a lot like the true/false we had in the FS before. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2012-09-25i965: Don't spill "smeared" registers.Paul Berry1-0/+15
Fixes an assertion failure when compiling certain shaders that need both pull constants and register spilling: brw_eu_emit.c:204: validate_reg: Assertion `execsize >= width' failed. NOTE: This is a candidate for release branches. Signed-off-by: Paul Berry <stereotype441@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2012-09-19ra: Add q_values parameter to ra_set_finalize()Tom Stellard1-1/+1
This allows the user to pass precomputed q values to the allocator. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2012-08-12i965: Add INTEL_DEBUG=perf for failure to compile 16-wide shaders.Eric Anholt1-1/+2
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2012-07-18i965/fs: Make register spill/unspill only do the regs for that instruction.Eric Anholt1-33/+33
Previously, if we were spilling the result of a texture call, we would store all 4 regs, then for each use of one of those regs as the source of an instruction, we would unspill all 4 regs even though only one was needed. In both lightsmark and l4d2 with my current graphics config, the shaders that produce spilling do so on split GRFs, so this doesn't help them out. However, in a capture of the l4d2 shaders with a different snapshot and playing the game instead of using a demo, it reduced one shader from 2817 instructions to 2179, due to choosing a now-cheaper texture result to spill instead of piles of texcoords. v2: Fix comment noted by Ken, and fix the if condition associated with it for the current state of what constitutes a partial write of the destination. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
2012-07-18i965/fs: Rename virtual_grf_next to virtual_grf_count.Eric Anholt1-12/+12
"count" is a more useful name, since most of the time we're using it for looping over the variables. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2012-02-14i965/fs: Enable register spilling on gen7 too.Eric Anholt1-2/+0
It turns out the same messages work on gen7, we were just being paranoid. Fixes the penumbra shadows mode of Lightsmark since the register allocation fix. NOTE: This is a candidate for release branches. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2012-02-10i965/fs: Add missing register allocation for 3rd sources.Eric Anholt1-0/+2
Our only instruction with a 3rd source so far was linterp, and that value was never register-allocated. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2012-01-30i965/fs: Fix rendering corruption in unigine tropics.Eric Anholt1-3/+3
We were allocating registers into the MRF hack region, resulting in sparkly renering in a few of the scenes. We could do better allocation by making an MRF class, having MRFs conflict with the corresponding GRFs, and tracking the live intervals of the "MRF"s and setting up the conflicts. But this is way easier for the moment. NOTE: This is a candidate for the 8.0 branch. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2012-01-18mesa: Make the register allocator allocation take a ralloc context.Eric Anholt1-1/+1
This fixes a memory leak on i965 context destruction. NOTE: This is a candidate for the 8.0 branch.
2011-10-27i965/gen6+: Parameterize barycentric interpolation modes.Paul Berry1-1/+10
This patch modifies the fragment shader back-end so that instead of using a single delta_x/delta_y register pair to store barycentric coordinates, it uses an array of such register pairs, one for each possible intepolation mode. When setting up the WM, we intstruct it to only provide the barycentric coordinates that are actually needed by the fragment shader--that is computed by brw_compute_barycentric_interp_modes(). Currently this function returns just BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC, because this is the only interpolation mode we support. However, that will change in a later patch. Reviewed-by: Eric Anholt <eric@anholt.net>
2011-08-30i965: Fix Android build by removing relative includesChad Versace1-3/+3
Replace each occurence of #include "../glsl/*.h" with #include "glsl/*.h" Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Chad Versace <chad@chad-versace.us>
2011-08-16i965: Create a shared enum for hardware and compiler-internal opcodes.Eric Anholt1-17/+3
This should make gdbing more pleasant, and it might be used in sharing part of the codegen between the VS and FS backends.
2011-08-10i965: Drop the reg/hw_reg distinction.Eric Anholt1-2/+2
"reg" was set in only one case, virtual GRFs pre register allocation, and would be unset and have hw_reg set after allocation. Since we never bothered with looking at virtual GRF number after allocation anyway, just use the same storage and avoid confusion.
2011-08-10i965/fs: Factor out the register allocator setup to a separate function.Eric Anholt1-66/+82
Besides separating out a logical step of the giant register allocator function, this now communicates a bunch of the allocator information through entries in brw_context, which will make this code partially reusable for caching the expensive allocator setup.
2011-08-10i965/fs: Simplify the register allocator using a map from RA reg to GRF.Eric Anholt1-41/+38
It's fewer pointers to track, and when we start caching the register set, should be algorithmically better in the cache hit case (lookup in a byte-per-register array, instead of a linear walk through desctiption of register classes to find how to translate that class).
2011-08-10i965/fs: Eliminate the magic nature of virtual GRF 0.Eric Anholt1-21/+12
This was a debugging aid at one point -- virtual grf 0 should never be allocated, and it would be used if undefined register access occurred in codegen. However, it made the confusing register allocation code even more confusing by indexing things off of 1 all over.
2011-08-10i965/fs: Use the new convenience interface for setting up reg conflicts.Eric Anholt1-22/+7
That code I wrote was impenetrable, and hard to write the first time. This makes things a lot more obvious.
2011-07-29i965/fs: Stop using the exec_list iterator.Eric Anholt1-8/+8
The old style has gone out of favor in the project, but I kept copy and pasting from existing iterator code.
2011-06-24i965/gen5: Fix grf_used calculation for 16-wide.Eric Anholt1-5/+4
If we happened to allocate a texture result (or other vector) to the highest hardware register slot, and we were in 16-wide, we would under-count the registers used and potentially wrap around to g0 if that allocation crossed a 16-register block boundary. Bad rendering and hangs ensued. Tested-by: Ian Romanick <idr@freedesktop.org>
2011-05-17i965: Disable register spilling on Ivybridge for now.Kenneth Graunke1-0/+2
The data port messages for this are rather different. For now, fail to compile rather than hanging the GPU. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
2011-04-26i965/fs: Add support for 16-wide dispatch to the register allocator.Eric Anholt1-19/+37
Note that the virtual grfs are in increments of the dispatch_width, not hardware registers -- this makes the 16-wide emit and 8-wide emit mostly the same. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2011-04-17i965/fs: Add gen6 register spilling support.Eric Anholt1-2/+0
Most of this is code movement to get the scratch space allocated in a shared location. Other than that, the only real changes are that the old oword block messages now operate on oword-aligned areas (with new messages for unaligned access, which we don't do), and that the caching control is in the SFID part of the descriptor instead of message control. Fixes glsl-fs-convolution-1.
2011-03-24i965/fs: Make compile failure more verbose with INTEL_DEBUG=wm.Eric Anholt1-4/+6
2011-01-31Convert everything from the talloc API to the ralloc API.Kenneth Graunke1-4/+4
2011-01-21glsl, i965: Remove unnecessary talloc includes.Kenneth Graunke1-1/+0
These are already picked up by ir.h or glsl_types.h.
2011-01-12i965: Clarify when we need to (re-)calculate live intervals.Eric Anholt1-0/+4
The ad-hoc placement of recalculation somewhere between when they got invalidated and when they were next needed was confusing. This should clarify what's going on here.
2010-10-26i965: Disable register spilling on gen6 until it's fixed.Eric Anholt1-1/+1
Avoids GPU hang on glsl-fs-convolution-1.
2010-10-22i965: Add support for pull constants to the new FS backend.Eric Anholt1-0/+1
Fixes glsl-fs-uniform-array-5, but not 6 which fails in ir_to_mesa.
2010-10-21i965: Add support for register spilling.Eric Anholt1-5/+158
It can be tested with if (0) replaced with if (1) to force spilling for all virtual GRFs. Some simple tests work, but large texturing tests fail.
2010-10-21i965: Split register allocation out of the ever-growing brw_fs.cpp.Eric Anholt1-0/+265