diff options
author | Kenneth Graunke <kenneth@whitecape.org> | 2017-04-21 01:28:13 -0700 |
---|---|---|
committer | Emil Velikov <emil.l.velikov@gmail.com> | 2017-04-30 09:46:15 +0100 |
commit | 36f6fc59cb4b61b1128a961d8808428257849adc (patch) | |
tree | 901fd656abb13451f29e63208da99933fc36dd7c | |
parent | 2bf79cb2f1fddd004c7e33cbe572242660ee64d0 (diff) |
i965/vec4: Avoid reswizzling MACH instructions in opt_register_coalesce().
opt_register_coalesce() was optimizing sequences such as:
mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D
mach(8) vgrf5.xy:D, attr18.xyyy:D, attr19.xyyy:D
mov(8) m4.zw:F, vgrf5.xxxy:F
into:
mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D
mach(8) m4.zw:D, attr18.xxxy:D, attr19.xxxy:D
This doesn't work - if we're going to reswizzle MACH, we'd need to
reswizzle the MUL as well. Here, the MUL fills the accumulator's .zw
components with attr18.yy * attr19.yy. But the MACH instruction expects
.z to contain attr18.x * attr19.x. Bogus results ensue.
No change in shader-db on Haswell. Prevents regressions in Timothy's
patches to use enhanced layouts for varying packing (which rearrange
code just enough to trigger this pre-existing bug, but were fine
themselves).
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 2faf227ec2e22c7a37e0a54783a3f0a0062ac852)
Squashed with commit:
i965/vec4: Use reads_accumulator_implicitly(), not MACH checks.
Curro pointed out that I should not just check for MACH, but use
the reads_accumulator_implicitly() helper, which would also prevent
the same bug with MAC and SADA2 (if we ever decide to use them).
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit 6b10c37b9c3a73add73f444fe1aee73c9ec82c94)
-rw-r--r-- | src/intel/compiler/brw_vec4.cpp | 7 |
1 files changed, 7 insertions, 0 deletions
diff --git a/src/intel/compiler/brw_vec4.cpp b/src/intel/compiler/brw_vec4.cpp index 0b92ba704e5..0909ddb5861 100644 --- a/src/intel/compiler/brw_vec4.cpp +++ b/src/intel/compiler/brw_vec4.cpp @@ -1071,6 +1071,13 @@ vec4_instruction::can_reswizzle(const struct gen_device_info *devinfo, if (devinfo->gen == 6 && is_math() && swizzle != BRW_SWIZZLE_XYZW) return false; + /* We can't swizzle implicit accumulator access. We'd have to + * reswizzle the producer of the accumulator value in addition + * to the consumer (i.e. both MUL and MACH). Just skip this. + */ + if (reads_accumulator_implicitly()) + return false; + if (!can_do_writemask(devinfo) && dst_writemask != WRITEMASK_XYZW) return false; |