diff options
author | Ian Romanick <ian.d.romanick@intel.com> | 2021-08-02 21:33:17 -0700 |
---|---|---|
committer | Dave Airlie <airlied@redhat.com> | 2021-08-12 06:31:23 +1000 |
commit | e71eb0f2eadd44eb8986200b1600469d75a72e65 (patch) | |
tree | 93d78b3c2192c012ad78590b8bd44c2f8fdc6aef | |
parent | c24604acb85bb3153b4f5a66cea17f78db711b79 (diff) |
intel/fs: sel.cond writes the flags on Gfx4 and Gfx5
On Gfx4 and Gfx5, sel.l (for min) and sel.ge (for max) are implemented
using a separte cmpn and sel instruction. This lowering occurs in
fs_vistor::lower_minmax which is called very, very late... a long, long
time after the first calls to opt_cmod_propagation. As a result,
conditional modifiers can be incorrectly propagated across sel.cond on
those platforms.
No tests were affected by this change, and I find that quite shocking.
After just changing flags_written(), all of the atan tests started
failing on ILK. That required the change in cmod_propagatin (and the
addition of the prop_across_into_sel_gfx5 unit test).
Shader-db results for ILK and GM45 are below. I looked at a couple
before and after shaders... and every case that I looked at had
experienced incorrect cmod propagation. This affected a LOT of apps!
Euro Truck Simulator 2, The Talos Principle, Serious Sam 3, Sanctum 2,
Gang Beasts, and on and on... :(
I discovered this bug while working on a couple new optimization
passes. One of the passes attempts to remove condition modifiers that
are never used. The pass made no progress except on ILK and GM45.
After investigating a couple of the affected shaders, I noticed that
the code in those shaders looked wrong... investigation led to this
cause.
v2: Trivial changes in the unit tests.
v3: Fix type in comment in unit tests. Noticed by Jason and Priit.
v4: Tweak handling of BRW_OPCODE_SEL special case. Suggested by Jason.
Fixes: df1aec763eb ("i965/fs: Define methods to calculate the flag subset read or written by an fs_inst.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Dave Airlie <airlied@redhat.com>
Iron Lake
total instructions in shared programs: 8180493 -> 8181781 (0.02%)
instructions in affected programs: 541796 -> 543084 (0.24%)
helped: 28
HURT: 1158
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.35% max: 0.86% x̄: 0.53% x̃: 0.50%
HURT stats (abs) min: 1 max: 3 x̄: 1.14 x̃: 1
HURT stats (rel) min: 0.12% max: 4.00% x̄: 0.37% x̃: 0.23%
95% mean confidence interval for instructions value: 1.06 1.11
95% mean confidence interval for instructions %-change: 0.31% 0.38%
Instructions are HURT.
total cycles in shared programs: 239420470 -> 239421690 (<.01%)
cycles in affected programs: 2925992 -> 2927212 (0.04%)
helped: 49
HURT: 157
helped stats (abs) min: 2 max: 284 x̄: 62.69 x̃: 70
helped stats (rel) min: 0.04% max: 6.20% x̄: 1.68% x̃: 1.96%
HURT stats (abs) min: 2 max: 48 x̄: 27.34 x̃: 24
HURT stats (rel) min: 0.02% max: 2.91% x̄: 0.31% x̃: 0.20%
95% mean confidence interval for cycles value: -0.80 12.64
95% mean confidence interval for cycles %-change: -0.31% <.01%
Inconclusive result (value mean confidence interval includes 0).
GM45
total instructions in shared programs: 4985517 -> 4986207 (0.01%)
instructions in affected programs: 306935 -> 307625 (0.22%)
helped: 14
HURT: 625
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.35% max: 0.82% x̄: 0.52% x̃: 0.49%
HURT stats (abs) min: 1 max: 3 x̄: 1.13 x̃: 1
HURT stats (rel) min: 0.12% max: 3.90% x̄: 0.34% x̃: 0.22%
95% mean confidence interval for instructions value: 1.04 1.12
95% mean confidence interval for instructions %-change: 0.29% 0.36%
Instructions are HURT.
total cycles in shared programs: 153827268 -> 153828052 (<.01%)
cycles in affected programs: 1669290 -> 1670074 (0.05%)
helped: 24
HURT: 84
helped stats (abs) min: 2 max: 232 x̄: 64.33 x̃: 67
helped stats (rel) min: 0.04% max: 4.62% x̄: 1.60% x̃: 1.94%
HURT stats (abs) min: 2 max: 48 x̄: 27.71 x̃: 24
HURT stats (rel) min: 0.02% max: 2.66% x̄: 0.34% x̃: 0.14%
95% mean confidence interval for cycles value: -1.94 16.46
95% mean confidence interval for cycles %-change: -0.29% 0.11%
Inconclusive result (value mean confidence interval includes 0).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12330>
-rw-r--r-- | src/intel/compiler/brw_fs.cpp | 12 | ||||
-rw-r--r-- | src/intel/compiler/brw_fs_cmod_propagation.cpp | 40 | ||||
-rw-r--r-- | src/intel/compiler/brw_fs_cse.cpp | 4 | ||||
-rw-r--r-- | src/intel/compiler/brw_fs_dead_code_eliminate.cpp | 11 | ||||
-rw-r--r-- | src/intel/compiler/brw_fs_live_variables.cpp | 2 | ||||
-rw-r--r-- | src/intel/compiler/brw_fs_lower_regioning.cpp | 2 | ||||
-rw-r--r-- | src/intel/compiler/brw_fs_sel_peephole.cpp | 9 | ||||
-rw-r--r-- | src/intel/compiler/brw_ir_fs.h | 2 | ||||
-rw-r--r-- | src/intel/compiler/brw_ir_performance.cpp | 4 | ||||
-rw-r--r-- | src/intel/compiler/brw_schedule_instructions.cpp | 4 | ||||
-rw-r--r-- | src/intel/compiler/test_fs_cmod_propagation.cpp | 131 |
11 files changed, 184 insertions, 37 deletions
diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp index add199388c1..2735a430ec3 100644 --- a/src/intel/compiler/brw_fs.cpp +++ b/src/intel/compiler/brw_fs.cpp @@ -1116,9 +1116,13 @@ fs_inst::flags_read(const intel_device_info *devinfo) const } unsigned -fs_inst::flags_written() const +fs_inst::flags_written(const intel_device_info *devinfo) const { - if ((conditional_mod && (opcode != BRW_OPCODE_SEL && + /* On Gfx4 and Gfx5, sel.l (for min) and sel.ge (for max) are implemented + * using a separte cmpn and sel instruction. This lowering occurs in + * fs_vistor::lower_minmax which is called very, very late. + */ + if ((conditional_mod && ((opcode != BRW_OPCODE_SEL || devinfo->ver <= 5) && opcode != BRW_OPCODE_CSEL && opcode != BRW_OPCODE_IF && opcode != BRW_OPCODE_WHILE)) || @@ -7609,7 +7613,7 @@ needs_src_copy(const fs_builder &lbld, const fs_inst *inst, unsigned i) return !(is_periodic(inst->src[i], lbld.dispatch_width()) || (inst->components_read(i) == 1 && lbld.dispatch_width() <= inst->exec_size)) || - (inst->flags_written() & + (inst->flags_written(lbld.shader->devinfo) & flag_mask(inst->src[i], type_sz(inst->src[i].type))); } @@ -8730,7 +8734,7 @@ fs_visitor::fixup_nomask_control_flow() foreach_inst_in_block_reverse_safe(fs_inst, inst, block) { if (!inst->predicate && inst->exec_size >= 8) - flag_liveout &= ~inst->flags_written(); + flag_liveout &= ~inst->flags_written(devinfo); switch (inst->opcode) { case BRW_OPCODE_DO: diff --git a/src/intel/compiler/brw_fs_cmod_propagation.cpp b/src/intel/compiler/brw_fs_cmod_propagation.cpp index 435186c1de1..770e417e47f 100644 --- a/src/intel/compiler/brw_fs_cmod_propagation.cpp +++ b/src/intel/compiler/brw_fs_cmod_propagation.cpp @@ -55,7 +55,7 @@ cmod_propagate_cmp_to_add(const intel_device_info *devinfo, bblock_t *block, fs_inst *inst) { bool read_flag = false; - const unsigned flags_written = inst->flags_written(); + const unsigned flags_written = inst->flags_written(devinfo); foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst) { if (scan_inst->opcode == BRW_OPCODE_ADD && @@ -89,8 +89,8 @@ cmod_propagate_cmp_to_add(const intel_device_info *devinfo, bblock_t *block, * Perhaps (scan_inst->flags_written() & flags_written) != * flags_written? */ - if (scan_inst->flags_written() != 0 && - scan_inst->flags_written() != flags_written) + if (scan_inst->flags_written(devinfo) != 0 && + scan_inst->flags_written(devinfo) != flags_written) goto not_match; /* From the Kaby Lake PRM Vol. 7 "Assigning Conditional Flags": @@ -142,7 +142,7 @@ cmod_propagate_cmp_to_add(const intel_device_info *devinfo, bblock_t *block, } not_match: - if ((scan_inst->flags_written() & flags_written) != 0) + if ((scan_inst->flags_written(devinfo) & flags_written) != 0) break; read_flag = read_flag || @@ -171,7 +171,7 @@ cmod_propagate_not(const intel_device_info *devinfo, bblock_t *block, { const enum brw_conditional_mod cond = brw_negate_cmod(inst->conditional_mod); bool read_flag = false; - const unsigned flags_written = inst->flags_written(); + const unsigned flags_written = inst->flags_written(devinfo); if (cond != BRW_CONDITIONAL_Z && cond != BRW_CONDITIONAL_NZ) return false; @@ -195,8 +195,8 @@ cmod_propagate_not(const intel_device_info *devinfo, bblock_t *block, * Perhaps (scan_inst->flags_written() & flags_written) != * flags_written? */ - if (scan_inst->flags_written() != 0 && - scan_inst->flags_written() != flags_written) + if (scan_inst->flags_written(devinfo) != 0 && + scan_inst->flags_written(devinfo) != flags_written) break; if (scan_inst->can_do_cmod() && @@ -209,7 +209,7 @@ cmod_propagate_not(const intel_device_info *devinfo, bblock_t *block, break; } - if ((scan_inst->flags_written() & flags_written) != 0) + if ((scan_inst->flags_written(devinfo) & flags_written) != 0) break; read_flag = read_flag || @@ -285,7 +285,7 @@ opt_cmod_propagation_local(const intel_device_info *devinfo, bblock_t *block) } bool read_flag = false; - const unsigned flags_written = inst->flags_written(); + const unsigned flags_written = inst->flags_written(devinfo); foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst) { if (regions_overlap(scan_inst->dst, scan_inst->size_written, inst->src[0], inst->size_read(0))) { @@ -296,8 +296,8 @@ opt_cmod_propagation_local(const intel_device_info *devinfo, bblock_t *block) * Perhaps (scan_inst->flags_written() & flags_written) != * flags_written? */ - if (scan_inst->flags_written() != 0 && - scan_inst->flags_written() != flags_written) + if (scan_inst->flags_written(devinfo) != 0 && + scan_inst->flags_written(devinfo) != flags_written) break; if (scan_inst->is_partial_write() || @@ -396,7 +396,7 @@ opt_cmod_propagation_local(const intel_device_info *devinfo, bblock_t *block) * between scan_inst and inst. */ if (!inst->src[0].negate && - scan_inst->flags_written()) { + scan_inst->flags_written(devinfo)) { if (scan_inst->opcode == BRW_OPCODE_CMP) { if ((inst->conditional_mod == BRW_CONDITIONAL_NZ) || (inst->conditional_mod == BRW_CONDITIONAL_G && @@ -408,8 +408,18 @@ opt_cmod_propagation_local(const intel_device_info *devinfo, bblock_t *block) break; } } else if (scan_inst->conditional_mod == inst->conditional_mod) { - inst->remove(block, true); - progress = true; + /* On Gfx4 and Gfx5 sel.cond will dirty the flags, but the + * flags value is not based on the result stored in the + * destination. On all other platforms sel.cond will not + * write the flags, so execution will not get to this point. + */ + if (scan_inst->opcode == BRW_OPCODE_SEL) { + assert(devinfo->ver <= 5); + } else { + inst->remove(block, true); + progress = true; + } + break; } else if (!read_flag) { scan_inst->conditional_mod = inst->conditional_mod; @@ -505,7 +515,7 @@ opt_cmod_propagation_local(const intel_device_info *devinfo, bblock_t *block) break; } - if ((scan_inst->flags_written() & flags_written) != 0) + if ((scan_inst->flags_written(devinfo) & flags_written) != 0) break; read_flag = read_flag || diff --git a/src/intel/compiler/brw_fs_cse.cpp b/src/intel/compiler/brw_fs_cse.cpp index 66c914bf93f..495f1f95fdd 100644 --- a/src/intel/compiler/brw_fs_cse.cpp +++ b/src/intel/compiler/brw_fs_cse.cpp @@ -332,10 +332,10 @@ fs_visitor::opt_cse_local(const fs_live_variables &live, bblock_t *block, int &i /* Kill all AEB entries that write a different value to or read from * the flag register if we just wrote it. */ - if (inst->flags_written()) { + if (inst->flags_written(devinfo)) { bool negate; /* dummy */ if (entry->generator->flags_read(devinfo) || - (entry->generator->flags_written() && + (entry->generator->flags_written(devinfo) && !instructions_match(inst, entry->generator, &negate))) { entry->remove(); ralloc_free(entry); diff --git a/src/intel/compiler/brw_fs_dead_code_eliminate.cpp b/src/intel/compiler/brw_fs_dead_code_eliminate.cpp index 21f6f903df0..ed7ab3ebdc9 100644 --- a/src/intel/compiler/brw_fs_dead_code_eliminate.cpp +++ b/src/intel/compiler/brw_fs_dead_code_eliminate.cpp @@ -40,11 +40,12 @@ using namespace brw; * Is it safe to eliminate the instruction? */ static bool -can_eliminate(const fs_inst *inst, BITSET_WORD *flag_live) +can_eliminate(const intel_device_info *devinfo, const fs_inst *inst, + BITSET_WORD *flag_live) { return !inst->is_control_flow() && !inst->has_side_effects() && - !(flag_live[0] & inst->flags_written()) && + !(flag_live[0] & inst->flags_written(devinfo)) && !inst->writes_accumulator; } @@ -96,14 +97,14 @@ fs_visitor::dead_code_eliminate() result_live |= BITSET_TEST(live, var + i); if (!result_live && - (can_omit_write(inst) || can_eliminate(inst, flag_live))) { + (can_omit_write(inst) || can_eliminate(devinfo, inst, flag_live))) { inst->dst = fs_reg(spread(retype(brw_null_reg(), inst->dst.type), inst->dst.stride)); progress = true; } } - if (inst->dst.is_null() && can_eliminate(inst, flag_live)) { + if (inst->dst.is_null() && can_eliminate(devinfo, inst, flag_live)) { inst->opcode = BRW_OPCODE_NOP; progress = true; } @@ -118,7 +119,7 @@ fs_visitor::dead_code_eliminate() } if (!inst->predicate && inst->exec_size >= 8) - flag_live[0] &= ~inst->flags_written(); + flag_live[0] &= ~inst->flags_written(devinfo); if (inst->opcode == BRW_OPCODE_NOP) { inst->remove(block, true); diff --git a/src/intel/compiler/brw_fs_live_variables.cpp b/src/intel/compiler/brw_fs_live_variables.cpp index 3a35cac3b76..5ad5b95976a 100644 --- a/src/intel/compiler/brw_fs_live_variables.cpp +++ b/src/intel/compiler/brw_fs_live_variables.cpp @@ -138,7 +138,7 @@ fs_live_variables::setup_def_use() } if (!inst->predicate && inst->exec_size >= 8) - bd->flag_def[0] |= inst->flags_written() & ~bd->flag_use[0]; + bd->flag_def[0] |= inst->flags_written(devinfo) & ~bd->flag_use[0]; ip++; } diff --git a/src/intel/compiler/brw_fs_lower_regioning.cpp b/src/intel/compiler/brw_fs_lower_regioning.cpp index 5e0f8664d45..c9ce2a814db 100644 --- a/src/intel/compiler/brw_fs_lower_regioning.cpp +++ b/src/intel/compiler/brw_fs_lower_regioning.cpp @@ -355,7 +355,7 @@ namespace { if (!has_inconsistent_cmod(inst)) inst->conditional_mod = BRW_CONDITIONAL_NONE; - assert(!inst->flags_written() || !mov->predicate); + assert(!inst->flags_written(v->devinfo) || !mov->predicate); return true; } diff --git a/src/intel/compiler/brw_fs_sel_peephole.cpp b/src/intel/compiler/brw_fs_sel_peephole.cpp index 6de5211f56d..540c2c8c21f 100644 --- a/src/intel/compiler/brw_fs_sel_peephole.cpp +++ b/src/intel/compiler/brw_fs_sel_peephole.cpp @@ -63,13 +63,14 @@ using namespace brw; * returns 3. */ static int -count_movs_from_if(fs_inst *then_mov[MAX_MOVS], fs_inst *else_mov[MAX_MOVS], +count_movs_from_if(const intel_device_info *devinfo, + fs_inst *then_mov[MAX_MOVS], fs_inst *else_mov[MAX_MOVS], bblock_t *then_block, bblock_t *else_block) { int then_movs = 0; foreach_inst_in_block(fs_inst, inst, then_block) { if (then_movs == MAX_MOVS || inst->opcode != BRW_OPCODE_MOV || - inst->flags_written()) + inst->flags_written(devinfo)) break; then_mov[then_movs] = inst; @@ -79,7 +80,7 @@ count_movs_from_if(fs_inst *then_mov[MAX_MOVS], fs_inst *else_mov[MAX_MOVS], int else_movs = 0; foreach_inst_in_block(fs_inst, inst, else_block) { if (else_movs == MAX_MOVS || inst->opcode != BRW_OPCODE_MOV || - inst->flags_written()) + inst->flags_written(devinfo)) break; else_mov[else_movs] = inst; @@ -152,7 +153,7 @@ fs_visitor::opt_peephole_sel() if (else_block == NULL) continue; - int movs = count_movs_from_if(then_mov, else_mov, then_block, else_block); + int movs = count_movs_from_if(devinfo, then_mov, else_mov, then_block, else_block); if (movs == 0) continue; diff --git a/src/intel/compiler/brw_ir_fs.h b/src/intel/compiler/brw_ir_fs.h index 214b0763d5e..a61e66303b6 100644 --- a/src/intel/compiler/brw_ir_fs.h +++ b/src/intel/compiler/brw_ir_fs.h @@ -378,7 +378,7 @@ public: * Return the subset of flag registers updated by the instruction (either * partially or fully) as a bitset with byte granularity. */ - unsigned flags_written() const; + unsigned flags_written(const intel_device_info *devinfo) const; fs_reg dst; fs_reg *src; diff --git a/src/intel/compiler/brw_ir_performance.cpp b/src/intel/compiler/brw_ir_performance.cpp index edec07a909a..e6caebb5a26 100644 --- a/src/intel/compiler/brw_ir_performance.cpp +++ b/src/intel/compiler/brw_ir_performance.cpp @@ -1375,7 +1375,7 @@ namespace { st, reg_dependency_id(devinfo, brw_acc_reg(8), j)); } - if (const unsigned mask = inst->flags_written()) { + if (const unsigned mask = inst->flags_written(devinfo)) { for (unsigned i = 0; i < sizeof(mask) * CHAR_BIT; i++) { if (mask & (1 << i)) stall_on_dependency(st, flag_dependency_id(i)); @@ -1425,7 +1425,7 @@ namespace { reg_dependency_id(devinfo, brw_acc_reg(8), j)); } - if (const unsigned mask = inst->flags_written()) { + if (const unsigned mask = inst->flags_written(devinfo)) { for (unsigned i = 0; i < sizeof(mask) * CHAR_BIT; i++) { if (mask & (1 << i)) mark_write_dependency(st, perf, flag_dependency_id(i)); diff --git a/src/intel/compiler/brw_schedule_instructions.cpp b/src/intel/compiler/brw_schedule_instructions.cpp index e19a9e5416d..7c6d459e00d 100644 --- a/src/intel/compiler/brw_schedule_instructions.cpp +++ b/src/intel/compiler/brw_schedule_instructions.cpp @@ -1262,7 +1262,7 @@ fs_instruction_scheduler::calculate_deps() } } - if (const unsigned mask = inst->flags_written()) { + if (const unsigned mask = inst->flags_written(v->devinfo)) { assert(mask < (1 << ARRAY_SIZE(last_conditional_mod))); for (unsigned i = 0; i < ARRAY_SIZE(last_conditional_mod); i++) { @@ -1384,7 +1384,7 @@ fs_instruction_scheduler::calculate_deps() } } - if (const unsigned mask = inst->flags_written()) { + if (const unsigned mask = inst->flags_written(v->devinfo)) { assert(mask < (1 << ARRAY_SIZE(last_conditional_mod))); for (unsigned i = 0; i < ARRAY_SIZE(last_conditional_mod); i++) { diff --git a/src/intel/compiler/test_fs_cmod_propagation.cpp b/src/intel/compiler/test_fs_cmod_propagation.cpp index ff4f9785fbb..e157a92b3c3 100644 --- a/src/intel/compiler/test_fs_cmod_propagation.cpp +++ b/src/intel/compiler/test_fs_cmod_propagation.cpp @@ -2482,3 +2482,134 @@ TEST_F(cmod_propagation_test, cmp_to_add_float_le) EXPECT_EQ(BRW_OPCODE_ADD, instruction(block0, 0)->opcode); EXPECT_EQ(BRW_CONDITIONAL_LE, instruction(block0, 0)->conditional_mod); } + +TEST_F(cmod_propagation_test, prop_across_sel_gfx7) +{ + const fs_builder &bld = v->bld; + fs_reg dest1 = v->vgrf(glsl_type::float_type); + fs_reg dest2 = v->vgrf(glsl_type::float_type); + fs_reg src0 = v->vgrf(glsl_type::float_type); + fs_reg src1 = v->vgrf(glsl_type::float_type); + fs_reg src2 = v->vgrf(glsl_type::float_type); + fs_reg src3 = v->vgrf(glsl_type::float_type); + fs_reg zero(brw_imm_f(0.0f)); + bld.ADD(dest1, src0, src1); + bld.emit_minmax(dest2, src2, src3, BRW_CONDITIONAL_GE); + bld.CMP(bld.null_reg_f(), dest1, zero, BRW_CONDITIONAL_GE); + + /* = Before = + * + * 0: add(8) dest1 src0 src1 + * 1: sel.ge(8) dest2 src2 src3 + * 2: cmp.ge.f0(8) null dest1 0.0f + * + * = After = + * 0: add.ge.f0(8) dest1 src0 src1 + * 1: sel.ge(8) dest2 src2 src3 + */ + + v->calculate_cfg(); + bblock_t *block0 = v->cfg->blocks[0]; + + EXPECT_EQ(0, block0->start_ip); + EXPECT_EQ(2, block0->end_ip); + + EXPECT_TRUE(cmod_propagation(v)); + EXPECT_EQ(0, block0->start_ip); + EXPECT_EQ(1, block0->end_ip); + EXPECT_EQ(BRW_OPCODE_ADD, instruction(block0, 0)->opcode); + EXPECT_EQ(BRW_CONDITIONAL_GE, instruction(block0, 0)->conditional_mod); + EXPECT_EQ(BRW_OPCODE_SEL, instruction(block0, 1)->opcode); + EXPECT_EQ(BRW_CONDITIONAL_GE, instruction(block0, 1)->conditional_mod); +} + +TEST_F(cmod_propagation_test, prop_across_sel_gfx5) +{ + devinfo->ver = 5; + devinfo->verx10 = devinfo->ver * 10; + + const fs_builder &bld = v->bld; + fs_reg dest1 = v->vgrf(glsl_type::float_type); + fs_reg dest2 = v->vgrf(glsl_type::float_type); + fs_reg src0 = v->vgrf(glsl_type::float_type); + fs_reg src1 = v->vgrf(glsl_type::float_type); + fs_reg src2 = v->vgrf(glsl_type::float_type); + fs_reg src3 = v->vgrf(glsl_type::float_type); + fs_reg zero(brw_imm_f(0.0f)); + bld.ADD(dest1, src0, src1); + bld.emit_minmax(dest2, src2, src3, BRW_CONDITIONAL_GE); + bld.CMP(bld.null_reg_f(), dest1, zero, BRW_CONDITIONAL_GE); + + /* = Before = + * + * 0: add(8) dest1 src0 src1 + * 1: sel.ge(8) dest2 src2 src3 + * 2: cmp.ge.f0(8) null dest1 0.0f + * + * = After = + * (no changes) + * + * On Gfx4 and Gfx5, sel.l (for min) and sel.ge (for max) are implemented + * using a separate cmpn and sel instruction. This lowering occurs in + * fs_vistor::lower_minmax which is called a long time after the first + * calls to cmod_propagation. + */ + + v->calculate_cfg(); + bblock_t *block0 = v->cfg->blocks[0]; + + EXPECT_EQ(0, block0->start_ip); + EXPECT_EQ(2, block0->end_ip); + + EXPECT_FALSE(cmod_propagation(v)); + EXPECT_EQ(0, block0->start_ip); + EXPECT_EQ(2, block0->end_ip); + EXPECT_EQ(BRW_OPCODE_ADD, instruction(block0, 0)->opcode); + EXPECT_EQ(BRW_CONDITIONAL_NONE, instruction(block0, 0)->conditional_mod); + EXPECT_EQ(BRW_OPCODE_SEL, instruction(block0, 1)->opcode); + EXPECT_EQ(BRW_CONDITIONAL_GE, instruction(block0, 1)->conditional_mod); + EXPECT_EQ(BRW_OPCODE_CMP, instruction(block0, 2)->opcode); + EXPECT_EQ(BRW_CONDITIONAL_GE, instruction(block0, 2)->conditional_mod); +} + +TEST_F(cmod_propagation_test, prop_into_sel_gfx5) +{ + devinfo->ver = 5; + devinfo->verx10 = devinfo->ver * 10; + + const fs_builder &bld = v->bld; + fs_reg dest = v->vgrf(glsl_type::float_type); + fs_reg src0 = v->vgrf(glsl_type::float_type); + fs_reg src1 = v->vgrf(glsl_type::float_type); + fs_reg zero(brw_imm_f(0.0f)); + bld.emit_minmax(dest, src0, src1, BRW_CONDITIONAL_GE); + bld.CMP(bld.null_reg_f(), dest, zero, BRW_CONDITIONAL_GE); + + /* = Before = + * + * 0: sel.ge(8) dest src0 src1 + * 1: cmp.ge.f0(8) null dest 0.0f + * + * = After = + * (no changes) + * + * Do not copy propagate into a sel.cond instruction. While it does modify + * the flags, the flags are not based on the result compared with zero (as + * with most other instructions). The result is based on the sources + * compared with each other (like cmp.cond). + */ + + v->calculate_cfg(); + bblock_t *block0 = v->cfg->blocks[0]; + + EXPECT_EQ(0, block0->start_ip); + EXPECT_EQ(1, block0->end_ip); + + EXPECT_FALSE(cmod_propagation(v)); + EXPECT_EQ(0, block0->start_ip); + EXPECT_EQ(1, block0->end_ip); + EXPECT_EQ(BRW_OPCODE_SEL, instruction(block0, 0)->opcode); + EXPECT_EQ(BRW_CONDITIONAL_GE, instruction(block0, 0)->conditional_mod); + EXPECT_EQ(BRW_OPCODE_CMP, instruction(block0, 1)->opcode); + EXPECT_EQ(BRW_CONDITIONAL_GE, instruction(block0, 1)->conditional_mod); +} |