summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorAlyssa Rosenzweig <alyssa@collabora.com>2022-02-23 13:50:54 -0500
committerAlyssa Rosenzweig <alyssa@collabora.com>2022-02-24 14:09:14 -0500
commiteb1479bda22bf80b553a87ab781956dc068d5b19 (patch)
treeae0857a5a472efa5f9fe81188dd6087da81bf49b
parentc8437cd415c79ad597b44c9d3c24540c772c5a59 (diff)
pan/bi: Support message preloading
Preload LD_VAR_IMM or VAR_TEX instructions in the first block of fragment shaders on v7. Preloaded messages write to fixed registers; when replacing instructions we insert moves from the registers at the start of the program and hope coalescing goes to town. (Admittedly we don't do any coalescing yet...) The extra moves hurts instruction count in some cases; the win for cycle count should cancel this out. When we get smarter copy prop or RA, those moves should go away anyway. This optimization may hurt register pressure by extending the lifetime of up to eight registers written in the first block. This is expected to be acceptable: on a large shader-db, there are no additional spills/fills, and only two shaders are hurt on thread count. This optimization only applies to v7, as the hardware was not introduced on v6 and was removed for Valhall. total instructions in shared programs: 2451624 -> 2454286 (0.11%) instructions in affected programs: 909046 -> 911708 (0.29%) helped: 4719 HURT: 3341 helped stats (abs) min: 1.0 max: 10.0 x̄: 1.49 x̃: 1 helped stats (rel) min: 0.08% max: 33.33% x̄: 6.79% x̃: 3.92% HURT stats (abs) min: 1.0 max: 50.0 x̄: 2.90 x̃: 2 HURT stats (rel) min: 0.12% max: 66.67% x̄: 6.39% x̃: 3.45% 95% mean confidence interval for instructions value: 0.27 0.39 95% mean confidence interval for instructions %-change: -1.55% -1.11% Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree). total tuples in shared programs: 1969529 -> 1963429 (-0.31%) tuples in affected programs: 601327 -> 595227 (-1.01%) helped: 5907 HURT: 1297 helped stats (abs) min: 1.0 max: 8.0 x̄: 1.41 x̃: 1 helped stats (rel) min: 0.07% max: 33.33% x̄: 7.25% x̃: 5.26% HURT stats (abs) min: 1.0 max: 40.0 x̄: 1.73 x̃: 1 HURT stats (rel) min: 0.16% max: 31.75% x̄: 3.38% x̃: 2.02% 95% mean confidence interval for tuples value: -0.88 -0.81 95% mean confidence interval for tuples %-change: -5.52% -5.15% Tuples are helped. total clauses in shared programs: 401689 -> 387830 (-3.45%) clauses in affected programs: 136944 -> 123085 (-10.12%) helped: 8427 HURT: 4 helped stats (abs) min: 1.0 max: 4.0 x̄: 1.65 x̃: 2 helped stats (rel) min: 0.49% max: 50.00% x̄: 19.88% x̃: 18.18% HURT stats (abs) min: 1.0 max: 4.0 x̄: 2.50 x̃: 2 HURT stats (rel) min: 1.96% max: 19.05% x̄: 14.18% x̃: 17.86% 95% mean confidence interval for clauses value: -1.66 -1.63 95% mean confidence interval for clauses %-change: -20.15% -19.58% Clauses are helped. total cycles in shared programs: 202735.83 -> 201862.21 (-0.43%) cycles in affected programs: 16295.46 -> 15421.83 (-5.36%) helped: 3349 HURT: 1962 helped stats (abs) min: 0.041665999999999315 max: 1.0 x̄: 0.32 x̃: 0 helped stats (rel) min: 0.24% max: 100.00% x̄: 40.77% x̃: 33.33% HURT stats (abs) min: 0.041665999999999315 max: 1.5833329999999997 x̄: 0.10 x̃: 0 HURT stats (rel) min: 0.09% max: 31.40% x̄: 2.95% x̃: 1.94% 95% mean confidence interval for cycles value: -0.17 -0.16 95% mean confidence interval for cycles %-change: -25.48% -23.76% Cycles are helped. total arith in shared programs: 74665.50 -> 74920.00 (0.34%) arith in affected programs: 16059.92 -> 16314.42 (1.58%) helped: 860 HURT: 3409 helped stats (abs) min: 0.041665999999999315 max: 0.25 x̄: 0.06 x̃: 0 helped stats (rel) min: 0.24% max: 37.50% x̄: 4.73% x̃: 2.56% HURT stats (abs) min: 0.041665999999999315 max: 1.5833329999999997 x̄: 0.09 x̃: 0 HURT stats (rel) min: 0.09% max: 100.00% x̄: 8.99% x̃: 4.21% 95% mean confidence interval for arith value: 0.06 0.06 95% mean confidence interval for arith %-change: 5.83% 6.62% Arith are HURT. total texture in shared programs: 13083.50 -> 11877 (-9.22%) texture in affected programs: 1663 -> 456.50 (-72.55%) helped: 2377 HURT: 3 helped stats (abs) min: 0.5 max: 1.0 x̄: 0.51 x̃: 0 helped stats (rel) min: 6.25% max: 100.00% x̄: 87.12% x̃: 100.00% HURT stats (abs) min: 0.5 max: 0.5 x̄: 0.50 x̃: 0 HURT stats (rel) min: 0.00% max: 25.00% x̄: 16.67% x̃: 25.00% 95% mean confidence interval for texture value: -0.51 -0.50 95% mean confidence interval for texture %-change: -87.98% -86.00% Texture are helped. total vary in shared programs: 10220.62 -> 4183.88 (-59.06%) vary in affected programs: 10126.50 -> 4089.75 (-59.61%) helped: 8538 HURT: 0 helped stats (abs) min: 0.125 max: 1.0 x̄: 0.71 x̃: 0 helped stats (rel) min: 7.14% max: 100.00% x̄: 74.74% x̃: 87.50% 95% mean confidence interval for vary value: -0.71 -0.70 95% mean confidence interval for vary %-change: -75.32% -74.16% Vary are helped. total quadwords in shared programs: 1766717 -> 1757161 (-0.54%) quadwords in affected programs: 553801 -> 544245 (-1.73%) helped: 6760 HURT: 711 helped stats (abs) min: 1.0 max: 11.0 x̄: 1.58 x̃: 1 helped stats (rel) min: 0.09% max: 29.41% x̄: 5.31% x̃: 4.84% HURT stats (abs) min: 1.0 max: 33.0 x̄: 1.54 x̃: 1 HURT stats (rel) min: 0.10% max: 31.13% x̄: 2.53% x̃: 1.61% 95% mean confidence interval for quadwords value: -1.31 -1.25 95% mean confidence interval for quadwords %-change: -4.67% -4.46% Quadwords are helped. total threads in shared programs: 52899 -> 52897 (<.01%) threads in affected programs: 4 -> 2 (-50.00%) helped: 0 HURT: 2 total preloads in shared programs: 0 -> 116492 preloads in affected programs: 0 -> 116492 helped: 0 HURT: 8604 HURT stats (abs) min: 2.0 max: 24.0 x̄: 13.54 x̃: 14 HURT stats (rel) min: 0.00% max: 0.00% x̄: 0.00% x̃: 0.00% 95% mean confidence interval for preloads value: 13.45 13.63 95% mean confidence interval for preloads %-change: 0.00% 0.00% Preloads are HURT. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9438>
-rw-r--r--src/panfrost/bifrost/bi_opt_message_preload.c141
-rw-r--r--src/panfrost/bifrost/bifrost.h1
-rw-r--r--src/panfrost/bifrost/bifrost_compile.c11
-rw-r--r--src/panfrost/bifrost/compiler.h1
-rw-r--r--src/panfrost/bifrost/meson.build1
5 files changed, 155 insertions, 0 deletions
diff --git a/src/panfrost/bifrost/bi_opt_message_preload.c b/src/panfrost/bifrost/bi_opt_message_preload.c
new file mode 100644
index 00000000000..e19eeb4b0ea
--- /dev/null
+++ b/src/panfrost/bifrost/bi_opt_message_preload.c
@@ -0,0 +1,141 @@
+/*
+ * Copyright (C) 2021 Collabora, Ltd.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "compiler.h"
+#include "bi_builder.h"
+
+/* Bifrost v7 can preload up to two messages of the form:
+ *
+ * 1. +LD_VAR_IMM, register_format f32/f16, sample mode
+ * 2. +VAR_TEX, register format f32/f16, sample mode (TODO)
+ *
+ * Analyze the shader for these instructions and push accordingly.
+ */
+
+static bool
+bi_is_regfmt_float(enum bi_register_format regfmt)
+{
+ return (regfmt == BI_REGISTER_FORMAT_F32) ||
+ (regfmt == BI_REGISTER_FORMAT_F16);
+}
+
+/*
+ * Preloaded varyings are interpolated at the sample location. Check if an
+ * instruction can use this interpolation mode.
+ */
+static bool
+bi_can_interp_at_sample(bi_instr *I)
+{
+ /* .sample mode with r61 corresponds to per-sample interpolation */
+ if (I->sample == BI_SAMPLE_SAMPLE)
+ return bi_is_value_equiv(I->src[0], bi_register(61));
+
+ /* If the shader runs with pixel-frequency shading, .sample is
+ * equivalent to .center, so allow .center
+ *
+ * If the shader runs with sample-frequency shading, .sample and .center
+ * are not equivalent. However, the ESSL 3.20 specification
+ * stipulates in section 4.5 ("Interpolation Qualifiers"):
+ *
+ * for fragment shader input variables qualified with neither
+ * centroid nor sample, the value of the assigned variable may be
+ * interpolated anywhere within the pixel and a single value may be
+ * assigned to each sample within the pixel, to the extent permitted
+ * by the OpenGL ES Specification.
+ *
+ * We only produce .center for variables qualified with neither centroid
+ * nor sample, so if .center is specified this section applies. This
+ * suggests that, although per-pixel interpolation is allowed, it is not
+ * mandated ("may" rather than "must" or "should"). Therefore it appears
+ * safe to substitute sample.
+ */
+ return (I->sample == BI_SAMPLE_CENTER);
+}
+
+static bool
+bi_can_preload_ld_var(bi_instr *I)
+{
+ return (I->op == BI_OPCODE_LD_VAR_IMM) &&
+ bi_can_interp_at_sample(I) &&
+ bi_is_regfmt_float(I->register_format);
+}
+
+static bool
+bi_is_var_tex(enum bi_opcode op)
+{
+ return (op == BI_OPCODE_VAR_TEX_F32) || (op == BI_OPCODE_VAR_TEX_F16);
+}
+
+void
+bi_opt_message_preload(bi_context *ctx)
+{
+ unsigned nr_preload = 0;
+
+ /* We only preload from the first block */
+ bi_block *block = bi_start_block(&ctx->blocks);
+ bi_builder b = bi_init_builder(ctx, bi_before_nonempty_block(block));
+
+ bi_foreach_instr_in_block_safe(block, I) {
+ if (!bi_is_ssa(I->dest[0])) continue;
+
+ struct bifrost_message_preload msg;
+
+ if (bi_can_preload_ld_var(I)) {
+ msg = (struct bifrost_message_preload) {
+ .enabled = true,
+ .varying_index = I->varying_index,
+ .fp16 = (I->register_format == BI_REGISTER_FORMAT_F16),
+ .num_components = I->vecsize + 1
+ };
+ } else if (bi_is_var_tex(I->op)) {
+ msg = (struct bifrost_message_preload) {
+ .enabled = true,
+ .texture = true,
+ .varying_index = I->varying_index,
+ .sampler_index = I->sampler_index,
+ .fp16 = (I->op == BI_OPCODE_VAR_TEX_F16),
+ .skip = I->skip,
+ .zero_lod = I->lod_mode
+ };
+ } else {
+ continue;
+ }
+
+ /* Report the preloading */
+ ctx->info.bifrost->messages[nr_preload] = msg;
+
+ /* Replace with moves at the start. Ideally, they will be
+ * coalesced out or copy propagated.
+ */
+ for (unsigned i = 0; i < bi_count_write_registers(I, 0); ++i) {
+ bi_mov_i32_to(&b, bi_word(I->dest[0], i),
+ bi_register((nr_preload * 4) + i));
+ }
+
+ bi_remove_instruction(I);
+
+ /* Maximum number of preloaded messages */
+ if ((++nr_preload) == 2)
+ break;
+ }
+}
diff --git a/src/panfrost/bifrost/bifrost.h b/src/panfrost/bifrost/bifrost.h
index c04b8a61ad4..6dce0c53b38 100644
--- a/src/panfrost/bifrost/bifrost.h
+++ b/src/panfrost/bifrost/bifrost.h
@@ -46,6 +46,7 @@ extern "C" {
#define BIFROST_DBG_NOOPT 0x0100
#define BIFROST_DBG_NOIDVS 0x0200
#define BIFROST_DBG_NOSB 0x0400
+#define BIFROST_DBG_NOPRELOAD 0x0800
extern int bifrost_debug;
diff --git a/src/panfrost/bifrost/bifrost_compile.c b/src/panfrost/bifrost/bifrost_compile.c
index ce51a5a40c6..0450e73cb34 100644
--- a/src/panfrost/bifrost/bifrost_compile.c
+++ b/src/panfrost/bifrost/bifrost_compile.c
@@ -48,6 +48,7 @@ static const struct debug_named_value bifrost_debug_options[] = {
{"noopt", BIFROST_DBG_NOOPT, "Skip optimization passes"},
{"noidvs", BIFROST_DBG_NOIDVS, "Disable IDVS"},
{"nosb", BIFROST_DBG_NOSB, "Disable scoreboarding"},
+ {"nopreload", BIFROST_DBG_NOPRELOAD, "Disable message preloading"},
DEBUG_NAMED_VALUE_END
};
@@ -4012,6 +4013,16 @@ bi_compile_variant_nir(nir_shader *nir,
bi_opt_copy_prop(ctx);
bi_opt_mod_prop_forward(ctx);
bi_opt_mod_prop_backward(ctx);
+
+ /* Push LD_VAR_IMM/VAR_TEX instructions. Must run after
+ * mod_prop_backward to fuse VAR_TEX */
+ if (ctx->arch == 7 && ctx->stage == MESA_SHADER_FRAGMENT &&
+ !(bifrost_debug & BIFROST_DBG_NOPRELOAD)) {
+ bi_opt_dead_code_eliminate(ctx);
+ bi_opt_message_preload(ctx);
+ bi_opt_copy_prop(ctx);
+ }
+
bi_opt_dead_code_eliminate(ctx);
bi_opt_cse(ctx);
bi_opt_dead_code_eliminate(ctx);
diff --git a/src/panfrost/bifrost/compiler.h b/src/panfrost/bifrost/compiler.h
index 664e25a3c7e..f71478e0cfd 100644
--- a/src/panfrost/bifrost/compiler.h
+++ b/src/panfrost/bifrost/compiler.h
@@ -986,6 +986,7 @@ void bi_opt_mod_prop_backward(bi_context *ctx);
void bi_opt_dead_code_eliminate(bi_context *ctx);
void bi_opt_fuse_dual_texture(bi_context *ctx);
void bi_opt_dce_post_ra(bi_context *ctx);
+void bi_opt_message_preload(bi_context *ctx);
void bi_opt_push_ubo(bi_context *ctx);
void bi_opt_reorder_push(bi_context *ctx);
void bi_lower_swizzle(bi_context *ctx);
diff --git a/src/panfrost/bifrost/meson.build b/src/panfrost/bifrost/meson.build
index 1dcd9b572da..eda61e8421a 100644
--- a/src/panfrost/bifrost/meson.build
+++ b/src/panfrost/bifrost/meson.build
@@ -33,6 +33,7 @@ libpanfrost_bifrost_files = files(
'bi_opt_dce.c',
'bi_opt_cse.c',
'bi_opt_push_ubo.c',
+ 'bi_opt_message_preload.c',
'bi_opt_mod_props.c',
'bi_opt_dual_tex.c',
'bi_pack.c',