summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorKenneth Graunke <kenneth@whitecape.org>2016-06-06 21:37:34 -0700
committerEmil Velikov <emil.l.velikov@gmail.com>2016-06-15 09:29:11 +0100
commit2b6817c91c7eb5979cf21b3cdc38e3631fde63de (patch)
tree5f65dc9e2082a0bad0a6900c64be89810e227ab3
parent9a118c79e785f9e090afa7292446417aee5656ec (diff)
i965: Use the correct number of threads for compute shaders.
We were programming the number of threads per subslice, when we should have been programming the total number of threads on the GPU as a whole. Thanks to Curro and Jordan for helping track this down! On Skylake GT3e: - Improves performance in Unreal's Elemental Demo by roughly 1.5-1.7x. - Improves performance in Synmark's Gl43CSDof by roughly 3.7x. - Improves performance in Synmark's Gl43GSCloth by roughly 1.18x. On Broadwell GT2: - Improves performance in Unreal's Elemental Demo by roughly 1.2-1.5x. - Improves performance in Synmark's Gl43CSDof by roughly 2.0x. - Improves performance in Synmark's Gl43GSCloth by 1.47035% +/- 0.255654% (n=25). On Haswell GT3e: - Improves performance in Unreal's Elemental Demo (in GL 4.3 mode) by roughly 1.10x. - Improves performance in Synmark's Gl43CSDof by roughly 1.18x. - Decreases performance in Synmark's Gl43CSCloth by -1.99484% +/- 0.432771% (n=64). On Ivybridge GT2: - Improves performance in Unreal's Elemental Demo (in GL 4.2 mode) by roughly 1.03x. - Improves performance in Synmark's G/43CSDof by roughly 1.25x. - No change in Synmark's Gl43CSCloth (n=28). Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> (cherry picked from commit 0fb85ac08d61d365e67c8f79d6955e9f89543560)
-rw-r--r--src/mesa/drivers/dri/i965/gen7_cs_state.c4
1 files changed, 3 insertions, 1 deletions
diff --git a/src/mesa/drivers/dri/i965/gen7_cs_state.c b/src/mesa/drivers/dri/i965/gen7_cs_state.c
index 9d83837812a..ba558a64ac9 100644
--- a/src/mesa/drivers/dri/i965/gen7_cs_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_cs_state.c
@@ -95,7 +95,9 @@ brw_upload_cs_state(struct brw_context *brw)
const uint32_t vfe_num_urb_entries = brw->gen >= 8 ? 2 : 0;
const uint32_t vfe_gpgpu_mode =
brw->gen == 7 ? SET_FIELD(1, GEN7_MEDIA_VFE_STATE_GPGPU_MODE) : 0;
- OUT_BATCH(SET_FIELD(brw->max_cs_threads - 1, MEDIA_VFE_STATE_MAX_THREADS) |
+ const uint32_t subslices = MAX2(brw->intelScreen->subslice_total, 1);
+ OUT_BATCH(SET_FIELD(brw->max_cs_threads * subslices - 1,
+ MEDIA_VFE_STATE_MAX_THREADS) |
SET_FIELD(vfe_num_urb_entries, MEDIA_VFE_STATE_URB_ENTRIES) |
SET_FIELD(1, MEDIA_VFE_STATE_RESET_GTW_TIMER) |
SET_FIELD(1, MEDIA_VFE_STATE_BYPASS_GTW) |