summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-04-01intel/brw/xehp+: Drop redundant arguments of lsc_msg_desc*().Francisco Jerez4-94/+44
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
2024-04-01intel/eu/xehp+: Don't initialize mlen and rlen descriptor fields from ↵Francisco Jerez1-11/+0
lsc_msg_desc*(). These fields are overlapping with the ones set by brw_message_desc(), so the latter should be used instead. This fixes corruption of the LSC message descriptors when inconsistent values are specified through both helpers, which can happen if the 'inst->mlen' field is modified during optimization (e.g. by opt_split_sends()). Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
2024-04-01intel/brw/xehp+: Replace lsc_msg_desc_dest_len()/lsc_msg_desc_src0_len() ↵Francisco Jerez5-41/+77
with helpers to do the computation. We cannot rely on the immediate message descriptor having accurate values for mlen and rlen at the IR level, since they are updated at codegen time via 'inst->mlen' and 'inst->size_written', which could end up with values inconsistent with the message descriptor if e.g. the split sends optimization had an effect. Instead, define helpers that do the computation without relying on the message descriptor, and use the pre-existing brw_message_desc_mlen()/brw_message_desc_rlen() helpers (fully equivalent to the lsc helpers deleted here) during disassembly. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
2024-03-01intel/brw: Allow CSE on TXF_CMS_W_GFX12_LOGICALKenneth Graunke1-0/+1
This was missed when adding the new XeHP variant of the opcode. Fixes: 261dd6c8 ("intel/compiler: Add new variant for TXF_CMS_W") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27908>
2023-11-01iris: handle tile case where cso width, height is zeroTapani Pälli1-0/+3
Patch adds a fallback to calculate_tile_dimensions if such case is hit, this happened when running CTS tests on simulation. Fixes: d13c81a2c3bf ("iris/xehp: Implement TBIMR tile pass setup and pipeline bandwidth estimation.") Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25989>
2023-10-30anv: Add more space for init_render_queue_state() batch (MTL regression)Jordan Justen1-1/+1
It may be some MTL specific code paths, but 7cdacaf4935 is triggering anvil to run out of space when initializing the render batch. Fixes: 7cdacaf4935 ("intel/xehp: Adjust TBIMR performance chicken bits.") Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25949>
2023-10-27intel/xehp: Enable TBIMR by default.Francisco Jerez2-2/+2
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
2023-10-27intel/xehp+: Use TBIMR tile box check in order to avoid performance regressions.Francisco Jerez3-0/+4
This allows the hardware to behave as if TBIMR was disabled until a polygon is processed which spans at least one tile. This is a rather heavy-handed heuristic meant to prevent regressions in heavily geometry-bound workloads that render large numbers of tiny primitives much smaller than a TBIMR tile. A particularly bad example of this was observed in SoTR, where certain draw calls with a long-running VS and a mostly trivial PS render more triangles than pixels, filling up the URB and TBIMR batch pretty quickly, which causes EU utilization to tank (since once the URB has filled up the parallelism of the VS is limited by the number of polygons that fit in a TBIMR batch at the completion of each tile walk, which isn't a lot in relation to the total EU count of a DG2), and causes the bottleneck to be the rate at which the tile sequencer performs additional tile passes, each one processing a small number (<1024 polygons) of the hundreds of thousands of triangles of the draw call. Enabling this heuristic seems effective at avoiding that scenario in SoTR among other titles (e.g. Total War Warhammer 3), but it's a bit of a compromise since one could imagine cases where TBIMR is helpful even if the geometry doesn't pass the box check, so a better heuristic or a driconf rule may be useful in the future. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
2023-10-27intel/xehp+: Adjust TBIMR batch size based on slice count.Francisco Jerez4-0/+21
This programs a TBIMR batch size equal to 128 polygons per slice in order to match the hardware spec recommendation (BSpec 68436). This has been confirmed to improve performance slightly relative to the hardware default batch size of 256 polygons. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
2023-10-27intel/xehp: Adjust TBIMR performance chicken bits.Francisco Jerez3-0/+24
This enables a couple of TBIMR performance tunables in CHICKEN_RASTER_2 that default to disabled. TBIMR fast clip appears to help slightly with some geometry-bound workloads. TBIMR open batch allows the rasterizer to start working immediately on the first tile of the framebuffer, even before the batch has been closed, which helps reduce the latency cost of the tile walk. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
2023-10-27anv/xehp+: Enable TBIMR in generated draw calls.Francisco Jerez4-1/+10
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
2023-10-27anv/xehp: Implement TBIMR tile pass setup and pipeline bandwidth estimation.Francisco Jerez3-0/+183
This sets up the basic parameters needed for tiled rendering based on a back-of-the-envelope estimate of the amount of memory used by the pixel pipeline during the tile pass. The actual cache footprint of a tile can vary wildly based on runtime factors which aren't easily predictable based on static analysis, so this is only intended to provide a rough approximation within the right order of magnitude. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
2023-10-27iris/xehp: Implement TBIMR tile pass setup and pipeline bandwidth estimation.Francisco Jerez1-0/+93
This sets up the basic parameters needed for tiled rendering based on a back-of-the-envelope estimate of the amount of memory used by the pixel pipeline during the tile pass. The actual cache footprint of a tile can vary wildly based on runtime factors which aren't easily predictable based on static analysis, so this is only intended to provide a rough approximation within the right order of magnitude. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
2023-10-27intel/xehp+: Define driconf option for selectively disabling TBIMR.Francisco Jerez8-0/+12
This may help debugging performance problems in the possible case that TBIMR negatively impacts the performance of some application. It could also allow applying application-specific band-aid fixes in the XML file until a more general workaround is implemented. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
2023-10-27intel/xehp+: Add dynamic state flags controlling whether TBIMR is enabled ↵Francisco Jerez5-1/+33
during 3D primitives. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
2023-10-27intel/xehp+: Import algorithm for TBIMR tiling parameter calculation.Francisco Jerez2-0/+276
This implements a minimalistic algorithm that can be used to obtain an approximate solution for the integer programming problem of finding the optimal tile dimensions based on an estimate of the tile cache consumption per pixel of the current graphics pipeline -- Including the TC footprint of render targets, depth and stencil buffers and their auxiliary surfaces. Considering other (less local) memory accesses performed by the pipeline (like texturing and shader storage) would be useful (and could be considered by this algorithm with little modification), but it would be pretty difficult to estimate the L3 cache consumption per pixel of such accesses based on static analysis of the pipeline state alone without some sort of dynamic feedback. The present algorithm returns a config with tile area large enough to utilize a target fraction of the L3, which can be adjusted to obtain greater/lower utilization of the L3 at the cost of higher/lower risk of L3 cache thrashing respectively. The aspect ratio of the tile layout returned attempts to minimize the number of poorly utilized tiles around the boundaries of the framebuffer (due to partial coverage), since having the tile sequencer process additional tiles comes at a cost due to the latency of the additional passes, even if they're mostly empty. Finally, among the solutions with satisfactory cache footprint and tile count, the tile aspect ratio closest to 1 is returned where possible, since tiles with very high aspect ratios can have a negative impact on cache locality. The algorithm is primarily intended for TBIMR, but it could be used for PTBR as well with little modifications, since the TBIMR-specific assumptions are few and noted in comments below. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
2023-10-27intel/xehp+: Add TBIMR-related genxml definitions.Francisco Jerez1-0/+41
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
2023-09-05intel/dev/xe: Move placeholder subslice info into XEHP_FEATURESJordan Justen1-2/+1
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24418>
2023-07-14iris: migrate WA 14013910100 to use the WA frameworkRohan Garg1-2/+3
Fixes: eeb3f4594d5 ("intel/xehp: Implement XeHP workaround Wa_14013910100.") Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24156>
2023-05-15iris: Init CCS_E to COMPRESSED_NO_CLEAR for XeHPNanley Chery1-7/+20
Use COMPRESSED_NO_CLEAR for the initial CCS aux state instead of COMPRESSED_CLEAR. This removes a dependency on the initial clear color, meaning that some resolves related to clear color management are now avoided. In the Car Chase benchmark, this avoids all 50 CCS resolves. These only happen during the warm-up phase of the benchmark, so I'm not sure there is an impact on FPS. This was tested on a DG2 in small-BAR mode. Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22857>
2023-03-23intel/devinfo: dedicated entries for XeHPLionel Landwerlin1-2/+31
Also fixing the max URB entries for VS stage. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reported-by: Chuansheng Liu <chuansheng.liu@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21949>
2022-11-02intel/dev: Set has_lsc in XEHP_FEATURES rather than DG2_FEATURESJordan Justen1-1/+1
MTL will want this set as well. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19447>
2022-10-28iris: Enable INTEL_MEASURE for compute dispatches on XeHPNanley Chery1-0/+2
Cc: mesa-stable Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19324>
2022-08-23intel/fs: fixup scratch load/store handling on Gfx12.5+Lionel Landwerlin1-32/+34
We did not handle the operation with data size < 4. It works fine on all other messages (global/shared). The initial commit was just too restrictive. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: 1e242785c315 ("intel/fs: Implement load/store_scratch on XeHP") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16964>
2022-07-28intel/eu: Mark header present in URB memory fences on XeHPKenneth Graunke1-1/+1
Fixes the following EU validation error: ERROR: Header must be present for all URB messages. The message header is ignored for URB fence messages, so I doubt that this actually matters in practice. But we should probably mark it as present, because you have to send something, and according to the documentation, there is a message header, it's just ignored. Fixes: e6a9501aa27 ("intel/fs: Add the URB fence message") Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17624>
2022-07-28intel/eu: Clarify spec citations for XeHP region restrictionsKenneth Graunke1-3/+8
When this rule started causing issues, I looked it up in the documentation, and found the rule for 64-bit destinations and integer DWord multiplication, but there was no mention of floating point destinations, as the text in brackets suggested. The actual restriction text had been updated, so this led to some confusion where I thought the conditions had been changed in newer docs. However, what's actually going on is that there are two separate conditions, each listed in separate rows of the table. One lists 64-bit destinations or integer DWord multiplication, and the other mentions floating-point destinations. In both cases, the actual restrictions are identical, so we handle them together in the code. Try to update the comment to avoid future confusion. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17624>
2022-07-28intel/eu: Fix XeHP register region validation for hstride == 0Kenneth Graunke1-7/+22
Recently, we started using <1;1,0> register regions for consecutive channels, rather than the <8;8,1> we've traditionally used, as the <1;1,0> encoding can be compacted on XeHP. Since then, one of the EU validator rules has been flagging tons of instructions as errors: mov(16) g114<1>F g112<1,1,0>UD { align1 1H I@2 compacted }; ERROR: Register Regioning patterns where register data bit locations are changed between source and destination are not supported except for broadcast of a scalar. Our code for this restriction checked three things: #1: vstride != width * hstride || #2: src_stride != dst_stride || #3: subreg != dst_subreg Destination regions are always linear (no replicated values, nor any overlapping components), as they only have hstride. Rule #1 is requiring that the source region be linear as well. Rules #2-3 are straightforward: the subregister must match (for the first channel to line up), and the source/destination strides must match (for any subsequent channels to line up). Unfortunately, rules #1-2 weren't working when horizontal stride was 0. In that case, regions are linear if width == 1, and the stride between consecutive channels is given by vertical stride instead. So we adjust our src_stride calculation from src_stride = hstride * type_size; to: src_stride = (hstride ? hstride : vstride) * type_size; and adjust rule #1 to allow hstride == 0 as long as width == 1. While here, we also update the text of the rule to match the latest documentation, which apparently clarifies that it's the location of the LSB of the channel which matters. Fixes: 3f50dde8b35 ("intel/eu: Teach EU validator about FP/DP pipeline regioning restrictions.") Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17624>
2022-06-24iris: Update comment about 2GB dynamic state rangeKenneth Graunke1-4/+9
We tracked this down with the HW teams back in 2020 and there's now a documented workaround. Comments from the HW team say this applies all the way through XeHP but we're not sure beyond that. This is a bug that we hit but the Windows drivers didn't because Jason decided to allocate our memory structures from the top end of the VMA range explicitly to catch bugs like this, while Windows allocates from zero and up, so they would need to allocate more than 2GB of dynamic state before running into it. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4880 Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17216>
2022-06-15anv: Move STATE_BASE_ADDRESS programming into init_common_queue_state()Jordan Justen1-32/+32
This is now needed following Ken's 8831cb38aa9. Ref: 8831cb38aa9 ("anv: Stop updating STATE_BASE_ADDRESS on XeHP") Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14395>
2022-06-12intel/fs/xehp+: Emit scheduling fence for all NIR barriers on platforms with ↵Francisco Jerez1-2/+4
LSC. Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15743>
2022-05-27intel: Only set VectorMaskEnable when neededJason Ekstrand6-9/+29
For cases with lots of very small primitives, this may improve performance because we're not executing those dead channels all the time. Shader-db reports no instruction or cycle-count changes. However, by hacking up the driver to report when this optimization triggers, it appears to affect about 10% of shader-db. v2 (Kenneth Graunke): Always enable VMask prior to XeHP for now, because using VMask on those platforms allows us to perform the eliminate_find_live_channel() optimization. However, XeHP doesn't seem to have packed fragment shader dispatch, so we lose that optimization regardless, and there's no reason not to avoid vmask. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1054>
2022-05-26intel/disasm: add missing handling of <1;1,0>Lionel Landwerlin1-0/+1
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: 7cd9adeb415e ("intel/compiler: In XeHP prefer <1;1,0> regions before compacting") Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16704>
2022-05-25intel/compiler: Move spill/fill tracking to the register allocatorKenneth Graunke5-34/+47
Originally, we had virtual opcodes for scratch access, and let the generator count spills/fills separately from other sends. Later, we started using the generic SHADER_OPCODE_SEND for spills/fills on some generations of hardware, and simply detected stateless messages there. But then we started using stateless messages for other things: - anv uses stateless messages for the buffer device address feature. - nir_opt_large_constants generates stateless messages. - XeHP curbe setup can generate stateless messages. So counting stateless messages is not accurate. Instead, we move the spill/fill accounting to the register allocator, as it generates such things, as well as the load/store_scratch intrinsic handling, as those are basically spill/fills, just at a higher level. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16691>
2022-05-17intel/perf: deal with OA reports timestamp values on DG2Lionel Landwerlin3-4/+24
OA reports on XeHP have their timestamp shifted to the left by 1. To get that back in the same time domain as the REG_READ you need to shift it back to the right and you're loosing the top bit. v2: use ull for 64bit constant (Ian) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16144>
2022-05-17intel/decoder: Fix binding table pointer decoding with large offsetsKenneth Graunke1-3/+16
XeHP supports a 20:5 pointer format, so the offset can legitimately be more than UINT16_MAX. Likewise, with 256B binding table mode on Icelake/Tigerlake, we might have 18:8 pointers that exceed UINT16_MAX. Thanks to Felix DeGrood for catching this! Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16538>
2022-05-12anv: Fix INTEL_DEBUG=bat on XeHPKenneth Graunke1-0/+5
We no longer emit STATE_BASE_ADDRESS in every batch on XeHP, so the decoder might not know what the various base addresses are if it's only looking at a single batch. Fortunately, they also never change, so we can just emit them once here. On earlier platforms, initializing them here should be harmless. We'll emit STATE_BASE_ADDRESS if we change them, which will update these. Thanks to Iván Briano for catching this. Fixes: 8831cb38aa9 ("anv: Stop updating STATE_BASE_ADDRESS on XeHP") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16287>
2022-05-02intel/compiler: In XeHP prefer <1;1,0> regions before compactingCaio Oliveira1-0/+24
Ken performed some tests with shader-db to evaluate the effects ``` Across all 145,848 shaders generated, the results were: Total bytes compacted before: 3,326,224 Total bytes compacted after: 60,963,280 ``` Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15399>
2022-04-30intel/dev: Compute pixel pipe information based on geometry topology DRM query.Francisco Jerez1-28/+41
This changes the intel_device_info calculation to call an additional DRM query requesting the geometry topology from the kernel, which may differ from the result of the current topology query on XeHP+ platforms with compute-only and 3D-only DSSes. This seems more reliable than the current guesswork done in intel_device_info.c trying to figure out which DSSes are available for the render CS. Cc: 22.1 <mesa-stable> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14143>
2022-04-28isl,iris: Add DG2 CCS modifier support for XeHPNanley Chery2-1/+68
Cc: 22.1 <mesa-stable> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14521>
2022-04-28isl,iris: Add I915_FORMAT_MOD_4_TILED support for XeHPAnuj Phogat2-0/+15
This patch adds Tile 4 modifier support to Mesa and allows Mesa to use Tile 4 on gen12-hp with GBM. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Cc: 22.1 <mesa-stable> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14521>
2022-04-25intel: fixup number of threads per EU on XeHPLionel Landwerlin1-0/+1
Computations for indexing in-memory data structures for ray queries depend on this. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: 4f9141607f40 ("intel: Add device info for DG2") Acked-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15925>
2022-04-07intel/compiler: Fix sample_d messages on DG2Ian Romanick1-3/+16
DG2 can only do sample_d and sample_d_c on 1D and 2D surfaces. The maximum number of gradient components and coordinate components should be 2. In spite of this limitation, the Bspec lists a mysterious R component before the min_lod, so the maximum coordinate components is 3. Fixes the following Vulkan CTS failures on DG2: dEQP-VK.glsl.texture_functions.texturegradclamp.isampler1d_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.isampler2d_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1d_fixed_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1d_float_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2d_fixed_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2d_float_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.usampler1d_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.usampler2d_fragment The Fixes: tag below is a bit misleading. This commit fixes some test cases similar to ones fixed by the Fixes: commit. I just want to make sure this commit gets applied everywhere that commit was also applied. Fixes: 635ed58e527 ("intel/compiler: Lower txd for 3D samplers on XeHP.") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15781>
2022-03-31nir: intel/compiler: Lower TXD on array surfaces on DG2+Ian Romanick3-2/+9
DG2 can only do sample_d and sample_d_c on 1D and 2D surfaces. Cube maps and 3D surfaces were already handled, but 1D array and 2D array surfaces were not. Fixes the following Vulkan CTS failures on DG2: dEQP-VK.glsl.texture_functions.texturegradclamp.isampler1darray_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.isampler2darray_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1darray_fixed_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1darray_float_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2darray_fixed_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2darray_float_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.usampler1darray_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.usampler2darray_fragment The Fixes: tag below is a bit misleading. This commit adds another lowering, similar to the one in the Fixes: commit, that probably should have been added at the same time. I just want to make sure this commit gets applied everywhere that commit was also applied. Fixes: 635ed58e527 ("intel/compiler: Lower txd for 3D samplers on XeHP.") Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15681>
2022-03-29anv: Stop updating STATE_BASE_ADDRESS on XeHPKenneth Graunke4-18/+90
Now that we're using 3DSTATE_BINDING_TABLE_POOL_ALLOC to set the base address for the binding table pool separately from surface states, we don't actually need to update surface state base address anymore. Instead, we can just set STATE_BASE_ADDRESS once at context creation, and never bother updating it again, saving some heavyweight flushes and freeing us from the need for address offsetting trickery. This patch was originally written by Jason Ekstrand, with fixes from Lionel Landwerlin, but was targeting Icelake. Doing it there requires additional changes (15:5 -> 18:8 binding table pointer formats) which also involve some trade-offs, whereas the XeHP change is purely a win, so we'll do it here first. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15616>
2022-03-29intel/decoder: Fix decoder handling of binding table pool alloc on XeHPKenneth Graunke1-1/+1
3DSTATE_BINDING_TABLE_POOL_ALLOC no longer has a "Binding Table Pool Enable" bit. It is always enabled. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15625>
2022-03-26intel/compiler: Use nir_opt_uniform_atomics()Kenneth Graunke1-0/+15
In general, an atomic intrinsic may perform separate atomics for every enabled SIMD channel, as each channel may operate on different memory. However, an extremely common case is for all channels to access the same memory location. In this case, we can simply perform a reduction/scan across the subgroup, and perform one atomic for the whole subgroup, rather than one per channel. For example, if an intrinsic says to take the minimum value of the existing memory and the value in each channel, we can do a thread-local minimum of all enabled channels, then do a single atomic to take the minimum of that and the existing memory. Our hardware doesn't optimize the case where multiple channels ask for atomics on the same memory location; it assumes the compiler will do so. nir_opt_uniform_atomics() uses divergence analysis to detect this case, adds the necessary subgroup operations, and moves the atomic inside a conditional that disables all but a single invocation. It even detects cases where the shader code already performs this kind of optimization, and avoids doing it a second time. This may not be the optimal solution for us. In the backend, we could detect this case and emit send(1) instructions with NoMask, rather than generating if...send(16)...endif, and a lot of unnecessary ALU ops. But it's simple to do, reuses the same path as ACO, and still provides most of the benefit by cutting up to 16x atomics down to a single atomic, which is more merciful to the memory bus. Improves performance of Shadow of the Tomb Raider by 5.5% on XeHP. Improves performance of a customer-internal benchmark on XeHP at 3840x2160 and low settings by approximately 30%. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>
2022-03-09intel: Limit Wa_1607854226 to Gfx12.0 onlyKenneth Graunke2-6/+6
This workaround is needed on all Gfx12.0 parts, but doesn't appear to be necessary on XeHP. The other drivers do not appear to be applying this workaround on those parts. As further evidence, we accidentally added the 3DSTATE_BINDING_TABLE_POOL_ALLOC commands after switching back to GPGPU mode, which would be an incorrect way to implement the workaround, and things seem to be working. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14507>
2022-03-09iris: Use more efficient binding table pointer formats on Icelake+.Kenneth Graunke6-24/+74
Skylake and older use a 15:5 binding table pointer format, which means our binder can be at most 64kB in size. Each binding table within the binder must be aligned to 32B. XeHP uses a new 20:5 binding table format, which allows us to increase the binder size to 1MB while retaining the nice 32B alignment. Larger binders mean fewer stalls as we update the base address for the binder. Icelake and Tigerlake can either use the 15:5 format or an 18:8 format. 18:8 mode requires the base of each binding table to be aligned to 256B instead of 32B, but it gives us a maximum binder size of 512kB. We can store 64 binding table entries in a 256B chunk (256B / 4B = 64), but only 8 entries in a 32B chunk (32B / 4B = 8). Assuming that most binding tables have fewer than 64 entries, this means that with the 18:8 format, we're likely to be able to fit 2048 (512KB / 256B) tables into a a buffer before needing to allocate a new one and stall. Technically, the old format could also store 2048 binding tables per buffer as well (64KB / 32B = 2048). However, tables that needed more than 8 entries would need multiple 32B chunks. A single table would take multiple aligned chunks, while with the larger 256B format, it could fit in a single one. This cuts binder resets by 6.3% on a Shadow of Mordor benchmark trace. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14507>
2022-03-01Revert "anv: Require the local heap for CCS on XeHP"Nanley Chery1-18/+3
This reverts commit 382f6ccda8869f72134dbfa9c3cc68a229e01138. The spec requires that all color images created with the same tiling (and a few other properties) support the same memoryTypeBits. So this wasn't a valid change. It also wasn't necessary - we already have a mechanism in anv_BindImageMemory2 for disabling compression if the BO doesn't support it. With this, XeHP passes the tests in dEQP-VK.memory.requirements.*tiling_optimal Fixes: 382f6ccd ("anv: Require the local heap for CCS on XeHP") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15068>
2022-02-23anv: Align state pools to 2MiB on XeHPJordan Justen1-1/+11
Suggested-by: Jason Ekstrand <jason.ekstrand@collabora.com> Fixes: c17e2216dd5 ("anv: Align buffer VMA to 2MiB for XeHP") Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15054>