summaryrefslogtreecommitdiff
path: root/src/intel
AgeCommit message (Collapse)AuthorFilesLines
3 hoursintel/mi_builder: Support gen11 command-streamer based register offsetsJordan Justen1-8/+61
Reworks: * Automatically apply to any register in the range 0x2000 - 0x4000 Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5466>
6 hoursintel/dev: Add device info for ADL-SJordan Justen1-0/+9
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7322>
9 hoursanv: Drop warning about gen12 not being supportedJordan Justen1-4/+2
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7333>
2 daysanv: report latest extension spec versionsLionel Landwerlin1-13/+13
In many cases those revision happened every before the first public release of the spec and we just forgot to update our numbers. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7136>
5 daysanv: Enable stencil buffer compression on Gen12+Sagar Ghuge1-0/+10
v2: (Nanley Chery) - Fix condition check. - Move aux_usage assignment after add_aux_state_tracking_buffer method. v3: (Nanley Chery) - Move stencil condition close to depth block. v4: (Nanley Chery) - Add DEBUG_NO_RBC condition. v5: (Nanley Chery) - Don't add CCS plane explicitly. - Use isl_surf_supports_ccs. v6: - Simplify condition (Nanley Chery) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2942>
5 daysanv: Pass correct stencil aux usage during MSAA resolveSagar Ghuge1-1/+4
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2942>
5 daysanv: Return optimal aux state for stencil buffer compressionSagar Ghuge1-3/+5
v2: - Assert on aux_supported. (Nanley Chery) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2942>
5 daysanv: Don't track clear bo for stencil buffer compressionSagar Ghuge1-1/+1
On Gen12+, stencil buffer compression does not support fast clear so we don't have to track clear address for it. v2: - Use isl_aux_usage_has_fast_clears (Nanley Chery) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2942>
5 daysanv: Get aux usage from plane while clearing stencil bufferSagar Ghuge1-1/+3
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2942>
5 daysanv: Set stencil_aux_usage flagSagar Ghuge1-0/+1
v2: Use image aux usage (Nanley Chery) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2942>
5 daysanv: Handle compressed stencil buffer transition on Gen12+Sagar Ghuge1-5/+57
Handle compressed stencil buffer transition from one layout to another on gen12+. When stencil compression is enabled, we have to initialize buffer via stencil clear (HZ_OP) before any renderpass. v2: - Pass predicate bit false to anv_image_ccs_op (Nanley Chery) v3: - update aspect assertion (Nanley Chery) v4: - Make state decision based on anv_layout_to_aux_state instated of anv_layout_to_aux_usage (Sagar Ghuge) v5: - No need to handle stencil CCS resolve case (Jason Ekstrand) - Initialize buffer using HZ_OP (Nanley Chery) v6: (Nanley Chery) - Pass correct layer/level count. - Remove local variable. v7: - Skip stencil initialization with HZ_OP packet if followed by fast clear. (Nanley Chery) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2942>
5 daysanv: Return number of layers/levels attached to anv_imageSagar Ghuge1-15/+3
Don't check the auxiliary surface's ISL surf in order to return the surface levels/layers instead we can return the anv_image parameter. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2942>
5 daysnir: Rename replicated-result dot-product instructionsIan Romanick1-3/+3
All these instructions replicate the result of a N-component dot-product to a vec4. Naming them fdot_replicatedN gives the impression that are some sort of abstract dot-product that replicates the result to a vecN. They also deviate from fdph_replicated... which nobody would reasonably consider naming fdot_replicatedh. Naming these opcodes fdotN_replicated more closely matches what they are, and it matches the pattern of fdph_replicated. I believe that the only reason these opcodes were named this way was because it simplified the implementation of the binop_reduce function in nir_opcodes.py. I made some fairly simple changes to that function, and I think the end result is ok. The bulk of the changes come from the sed rename: sed --in-place -e 's/fdot_replicated\([234]\)/fdot\1_replicated/g' \ $(grep -r 'fdot_replicated[234]' src/) v2: Use a named parameter to binop_reduce instead of using isinstance(name, str). Suggested by Jason. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5725>
5 daysanv: fix source/destination layers for 3D blitsLionel Landwerlin1-5/+10
When blitting from source depth range [0-3] into destination depth range [0-2], we'll have to use a source layer that is in between 2 layers of the 3D source image. Other than having an incorrect formula, we're also using integer which prevent us from using the right source layer. v2: Drop + 0.5 on application offsets v3: Reuse num_layers (Jason) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3458 Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6909>
5 daysblorp: allow blits with floating point source layersLionel Landwerlin4-9/+14
The current blorp API only allows source layers for 3D images to be integers. That is causing problems with the Vulkan API where we need to be able to use a 3D layer that could be in between 2 layers. This change allows a floating point value to be passed for blits and internally sets up the input parameters to pass floating point values to kernels. v2: Use tex op to determinate what types are the coordinates (Jason) Drop setting params->z (Lionel) v3: Fix nir_texop_txf_ms_mcs op not considered as having integer coords (Lionel) v4: Fix incorrect test on nir_texop_txf_ms_mcs (Ivan) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3458 Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6909>
5 daysblorp: identify copy kernels in NIRLionel Landwerlin4-5/+26
This was useful in identifying blit vs copy kernels. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6909>
7 daysanv,iris: Use the data cache for UBO pulls on Gen12+Jason Ekstrand1-1/+1
Now that we have the HDC, using the data cache for UBO pulls seems to help things quite a bit: GTA V DXVK 104.0% Talos Principle GL 102.8% Rise of Tomb Raider VK 102.8% Dark Souls 3 DXVK 101.4% Witcher3 DXVK 101.3% Bioshock Infinite GL 100.5% Doom 2016 VK 97.7% Doom is a bit of a loss but it helps enough other stuff, it's probably worth the hit. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7230>
8 daysgenxml: drop gen10Lionel Landwerlin6-6994/+1
Finishing off the job started in !6899 v2: Remove remaining gen10_pack.h include (Sagar) v3: Forgot isl gen10 removal (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7185>
8 daysanv: Advertise VK_KHR_shader_terminate_invocationCaio Marcelo de Oliveira Filho2-0/+8
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7221>
8 daysisl: Enable Tigerlake HDC:L1 caches via MOCS in various cases.Kenneth Graunke2-0/+22
Thanks to Felix Degrood for discovering that we missed enabling this additional caching on Tigerlake! Felix also benchmarked the changes. We now use MOCS 48 (HDC:L1 + L3 + LLC) for render targets, textures, and pull constant buffers. We leave storage buffers & images, as well as stateless messages, using the previous MOCS 2 value. We can't use HDC:L1 with atomics, and we don't know a priori whether storage buffers will be used with atomics or not. Similarly, the Vulkan buffer device address feature allows atomics to be performed on buffers via stateless messages, and we only can control MOCS at the base address level, so we can't do much there. This is closer to what the Windows Vulkan and OpenGL drivers do, though it isn't quite the same - they also disable LLC in some cases, but we observed this to have noticable performance regressions when we tried (though a couple titles benefited). We may try experiment with that in the future. Improves performance in a number of titles: - Unreal Engine 4 Shooter Demo [VK]: 11.8% - Witcher 3 [DXVK]: 3.9% - Rise of the Tomb Raider [VK]: 1.5% - Shadow of the Tomb Raider [VK]: 1.0% - Grand Theft Auto V [DXVK]: 0.8% We did not observe any performance regressions. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7104>
8 daysisl, anv, iris: Add a centralized helper to select MOCS based on usageKenneth Graunke13-49/+104
On Gen12+, we can enable additional caches in certain usage situations. This routes that decision making to a central place in ISL, based on surface usage flags, and updates both drivers to use it. (i965 doesn't need to change because it doesn't support Gen12.) We continue handling the "external" decision via an anv_mocs() wrapper for now, since we store that flag in anv_bo, which isl doesn't know about. (We could introduce an ISL_SURF_USAGE_EXTERNAL, but I'm not actually sure that would be cleaner.) This patch should not have any functional nor performance effects, as we continue selecting the exact same MOCS values for now. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7104>
8 daysanv: Set only one ISL usage bit (RT/texture) for CopyBuffer sourcesKenneth Graunke1-6/+7
Most uses of this function deal with destination buffers, but for copy_buffer_to_image, the buffer is the source, and isn't rendered to. We should avoid setting ISL_SURF_USAGE_RENDER_TARGET_BIT. Also, we should avoid setting ISL_SURF_USAGE_TEXTURE_BIT for the destination, which isn't sampled from. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7104>
8 daysisl: Fix the aux-map encoding for D24_UNORM_X8Nanley Chery1-1/+1
Bspec: 53911 now defines the encoding for this format. Cc: mesa-stable Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7198>
8 daysanv: Implement VariableDescriptorCountJason Ekstrand5-30/+182
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7180>
8 daysanv: Add a descriptor_count to descriptor setsJason Ekstrand4-2/+8
This is useful for asserting in-bounds descriptor set access. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7180>
8 daysanv: Bump the number of update-after-bind descriptors to 1MJason Ekstrand1-7/+12
It's a bit hard to exactly map our implementation to the limits described by Vulkan. The bindless surface handle in the extended message descriptors is 20 bits and it's an index into the table of RENDER_SURFACE_STATE structs that starts at bindless surface base address. This means that we can have at must 1M surface states allocated at any given time. Since most image views take two descriptors, this means we have a limit of about 500K image views. However, since we allocate surface states at vkCreateImageView time, this means our limit is actually something on the order of 500K image views allocated at any time. The actual limit describe by Vulkan, on the other hand, is a limit of how many you can have in a descriptor set. Assuming anyone using 1M descriptors will be using the same image view twice a bunch of times (or a bunch of null descriptors), we can safely advertise a larger limit. 1M is what's required by D3D12, so let's advertise that. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3335 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7180>
9 daysanv: Ignore continue flag in primary cmd buffersRicardo Garcia1-2/+11
According to the Vulkan specification, the VK_COMMAND_BUFFER_USAGE_RENDER_PASS_CONTINUE_BIT flag will be ignored if included in a VkCommandBufferBeginInfo for a primary command buffer. This also implies pBeginInfo->pInheritanceInfo should not be read even if the flag is present, and makes it legal to include the flag knowing it will be ignored. Signed-off-by: Ricardo Garcia <rgarcia@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7128>
11 daysintel/isl: Drop redundant unpack of unorm channelsNanley Chery1-1/+0
Fixes: 09ced654204 ("intel/isl: Add format conversion code") Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7168>
12 daysintel/fs: Handle nir_intrinsic_terminateCaio Marcelo de Oliveira Filho3-19/+18
For terminate operation, jump the invocation without predicating on the rest of the quad being disabled -- which is what is done for demote and discard. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7150>
12 daysanv: Go back to using the sampler for UBO pullsJason Ekstrand1-1/+1
This functionally reverts b54d37a8676acbd725ef1817479f2630d3ea95be. This fixes a 12% performance regression in DOOM (2016) on Tigerlake. Fixes: b54d37a8676a "anv: Use the data cache for indirect UBO..." Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7173>
12 daysintel: Remove Gen10-specific device entriesIan Romanick2-62/+0
The enables removal of gen_device_info::is_cannonlake. v2: Remove GEN10_FEATURES and GEN10_HW_INFO macros. Suggested by Lionel. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6899>
12 daysintel: Remove Gen10-speicific perf supportIan Romanick4-10415/+0
v2: Also update Makefile.sources and Android build files. Noticed by Lionel. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6899>
12 daysanv: Don't generate Gen10-specific functionsIan Romanick8-44/+1
v2: Re-wrap lines in meson.build. Suggested by Jason. v3: Also update Makefile.sources and Android build files. Noticed by Lionel. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v2] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6899>
12 daysintel/isl: Don't generate Gen10-specific functionsIan Romanick4-30/+0
v2: Also update Makefile.sources and Android build files. Noticed by Lionel. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v1] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6899>
12 daysintel: Remove Gen10-specific cache config codeIan Romanick1-20/+0
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6899>
12 daysintel/compiler: Remove Gen10-specific codeIan Romanick8-47/+7
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6899>
12 daysintel: Disable all support for Gen10Ian Romanick1-0/+5
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6899>
12 daysintel: Silence many unused parameter warnings in blorp_genX_exec.hIan Romanick1-8/+7
I considered a couple other options (including adding #if / #endif around UNUSED and adding an UNUSED_ON_SOME_GEN), but this seemed the best. There was also at least one other case of having UNUSED on a paramter that is sometimes unused (params in blorp_emit_color_calc_state). This header gets included in a lot of places (esp. in files that get built per-Gen), so the warnings are repeated a lot. In file included from src/mesa/drivers/dri/i965/genX_blorp_exec.c:33: src/intel/blorp/blorp_genX_exec.h: In function ‘emit_urb_config’: src/intel/blorp/blorp_genX_exec.h:193:48: warning: unused parameter ‘deref_block_size’ [-Wunused-parameter] 193 | enum gen_urb_deref_block_size *deref_block_size) | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~ src/intel/blorp/blorp_genX_exec.h: In function ‘blorp_fill_vertex_buffer_state’: src/intel/blorp/blorp_genX_exec.h:350:52: warning: unused parameter ‘batch’ [-Wunused-parameter] 350 | blorp_fill_vertex_buffer_state(struct blorp_batch *batch, | ~~~~~~~~~~~~~~~~~~~~^~~~~ src/intel/blorp/blorp_genX_exec.h: In function ‘blorp_emit_surface_state’: src/intel/blorp/blorp_genX_exec.h:1403:42: warning: unused parameter ‘aux_op’ [-Wunused-parameter] 1403 | enum isl_aux_op aux_op, | ~~~~~~~~~~~~~~~~^~~~~~ src/intel/blorp/blorp_genX_exec.h: In function ‘blorp_update_clear_color’: src/intel/blorp/blorp_genX_exec.h:1867:46: warning: unused parameter ‘batch’ [-Wunused-parameter] 1867 | blorp_update_clear_color(struct blorp_batch *batch, | ~~~~~~~~~~~~~~~~~~~~^~~~~ Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6899>
13 daysintel/compiler, anv: Delete cs_prog_data->slm_sizeKenneth Graunke3-3/+1
cs_prog_data->slm_size is basically redundant with prog_data->total_shared, which is the field that we actually use for controlling the shared local memory size in all drivers. We were still using it in one place for VK_EXT_pipeline_executable_properties, but we should just fix that and delete the field. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7152>
13 daysintel/fs: Allow constant-propagation into SAMPLEINFO and IMAGE_SIZEJason Ekstrand1-0/+2
Without this, we end up with indirect sampler messages all the time because we don't propagate the texture/image BTI. This makes debugging shaders with imageSize or textureSamples in them a pain. Shader-db results on Ice Lake: total instructions in shared programs: 19720612 -> 19720564 (<.01%) instructions in affected programs: 4998 -> 4950 (-0.96%) helped: 12 HURT: 0 All affected shaders were compute shaders in Deus Ex: Mankind Divided. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6794>
13 daysisl: Allow CCS for 8bpp surfaces with 3+ miplevelsNanley Chery1-11/+0
I can't find a restriction for enabling CCS on these surfaces in recent versions of the Bspec. Since I didn't cite my source, I'm not even sure such a restriction existed in the first place. Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7085>
2020-10-13intel/fs: Rework scratch handling on Gen9+Jason Ekstrand2-16/+111
The current scratch mechanism uses an MRF hack where we reserve a few GRF registers to treat like the MRF and we collect the data into that MRF region before doing a scratch write. We also use that region for the header for scratch reads. This commit changes things and gets rid of the MRF hack. Instead, we reserve a single register (which RA is free to pick) for the scratch header and uses split sends for scratch writes to avoid having to do the copy. This should provide RA with more freedom in the presence of spilling as well as avoid some unnecessary data moves. In future, the new GEN9_SCRATCH_HEADER opcode gives us a place where we can do our own per-thread scratch base address calculations rather than depending on the scratch base address that gets pushed into g0. Having an opcode for this lets us do it once at the top of the shader rather than repeating it at every read/write. One other noticeable difference is the use of SHADER_OPCODE_SEND. We can get away with this thanks to the fact that we're now using a set to track which instructions are generated by spills and don't rely on the opcodes to find spill/fill instructions. This allows us to avoid adding more virtual opcodes and let the normal code paths handle things like scoreboard dependencies between header setup and the SEND. It also means that post-RA scheduling may be able to space out the header setup MOV and the SEND for better latency hiding. Shader-db results on Skylake: total spills in shared programs: 12137 -> 10604 (-12.63%) spills in affected programs: 6685 -> 5152 (-22.93%) helped: 274 HURT: 2 total fills in shared programs: 13065 -> 11515 (-11.86%) fills in affected programs: 9007 -> 7457 (-17.21%) helped: 275 HURT: 1 Shader-db results on Ice Lake: total spills in shared programs: 12482 -> 10953 (-12.25%) spills in affected programs: 6586 -> 5057 (-23.22%) helped: 275 HURT: 0 total fills in shared programs: 12819 -> 11234 (-12.36%) fills in affected programs: 7867 -> 6282 (-20.15%) helped: 274 HURT: 0 Shader-db results on Tigerlake: total spills in shared programs: 11689 -> 10233 (-12.46%) spills in affected programs: 4740 -> 3284 (-30.72%) helped: 259 HURT: 0 total fills in shared programs: 10840 -> 9443 (-12.89%) fills in affected programs: 6244 -> 4847 (-22.37%) helped: 259 HURT: 0 Fossil-db results on Ice Lake: Spills in all programs: 245249 -> 201633 (-17.8%) Fills in all programs: 366066 -> 314368 (-14.1%) More practically, this seems to give about a 0.5-1% perf boost in Witcher 3 (DXVK) and Shadow of the Tomb Raider (Vulkan native). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
2020-10-13intel/fs/ra: Use a set to track added spill/fill instructionsJason Ekstrand1-21/+32
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
2020-10-13intel/fs/ra: Sanity-check our IP countsJason Ekstrand1-0/+8
Starting with e99081e76d4a, we don't re-construct liveness information every time we spill a register. Instead, we're very careful to track which instructions are spill instructions and not contribute those to the IP count so that we can continue to use the old liveness information even though instructions have been added. This commit adds an assert that sanity-checks that we count the same number of instructions as our liveness information is based on. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
2020-10-13intel/fs/ra: Store the last non-spill VGRF nodeJason Ekstrand1-1/+4
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
2020-10-13intel/fs/ra: Refactor handling of Gen7 scratch readsJason Ekstrand1-16/+15
The attempt at de-duplication with the gen7_read Boolean wasn't actually saving us anything. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
2020-10-13intel/fs/ra: Increment spill_offset as part of the emit_spill loopJason Ekstrand1-2/+4
This makes it consistent with our handling of src.offset and with our handling of spill_offset in emit_unspill. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
2020-10-13intel/fs: Add a SCRATCH_HEADER opcodeJason Ekstrand6-33/+86
This opcode is responsible for setting up the buffer base address and per-thread scratch space fields of a scratch message header. For the most part, it's a copy of g0 but some messages need us to zero out g0.2 and the bottom bits of g0.5. This may actually fix a bug when nir_load/store_scratch is used. The docs say that the DWORD scattered messages respect the per-thread scratch size specified in gN.3[3:0] in the message header but we've been leaving it zero. This may mean that we've been ignoring any scratch reads/writes from a load/store_scratch intrinsic above the 1KB mark. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
2020-10-13intel/fs: Copy the PTSS from g0 for scratch reads/writesJason Ekstrand1-0/+5
In theory, this fixes a bug where we were dropping the PTSS bound on the floor. The hardware docs claim that the A32 DWORD and BYTE scattered read/write messages do a PTSS bounds check. However, in practice, it seems that the hardware ignores the bounds check so this doesn't actually matter. I verified this with the following couple of piglit tests: https://gitlab.freedesktop.org/mesa/piglit/-/merge_requests/399 In practice, this prevents the next commit from making a subtle behavioral change. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
2020-10-13intel/batch_decoder: Don't clame vec4 vs/gs/tcs shaders on Gen11+Jason Ekstrand1-1/+1
Because we hard-coded the default to vec4, any platform where it doesn't have a "Dispatch Mode" field gets vec4 by default. This includes Gen11+ where vec4 is no longer a thing. Change the default so it works on newer hardware. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>