summaryrefslogtreecommitdiff
path: root/src/intel/compiler/brw_fs_nir.cpp
AgeCommit message (Collapse)AuthorFilesLines
5 daysintel/brw: Avoid getting a stride of 0 for nir_intrinsic_exclusive_scanJordan Justen1-1/+1
Ref: 671745b616a ("intel/fs: Don't allow 0 stride on MOV destination") Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28821>
8 daysintel/brw: Make an fs_builder::SYNC helperKenneth Graunke1-6/+3
We always want a null destination, so this saves some typing. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
8 daysintel/brw: Replace FS_OPCODE_LINTERP with BRW_OPCODE_PLNKenneth Graunke1-1/+1
We no longer support the old LINE+MAC lowering, and we already lower this to MAD in NIR on Gfx11+, so the LINTERP virtual opcode always corresponds the PLN. The only catch is that LINTERP's operands are reversed from PLN, so we have to switch them. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
8 daysintel/brw: Drop default size of 1 from bld.vgrf() callsKenneth Graunke1-31/+31
This isn't necessary as 1 is the default value for the parameter. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
8 daysintel/brw: Delete fs_visitor::vgrf helperKenneth Graunke1-16/+12
Just use fs_builder::vgrf instead of the older glsl_type-based one. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
2024-03-29intel/brw/xe2+: DPAS must be SIMD16 nowIan Romanick1-4/+4
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>
2024-03-29nir: intel/brw: Change the order of sources for nir_dpas_intelIan Romanick1-9/+9
It was by pure luck that all sources (and the result) of nir_dpas_intel had the same number of components. It is possible to support matrix sizes where the accumlator matrix and the result matrix are larger (e.g., 16x8 * 8x16 = 16x16). This breaks all of the assumptions of NIR's infrastructure for code generating intrinsics. Fix the by making the accumulator matrix be the first source. The accumulator and the result will always have the same dimensions (due to rules of matrix multiplication) and the same type (due to restructions of the cooperative matrix extension). This forces them to have the same number of components. This doesn't fix all the potential problems. NIR expects that all 0-sized sources will have the same number of components. This just ensures that the result has the correct number of components. Fixes: 6b14da33ad3 ("intel/fs: nir: Add nir_intrinsic_dpas_intel") Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>
2024-03-28intel/brw: minor rework to de duplicate variable assignmentRohan Garg1-3/+1
Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27235>
2024-03-19intel/fs: Avoid generating useless UNDEFs for every SSA defKenneth Graunke1-1/+4
Emitting UNDEF is only necessary when the instructions we generate to produce the NIR def are considered partial writes. By adding a simple check (adapted from fs_inst::is_partial_write()), we can avoid creating loads of unnecessary UNDEFs that we have to clean up later. Our first dead code elimination pass does get rid of them pretty quickly, but this should save memory and time during our first split_virtual_grfs and dead_code_elimination passes. This generates roughly 30% fewer instructions at the beginning. Improves compilation time of shaders: - Rise of the Tomb Raider: -3.51563% +/- 0.103951% (n=7) - Borderlands 3: -3.64422% +/- 0.300951% (n=7). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28169>
2024-03-19intel/brw: Use predicates for quad_vote_any and quad_vote_all when availableCaio Oliveira1-0/+22
Up until Xe2, we can use the predicates ANY4H and ALL4H to achieve the same result with less instructions. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27279>
2024-03-19intel/brw: Implement quad_vote_any and quad_vote_allCaio Oliveira1-0/+52
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27279>
2024-03-19intel/fs: Don't allow 0 stride on MOV destinationIan Romanick1-1/+1
Outside SIMD1 instructions, a destination stride of zero doesn't make any sense. When such strides exist, they would be fixed by the FS generator. Currently the only place that intentionally generates such a stride is setup_barrier_message_payload_gfx125, and this commit changes that. The existence of a zero stride that won't really be a zero stride causes a variety of problems with other optimization passes. Those passes don't know that 0 actually means 1, and they make incorrect assumptions about sizes written, etc. The assertion helped catch many bugs in some other work in progress that tries to store convergent values in SIMD8 registers regardless of the dispatch width. That code would accidentally generate destination strides of zero. v2: Check stride differently depending on register file. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28256>
2024-03-12intel/brw: Emit better code for read_invocation(x, constant)Kenneth Graunke1-1/+7
For something as basic as read_invocation(x, 0), we were emitting: mov(8) vgrf67:D, 0d find_live_channel(8) vgrf236:UD, NoMask broadcast(8) vgrf237:D, vgrf67:D, vgrf236+0.0<0>:UD NoMask broadcast(8) vgrf235+0.0:W, vgrf197+0.0:W, vgrf237+0.0<0>:D NoMask mov(8) vgrf234+0.0:W, vgrf235+0.0<0>:W This is way overcomplicated - if the invocation is a constant, we can simply emit a single MOV which reads the desired channel index. Not only that, but it's difficult to clean up: 1. If this expression appears multiple times, CSE will find all the redundant emit_uniformize(invocation) and get rid of the duplicate (find_live_channel+broadcast) on future instructions. 2. Copy propagation will put the 0d directly in the first broadcast. 3. Dead code elimination will get rid of the vgrf67 temp holding 0. 4. Algebraic will replace the first broadcast(x, 0) with a MOV. 5. Copy propagation will put the 0d directly in the second broadcast. 6. Dead code elimination will get rid of the vgrf237 temp. 7. Algebraic will replace the second broadcast(x, 0) with a MOV. 8. Copy propagation will finally combine the two MOVs That's at least 7-8 optimization passes and several loops through the same passes just to clean up something we can do trivially. Cuts 25% of the of the optimizer steps in pipeline 22200210259a2c9c of fossil-db/google-meet-clvk/BgBlur.1f58fdf742c27594.1 (31 to 23). Shortens compilation time of the google-meet-clvk/Relight pipeline by -2.87717% +/- 0.509162% (n=150). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28097>
2024-03-01intel/compiler: Xe2+ can do URB load/store with a byte offsetRohan Garg1-50/+119
Thanks to Ken for suggesting this URB refactoring change and pointing out that the LSC can operate on the byte offset granularity. This should fix the geometry shader test cases where we have more than 32 vertices since previously we were failing to write the correct control data bits because of incorrect write mask. Shader-db results for Xe2: total instructions in shared programs: 153475 -> 153437 (-0.02%) instructions in affected programs: 1374 -> 1336 (-2.77%) helped: 11 HURT: 0 helped stats (abs) min: 3 max: 5 x̄: 3.45 x̃: 3 helped stats (rel) min: 1.67% max: 4.92% x̄: 3.23% x̃: 2.70% 95% mean confidence interval for instructions value: -3.92 -2.99 95% mean confidence interval for instructions %-change: -4.10% -2.36% Instructions are helped. total loops in shared programs: 140 -> 140 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 16002649 -> 16002329 (<.01%) cycles in affected programs: 9174 -> 8854 (-3.49%) helped: 11 HURT: 0 helped stats (abs) min: 22 max: 38 x̄: 29.09 x̃: 32 helped stats (rel) min: 2.62% max: 5.54% x̄: 3.78% x̃: 3.85% 95% mean confidence interval for cycles value: -33.56 -24.62 95% mean confidence interval for cycles %-change: -4.48% -3.08% Cycles are helped. total spills in shared programs: 52 -> 52 (0.00%) spills in affected programs: 0 -> 0 helped: 0 HURT: 0 total fills in shared programs: 94 -> 94 (0.00%) fills in affected programs: 0 -> 0 helped: 0 HURT: 0 total sends in shared programs: 4240 -> 4240 (0.00%) sends in affected programs: 0 -> 0 helped: 0 HURT: 0 LOST: 0 GAINED: 0 Rework: (Sagar) - Adjust offset/indirect offset calculation. - Add shader-db results - Always calculate dword index - Drop changes for indirect writes Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27602>
2024-02-29intel/brw: Remove brw_shader.hCaio Oliveira1-0/+32
Find a better home for its existing content. Some functions are now just static functions at the usage sites. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27861>
2024-02-29intel/brw: Remove extra stage_prog_data field in fs_visitorCaio Oliveira1-2/+2
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27861>
2024-02-29intel/brw: Fold backend_shader into fs_visitorCaio Oliveira1-1/+1
The base class was used when we had vec4, but now we can fold it with its only subclass. Declare fs_visitor now as a struct to be able to forward declare for C code without causing errors due to class/struct being mixed. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27861>
2024-02-29intel/fs: add plumbing for embedded samplersLionel Landwerlin1-7/+26
We can address samplers from 3 different locations : - binding table - dynamic state base address - bindless sampler base address (only Gfx11+) Here we allow samplers to be address from the dynamic state base address with the embedded sampler flag. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22151>
2024-02-28intel/brw: Remove Gfx8- fields from *_prog_key structsCaio Oliveira1-9/+1
Those are not used or relevant anymore. Also update Iris accordingly. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28intel/brw: Remove F16TO32 and F32TO16 opcodesCaio Oliveira1-5/+5
These are done with MOVs and appropriate types in Gfx9+. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28intel/brw: Remove Gfx8- code from NIR conversionCaio Oliveira1-459/+143
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-27intel/fs: Add fast path for ballot(true)Ian Romanick1-6/+15
This doesn't help very much now. A later commit adds a NIR optimization pass, tentatively called nir_opt_uniform_subgroup, that converts many kinds of subgroup operations to things involving bitCount(ballot(true)). This commit makes a huge difference in the results of that later commit. No shader-db changes on any Intel platform. Fossil-db results: All Intel platforms had similar results. (Ice Lake shown) Totals: Instrs: 165558033 -> 165557519 (-0.00%) Cycles: 15156188362 -> 15156178922 (-0.00%); split: -0.00%, +0.00% Totals from 299 (0.05% of 656117) affected shaders: Instrs: 88293 -> 87779 (-0.58%) Cycles: 3709498 -> 3700058 (-0.25%); split: -0.28%, +0.03% v2: Rebase on splitting ELK from BRW. Remove devinfo->ver >= 8 check. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>
2024-02-27intel/fs: Use constant of same type to write flagIan Romanick1-1/+1
Otherwise the compiler generates an extra MOV to load the constant into a register first because reasons. :shrug: vote_any, vote_all, vote_ieq, and vote_feq handling already do this. No shader-db changes on any Intel plaform. Fossil-db results: All Intel platforms had similar results. (Ice Lake shown) Totals: Instrs: 165592451 -> 165557937 (-0.02%) Cycles: 15133282615 -> 15133059360 (-0.00%); split: -0.00%, +0.00% Totals from 33779 (5.15% of 656115) affected shaders: Instrs: 4396576 -> 4362062 (-0.79%) Cycles: 86867412 -> 86644157 (-0.26%); split: -0.37%, +0.11% Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>
2024-02-27intel/fs: Delete stale comment in nir_intrinsic_ballot implementationIan Romanick1-6/+1
Discard actually uses f1.x, so this implementation of ballot is fine. Trivial. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>
2024-02-27intel/compiler: Add texture gather offset LOD/Bias message supportSagar Ghuge1-0/+18
v2: (Ian) - Space formatting on conditional statement Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27447>
2024-02-27intel/compiler: Add gather4_i/l/[_c]/b sampler messageSagar Ghuge1-3/+21
v2: (Ian) - Format comment Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27447>
2024-02-27intel/compiler: Adjust sample_b parameter according to new layoutSagar Ghuge1-1/+4
On Xe2+, we need to pack LOD with array index for cube array surfaces, with that mlod parameter gets adjusted to different indices based on the layout. So track if we are packing LOD with array index in fs_inst and propogate that to sampler lowering code to adjust param location. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27447>
2024-02-20intel/fs: Don't rely on CSE for VARYING_PULL_CONSTANT_LOADKenneth Graunke1-4/+10
In the past, we didn't have a good solution for combining scalar loads with a variable index plus a constant offset. To handle that, we took our load offset and rounded it down to the nearest vec4, loaded an entire vec4, and trusted in the backend CSE pass to detect loads from the same address and remove redundant ones. These days, nir_opt_load_store_vectorize() does a good job of taking those scalar loads and combining them into vector loads for us, so we no longer need to do this trick. In fact, it can be better not to: our offset need only be 4 byte (scalar) aligned, but we were making it 16 byte (vec4) aligned. So if you wanted to load an unaligned vec2, we might actually load two vec4's (___X | Y___) instead of doing a single load at the starting offset. This should also reduce the work the backend CSE pass has to do, since we just emit a single VARYING_PULL_CONSTANT_LOAD instead of 4. shader-db results on Alchemist: - No changes in SEND count or spills/fills - Instructions: helped 95, hurt 100, +/- 1-3 instructions - Cycles: helped 3411 hurt 1868, -0.01% (-0.28% in affected) - SIMD32: gained 5, lost 3 fossil-db results on Alchemist: - Instrs: 161381427 -> 161384130 (+0.00%); split: -0.00%, +0.00% - Cycles: 14258305873 -> 14145884365 (-0.79%); split: -0.95%, +0.16% - SIMD32: Gained 42, lost 26 - Totals from 56285 (8.63% of 652236) affected shaders: - Instrs: 13318308 -> 13321011 (+0.02%); split: -0.01%, +0.03% - Cycles: 7464985282 -> 7352563774 (-1.51%); split: -1.82%, +0.31% From this we can see that we aren't doing more loads than before and the change is pretty inconsequential, but it requires less optimizing to produce similar results. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27568>
2024-02-14intel/compiler: Rename DISPATCH_MODE_* enums to INTEL_DISPATCH_MODE_*Caio Oliveira1-1/+1
And move to the intel_shader_enums.h file. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27475>
2024-02-14intel/compiler: Rename BRW_WM_MSAA_* enums to INTEL_MSAA_*Caio Oliveira1-5/+5
And move to the intel_shader_enums.h file. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27475>
2024-02-14intel/compiler: Implement nir_intrinsic_load_topology_id_intel for xe2Jordan Justen1-19/+80
Rework: * Sagar: Rework BRW_TOPOLOGY_ID_DSS, BRW_TOPOLOGY_ID_EU_THREAD_SIMD calculations Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27529>
2024-02-12intel/compiler: Use nir_tex_src_backend1 to pack LOD and array indexSagar Ghuge1-1/+5
Since this lowering is totally Intel specific, we don't have to introduce the new texture source. We can use the nir_tex_src_backend1 source to pack LOD/LOD Bias and array index into 32 bit single value. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27458>
2024-02-02intel/compiler/xe2: Emit texture instructions w/ combined LOD and array indexIan Romanick1-0/+17
The extra assertions are just there to help validate pack_lod_and_array_index (in nir_lower_tex.c). v2: Split got_lod_or_bias into two variables. This simplifies some changes that Sagar is working on. Suggested by Sagar. Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27305>
2024-01-25intel: Use hardware generated compute shader local invocation IDsKenneth Graunke1-6/+7
Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27167>
2024-01-24intel/fs: Fix shift counts for 8- and 16-bit typesIan Romanick1-30/+29
With regards to implicit masking of the shift counts for 8- and 16-bit types, the PRMs are incorrect. They falsely state that on Gen9+ only the low bits of src1 matching the size of src0 (e.g., 4-bits for W or UW src0) are used. The Bspec (backed by data from experimentation) state that 0x3f is used for Q and UQ types, and 0x1f is used for **all** other types. To match the behavior expected for the NIR opcodes, explicit masks for 8- and 16-bit types must be added. This fixes (the updated version, see crucible!138) of func.shader.shift.int16_t on all Intel platforms. According to Karol, this also fixes "integer_ops integer_rotate" tests in OpenCL CTS. No shader-db or fossil-db changes on any Intel platform. Tested-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23001>
2024-01-22intel/fs: Track instance id in gs_thread_payloadSagar Ghuge1-5/+1
This change moves the instance id gs_thread_payload constructor and lowering code will simply use that. Also, this change takes the Xe2 register width in consideration that fixes a couple of tests involving geometry shaders with gl_InvocationID on Xe2. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26960>
2024-01-12intel/compiler/xe2: Fix for the removal of most predication modes.Francisco Jerez1-18/+25
Reworks: * Remove changes to fixup_nomask workaround since it applies only for Gfx12 family. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26860>
2024-01-04intel/compiler: reemit boolean resolve for inverted if on gen5Dave Airlie1-0/+11
Gen5 adds some boolean conversion instructions after nir emits, but that nir srcs don't line up with them, so reemit the boolean conversion if we reemit the inot. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: 31b5f5a51f3a ("nir/opt_if: Simplify if's with general conditions") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26782>
2023-12-29anv: Set COMPUTE_WALKER systolic mode enable flagIan Romanick1-0/+1
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29intel/fs: nir: Add nir_intrinsic_dpas_intelIan Romanick1-0/+59
v2: Fix parameter order in nir_intrinsic_dpas_intel to DPAS conversion. v3: Fix float16 destination DPAS on DG2. v4: Use nir_component_mask(...) instead of 0xffff. Suggested by Caio. v5: Rebase on !26323. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-28intel/fs/xe2+: Update for new layout of vertex setup data in PS payload.Francisco Jerez1-4/+21
The interpolation deltas of PS inputs now show up as a 12B vec3 (A0, A1-A0, A2-A0) in the ATTR file, instead of the previously used 16B format with an unused component. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
2023-12-28intel/fs/xe2+: Update poly info PS payload for new multi-polygon dispatch ↵Francisco Jerez1-3/+59
format. This includes the render target array index, viewport index, and front/back facing fields, which are now replicated per pair of subspans in order to support fixed-layout multi-polygon PS dispatch. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
2023-12-28intel/fs/xe2+: Update location of sample ID fields in PS payload.Francisco Jerez1-2/+7
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
2023-12-28intel/fs/xe2+: Update uses of pixel/sample mask from PS thread payload.Francisco Jerez1-3/+8
Note from Caio: proper handling of brw_sample_mask_reg will appear in later patches. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
2023-12-22intel/fs/gfx12: Implement multi-polygon format of render target array index ↵Francisco Jerez1-1/+20
in PS payload. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
2023-12-22intel/fs/gfx12: Implement multi-polygon format of back/front-facing flag in ↵Francisco Jerez1-2/+43
PS payload. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
2023-12-22intel/fs: Pass builder to per_primitive_reg().Francisco Jerez1-1/+1
Matches prototype of interp_reg(), will be useful in a subsequent commit. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
2023-12-22intel/fs: Provide component index explicitly to interp_reg().Francisco Jerez1-9/+9
Main motivation is that for multipolygon PS shaders the i-th plane parameter for the j-th input attribute will no longer necessarily be a scalar, since different channels may be processing different polygons with different input plane parameters, so simply taking a component() of the result of interp_reg() will no longer work. Instead of duplicating the multipolygon handling logic in every caller of interp_reg(), fold the component() call into interp_reg() so we can replace it with multipolygon-correct code more easily. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
2023-12-22intel/fs: Map all TES input attributes to ATTR register number 0.Francisco Jerez1-5/+4
Instead of treating fs_reg::nr as an offset for ATTR registers simply consider different indices as denoting disjoint spaces that can never be accessed simultaneously by a single region. From now on geometry stages will just use ATTR #0 for everything and select specific attributes via offset() with the native dispatch width of the program, which should work on current platforms as well as on Xe2+. See "intel/fs: Map all GS input attributes to ATTR register number 0." for the rationale. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
2023-12-22intel/fs: Map all VS input attributes to ATTR register number 0.Francisco Jerez1-3/+4
Instead of treating fs_reg::nr as an offset for ATTR registers simply consider different indices as denoting disjoint spaces that can never be accessed simultaneously by a single region. From now on geometry stages will just use ATTR #0 for everything and select specific attributes via offset() with the native dispatch width of the program, which should work on current platforms as well as on Xe2+. See "intel/fs: Map all GS input attributes to ATTR register number 0." for the rationale. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>