path: root/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
AgeCommit message (Collapse)AuthorFilesLines
2020-09-01nv50/ir: fix cas lowering for 64 bitKarol Herbst1-2/+3
Signed-off-by: Karol Herbst <> Part-of: <>
2020-08-25nv50/ir: add nv50_ir_prog_info_outKarol Herbst1-3/+3
Split out the output relevant fields from the nv50_ir_prog_info struct in order to have a cleaner separation between the input and output of the compilation. Signed-off-by: Karol Herbst <> Part-of: <>
2020-07-10nv50: Clear nv50_ir_prog_info of dead and codegen specific variablesmmenzyns1-1/+1
These variables are either not used in the code, only assigned but never accessed, or only used inside codegen. Another reason is that this patch will be preceding shader cache, and these variables are useless to cache. Removing/moving them should make it clearer by removing the case something from the structure is not cached. Shader cache patch: Signed-off-by: Mark Menzynski <> Reviewed-by: Karol Herbst <> Part-of: <>
2020-06-22gv100/ir: fix atom casKarol Herbst1-1/+2
Signed-off-by: Karol Herbst <> Reviewed-by: Ben Skeggs <> Part-of: <>
2020-06-10nvir/gv100: initial supportBen Skeggs1-0/+2
v2: - add TargetGV100::isBarrierRequired() for OP_BREV - use NV50_IR_SUBOP_LOP3_LUT() convenience macro where it makes sense - separated out nir_lower_idiv into its own commit - make use of the shared function to generate compiler options - disable lower_fpow, nir's lowering is broken v3: - use replaceCvt() instead of custom NEG/ABS/SAT lowering v4: - remove WAR from peephole, not needed now we're using replaceCvt() Signed-off-by: Ben Skeggs <> Acked-by: Karol Herbst <> Part-of: <>
2020-06-10nvir: run replaceZero() before replaceCvt()Ben Skeggs1-3/+3
replaceCvt() will miss some cases otherwise. Signed-off-by: Ben Skeggs <> Reviewed-by: Karol Herbst <> Part-of: <>
2020-06-10nvir: introduce OP_BREV with lowering to EXTBF_REV for current GPUsBen Skeggs1-0/+11
SM70 has this instruction, but no BFE. Signed-off-by: Ben Skeggs <> Reviewed-by: Karol Herbst <> Part-of: <>
2019-12-11nv50/ir: implement global atomics and handle it for nirKarol Herbst1-0/+2
TGSI doesn't have any concept of global memory right now. Signed-off-by: Karol Herbst <> Acked-by: Dave Airlie <>
2019-10-30gm107/ir: fix loading z offset for layered 3d image bindingsIlia Mirkin1-50/+158
Unfortuantely we don't know if a particular load is a real 2d image (as would be a cube face or 2d array element), or a layer of a 3d image. Since we pass in the TIC reference, the instruction's type has to match what's in the TIC (experimentally). In order to properly support bindless images, this also can't be done by looking at the current bindings and generating appropriate code. As a result all plain 2d loads are converted into a pair of 2d/3d loads, with appropriate predicates to ensure only one of those actually executes, and the values are all merged in. This goes somewhat against the current flow, so for GM107 we do the OOB handling directly in the surface processing logic. Perhaps the other gens should do something similar, but that is left to another change. This fixes dEQP tests like image_load_store.3d.*_single_layer and GL-CTS tests like shader_image_load_store.non-layered_binding without breaking anything else. Signed-off-by: Ilia Mirkin <> Cc: "20.0" <>
2019-07-23nvc0/ir: Fix assert accessing null pointerMark Menzynski1-1/+1
Bugzilla: Bugzilla: Signed-off-by: Mark Menzynski <> Reviewed-by: Ilia Mirkin <> Reviewed-by: Tobias Klausmann<>
2019-02-06gm107/ir: add fp64 rsqKarol Herbst1-1/+1
Acked-by: Ilia Mirkin <> Cc: 19.0 <>
2019-02-06gm107/ir: add fp64 rcpKarol Herbst1-1/+1
Acked-by: Ilia Mirkin <> Cc: 19.0 <>
2019-02-06gk104/ir: Use the new rcp/rsq in libraryKarol Herbst1-1/+1
[imirkin: add a few more "long" prefixes to safen things up] Acked-by: Ilia Mirkin <> Cc: 19.0 <>
2019-02-06gk110/ir: Use the new rcp/rsq in libraryBoyan Ding1-0/+38
v2: (Karol Herbst <> * fix Value setup for the builtins Signed-off-by: Boyan Ding <> [imirkin: track the fp64 flag when switching ops to calls] Signed-off-by: Ilia Mirkin <> Cc: 19.0 <>
2019-02-06nvc0: fix 3d images on keplerIlia Mirkin1-20/+26
Looks like SUBFM.3D and SUEAU are perfectly capable of dealing with 3d tiling, they just need the correct inputs. Supply them. We also have to deal with the case where a 2d "layer" of a 3d image is bound. In this case, we supply the z coordinate separately to the shader, which has to optionally treat every 2d case as if it could be a slice of a 3d texture. Signed-off-by: Ilia Mirkin <> Cc: 19.0 <>
2019-02-06nvc0/ir: fix second tex argument after levelZero optimizationIlia Mirkin1-16/+0
We used to pre-set a bunch of extra arguments to a texture instruction in order to force the RA to allocate a register at the boundary of 4. However with the levelZero optimization, which removes a LOD argument when it's uniformly equal to zero, we undid that logic by removing an extra argument. As a result, we could end up with insufficient alignment on the second wide texture argument. Instead we switch to a different method of achieving the same result. The logic runs during the constraint analysis of the RA, and adds unset sources as necessary right before being merged into a wide argument. Fixes MISALIGNED_REG errors in Hitman when run with bindless textures enabled on a GK208. Fixes: 9145873b152 ("nvc0/ir: use levelZero flag when the lod is set to 0") Signed-off-by: Ilia Mirkin <> Cc: 19.0 <>
2019-02-05nvc0/ir: replace cvt instructions with add to improve shader performanceKarol Herbst1-0/+63
gives me an performance boost of 0.2% in pixmark_piano on my gk106, gm204 and gp107. reduces the amount of generated convert instructions by roughly 30% in shader-db. v2: only for 32 bit operations move some common code out of the switch handle OP_SAT with modifiers v3: only for registers and const memory rework if clauses merge isCvt into this patch v4: merge isCvt into its use Signed-off-by: Karol Herbst <> Reviewed-by: Ilia Mirkin <>
2018-09-23nv50/ir: fix link-time build failureRhys Perry1-1/+1
Seems this fixes linking problems that occur in some situations. Signed-off-by: Rhys Perry <> Reviewed-by: Ilia Mirkin <>
2018-09-22nvc0: fix bindless multisampled images on Maxwell+Rhys Perry1-2/+41
NVC0_CB_AUX_BINDLESS_INFO isn't written to on Maxwell+ and it's too small anyway. With these changes, TXQ is used to determine the number of samples and the coordinate adjustment information looked up in a small array in the driver constant buffer. v2: rework to use TXQ and a small array instead of a larger array with an entry for each texture v3: get rid of the small array and calculate the adjustments in the shader Signed-off-by: Rhys Perry <> Fixes: c2ae9b40527 ('nvc0: implement multisampled images on Maxwell+') Reviewed-by: Ilia Mirkin <>
2018-08-27nv50/ir,nvc0: use constant buffers for compute when possible on Kepler+Rhys Perry1-10/+8
Gives a +7.79% increase in FPS with Hitman on lowest quality settings on my GTX 1060. total instructions in shared programs : 5787979 -> 5748677 (-0.68%) total gprs used in shared programs : 669901 -> 669373 (-0.08%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21068 -> 21064 (-0.02%) local shared gpr inst bytes helped 1 0 152 274 274 hurt 0 0 0 0 0 Signed-off-by: Rhys Perry <> Reviewed-by: Karol Herbst <>
2018-08-04nvc0/ir: return 0 in imageLoad on incomplete texturesKarol Herbst1-3/+30
We already guarded all OP_SULDP against out of bound accesses, but we ended up just reusing whatever value was stored in the dest registers. Fixes CTS test shader_image_load_store.incomplete_textures v2: fix for loads not ending up with predicates (bindless_texture) v3: fix replacing the def Cc: <> Reviewed-by: Ilia Mirkin <> Signed-off-by: Karol Herbst <>
2018-08-04gm200/ir: add native OP_SQRT supportKarol Herbst1-0/+3
./GpuTest /test=pixmark_piano 1024x640 30sec: 301 -> 327 points shader-db: total instructions in shared programs : 5472103 -> 5456166 (-0.29%) total gprs used in shared programs : 647530 -> 647522 (-0.00%) total shared used in shared programs : 389120 -> 389120 (0.00%) total local used in shared programs : 21064 -> 21064 (0.00%) total bytes used in shared programs : 58459304 -> 58288696 (-0.29%) local shared gpr inst bytes helped 0 0 27 8281 8281 hurt 0 0 21 431 431 v2: use NVISA_GM200_CHIPSET Reviewed-by: Ilia Mirkin <> Signed-off-by: Karol Herbst <>
2018-07-07nvc0/ir: use the combined tid special registerRhys Perry1-0/+12
total instructions in shared programs : 5804448 -> 5804690 (0.00%) total gprs used in shared programs : 670065 -> 670065 (0.00%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21068 -> 21068 (0.00%) local shared gpr inst bytes helped 0 0 0 5 5 hurt 0 0 0 191 191 Signed-off-by: Rhys Perry <> Reviewed-by: Karol Herbst <>
2018-07-04nvc0: implement multisampled images on Maxwell+Rhys Perry1-29/+2
Changes in v2: - make loadSuInfo32() protected without making the rest protected - move NVC0_SU_INFO_* into nv50_ir_lowering_nvc0.h instead of duplicating NVC0_SU_INFO_MS Signed-off-by: Rhys Perry <> Reviewed-by: Karol Herbst <>
2018-06-14nvc0: add support for programmable sample locationsRhys Perry1-10/+92
Signed-off-by: Rhys Perry <>
2018-02-21nvir/nvc0: fix legalizing of ld unlock c0[0x10000]Karol Herbst1-1/+1
We have to increase the file index also for 0x10000 not just for values greater than 0x10000. Fixes: 37b67db6ae34fb6586d640a7a1b6232f091dd812 Signed-off-by: Karol Herbst <> Reviewed-by: Ilia Mirkin <>
2018-02-17nvc0: add support for bindless on maxwell+Ilia Mirkin1-9/+23
Signed-off-by: Ilia Mirkin <>
2018-01-07nvc0: add bindless image support for keplerIlia Mirkin1-26/+31
A part of the driver constbuf area is allocated for bindless images. Any update requires uploading to all driver constbufs. This also extends the driver constbuf to 64KB, up from 2KB. Signed-off-by: Ilia Mirkin <>
2018-01-07nvc0: add support for bindless textures on kepler+Ilia Mirkin1-4/+6
This keeps a list of resident textures (per context), and dumps that list into the active buffer list when submitting. We also treat bindless texture fetches slightly differently, wrt the meaning of indirect, and not requiring the SAMPLER file to be used. Signed-off-by: Ilia Mirkin <>
2018-01-07nvc0/ir: safen up lowering logic against overwriting reused valuesIlia Mirkin1-2/+4
I'm fairly sure both of the changed sites are OK as-is, but they're fragile, so this is just safening them up. Since this is happening pre-ssa, we don't want to be overwriting values that may potentially get used later on. Signed-off-by: Ilia Mirkin <>
2017-12-19nvc0/ir: change textureGrad to always use lane 0 as the tex originIlia Mirkin1-14/+46
Thanks to Karol Herbst for the debugging / tracing work that led to this change. Move to using lane 0 as the "work" lane for the texture. It is unclear why this helps, as that computation should be identical to doing it in the "correct" lane with the properly adjusted quadops. In order to be able to use the lane 0 result, we also have to ensure that lane 0 contains the proper array/indirect/shadow values. This applies to Fermi and Kepler. Maxwell+ may or may not need fixing, but that lowering logic is separate. Fixes KHR-GL45.texture_cube_map_array.sampling Signed-off-by: Ilia Mirkin <>
2017-12-04nvc0/ir: Properly lower 64-bit shifts when the shift value is >32Pierre Moreau1-1/+1
Fixes: 61d7676df77 "nvc0/ir: add support for 64-bit shift lowering on SM20/SM30" Fixes fs-shift-scalar-by-scalar.shader_test from piglit for the current set-up: uniform int64_t ival -0x7dfcfefbdf6536ff # bit pattern: 0x82030104209ac901 uniform uint64_t uval 0x1400000085010203 uniform int shl 36 uniform int shr 36 uniform int64_t iexpected_shl 0x09ac901000000000 uniform int64_t iexpected_shr -0x7dfcff0 # bit pattern: 0xfffffffff8203010 uniform uint64_t uexpected_shl 0x5010203000000000 uniform uint64_t uexpected_shr 0x0000000001400000 draw rect ortho 12 0 4 4 Signed-off-by: Pierre Moreau <> Reviewed-by: Ilia Mirkin <>
2017-08-31nvc0/ir: propagate immediates to CALL input MOVsTobias Klausmann1-2/+19
On using builtin functions we have to move the input to registers $0 and $1, if one of the input value is an immediate, we fail to propagate the immediate: ... mov u32 $r477 0x00000003 (0) ... mov u32 $r0 %r473 (0) mov u32 $r1 $r477 (0) call abs BUILTIN:0 (0) mov u32 %r495 $r1 (0) ... With this patch the immediate is propagated, potentially causing the first MOV to be superfluous, which we'd remove in that case: ... mov u32 $r0 %r473 (0) mov u32 $r1 0x00000003 (0) call abs BUILTIN:0 (0) mov u32 %r495 $r1 (0) ... Shaderdb stats: total instructions in shared programs : 4893460 -> 4893324 (-0.00%) total gprs used in shared programs : 582972 -> 582881 (-0.02%) total local used in shared programs : 17960 -> 17960 (0.00%) local gpr inst bytes helped 0 91 112 112 hurt 0 0 0 0 v2: implement some changes proposed by imirkin, the manual deletion of the dead mov is necessary after ea22ac23e0 ("nvc0/ir: unlink values pre- and post-call to division function") as the potentially dead mov is unlinked properly, causing later passes to not notice the mov op at all and thus not cleaning it up. That makes up a big chunk of the regression the above commit caused. Keep the deletion of the op where it is, deleting it later unnecessarily blows up size of the change. Signed-off-by: Tobias Klausmann <> Reviewed-by: Ilia Mirkin <>
2017-08-12nvc0/ir: unlink values pre- and post-call to division functionIlia Mirkin1-4/+3
While technically correct, this can lead to e.g. getImmediate assuming that it can walk up the value chain. It could be fixed to not do this, but it seems easier and less error-prone to just not link the two values to save on one LValue object. Signed-off-by: Ilia Mirkin <>
2017-05-20nvc0/ir: SHLADD's middle source must be an immediateIlia Mirkin1-0/+2
The instruction encodings only allow for immediates. Don't try to replace a zero (which is dumb to have in that op in any case) with RZ. Signed-off-by: Ilia Mirkin <> Cc:
2017-02-09nvc0/ir: fix ubo max clamp, reset file indexIlia Mirkin1-1/+3
We just increased the max UBO, so we should also increase the clamp that we do for robustness. Similarly, as we're including the fileIndex in the new indirect value, we should reset fileIndex to 0 so that it is not added in a second time. Signed-off-by: Ilia Mirkin <> Reviewed-by: Samuel Pitoiset <> Cc:
2017-02-09nvc0/ir: fix robustness guarantees for constbuf loads on kepler+ computeIlia Mirkin1-25/+22
Kepler and up unfortunately only support up to 8 constbufs. We work around this by loading from constbufs as if they were storage buffers. However we were not consistently applying limits to loads from these buffers. Make sure to do the same thing we do for storage buffers. Fixes GL45-CTS.robust_buffer_access_behavior.uniform_buffer Signed-off-by: Ilia Mirkin <> Reviewed-by: Samuel Pitoiset <> Cc:
2017-02-09nvc0/ir: make it possible to have the flags def in def0Ilia Mirkin1-1/+1
There's all kinds of logic that doesn't like there being holes in defs or srcs lists. Avoid them. This also fixes the sched logic for maxwell. Signed-off-by: Ilia Mirkin <>
2017-02-09nvc0/ir: add support for 64-bit shift lowering on SM20/SM30Ilia Mirkin1-6/+62
Unfortunately there is no SHF.L/SHF.R instruction pre-SM35. So we have to do a bit more work to get the job done. Signed-off-by: Ilia Mirkin <>
2017-02-09nvc0/ir: add support for all the new int64 tgsi opcodesIlia Mirkin1-1/+66
A few thoughts: - Some of that LegalizeSSA logic should really live much earlier and be subject to the likes of DCE and other useful passes - Some of the "lowering" done in from_tgsi should be done later so that proper optimization might be done. However this all works and the above can be improved upon later. Signed-off-by: Ilia Mirkin <>
2017-01-16nvc0: enable FBFETCH with a special slot for color buffer 0Ilia Mirkin1-4/+16
We don't need to support all the color buffers for advanced blend, just cb0. For Fermi, we use the special binding slots so that we don't overlap with user textures, while Kepler+ gets a dedicated position for the fb handle in the driver constbuf. This logic is only triggered when a FBFETCH is actually present so it should be a no-op most of the time. Signed-off-by: Ilia Mirkin <>
2017-01-12nvc0/ir: only try to check for zero LOD if we aren't already forcing itIlia Mirkin1-1/+1
There's a levelZero flag which forces texturing to pick level zero (and not consume an explicit LOD argument). This is set for MS targets, but could also be set for any other incoming instruction. As that is what determines whether a LOD argument is present, check that rather than the more indirect isMS logic. Signed-off-by: Ilia Mirkin <>
2017-01-12nv50/ir: do not insert texture barriers on gm107Samuel Pitoiset1-1/+2
It's actually useless to insert those texture barriers post RA because the current control code (ie. st 0x0) will wait for all dependencies before issuing a new instruction. Signed-off-by: Samuel Pitoiset <> Reviewed-by: Ilia Mirkin <> Reviewed-by: Pierre Moreau <>
2016-11-20nvc0/ir: use levelZero flag when the lod is set to 0Ilia Mirkin1-6/+42
Signed-off-by: Ilia Mirkin <> Reviewed-by: Samuel Pitoiset <>
2016-10-19nvc0/ir: simplify predicate logic for GK104 atomic operationsSamuel Pitoiset1-14/+7
The predicate is always CC_NOT_P as defined in processSurfaceCoordsNVE4(), so we only want to emit OR. Signed-off-by: Samuel Pitoiset <> Reviewed-by: Ilia Mirkin <>
2016-10-19nvc0/ir: remove useless NVC0LoweringPass::gMemBaseSamuel Pitoiset1-4/+1
Signed-off-by: Samuel Pitoiset <>
2016-10-18gm107/ir: fix texturing with indirect samplersIlia Mirkin1-0/+10
The indirect handle has to come right after the coordinates, so if there was a sample/bias/depth compare/offset, everything would end up being shifted by one argument position. Signed-off-by: Ilia Mirkin <> Reviewed-by: Samuel Pitoiset <> Cc:
2016-10-12nvc0/ir: fix textureGather with a single offsetIlia Mirkin1-2/+2
Recent fix for non-const offsets broke the case of a single offset (vs 4 offsets). The later code relies on the offs array to contain null values to tell whether they should be added onto the srcs list. Fixes: 5239bd592 ("nvc0/ir: fix overwriting of value backing non-constant gather offset") Signed-off-by: Ilia Mirkin <> Reviewed-by: Samuel Pitoiset <> Cc:
2016-10-10nvc0/ir: fix overwriting of value backing non-constant gather offsetIlia Mirkin1-2/+2
Normally the value is an immediate, which is moved to some temporary, so there's no problem. In the case of a non-constant offset (as allowed by ARB_gpu_shader5), we have to take care to copy it first before using it to build up the bits. This fixes a compilation error observed in F1 2015. Signed-off-by: Ilia Mirkin <> Reviewed-by: Samuel Pitoiset <> Cc:
2016-09-10gm107/ir: allow indirect inputs to be loaded by frag shaderIlia Mirkin1-4/+21
Looks like the GM107 IPA op does not allow a separate offset when using an indirect register. Instead we must use AL2P like we do for indirect vertex operations on Kepler+. Signed-off-by: Ilia Mirkin <> Reviewed-by: Samuel Pitoiset <>