summaryrefslogtreecommitdiff
path: root/src/gallium/drivers/nouveau
AgeCommit message (Collapse)AuthorFilesLines
2019-01-01nv30: disable rendering to 3D texturesIlia Mirkin1-0/+6
There's no way to tell the 3D engine about swizzling on such textures. While rendering to NPOT ones may be possible, there's no great way to expose that in gallium, nor would there be any practical benefit. Fixes the non-compressed-format "copyteximage 3D" failures. Something odd going on with the compressed formats. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-12-30nv30: fix some s3tc layout issuesIlia Mirkin2-7/+26
s3tc layouts are a bit finicky - they're packed, but not swizzled. Adjust logic to allow for that case: - Don't set a uniform pitch for POT-sized compressed textures - Adjust define_rect API to be less confused about block sizes - Only mark a texture as linear if it has a uniform pitch set This has been tested to fix xonotic (as well as the s3tc-* piglits) on nv3x and keeps it working on nv4x. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-12-30nv30: use correct helper to get blocks in y directionIlia Mirkin1-1/+1
This doesn't matter since all compressed formats supported by this hardware use square blocks, but best to use the correct helper. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-12-30nv30: add support for multi-layer transfersIlia Mirkin1-4/+35
This logic mirrors what we do on nv50. The relatively new texture_subdata callback can cause this to happen with 3D textures, which is triggered at least by xonotic, and probably many piglits. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-12-30nv30: fix rare issue with fp unbinding not finding the bufctxIlia Mirkin1-1/+1
If the last-active context gets deleted, the pushbuf doesn't have a bufctx to reference. Then there could be a sequence of binds which would trigger a reset on that bin before validation was done. Instead we just pass in the bufctx in question directly. All other instances of PUSH_RESET happen strictly after a validation is run. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102349 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-12-30nv30: avoid setting user_priv without setting cur_ctxIlia Mirkin1-3/+1
The whole user_priv thing is a mess, but as long as it's there, it basically has to map 1:1 to the cur_ctx. Unfortunately we were setting user_priv to some context, then that context could get deleted without any draws/validations in it, leading user_priv to become NULL, with cur_ctx still pointing at some old context. Then we wouldn't run the switch logic, which in turn led to a NULL bufctx being dereferenced. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102349 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-12-26nv50,nvc0: add missing CAPs for unsupported featuresIlia Mirkin2-0/+3
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-12-26nvc0: enable GL_NV_shader_atomic_float on pre-MaxwellIlia Mirkin1-0/+2
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-12-26nv50/ir: add support for converting ATOMFADD to proper irIlia Mirkin1-0/+4
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-12-14nvc0: always keep TSC slot 0 bound to fix TXFIlia Mirkin2-0/+21
Same as on nv50, the TXF op always uses the TSC bound to slot 0, returning blank values if nothing is bound. An earlier change arranges for the TSC entries list to always have valid data at entry 0, so here we just make use of it. Fixes arb_texture_buffer_object-subdata-sync among others. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-12-14nvc0: replace use of explicit default_tsc with entry 0Ilia Mirkin6-22/+25
This was used for implementing FBFETCH. However that uses TXF, which doesn't do much with a TSC. The only important bit is that sRGB-decoding works as expected, which we can achieve since all samplers we ever generate enable sRGB-decoding. Always point to entry 0 in the TSC table, and ensure that even before it ever gets initialized, the sRGB-decoding enable bit is set. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-12-09nv50/ir: fix use-after-free in ConstantFolding::visitKarol Herbst1-33/+49
opnd() might delete the passed in instruction, but it's used through i->srcExists() later in visit v2: use continue instead return v3: use brackets for the outer if/else chain Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-12-09nouveau: use atomic operations for driver statisticsKarol Herbst1-3/+4
multiple threads can write to those at the same time Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-12-09nv50/ir: initialize relDegree staticlyKarol Herbst1-7/+16
this race condition is pretty harmless, but also pretty trivial to fix Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-12-03nouveau: set texture upload budgetIlia Mirkin3-3/+6
It doesn't seem like the exact number has too much effect on the performaince in "teximage". However setting it to just about anything prevents some OOMs from getting hit. These values are not well-tuned, but don't seem too bad. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-12-03nv50,nvc0: add explicit handling of PIPE_CAP_MAX_VERTEX_ELEMENT_SRC_OFFSETIlia Mirkin2-0/+4
Since the max attrib stride is 2048, the max src offset makes sense as 2047. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-12-03nv50: always keep TSC slot 0 boundIlia Mirkin3-0/+31
All TXF operations implicitly use sampler 0, and fail if it's not bound to anything. This does not happen in LINKED_TSC mode, but we don't currently use this. We ensure that TSC entry at id 0 has the SRGB conversion bit enabled (and all samplers we normally generate will too). Then when the TSC at *slot* 0 (not to be confused with entry 0 in the global TSC table) is unbound, we bind it to entry 0. This way, TXF operations are not dependent on there being a regular sampler bound there. Fixes arb_texture_buffer_object-subdata-sync among others. (TBO's are particularly susceptible to this as they don't bind a sampler.) Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-12-02nv50,nvc0: Fix gallium nine regression regarding sampler bindingsKarol Herbst2-16/+12
The new approach is that samplers don't get unbound even if they won't be used in a draw and we should just leave them be as well. Fixes a regression in multiple windows games using gallium nine and nouveau. v2: adjust num_samplers to keep track of the highest sampler bound v3: rework how to set the new value of num_samplers Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106577 Fixes: 4d6fab245eec3880e2a59424a579851f44857ce8 "cso: don't track the number of sampler states bound" Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-11-24nv50/ir: remove dnz flag when converting MAD to ADD due to optimizationsIlia Mirkin1-0/+3
dnz flag only applies for multiplications (e.g. to make 0 * Infinity becomes 0 instead of NaN). Once we optimize a MAD into an ADD, the dnz flag no longer makes sense, and upsets the GM107 emitter (since it looks at the ftz and dnz flags together). Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Karol Herbst <kherbst@redhat.com>
2018-11-16nv50/ir/ra: enforce max register requirement, and change spill orderIlia Mirkin4-16/+26
On nv50, certain operations must happen on regs below 64, due to encoding requirements. First of all, we add infrastructure to enforce this. Secondly we change the spill order to first spill RIG nodes that are unconstrained, followed by ones that are. This makes the gamecube logo shadertoy compile properly. Curiously, if we adjust the spill order so that we first spill the constrained RIG nodes instead, the RA also succeeds. However it seems more logical to first spill the unconstrained ones. While we're at it, drop the nv50 max register to reserve r127 as the zero register of last resort (r63 is preferred). Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Acked-by: Karol Herbst <kherbst@redhat.com>
2018-11-16nv50/ir/ra: improve condition for short regs, unify with cond for 16-bitIlia Mirkin1-7/+7
Instead of the size restriction existing in two places, and potentially being applied twice, we move this together. Ops with 16-bit register addresses can only take a short reg, and ops with immediates can only take a short reg. Of course we leave the immediate 0 in place since we know that it will be replaced by r63/r127 down the line, so don't treat zeroes as an immediate. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Karol Herbst <kherbst@redhat.com>
2018-11-16nv50/ir: delete MINMAX instruction that is no longer in the BBIlia Mirkin1-1/+1
We removed the op from the BB, but it was still listed in its sources' uses. This could trip up some logic down the line which analyzes all the uses of an l-value, e.g. spilling. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Karol Herbst <kherbst@redhat.com>
2018-11-07gm107/ir: fix compile time warning in getTEXSMaskKarol Herbst1-0/+1
In function 'uint8_t nv50_ir::getTEXSMask(uint8_t)': warning: control reaches end of non-void function [-Wreturn-type] Reported-by: Moiman@freenode Fixes: f821e80213e38e93f96255b3deacb737a600ed40 "gm107/ir: use scalar tex instructions where possible" Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-11-06gm107/ir: use scalar tex instructions where possibleKarol Herbst2-3/+317
TEXS, TLD4 and TLD4S are variants of tex instructions which are more scalar, which gives RA more freedom and is less likely to insert silly MOVs to satisfy quad registers. shader-db changes: total instructions in shared programs : 7687265 -> 7614782 (-0.94%) total gprs used in shared programs : 803620 -> 798045 (-0.69%) total shared used in shared programs : 639636 -> 639636 (0.00%) total local used in shared programs : 24648 -> 24648 (0.00%) total bytes used in shared programs : 82103400 -> 81330696 (-0.94%) local shared gpr inst bytes helped 0 0 3648 10647 10647 hurt 0 0 464 205 205 Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-11-06nv50/ir: add scalar field to TexInstructionsKarol Herbst2-1/+6
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-11-06nv50/ra: add condenseDef overloads for partial condensesKarol Herbst1-8/+21
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-11-06nv50/ir: print color masks of tex instructionsKarol Herbst1-4/+33
v2: print the mask for TXG as well make the mask to be printed more mask like Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-10-30nouveau: remove unused class memberEric Engestrom1-1/+0
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-10-26util: Change remaining uint32 cache ids to sha1David McFarland1-14/+15
After discussion with Timothy Arceri. disk_cache_get_function_identifier was using only the first byte of the sha1 build-id. Replace disk_cache_get_function_identifier with implementation from radv_get_build_id. Instead of writing a uint32_t it now writes to a mesa_sha1. All drivers using disk_cache_get_function_identifier are updated accordingly. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Fixes: 83ea8dd99bb1 ("util: add disk_cache_get_function_identifier()")
2018-10-25nvc0: increase NOUVEAU_TRANSFER_PUSHBUF_THRESHOLD to 1024 on Kepler+Rhys Perry4-3/+11
Gives a +3.89% to +5.27% FPS improvement with Hitman and +2.73% to +2.82% FPS improvement with Dirt Rally on my GTX 1060. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-10-20nv50/ir: fix ConstantFolding::createMul for 64 bit mulsKarol Herbst1-1/+1
Fixes: 2f52925f5c60c72c9389bfdc122c3d5f8e15b25f "nv50/ir: move a * b -> a << log2(b) code into createMul()" Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Signed-off-by: Karol Herbst <kherbst@redhat.com>
2018-10-09nvc0: fix blitting red to srgb8_alphaIlia Mirkin1-0/+4
For some reason the 2d engine can't handle this. Red formats get special treatment there, so perhaps related. Fixes dEQP-GLES3 tests of the form: dEQP-GLES3.functional.fbo.blit.conversion.r{8,16f,32f}_to_srgb8_alpha8 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Karol Herbst <kherbst@redhat.com> Cc: mesa-stable@lists.freedesktop.org
2018-10-09nv50,nvc0: guard against zero-size blitsIlia Mirkin2-0/+14
The current state tracker can generate these sometimes. Fixing this is more involved, and due to some integer math we can generate divisions-by-zero. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Karol Herbst <kherbst@redhat.com> Cc: mesa-stable@lists.freedesktop.org
2018-10-09nv50,nvc0: mark RGBX_UINT formats as renderableIlia Mirkin1-4/+4
This helps st/mesa avoid some (apparently) buggy fallbacks. Specifically the CopyTexSubImage fallback tries to read texture A as RGBA_FLOAT and write back that data into the target format, which fails for integer formats which have no appropriate logic to do the conversion. Since integer formats don't blend, there's no harm in the fact that the "A" component gets written anyways. Fixes, among others: https://www.khronos.org/registry/webgl/sdk/tests/conformance2/textures/canvas/tex-2d-rgb8ui-rgb_integer-unsigned_byte.html Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org
2018-10-03nouveau: use build-id when available for disk cacheTimothy Arceri1-7/+7
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-09-23nv50/ir: fix link-time build failureRhys Perry1-1/+1
Seems this fixes linking problems that occur in some situations. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-09-22nvc0: fix bindless multisampled images on Maxwell+Rhys Perry3-5/+45
NVC0_CB_AUX_BINDLESS_INFO isn't written to on Maxwell+ and it's too small anyway. With these changes, TXQ is used to determine the number of samples and the coordinate adjustment information looked up in a small array in the driver constant buffer. v2: rework to use TXQ and a small array instead of a larger array with an entry for each texture v3: get rid of the small array and calculate the adjustments in the shader Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Fixes: c2ae9b40527 ('nvc0: implement multisampled images on Maxwell+') Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-09-22nvc0: warn about changing NVC0_CB_AUX_MP_INFO and NVC0_CB_AUX_DRAW_INFORhys Perry1-2/+6
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-09-22nvc0: Update counter reading shaders to new NVC0_CB_AUX_MP_INFORhys Perry1-18/+18
Fixes: 66ca7e400b8 ('nvc0: add support for programmable sample locations') Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-09-13nvir: Always split 64-bit IMAD/IMUL operationsPierre Moreau1-1/+1
Those operations do not map to actual hardware instructions, therefore those should always be lowered to 32-bit instructions. Fixes: 009c54aa7af "nv50/ir: Split 64-bit integer MAD/MUL operations" Signed-off-by: Pierre Moreau <pierre.morrow@free.fr> Reviewed-by: Karol Herbst <kherbst@redhat.com> Signed-off-by: Karol Herbst <kherbst@redhat.com>
2018-09-11nv50,nvc0: warn on not-explicitly-handled capsIlia Mirkin2-14/+26
Not handling caps explicitly means that we're likely getting incorrect values -- these need to be reviewed and set appropriately. While we're at it, add in some missing caps, and set all the subpixel stuff to 8 as that seems to be what the blob reports. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-09-07gallium: add PIPE_CAP_MAX_TEXTURE_UPLOAD_MEMORY_BUDGETMarek Olšák3-0/+3
2018-09-06gallium: enable GL_AMD_depth_clamp_separate on r600, radeonsiMarek Olšák3-0/+3
2018-09-06gallium: split depth_clip into depth_clip_near & depth_clip_farMarek Olšák3-3/+3
for AMD_depth_clamp_separate.
2018-09-04gallium: Add a helper for implementing PIPE_CAP_* default values.Eric Anholt3-9/+9
One of the pains of implementing a gallium driver is filling in a million pipe caps you don't know about yet when you're just starting out. One of the pains of working on gallium is copy-and-pasting your new PIPE_CAP into each driver. We can fix both of these by having each driver call into the default helper from their default case, so that both sides can ignore each other until they need to. v2: fix i915g build, revert swr change to avoid breaking scons build (https://travis-ci.org/anholt/mesa/jobs/419739857) v3: Rebase on 3 new gallium caps. Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1) Cc: Bruce Cherniak <bruce.cherniak@intel.com> Cc: George Kyriazis <george.kyriazis@intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org>
2018-08-29nv50: bump compat glsl level to same as coreIlia Mirkin1-1/+1
Passes the compat piglits. I'm sure that there will be odd issues that aren't caught by them, but at least it should basically work. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-08-29nvc0: bump compat GLSL version to match coreIlia Mirkin1-1/+1
This passes the handful of tests in piglit. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-08-29nv50/ir: silence partitionLoadStore() unused function warningRhys Kidd1-2/+2
Move this now-unused function into the existing comment block, which was its only prior use. ../../../../../src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp:2645:1: warning: unused function 'partitionLoadStore' [-Wunused-function] partitionLoadStore(uint8_t comp[2], uint8_t size[2], uint8_t mask) Fixes: ("86e4440361 nouveau: codegen: Disable more old resource handling code") Signed-off-by: Rhys Kidd <rhyskidd@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-08-27nv50/ir,nvc0: use constant buffers for compute when possible on Kepler+Rhys Perry2-10/+36
Gives a +7.79% increase in FPS with Hitman on lowest quality settings on my GTX 1060. total instructions in shared programs : 5787979 -> 5748677 (-0.68%) total gprs used in shared programs : 669901 -> 669373 (-0.08%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21068 -> 21064 (-0.02%) local shared gpr inst bytes helped 1 0 152 274 274 hurt 0 0 0 0 0 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Karol Herbst <kherbst@redhat.com>
2018-08-27nv50/ir: optimize multiplication by 16-bit immediates into two xmadsRhys Perry1-0/+10
Rather than the usual three that would be created. total instructions in shared programs : 5796385 -> 5786560 (-0.17%) total gprs used in shared programs : 670103 -> 669968 (-0.02%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21164 -> 21068 (-0.45%) local shared gpr inst bytes helped 1 0 64 1040 1040 hurt 0 0 27 0 0 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Karol Herbst <kherbst@redhat.com>