summaryrefslogtreecommitdiff
path: root/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h
AgeCommit message (Collapse)AuthorFilesLines
2017-02-09nvc0/ir: add support for all the new int64 tgsi opcodesIlia Mirkin1-0/+2
A few thoughts: - Some of that LegalizeSSA logic should really live much earlier and be subject to the likes of DCE and other useful passes - Some of the "lowering" done in from_tgsi should be done later so that proper optimization might be done. However this all works and the above can be improved upon later. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-11-20nvc0/ir: use levelZero flag when the lod is set to 0Ilia Mirkin1-0/+1
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-07-27gm107/ir: add a legalize SSA pass for PFETCHSamuel Pitoiset1-1/+1
PFETCH, actually ISBERD on GM107+ ISA only accepts a GPR for src0. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-20gm107/ir: lower surface operationsSamuel Pitoiset1-0/+2
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-08nvc0/ir: remove unused resource info loading helpersSamuel Pitoiset1-4/+0
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-08nvc0/ir: refactor the surfaces info loading logicSamuel Pitoiset1-1/+1
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-06-19gk104/ir: fix tex use generation to be more careful about eliding usesIlia Mirkin1-2/+3
If we have a loop, instructions before the tex might be added as tex uses, and those may in fact dominate all other uses of the tex results. This however doesn't mean that we don't need a texbar after the tex. Only check if uses dominate each other they are dominated by the tex. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96565 Fixes: 7752bbc44 (gk104/ir: simplify and fool-proof texbar algorithm) Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-05-21nvc0/ir: add a lowering pass for surfaces on FermiSamuel Pitoiset1-0/+2
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-11nvc0: fix gl_SampleMaskIn computationIlia Mirkin1-0/+1
The SAMPLEMASK semantic should only return the bits set covered by the current invocation. However we were always retrieving the covmask, which returns the covered samples of the whole pixel. When not doing per-sample invocation, this is precisely what we want. However when doing per-sample invocation, we have to select the sampleid'th bit and only return that. Furthermore, this means that we have to have a 1:1 correlation for invocations and samples. This fixes most dEQP-GLES31.functional.shaders.sample_variables.sample_mask_in.* tests. A few failures remain due to disagreements about nr_samples==1 logic as well as what happens with MSAA x2 RTs when the shading fraction is 0.5. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26nv50/ir: add support for SULDP -> SULDB conversionIlia Mirkin1-0/+1
This will allow to convert surface formats without adding an extra call to our lib. [hakzsam: make use of this for GK104] Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26nv50/ir: make use of OP_SUQ for surfaces querySamuel Pitoiset1-1/+1
This implements RESQ for surfaces which comes from imageSize() GLSL bultin. As the dimensions are sticked into the driver constant buffer, this only has to be lowered with loads. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> (v2)
2016-04-26nv50/ir: add OP_BUFQ for buffers querySamuel Pitoiset1-0/+1
TGSI RESQ allows both images and buffers but we have to make a distinction between these two type of resources in our lowering pass. Introducing OP_BUFQ which is a fake operand will allow to implement OP_SUQ for surfaces. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01nvc0/ir: add atomics support on shared memory for KeplerSamuel Pitoiset1-0/+1
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01nvc0/ir: add support for compute UBOs on KeplerSamuel Pitoiset1-0/+3
Make sure to avoid out of bounds access in presence of indirect array indexing by loading the size from the driver constant buffer. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-29nvc0/ir: move load/store lowering pass to handleLDST()Samuel Pitoiset1-0/+1
Having all this code in a big switch is not really a good pratice. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-29nvc0: use a different offset for buffers and surfacesSamuel Pitoiset1-3/+9
To not overwrite buffers and surfaces information, we need to use a different offset in the driver constant buffer. Currently, OP_SUQ is only supported for buffers but this will be slightly updated for images support. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-21nv50/ir: add atomics support on shared memory for FermiSamuel Pitoiset1-0/+1
Changes from v3: - move the previous OP_SELP change to the previous commit Changes from v2: - make sure the op is OP_SELP when emitting the predicate and add one assert - use bld.getSSA() for mkOp2() - add cross edge between tryLockAndSetBB and joinBB Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29nv50/ir: add SUQ op by reading the info from driver constbufIlia Mirkin1-0/+1
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29nv50/ir: add support for BUFFER accessesIlia Mirkin1-0/+2
This largely leaves the existing image logic alone. When image support is added this will have to be harmonized somehow. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-12gk104/ir: simplify and fool-proof texbar algorithmIlia Mirkin1-6/+4
With the current algorithm, we only look at tex uses. However there's a write-after-write hazard where we might decide to, on some path, not use a texture's output at all, but instead to write a different value to that register. However without the barrier, the texture might complete later and overwrite that value. This fixes Unreal Elemental demo on GK110/GK208, flightgear on GK10x, and likely other random-looking failures. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "11.1" <mesa-stable@lists.freedesktop.org>
2015-08-20nv50/ir: support different unordered_set implementationsChih-Wei Huang1-3/+1
If build with C++11 standard, use std::unordered_set. Otherwise if build on old Android version with stlport, use std::tr1::unordered_set with a wrapper class. Otherwise use std::tr1::unordered_set. Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-17gm107/ir: avoid letting the lowering pass get out of syncIlia Mirkin1-1/+2
There's a lot of functionality duplicated in the gm107 lowering pass from the nvc0 pass. As that one gets updated, the gm107 one falls behind. Avoid this by sharing the code. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-28nvc0/ir: flush denorms to zero in non-compute shadersIlia Mirkin1-0/+1
This will set the FTZ flag (flush denorms to zero) on all opcodes that can take it. This resolves issues in Unigine Heaven 4.0 where there were solid-filled boxes popping up. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89455 Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org> Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-09-01nvc0/ir: avoid infinite recursion when finding first uses of texIlia Mirkin1-1/+4
In certain circumstances, findFirstUses could end up doubling back on instructions it had already processed, resulting in an infinite recursion. Avoid this by keeping track of already-visited instructions. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83079 Tested-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
2014-05-15nvc0: add maxwell (sm50) compiler backendBen Skeggs1-1/+1
The big missing part here is proper sched data calculations, but hopefully the chosen placeholder will be sufficient for now. Passes piglit as well as GK107 does. Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-05-15nvc0: move nvc0 lowering pass class definitions into headerBen Skeggs1-0/+134
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>