summaryrefslogtreecommitdiff
path: root/src/gallium
AgeCommit message (Collapse)AuthorFilesLines
2015-01-12r600g: fix build failure when building the driver without LLVMMarek Olšák1-0/+4
2015-01-11vc4: Clamp the inputs to the blend equation to [0, 1].Eric Anholt1-1/+10
Fixes the remaining ARB_color_buffer_float rendering tests.
2015-01-11vc4: Add a little helper for clamping to [0,1].Eric Anholt1-4/+10
2015-01-11vc4: Fix up statechange management for uncompiled/compiled FS/VS.Eric Anholt2-11/+10
No need to recheck the FS compile when the VS source has changed, but there *is* a need to recheck the VS compile when the compiled VS has changed (since the live inputs may change). Fixes es3conform's blend test.
2015-01-11vc4: Fix clear color setup for RGB565.Eric Anholt1-1/+4
The util_pack_color() thing only sets up the low bits of the union, so only return them, too. Fixes intermittent failure on fbo-alphatest-formats and es3conform's framebuffer-objects test under simulation.
2015-01-11vc4: Avoid the save/restore of r3 for raddr conflicts, just use ra31.Eric Anholt2-38/+11
Turns out this was harmful in code quality: total instructions in shared programs: 39487 -> 38845 (-1.63%) instructions in affected programs: 22522 -> 21880 (-2.85%) This costs us yet another register, which is painful since it means more programs might fail to compile). However, the alternative was causing us trouble where we'd save/restore r3 while it contained a MIN-ed direct texture offset, causing the kernel to fail to validate our shaders (such as in GLB2.7).
2015-01-10vc4: Allow dead code elimination of VPM reads.Eric Anholt2-1/+44
This gets a bunch of dead reads out of the CSes, which don't read most attributes generally. total instructions in shared programs: 39753 -> 39487 (-0.67%) instructions in affected programs: 4721 -> 4455 (-5.63%)
2015-01-10vc4: Cook up the draw-time VPM setup info during shader compile.Eric Anholt4-11/+28
This will give the compiler the chance to dead-code eliminate unused VPM reads. This is particularly a big deal in the CS where a bunch of vattrs are just not going to be used.
2015-01-10vc4: Split two notions of instructions having side effects.Eric Anholt5-4/+15
Some ops can't be DCEd, while some of the ops that are just important due to the args they have can be.
2015-01-10vc4: Redo VPM reads as a read file.Eric Anholt5-16/+16
This will let us do copy propagation of the VPM reads.
2015-01-10vc4: Fix miscalculation of the VPM space.Eric Anholt1-1/+1
We pass in a byte offset, not dword. I'm rather scared that this actually managed to pass piglit, but it does fix gears.
2015-01-10vc4: Pack VPM attr contents according to just the size of the attribute.Eric Anholt3-11/+9
total instructions in shared programs: 40960 -> 39753 (-2.95%) instructions in affected programs: 20871 -> 19664 (-5.78%)
2015-01-10vc4: Restructure color packing as a series of channel replacements.Eric Anholt4-49/+60
I'm using this in some WIP commits for doing blending in 8888 instead of vec4. But it also gives us these results immediately, thanks to allowing more uniforms/immediates in the arguments: total instructions in shared programs: 41027 -> 40960 (-0.16%) instructions in affected programs: 4381 -> 4314 (-1.53%)
2015-01-10vc4: Fix the no-copy-propagating-from-TLB_COLOR_READ check.Eric Anholt1-1/+1
Our MOV's dst obviously won't be the TLB_COLOR_READ's def, because we're ssa.
2015-01-10vc4: Move global seqno short-circuiting to vc4_wait_seqno().Eric Anholt2-6/+3
Any other caller would want it, too.
2015-01-08st/wgl: Ignore ulVersion in DrvValidateVersion.José Fonseca1-2/+10
We never used ulVersion for proper version checks. Most 3rd party drivers use version 1, but recently NVIDIA OpenGL driver started using a different version number, so the handy trick of renaming Mesa's ICDs as nvoglv32.dll on Windows machines with NVIDIA hardware for quick testing of Mesa software renderers stopped working. Reviewed-by: Brian Paul <brianp@vmware.com>
2015-01-07freedreno/ir3: fix pos_regid > max_regRob Clark4-41/+121
We can't (or don't know how to) turn this off. But it can end up being stored to a higher reg # than what the shader uses, leading to corruption. Also we currently aren't clever enough to turn off frag_coord/frag_face if the input is dead-code, so just fixup max_reg/max_half_reg. Re-org this a bit so both vp and fp reg footprint fixup are called by a common fxn used also by ir3_cmdline. Also add a few more output lines for ir3_cmdline to make it easier to see what is going on. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07freedreno/ir3: start on indirect gpr readsRob Clark3-8/+146
Handle TEMP[ADDR[]] src registers by generating a fanin to group array elements, similarly to how texture fetch instructions work. NOTE: For all the scalar instructions generated for a single tgsi vector operation which uses an array src (or possibly even uses the same array as multiple srcs), re-use the same fanin node. Since a vector operation operates on all components at the same time, it should never see more than one version of the same array. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07freedreno/ir3: make reg array dynamicRob Clark4-13/+50
To use fanin's to group registers in an array, we can potentially have a much larger array of registers. Rather than continuing to bump up the array size, just make it dynamically allocated when the instruction is created. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07freedreno/ir3: simplify RARob Clark8-777/+622
Group inputs/outputs, in addition to fanin/fanout, as they must also exist in sequential scalar registers. This lets us simplify RA by working in terms of neighbor groups. NOTE: has the slight problem that it can't optimize out mov's for things like: MOV OUT[n], IN[m] To avoid this, instead of trying to figure out what mov's we can eliminate, we first remove all mov's prior to grouping, and then re-insert mov's as needed while grouping inputs/outputs/fanins. Eventually we'd prefer the frontend to not insert extra mov's in the first place (so we don't have to bother removing them). This is the plan for an eventual NIR based frontend, so separate out the instr grouping (which will still be needed for NIR frontend) from the mov elimination (which won't). Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07freedreno/ir3: regmask support for relative addrRob Clark2-17/+51
For temp arrays, a 32bit mask won't be sufficient.. but otoh we don't need to support an arbitrary mask. So for this case use a simple size field rather than a bitmask. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07freedreno/ir3: split up ssa_srcRob Clark1-23/+34
Slight bit of refactoring that will be needed for indirect gpr addressing (TEMP[ADDR[]]). Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07freedreno/ir3: drop instr_clone() stuffRob Clark2-49/+17
Unnecessary and overly complicated. And gets in the way for temp arrays (TEMP[ADDR[]]). Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07freedreno/ir3: runtime enable RA debug for DEBUG buildsRob Clark1-1/+6
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07freedreno/ir3: handle relative addr in ir3_dumpRob Clark1-1/+8
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07freedreno/ir3: legalize vs unused sam dst componentsRob Clark2-2/+9
We probably could be more clever elsewhere and mask out components that are not used. But either way, legalize should realize that there is also a write-after-write hazard with texture sample instructions. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07freedreno/ir3: hack for old compilerRob Clark1-0/+23
Old compiler doesn't have ir3_block's.. so we need a special path. This hack can be dropped when ir3_compiler_old is retired. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07tgsi: track max array per fileRob Clark2-0/+4
NOTE IN[] and OUT[] don't need (have?) ArrayID's.. and TEMP[] can optionally have them. So we implicitly assume that ArrayID==0 always exists for each file. This is why array_max[file] is never less than zero. You can tell from indirect_files(_read/written) if the legacy array- id zero was actually used. Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07tgsi: keep track of read vs written indirectsRob Clark2-0/+8
At least temporarily, I need to fallback to old compiler still for relative dest (for freedreno), but I can do relative src temp. Only a temporary situation, but seems easy/reasonable for tgsi-scan to track this. Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-01-08Revert "radeonsi: reduce the size of si_pm4_state"Marek Olšák2-3/+12
This reverts commit 9141d8855555e45a057970e78969e1518ad3617d. It broke OpenCL.
2015-01-07radeonsi: Fix crash when destroying si_screenTom Stellard1-2/+4
We were invalidating si_screen:tm by calling r600_destroy_common_screen() which frees the si_screen object. This caused the driver to crash in LLVMDisposeTargetMachine() since we were passing it an invalid pointer. https://bugs.freedesktop.org/show_bug.cgi?id=88170
2015-01-07radeonsi: enable LLVM optimizations that assume no NaNs for non-compute shadersMarek Olšák3-4/+12
v2: complete rewrite Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-01-07radeonsi: emit SURFACE_SYNC lastMarek Olšák1-23/+35
This fixes a case where a transform feedback buffer is fed back as an index buffer, because SURFACE_SYNC must be after VS_PARTIAL_FLUSH. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07radeonsi: flush all CB/DB caches unconditionally when changing the framebufferMarek Olšák1-11/+7
This is easier to read and will work better with shader image stores. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07radeonsi: change TC cache flushing strategy for texturesMarek Olšák2-4/+6
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07radeonsi: improve and fix streamout flushingMarek Olšák3-10/+40
- we don't usually need to flush TC L2 - we should flush KCACHE (not really an issue now since we always flush KCACHE when updating descriptors, but it could be a problem if we used CE, which doesn't require flushing KCACHE) - add an explicit VS_PARTIAL_FLUSH flag Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07radeonsi: use TC L2 for CP DMA operations with shader resources on CIKMarek Olšák3-10/+39
So that TC L2 doesn't need to be flushed. The only problem is with index buffers, which don't use TC. A simple solution is added that flushes TC L2 before a draw call (TC_L2_dirty). Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07radeonsi: use TC L2 for updating descriptors on CIKMarek Olšák2-5/+10
This allows not flushing TC L2 on CIK later. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07radeonsi: don't use TC L2 for updating descriptors on SIMarek Olšák2-2/+14
It's causing problems, because we mix uncached CP DMA with cached WRITE_DATA when updating the same memory. The solution for SI is to use uncached access here, because CP DMA doesn't support cached access. CIK will be handled in the next patch. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07radeonsi: only flush the right set of caches for CP DMA operationsMarek Olšák9-34/+48
That's either framebuffer caches or caches for shader resources. The motivation is that framebuffer caches need to be flushed very rarely here. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07radeonsi: implement separate ICACHE and KCACHE flush for SIMarek Olšák1-9/+17
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07radeonsi: add a combined flag for flushing a framebufferMarek Olšák3-20/+10
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07radeonsi: rename flush flags, split the TC flag into L1 and L2Marek Olšák7-91/+109
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07r600g,radeonsi: separate cache flush flagsMarek Olšák5-26/+39
I will rename them for radeonsi. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07r600g: move r6xx-specific streamout flush flagging into r600gMarek Olšák2-9/+7
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07radeonsi: only set BC_OPTIMIZE_DISABLE when necessaryMarek Olšák2-6/+15
SPI_PS_IN_CONTROL is moved into the SPI mapping state. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07radeonsi: do not define FACE as an ordinary PS inputMarek Olšák1-1/+2
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07radeonsi: remove flatshade from the shader keyMarek Olšák3-7/+7
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07radeonsi: remove special handling of TGSI_INTERPOLATE_COLOR in shader codegenMarek Olšák1-6/+10
It doesn't do anything useful. And colors are floating-point, so we can use fs.interp, remove "flatshade" from the shader key, and rely on the FLAT_SHADE state only (in the next patch). Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-07radeonsi: implement VERTEXID_NOBASE and BASEVERTEX system valuesMarek Olšák1-0/+10
Only done for completeness. Not used by anything yet. Tested by advertising PIPE_CAP_VERTEXID_NOBASE. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>