summaryrefslogtreecommitdiff
path: root/src/gallium/drivers/vc4
AgeCommit message (Collapse)AuthorFilesLines
2019-01-08nir: rename global/local to private/function memoryKarol Herbst1-2/+2
the naming is a bit confusing no matter how you look at it. Within SPIR-V "global" memory is memory accessible from all threads. glsl "global" memory normally refers to shader thread private memory declared at global scope. As we already use "shared" for memory shared across all thrads of a work group the solution where everybody could be happy with is to rename "global" to "private" and use "global" later for memory usually stored within system accessible memory (be it VRAM or system RAM if keeping SVM in mind). glsl "local" memory is memory only accessible within a function, while SPIR-V "local" memory is memory accessible within the same workgroup. v2: rename local to function as well v3: rename vtn_variable_mode_local as well Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-20vc4: Hook up perf_debug() output to GL_ARB_debug_output as well.Eric Anholt2-0/+3
This is the right channel to report these things, so that end-users don't need to know each driver's custom debug options.
2018-12-20vc4: Wire up core pipe_debug_callbackRhys Kidd2-0/+14
This lets the driver use pipe_debug_message() for GL_ARB_debug_output. Signed-off-by: Rhys Kidd <rhyskidd@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>
2018-12-19vc4: Move the utile load/store functions to a header for reuse by v3d.Eric Anholt2-202/+11
These implementations of whole-utile load/stores would be the same for v3d, though the layouts of blocks of utiles has changed.
2018-12-17nir/opt_peephole_select: Don't peephole_select expensive math instructionsIan Romanick1-1/+1
On some GPUs, especially older Intel GPUs, some math instructions are very expensive. On those architectures, don't reduce flow control to a csel if one of the branches contains one of these expensive math instructions. This prevents a bunch of cycle count regressions on pre-Gen6 platforms with a later patch (intel/compiler: More peephole select for pre-Gen6). v2: Remove stray #if block. Noticed by Thomas. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17nir/opt_peephole_select: Don't try to remove flow control around indirect loadsIan Romanick1-1/+1
That flow control may be trying to avoid invalid loads. On at least some platforms, those loads can also be expensive. No shader-db changes on any Intel platform (even with the later patch "intel/compiler: More peephole select"). v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested by Rob. See also the big comment in src/intel/compiler/brw_nir.c. v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from nir_lower_io_arrays_to_elements.c). v4: Fix inverted condition in brw_nir.c. Noticed by Lionel. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17vc4: Reuse nir_format_convert.h in our blend lowering.Eric Anholt1-33/+3
These helpers came along after and have effectively the same implementation.
2018-12-16nir: Add a bool to int32 lowering passJason Ekstrand1-0/+2
We also enable it in all of the NIR drivers. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-16nir: Rename Boolean-related opcodes to include 32 in the nameJason Ekstrand1-20/+20
This is a squash of a bunch of individual changes: nir/builder: Generate 32-bit bool opcodes transparently nir/algebraic: Remap Boolean opcodes to the 32-bit variant Use 32-bit opcodes in the NIR producers and optimizations Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c Use 32-bit opcodes in the NIR back-ends Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-16vc4: Use the original bit size when scalarizing uniform loads.Eric Anholt1-1/+2
Prevents a regression in jekstrand's 1-bit series. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-07vc4: Fix a leak of the transfer helper on screen destroy.Eric Anholt1-0/+3
Fixes: d009463a6549 ("vc4: Switch to using u_transfer_helper for MSAA maps.")
2018-12-05nir: Make boolean conversions sized just like the othersJason Ekstrand1-4/+4
Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is one if 8, 16, 32, or 64. This leads to having a few more opcodes but now everything is consistent and booleans aren't a weird special case anymore. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2018-11-19nir: Make nir_lower_clip_vs optionally work with variables.Kenneth Graunke1-1/+2
The way nir_lower_clip_vs() works with store_output intrinsics makes a ton of assumptions about the driver_location field. In i965 and iris, I'd rather do this lowering early and work with variables. v3d may want to switch to that as well, and ir3 could too, but I'm not sure exactly what would need updating. For now, handle both methods. Reviewed-by: Eric Anholt <eric@anholt.net>
2018-11-15vc4: Don't return a vc4 BO handle on a renderonly screen.Eric Anholt1-2/+4
The handles exported need to be on the KMS device's fd, anything else is failure. Also, this code is assuming that the scanout resource has been created already, so assert it.
2018-11-15vc4: Make sure we make ro scanout resources for create_with_modifiers.Eric Anholt1-1/+9
The DRI3 create_with_modifiers paths don't set tmpl.bind to SCANOUT or SHARED, with the theory that given that you've got modifiers, that's all you need. However, we were looking at the tmpl.bind for setting up the KMS handle in the renderonly case, so we'd end up trying to use vc4's handle on the hx8357d fd. Fixes: 84ed8b67c56b ("vc4: Set shareable BOs as T tiled if possible")
2018-11-02vc4: Use the normal simulator ioctl path for CL submit as well.Eric Anholt3-13/+5
The simulator no longer needs to look back into the gallium structs.
2018-11-02vc4: Maintain a separate GEM mapping of BOs in the simulator.Eric Anholt2-42/+58
This will let us avoid looking back into the gallium driver's vc4_bo.
2018-11-02vc4: Take advantage of _mesa_hash_table_remove_key() in the simulator.Eric Anholt1-4/+2
2018-11-01vc4: Drop the winsys_stride relayout in the simluatorEric Anholt5-95/+12
Since 0c1dd9dee0da ("broadcom/vc4: Allow importing linear BOs with arbitrary offset/stride."), we have the vc4-side BO properly laid out (assuming it's linear) in the winsys BO so that we can skip this extra copy.
2018-10-30util: Move os_misc to utilDylan Baker1-1/+1
this is needed by u_debug Tested-by: Brian Paul <brianp@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-10-30vc4: Fix unused variable warning.Eric Anholt1-1/+0
Fixes: bb84fa146f22 ("util: use C99 declaration in the for-loop hash_table_foreach() macro")
2018-10-25util: use C99 declaration in the for-loop hash_table_foreach() macroEric Engestrom5-6/+0
Signed-off-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-10-15gallium/ttn: Convert inputs and outputs to derefs of variables.Eric Anholt1-3/+4
This means that TTN shaders more closely resemble GTN shaders: they have inputs and outputs as variable derefs, with the variables having their .driver_location already set up for you. This will be useful for v3d to do input variable DCE in NIR, which we can't do when the TTN shaders never have a pre-nir_lower_io stage. Acked-by: Rob Clark <robdclark@gmail.com>
2018-10-14gallium/u_transfer_helper: Add support for separate Z24/S8 as well.Kenneth Graunke1-1/+2
u_transfer_helper already had code to handle treating packed Z32_S8 as separate Z32_FLOAT and S8_UINT resources, since some drivers can't handle that interleaved format natively. Other hardware needs depth and stencil as separate resources for all formats. For example, V3D3 needs this for 24-bit depth as well. This patch adds a new flag to lower all depth/stencils formats, and implements support for Z24_UNORM_S8_UINT. (S8_UINT_Z24_UNORM is left as an exercise to the reader, preferably someone who has access to a machine that uses that format.) Reviewed-by: Eric Anholt <eric@anholt.net>
2018-09-21vc4: Remove dead i == 0 code from the cos() implementation.Eric Anholt1-6/+3
The loop starts at 1.
2018-09-21vc4: Fix sin(0.0) and cos(0.0) accuracy to fix SDL rendering rotation.Eric Anholt1-26/+40
SDL has some shaders that compute sin(angle) and cos(angle) for a rotation matrix in the VS, and angle is usually 0.0. Our previous implementation had quite a bit of error around 0.0, causing single-pixel rotations at typical window sizes. SDL2 has changed as of August 28th (commit 12156:e5a666405750) to not need sin/cos in the VS, but we should still fix this for existing implementations or similar patterns that other programs may have. glsl-cos goes from 32 instructions to 36, but 9 uniforms to 7. glsl-sin goes from 32 instructions to 34, but 8 uniforms to 7. This seems like a fine impact to have for the bugfix. Cc: 18.1 18.2 <mesa-stable@lists.freedesktop.org> Fixes: https://github.com/anholt/mesa/issues/110
2018-09-04vc4: Drop a bunch of duplicated gallium PIPE_CAP default code.Eric Anholt1-179/+0
Now that we have the util function for the default values, we can get rid of the boilerplate. v2: drop GLSL level in favor of defaults. v3: Rebase on new gallium caps
2018-09-04gallium: Add a helper for implementing PIPE_CAP_* default values.Eric Anholt1-2/+2
One of the pains of implementing a gallium driver is filling in a million pipe caps you don't know about yet when you're just starting out. One of the pains of working on gallium is copy-and-pasting your new PIPE_CAP into each driver. We can fix both of these by having each driver call into the default helper from their default case, so that both sides can ignore each other until they need to. v2: fix i915g build, revert swr change to avoid breaking scons build (https://travis-ci.org/anholt/mesa/jobs/419739857) v3: Rebase on 3 new gallium caps. Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1) Cc: Bruce Cherniak <bruce.cherniak@intel.com> Cc: George Kyriazis <george.kyriazis@intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org>
2018-08-24gallium: Split out PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE.Kenneth Graunke1-0/+1
Some hardware can do PIPE_TEX_WRAP_MIRROR_REPEAT but not PIPE_TEX_WRAP_MIRROR_CLAMP and PIPE_TEX_WRAP_MIRROR_CLAMP_TO_BORDER. Drivers for such hardware would like to advertise support for ARB_texture_mirror_clamp_to_edge but not EXT_texture_mirror_clamp. This commit adds a new PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE bit, changes the extension enable to be based on that, and enables it in all upstream drivers which supported PIPE_CAP_TEXTURE_MIRROR_CLAMP (so they continue supporting this mode).
2018-08-23gallium: add PIPE_CAP_MAX_SHADER_BUFFER_SIZEMarek Olšák1-0/+2
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-08-23gallium: add PIPE_CAP_MAX_GS_INVOCATIONSMarek Olšák1-0/+1
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-08-08vc4: Implement texture_subdata() to directly upload tiled data.Eric Anholt1-1/+39
This avoids a memcpy into a temporary in the upload path. Improves x11perf -putimage100 performance by 12.1586% +/- 1.38155% (n=145)
2018-08-08vc4: Handle partial loads/stores of tiled textures.Eric Anholt3-60/+155
Previously, we would load out the tile-aligned area, update the raster copy, and store it back. This was a huge cost for XPutImage calls to the screen under glamor. Instead, implement a general load/store path that walks over the source x/y writing into the corresponding pixel of the destination (using clever math from https://fgiesen.wordpress.com/2011/01/17/texture-tiling-and-swizzling/). If things are aligned, we go through the previous utile-at-a-time loop. Improves x11perf -putimage10 performance by 139.777% +/- 2.83464% (n=5) Improves x11perf -putimage100 performance by 383.908% +/- 22.6297% (n=11) Improves x11perf -getimage10 performance by 2.75731% +/- 0.585054% (n=145)
2018-08-08vc4: Compile the LT image helper per cpp we might load/store.Eric Anholt1-2/+31
For the partial load/store support I'm about to add, we want the memcpy to be compiled out to a single load/store. This should also eliminate the calls to vc4_utile_width/height(). Improves x11perf -putimage100 performance by 3.76344% +/- 1.16978% (n=15)
2018-08-08vc4: Refactor to reuse the LT tile walking code.Eric Anholt1-24/+34
2018-08-07vc4: Fix vc4_fence_server_sync() on pre-syncobj kernels.Eric Anholt1-1/+2
We won't have an FD if we're just having the server wait on a fence created by eglCreateSyncKHR(). Our seqno fences will happen in order, so server-side waits are no-ops in that case. Fixes dEQP-EGL.functional.sharing.gles2.multithread.simple_egl_server_sync.buffers.gen_delete Fixes: b0acc3a5628c ("broadcom/vc4: Native fence fd support")
2018-08-07vc4: Ignore samplers for finding uniform offsets.Eric Anholt1-3/+14
Fixes: dEQP-GLES2.shaders.struct.uniform.sampler_array_fragment dEQP-GLES2.shaders.struct.uniform.sampler_array_vertex dEQP-GLES2.shaders.struct.uniform.sampler_nested_fragment dEQP-GLES2.shaders.struct.uniform.sampler_nested_vertex Cc: mesa-stable@lists.freedesktop.org
2018-08-07vc4: Extend dumping of uniforms in QIR and in the command stream.Eric Anholt3-13/+68
Similar to what I did for V3D, provide some description of the uniforms.
2018-08-07vc4: Pull uinfo->data[i] dereference out to the top of the loop.Eric Anholt1-20/+18
Reduces the size of vc4_uniforms.o by about 10%. We would basically always end up loading the cachline of uinfo->data[i] anyway, so it should be good for performance as well as making the code a bit cleaner.
2018-08-07vc4: Make sure to emit a tile coordinates between two MSAA loads.Eric Anholt1-12/+11
The HW only executes a load once the tile coordinates packet happens, and only tracks one at a time, so by emitting our two MSAA loads back to back we would end up with an undefined color or Z buffer. The simulator doesn't seem to care, but sync up the RCL generation with the kernel anyway. Fixes dEQP-EGL.functional.render.multi_context.gles2.rgb888_window
2018-08-07vc4: Respect a sampler view's first_layer field.Eric Anholt1-1/+3
Fixes texturing from EGL images created from cubemap faces, as in dEQP-EGL.functional.image.create.gles2_cubemap_negative_x_rgba_texture Cc: mesa-stable@lists.freedesktop.org
2018-08-06vc4: Fix a leak of the no-vertex-elements workaround BO.Eric Anholt1-0/+2
Fixes: bd1925562ad1 ("vc4: Convert the driver to emitting the shader record using pack macros.")
2018-08-06vc4: Fix context creation when syncobjs aren't supported.Eric Anholt1-2/+6
Noticed when trying to run current Mesa on rpi's downstream kernel. Fixes: b0acc3a5628c ("broadcom/vc4: Native fence fd support")
2018-08-01vc4: Fix automake linking error.Juan A. Suarez Romero1-0/+9
CXXLD gallium_dri.la ../../../../src/gallium/drivers/vc4/.libs/libvc4.a(vc4_cl_dump.o): In function `vc4_dump_cl': src/gallium/drivers/vc4/vc4_cl_dump.c:45: undefined reference to `clif_dump_init' src/gallium/drivers/vc4/vc4_cl_dump.c:82: undefined reference to `clif_dump_destroy' ../../../../src/broadcom/cle/.libs/libbroadcom_cle.a(cle_libbroadcom_cle_la-v3d_decoder.o): In function `v3d_field_iterator_next': src/broadcom/cle/v3d_decoder.c:902: undefined reference to `clif_lookup_bo' Fixes: e92959c4e0 ("v3d: Pass the whole clif_dump structure to v3d_print_group().") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107423 CC: Eric Anholt <eric@anholt.net> Acked-by: Eric Anholt <eric@anholt.net> Reviewed-by: Andres Gomez <agomez@igalia.com>
2018-07-31gallium: add storage_sample_count parameter into is_format_supportedMarek Olšák1-0/+4
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-07-31gallium: add PIPE_CAP_FRAMEBUFFER_MSAA_CONSTRAINTSMarek Olšák1-0/+1
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-07-30v3d: Add a separate flag for CLIF ABI output versus human-readable CLs.Eric Anholt1-1/+1
A few of the upcoming changes would make the V3D_DEBUG=cl output less readable, so let's make proper CLIF file production be under a separate V3D_DEBUG=clif flag.
2018-07-29vc4: Fix meson build when enabled without v3d.Eric Anholt1-1/+1
Reported-by: Rob Clark <robdclark@gmail.com> Fixes: e92959c4e03c ("v3d: Pass the whole clif_dump structure to v3d_print_group().")
2018-07-27v3d: Stop doing pretty-printed colorful booleans in CLIF output.Eric Anholt1-1/+1
The parser wants to see a 1 or 0. We can put "true" and "false" in a comment to clarify that it's a boolean and the parser will skip it.
2018-07-27v3d: Move clif dump BO lookup into the clif dumper.Eric Anholt1-1/+1
The clif dumper is going to need information about all of our BOs if we're going to dump them for replay purposes.