summaryrefslogtreecommitdiff
path: root/src/broadcom
AgeCommit message (Collapse)AuthorFilesLines
2020-02-06broadcom: Fix implicit declaration of ffs for Android buildJose Maria Casanova Crespo1-0/+1
Include util/bitscan.h to ensure ffs is available when there is no glibc like in Android. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1983 Reviewed-by: Eric Anholt <eric@anholt.net> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2554> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2554>
2020-02-05glsl,nir: Switch the enum representing shader image formats to PIPE_FORMAT.Eric Anholt1-220/+63
This means you can directly use format utils on it without having to have your own GL enum to number-of-components switch statement (or whatever) in your vulkan backend. Thanks to imirkin for fixing up the nouveau driver (and a couple of core details). This fixes the computed qualifiers for EXT_shader_image_load_store's non-integer sizeNxM qualifiers, which we don't have tests for. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v3d) Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3355> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3355>
2020-01-23util/hash_table: update users to use new optimal integer hash functionsAnthony Pesch1-13/+1
Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3475> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3475>
2020-01-13nir/lower_atomics_to_ssbo: Also lower barriersJason Ekstrand1-1/+0
This is more correct for a pass which is supposed to completely lower away atomic counters. It also lets us stop supporting atomic counter barriers in most of the drivers. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
2020-01-13nir: Rename nir_intrinsic_barrier to control_barrierJason Ekstrand1-1/+1
This is a more explicit name now that we don't want it to be doing any memory barrier stuff for us. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
2020-01-13nir: Add a new memory_barrier_tcs_patch intrinsicJason Ekstrand1-0/+1
Right now, it's implemented as a no-op for everyone. For most drivers, it's a switch case in the NIR -> whatever which just breaks. For ir3, they already have code to delete tessellation barriers so we just add a case to also delete memory_barrier_tcs_patch. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
2019-12-16v3d: handle writes to gl_Layer from geometry shadersIago Toral Quiroga3-0/+53
When geometry shaders write a value to gl_Layer that doesn't correspond to an existing layer in the target framebuffer the rendering behavior is undefined according to the spec, however, there are CTS tests that trigger this scenario on purpose, probably to ensure that nothing terrible happens. For V3D, this situation is problematic because the binner uses the layer index to select the offset to write into the tile state data, and we only allocate tile state for MAX2(num_layers, 1), so we want to make sure we don't produce values that would lead to out of bounds writes. The simulator has an assert to catch this, although we haven't observed issues in actual hardware it is probably best to play safe. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-12-16v3d: predicate geometry shader outputs inside non-uniform control flowIago Toral Quiroga1-0/+15
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-12-16v3d: we always have at least one output segmentIago Toral Quiroga1-1/+1
If we program an output size of 0 the simulator asserts. This was not a problem until now because our VS would always have to emit fixed function outputs, however, now that it can be paired with a GS we can end up with a VS shader that no longer emits any outputs. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-12-16v3d: compute appropriate VPM memory configuration for geometry shader workloadsIago Toral Quiroga2-0/+25
Geometry shaders can output many vertices and thus have higher VPM memory pressure as a result. It is possible that too wide geometry shader dispatches exceed the maximum available VPM output allocated, in which case we need to reduce the dispatch width until we can fit the VPM memory requirements. Supported dispatch widths for geometry shaders are 16, 8, 4, 1. There is a limit in the number of VPM output sectors that can be used by a geometry shader that we can meet by lowering the dispatch width at compile time, however, at draw time we need to revisit this number and, together with other elements that can contribute to total VPM memory requirements, decide on a configuration that can fit the program into the available VPM memory. Ideally, we also want to aim for not using more than half of the available memory so we that we can run a pair of bin and render programs in parallel. v2: fixed language in comment and typo in commit log. (Alejandro) Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-12-16v3d: add 1-way SIMD packing definitionIago Toral Quiroga1-0/+1
According to the documentation, the 1-way dispatch width is only supported with geometry shaders. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-12-16v3d: implement geometry shader instancingIago Toral Quiroga3-0/+9
v2: - Remove unused field uses_iid from v3d_gs_prog_data (Alejandro) Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-12-16v3d: fix packet descriptions for geometry and tessellation shadersIago Toral Quiroga1-10/+30
Every code address starts at bit 3 (addresses must be 64-bit aligned), with the first 3 bits used to specify threading and NaN propagation parameters for the shader program. We generally skip "reserved" bits, however, doing this when the reserved field is the last in a struct and it is large enough can make us compute incorrect (smaller) struct sizes which can lead to corrupt CLs. In particular, the "Tess/Geom Common Params" struct has a reserved field at the end that is 8-bit, so if we don't include this we compute a packet size that is 1 byte smaller than it shold, making the next packet we emit start 1 byte earlier and therefore leading to incorrect CL data from that point forward. The name of one of the fields was not correct. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-12-16v3d: add initial compiler plumbing for geometry shadersIago Toral Quiroga5-79/+610
Most of the relevant work happens in the v3d_nir_lower_io. Since geometry shaders can write any number of output vertices, this pass injects a few variables into the shader code to keep track of things like the number of vertices emitted or the offsets into the VPM of the current vertex output, etc. This is also where we handle EmitVertex() and EmitPrimitive() intrinsics. The geometry shader VPM output layout has a specific structure with a 32-bit general header, then another 32-bit header slot for each output vertex, and finally the actual vertex data. When vertex shaders are paired with geometry shaders we also need to consider the following: - Only geometry shaders emit fixed function outputs. - The coordinate shader used for the vertex stage during binning must not drop varyings other than those used by transform feedback, since these may be read by the binning GS. v2: - Use MAX3 instead of a chain of MAX2 (Alejandro). - Make all loop variables unsigned in ntq_setup_gs_inputs (Alejandro) - Update comment in IO owering so it includes the GS stage (Alejandro) Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-12-16v3d: remove unused variableIago Toral Quiroga1-4/+1
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-12-16v3d: enable debug options for geometry shader dumpsIago Toral Quiroga2-10/+12
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-12-16v3d: add debug assertIago Toral Quiroga1-0/+1
While lowering vpm outputs we look for the NIR variables matching particular store output instructions and we expect to find a match, so assert on that. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-12-16v3d: add missing plumbing for VPM load instructionsIago Toral Quiroga2-0/+7
We will need to use LDVPMG_IN specifically to read VPM inputs in geometry shaders. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-12-11meson/broadcom: libbroadcom_cle also needs zlibDylan Baker1-1/+1
Fixes: 1ae8018a6af81eec4832a57d9d0346aa3dd98d28 ("meson: Add support for the vc4 driver.") Reviewed-by: Eric Anholt <eric@anholt.net>
2019-12-10meson/broadcom: libbroadcom_cle needs expat headersDylan Baker1-1/+1
Fixes: 1ae8018a6af81eec4832a57d9d0346aa3dd98d28 ("meson: Add support for the vc4 driver.") Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-25nir: Add a scheduler pass to reduce maximum register pressure.Eric Anholt1-0/+5
This is similar to a scheduler I've written for vc4 and i965, but this time written at the NIR level so that hopefully it's reusable. A notable new feature it has is Goodman/Hsu's heuristic of "once we've started processing the uses of a value, prioritize processing the rest of their uses", which should help avoid the heuristic otherwise making such systematically bad choices around getting texture results consumed. Results for v3d: total instructions in shared programs: 6497588 -> 6518242 (0.32%) total threads in shared programs: 154000 -> 152828 (-0.76%) total uniforms in shared programs: 2119629 -> 2068681 (-2.40%) total spills in shared programs: 4984 -> 472 (-90.53%) total fills in shared programs: 6418 -> 1546 (-75.91%) Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> (v1) Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (v2) v2: Use the DAG datastructure, fold in the scheduling-for-parallelism patch, include SSA defs in live values so we can switch to bottom-up if we want. v3: Squash in improvements from Alejandro Piñeiro for getting V3D to successfully register allocate on GLES3.1 dEQP. Make sure that discards don't move after store_output. Comment spelling fix.
2019-11-20v3d: adds an extra MOV for any sig.ld*Alejandro Piñeiro1-4/+19
Specifically when we are in non-uniform control flow, as we would need to set the condition for the last instruction. If (for example) a image atomic load stores directly their value on a NIR register, last_inst would be a nop, and would fail when set the condition. Fixes piglit test: spec/glsl-es-3.10/execution/cs-ssbo-atomic-if-else-2.shader_test Fixes: 6281f26f064ada ("v3d: Add support for shader_image_load_store.") v2: (Changes suggested by Eric Anholt) * Cover all sig.ld* signals, not just ldunif and ldtmu, as all of them have the same restriction. * Update comment explaining why we add a MOV in that case * Tweak commit message. v3: * Drop extra set of parens (Eric) * Add missing ld signal to is_ld_signal to fix shader-db regression. Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-20v3d: Fix predication with atomic image operationsJose Maria Casanova Crespo1-0/+12
Fixes dEQP test: dEQP-GLES31.functional.synchronization.inter_call.with_memory_barrier.image_atomic_multiple_interleaved_write_read Fixes piglit test: spec/glsl-es-3.10/execution/cs-image-atomic-if-else.shader_test Fixes: 6281f26f064ada ("v3d: Add support for shader_image_load_store.") Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-14util: Move gallium's PIPE_FORMAT utils to /util/format/Eric Anholt2-2/+2
To make PIPE_FORMATs usable from non-gallium parts of Mesa, I want to move their helpers out of gallium. Since u_format used util_copy_rect(), I moved that in there, too. I've put it in a separate directory in util/ because it's a big chunk of related code, and it's not clear to me whether we might want it as a separate library from libmesa_util at some point. Closes: #1905 Acked-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-10-31v3d: rename vertex shader key (num)_fs_inputs fieldsIago Toral Quiroga4-10/+11
Until now this made sense because we always paired vertex shaders with fragment shaders, but as soon as we implement geometry and tessellation shaders that will no longer be the case, so rename this to (num_)used_outputs. v2: Use 'used_outputs' instead of ns_outputs, which is more explicit (Eric). Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net>
2019-10-28util: rename list_empty() to list_is_empty()Timothy Arceri3-4/+4
This makes it clear that it's a boolean test and not an action (eg. "empty the list"). Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2019-10-27v3d: fix empty-body instructionEric Engestrom1-1/+1
Fixes: 8d43e2b2ded0fe3c82d4 ("meson: add -Werror=empty-body to disallow `if(x);`") Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-10-23Revert "v3d: do not report alpha-test as supported"Erik Faye-Lund2-0/+11
This reverts commit 9d0523b569bb7208c6e74cafc0f3945415d94336. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Jose Maria Casanova <jmcasanova@igalia.com>
2019-10-21nir/lower_idiv: add new llvm-based pathRhys Perry1-1/+1
v2: make variable names snake_case v2: minor cleanups in emit_udiv() v2: fix Panfrost build failure v3: use an enum instead of a boolean flag in nir_lower_idiv()'s signature v4: remove nir_op_urcp v5: drop nv50 path v5: rebase v6: add back nv50 path v6: add comment for nir_lower_idiv_path enum v7: rename _nv50/_llvm to _fast/_precise v8: fix etnaviv build failure Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-18broadcom: document known hardware issues for L2T flush commandIago Toral Quiroga1-0/+35
Suggested-by: Eric Anholt <eric@anholt.net> Reviewed-by: Eric Anholt <eric@anholt.net>
2019-10-18v3d: add new flag dirty TMU cache at v3d_compilerIago Toral Quiroga5-0/+12
That we set for any TMU write on spills and general tmu. It is then used as part of v3d_emit_gl_shader_state later. v2: add a new flag instead at v3d_compiler instead of dirty the flag at v3dx if there is any spill (change suggested by Eric, added by Alejandro) v3: set this for anything that is not a load and do it also in v3d40_vir_emit_image_load_store (Eric) Reviewed-by: Eric Anholt <eric@anholt.net>
2019-10-17v3d: do not report alpha-test as supportedErik Faye-Lund2-11/+0
This triggers lowering in the state-tracker, which makes things a bit simpler. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-10-17nir: support feeding state to nir_lower_clip_[vg]sErik Faye-Lund1-1/+1
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-10-17nir: support lowering clipdist to arraysErik Faye-Lund1-2/+3
This allows us to make sure clipdist is emitted as a scalar array rather than two vec4s. This matches SPIR-V semantics, and will be useful for Zink. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-10-17nir: allow passing alpha-ref state to lowering-codeErik Faye-Lund1-1/+1
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-10-10nir: add nir_shader_compiler_options::lower_to_scalarMarek Olšák1-0/+1
This will replace PIPE_SHADER_CAP_SCALAR_ISA. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-09-30v3d: Enable the late algebraic optimizations to get real subs.Eric Anholt1-0/+16
This worked better than my original v3d-local pass for just subs, and is a huge win over not producing subs. total instructions in shared programs: 6408469 -> 6167932 (-3.75%) total threads in shared programs: 153784 -> 154104 (0.21%) total uniforms in shared programs: 2157078 -> 1905823 (-11.65%) total max-temps in shared programs: 904546 -> 895796 (-0.97%) total spills in shared programs: 4959 -> 4993 (0.69%) total fills in shared programs: 6558 -> 6670 (1.71%) total sfu-stalls in shared programs: 25845 -> 25175 (-2.59%) total inst-and-stalls in shared programs: 6434314 -> 6193107 (-3.75%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-23broadcom/genxml: Stop manually scrubbing 'α' -> "alpha"Kenneth Graunke1-1/+0
'α' has never appeared in any genxml files, so there's no need to replace it with the word "alpha". Reviewed-by: Eric Anholt <eric@anholt.net>
2019-09-06nir: allow specifying filter callback in lower_alu_to_scalarVasily Khoruzhick1-1/+1
Set of opcodes doesn't have enough flexibility in certain cases. E.g. Utgard PP has vector conditional select operation, but condition is always scalar. Lowering all the vector selects to scalar increases instruction number, so we need a way to filter only those ops that can't be handled in hardware. Reviewed-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-09-05v3d: writes to magic registers aren't RF writes after THRENDJose Maria Casanova Crespo1-1/+3
Shaders must not attempt to write to the register files in the last three instructions, but that doesn't include the magic registers: nop ; nop ; thrsw; ldtmu.- *** ERROR *** nop ; nop nop ; nop v2: Simplify validation rules. (Eric Anholt) v3: Adjust validation even more. (Eric Anholt) Reviewed-by: Eric Anholt <eric@anholt.net>
2019-09-03nir: Fix num_ssbos when lowering atomic countersConnor Abbott1-2/+1
Otherwise it's impossible to know the maximum SSBO index for both internal TGSI shaders from TTN (which don't have any notion of atomic counters and no offset) as well as shaders from GLSL. I fixed everything I could find while grepping for num_ssbos and num_abos, which hopefully is everything (iris was the only user I could find that uses it in a meaningful way). Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-21v3d: Use the correct opcodes for signed image min/maxJason Ekstrand1-0/+2
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-21nir: Add explicit signs to image min/max intrinsicsJason Ekstrand2-4/+8
This better matches all the other atomic intrinsics such as those for SSBOs and shared variables where the sign is part of the intrinsic opcode. Both generators (GLSL and SPIR-V) know the sign from the type of the image variable or handle. In SPIR-V, signed min/max are separate opcodes from unsigned. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-13v3d: clamp gl_PointSize to a minimum of 1.0Iago Toral Quiroga1-0/+5
The OpenGL ES spec requires that the value of gl_PointSize is clamped to an implementation-dependent range matching what is advertised by GL_ALIASED_POINT_SIZE_RANGE. For V3D this is [1.0, 512.0], but the hardware won't clamp to the minimum side of the range and won't render points with a size strictly smaller than 1.0 either, so we need to clamp manually. For points larger than the maximum size of the range the hardware clamps automatically. Fixes piglit test: spec/!opengl 2.0/vs-point_size-zero Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-13v3d: line length style fixesIago Toral Quiroga1-26/+33
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-13v3d: honor the write mask on store operationsIago Toral Quiroga1-85/+120
v2: - Fix incremental update of the const offset when we need to emit a sequence with more than one write because of the writemask. - Do not move the tmu write emission to a separate helper. v3: - Get the store writemask before the loop, use ffs to get the first component to write and clear writemask bits as we process the components (Eric). - Simplified the code that figured out the number of components for the TMU config based on the number of tmu writes for stores and atomics. v4: - Code clean-ups (Eric). Fixes: KHR-GLES31.core.shader_image_load_store.advanced-cast-cs KHR-GLES31.core.shader_image_load_store.advanced-cast-fs KHR-GLES31.core.shader_storage_buffer_object.advanced-switchBuffers-cs KHR-GLES31.core.shader_storage_buffer_object.advanced-switchPrograms-cs KHR-GLES31.core.shader_storage_buffer_object.basic-operations-case1-cs Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-13v3d: refactor ntq_emit_tmu_general() slightlyIago Toral Quiroga1-24/+36
When we implement write masks on store operations we might need to emit multiple write sequences for a given store intrinsic. To make that easier, let's split the emission of the tmud instructions to their own block after we are done with the code that only needs to run once no matter how many write sequences we need to emit. Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-12nir: merge and extend nir_opt_move_comparisons and nir_opt_move_load_uboRhys Perry1-1/+1
v2: add to series v3: update Makefile.sources v4: don't remove a comment and break statement v4: use nir_can_move_instr Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-08v3d: use the GPU to record primitives written to transform feedbackIago Toral Quiroga1-0/+10
We can use the PRIMITIVE_COUNTS_FEEDBACK packet to write various primitive counts to a buffer, including the number of primives written to transform feedback buffers, which will handle buffer overflow correctly. There are a couple of caveats with this: Primitive counters are reset when we emit a 'Tile Binning Mode Configuration' packet, which can happen in the middle of a primitives query, so we need to read the buffer when we submit a job and accumulate the counts in the context so we don't lose them. We also need to do the same when we switch primitive type during transform feedback so we can compute the correct number of recorded vertices from the number of primitives. This is necessary so we can provide an accurate vertex count for draw from transform feedback. v2: - When computing the number of vertices for a primitive, pass in the base primitive, since that is what the hardware will count. - No need to update primitive counts when switching primitive types if the base primitives are the same. - Log perf warning when mapping the primitive counts BO for readback (Eric). - Only emit the primitive counts packet once at job end (Eric). - Use u_upload mechanism for the primitive counts buffer (Eric). - Use the XML to generate indices into the primitive counters buffer (Eric). Fixes piglit tests: spec/ext_transform_feedback/overflow-edge-cases spec/ext_transform_feedback/query-primitives_written-bufferrange spec/ext_transform_feedback/query-primitives_written-bufferrange-discard spec/ext_transform_feedback/change-size base-shrink spec/ext_transform_feedback/change-size base-grow spec/ext_transform_feedback/change-size offset-shrink spec/ext_transform_feedback/change-size offset-grow spec/ext_transform_feedback/change-size range-shrink spec/ext_transform_feedback/change-size range-grow spec/ext_transform_feedback/intervening-read prims-written Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-08v3d: add header guards in v3d_packet_helpers.hIago Toral Quiroga1-0/+4
Reviewed-by: Eric Anholt <eric@anholt.net>