summaryrefslogtreecommitdiff
path: root/src/gallium/drivers/radeonsi
AgeCommit message (Collapse)AuthorFilesLines
2019-01-17radeonsi/nir: get correct type for images inside structsTimothy Arceri1-1/+2
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-01-14radeonsi: also apply the GS hang workaround to draws without tessellationMarek Olšák1-11/+14
ported from AMDVLK. Cc: 18.3 <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-09radeonsi: Fix use of 1- or 2- component GL_DOUBLE vbo's.Mario Kleiner1-0/+8
With Mesa 18.1, commit be973ed21f6e, si_llvm_load_input_vs() changed the number of source 32-bit wide dword components used for fetching vertex attributes into the vertex shader from a constant 4 to a variable num_channels number, depending on input data format, with some special case handling for input data formats like 64-Bit doubles. In the case of a GL_DOUBLE input data format with one or two components though, e.g, submitted via ... a) glTexCoordPointer(1, GL_DOUBLE, 0, buffer); b) glTexCoordPointer(2, GL_DOUBLE, 0, buffer); ... the input format would be SI_FIX_FETCH_RG_64_FLOAT, but no special case handling was implemented for that case, so in the default path the number of 32-bit dwords would be set to the number of float input components derived from info->input_usage_mask. This ends with corrupted input to the vertex shader, because fetching a 64-bit double from the vbo requires fetching two 32-bit dwords instead of 1, and fetching a two double input requires 4 dword fetches instead of 2, so in these cases the vertex shader receives incomplete/truncated input data: a) float v = gl_MultiTexCoord0.x; -> v.x is corrupted. b) vec2 v = gl_MultiTexCoord0.xy; -> v.x is assigned correctly, but v.y is corrupted. This happens with the standard TGSI IR compiled shaders. Under NIR with R600_DEBUG=nir, we got correct behavior because the current radeonsi nir code always assigns info->input_usage_mask = TGSI_WRITEMASK_XYZW, thereby always fetches 4 dwords regardless of what the shader actually needs. Fix this by properly assigning 2 or 4 dword fetches for one or two component GL_DOUBLE input. Fixes: be973ed21f6e ("radeonsi: load the right number of components for VS inputs and TBOs") Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Cc: mesa-stable@lists.freedesktop.org Cc: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-01-09ac/nir,radv,radeonsi/nir: use correct indices for interpolation intrinsicsRhys Perry1-0/+3
Fixes artifacts in World of Warcraft when Multi-sample Alpha-Test is enabled with DXVK. It also fixes artifacts with Fallout 4's god rays with DXVK. Various piglit interpolateAt*() tests under NIR are also fixed. v2: formatting fix update commit message to include Fallout 4 and the Fixes tag Fixes: f4e499ec791 ('radv: add initial non-conformant radv vulkan driver') Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106595 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
2019-01-08nir: Distinguish between normal uniforms and UBOsJason Ekstrand1-2/+3
Previously, NIR had a single nir_var_uniform mode used for atomic counters, UBOs, samplers, images, and normal uniforms. This commit splits this into nir_var_uniform and nir_var_ubo where nir_var_uniform is still a bit of a catch-all but the nir_var_ubo is specific to UBOs. While we're at it, we also rename shader_storage to ssbo to follow the convention. We need this so that we can distinguish between normal uniforms and UBO access at the deref level without going all the way back variable and seeing if it has an interface type. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-01-02radeonsi: always unmap texture CPU mappings on 32-bit CPU architecturesMarek Olšák1-0/+16
Team Fortress 2 32-bit version runs out of the CPU address space. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-01-02radeonsi: remove unused variables in si_insert_input_ptrMarek Olšák1-3/+1
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-01-02radeonsi: use u_decomposed_prims_for_vertices instead of u_prims_for_verticesMarek Olšák1-1/+3
It seems to be the same, but this doesn't use integer division with a variable divisor. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-01-02radeonsi: make si_cp_wait_mem more configurableMarek Olšák5-8/+8
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-01-02radeonsi: call si_fix_resource_usage for the GS copy shader as wellMarek Olšák1-0/+4
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-01-02radeonsi: don't emit redundant PKT3_NUM_INSTANCES packetsMarek Olšák2-2/+10
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-01-02st/glsl_to_nir: call nir_lower_load_const_to_scalar() in the stTimothy Arceri1-2/+0
This will help the new opt introduced in the following patches allowing us to remove extra duplicate varyings. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Eric Anholt <eric@anholt.net>
2019-01-02radeonsi: make use of ac_are_tessfactors_def_in_all_invocs()Timothy Arceri1-8/+2
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-01-02radeonsi: remove unrequired param in si_nir_scan_tess_ctrl()Timothy Arceri3-3/+1
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-28radeonsi: Enable adaptive_sync by default for radeonNicholas Kazlauskas1-0/+4
It's better to let most applications make use of adaptive sync by default. Problematic applications can be placed on the blacklist or the user can manually disable the feature. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
2018-12-19radeonsi: const-ify the si_query_opsNicolai Hähnle3-5/+5
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-19radeonsi: split perfcounter queries from si_query_hwNicolai Hähnle1-50/+93
Remove a level of indirection to make the code more explicit -- should make it easier to follow what's going on. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-19radeonsi: factor si_query_buffer logic out of si_query_hwNicolai Hähnle4-110/+99
This is a move towards using composition instead of inheritance for different query types. This change weakens out-of-memory error reporting somewhat, though this should be acceptable since we didn't consistently report such errors in the first place. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-19radeonsi: move query suspend logic into the top-level si_query structNicolai Hähnle3-44/+62
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-19radeonsi: move remaining perfcounter code into si_perfcounter.cNicolai Hähnle6-127/+643
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-19radeonsi: track constant buffer bind history in si_pipe_set_constant_bufferNicolai Hähnle1-2/+3
Other callers of si_set_constant_buffer don't need it. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-19radeonsi: use si_set_rw_shader_buffer for setting streamout buffersNicolai Hähnle1-50/+11
Reduce the number of places that encode buffer descriptors. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-19radeonsi: add an si_set_rw_shader_buffer convenience functionNicolai Hähnle2-45/+64
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-19radeonsi: avoid using hard-coded SI_NUM_RW_BUFFERSNicolai Hähnle1-1/+2
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-19radeonsi: show the fixed function TCS in debug dumpsNicolai Hähnle1-2/+8
This is rather important for merged VS/TCS as LSHS shaders... Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-19radeonsi: const-ify si_set_tesseval_regsNicolai Hähnle1-2/+2
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-19radeonsi: rename SI_RESOURCE_FLAG_FORCE_TILING to clarify its purposeNicolai Hähnle3-4/+4
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-19radeonsi: don't set RAW_WAIT for CP DMA clearsNicolai Hähnle1-1/+2
There is never a read-after-write hazard because the command doesn't read. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-19radeonsi/gfx9: use SET_UCONFIG_REG_INDEX packets when availableNicolai Hähnle2-5/+15
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-19radeonsi: add si_init_draw_functions and make some functions staticNicolai Hähnle4-22/+22
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-19radeonsi: extract declare_vs_blit_inputsNicolai Hähnle1-18/+25
Prepare for some later refactoring. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-19radeonsi: move SI_FORCE_FAMILY functionality to winsysNicolai Hähnle1-34/+0
This helps some debugging cases by initializing addrlib with slightly more appropriate settings. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-17nir/opt_peephole_select: Don't peephole_select expensive math instructionsIan Romanick1-1/+1
On some GPUs, especially older Intel GPUs, some math instructions are very expensive. On those architectures, don't reduce flow control to a csel if one of the branches contains one of these expensive math instructions. This prevents a bunch of cycle count regressions on pre-Gen6 platforms with a later patch (intel/compiler: More peephole select for pre-Gen6). v2: Remove stray #if block. Noticed by Thomas. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17nir/opt_peephole_select: Don't try to remove flow control around indirect loadsIan Romanick1-1/+1
That flow control may be trying to avoid invalid loads. On at least some platforms, those loads can also be expensive. No shader-db changes on any Intel platform (even with the later patch "intel/compiler: More peephole select"). v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested by Rob. See also the big comment in src/intel/compiler/brw_nir.c. v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from nir_lower_io_arrays_to_elements.c). v4: Fix inverted condition in brw_nir.c. Noticed by Lionel. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-16nir: Add a bool to int32 lowering passJason Ekstrand1-0/+2
We also enable it in all of the NIR drivers. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-06amd: remove support for LLVM 6.0Samuel Pitoiset7-195/+38
User are encouraged to switch to LLVM 7.0 released in September 2018. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-11-28radeonsi: add memory management stress tests for GDSMarek Olšák2-0/+48
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-28radeonsi: allow si_cp_dma_clear_buffer to clear GDS from any IBMarek Olšák4-31/+33
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-28winsys/amdgpu,radeon: pass vm_alignment to buffer_from_handleMarek Olšák1-1/+4
Acked-by: Christian König <christian.koenig@amd.com>
2018-11-28radeonsi: fix is_oneway_access_only for bindless imagesMarek Olšák1-6/+23
2018-11-28radeonsi/nir: parse more information about bindless usageMarek Olšák1-4/+32
fill more tgsi_shader_info fields.
2018-11-28radeonsi: small cleanup for memory opcodesMarek Olšák1-9/+4
2018-11-28radeonsi: fix is_oneway_access_only for image storesMarek Olšák1-12/+37
We need to look at the Dst for image stores.
2018-11-28radeonsi: use structured buffer intrinsics for image viewsMarek Olšák2-10/+42
to stop using the workaround in si_make_buffer_descriptor.
2018-11-28radeonsi: clean up primitive binning enablementMarek Olšák1-11/+16
no change in behavior. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-28winsys/amdgpu: explicitly declare whether buffer_map is permanent or notNicolai Hähnle1-1/+2
Introduce a new driver-private transfer flag RADEON_TRANSFER_TEMPORARY that specifies whether the caller will use buffer_unmap or not. The default behavior is set to permanent maps, because that's what drivers do for Gallium buffer maps. This should eliminate the need for hacks in libdrm. Assertions are added to catch when the buffer_unmap calls don't match the (temporary) buffer_map calls. I did my best to update r600 for consistency (r300 needs no changes because it never calls buffer_unmap), even though the radeon winsys ignores the new flag. As an added bonus, this should actually improve the performance of the normal fast path, because we no longer call into libdrm at all after the first map, and there's one less atomic in the winsys itself (there are now no atomics left in the UNSYNCHRONIZED fast path). Cc: Leo Liu <leo.liu@amd.com> v2: - remove comment about visible VRAM (Marek) - don't rely on amdgpu_bo_cpu_map doing an atomic write Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-11-20radeonsi: go back to using bottom-of-pipe for beginning of TIME_ELAPSEDMarek Olšák1-11/+4
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102597 Cc: 18.3 <mesa-stable@lists.freedesktop.org> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-11-20radeonsi: don't send data after write-confirm with BOTTOM_OF_PIPE_TSMarek Olšák3-9/+5
There are no writes. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-11-19radeonsi: fix an out-of-bounds read reported by ASANNicolai Hähnle1-0/+4
We read 4 values out of sample_locs_8x, so make sure the array is big enough. Fixes: ac76aeef20 ("radeonsi: switch back to standard DX sample positions") Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-11-14radeonsi: fix video APIs on Raven2Marek Olšák2-4/+8
This was missed when I added the new enum. Cc: 18.3 <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Leo Liu <leo.liu@amd.com>