path: root/src/gallium/drivers
AgeCommit message (Collapse)AuthorFilesLines
2019-01-18freedreno/a6xx: Turn on texture tiling by defaultKristian H. Kristensen7-43/+64
The color swap isn't available for tiled formats and it's not needed either. We pick one channel order and use for all non-linear formats. Signed-off-by: Kristian H. Kristensen <> Reviewed-by: Rob Clark <>
2019-01-18freedreno: Synchronize batch and flush for staging resourceKristian H. Kristensen1-1/+15
Staging blit downloads would wait on the src resource instead of the staging resource and didn't make sure to submit the blit batch first. Signed-off-by: Kristian H. Kristensen <> Reviewed-by: Rob Clark <>
2019-01-18gm107/ir: disable TEXS for tex with derivAll setKarol Herbst2-1/+3
fixes deqp tests: dEQP-GLES3.functional.shaders.texture_functions.texturegrad.samplercube_fixed_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.samplercube_float_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.isamplercube_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.usamplercube_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.sampler3d_fixed_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.sampler3d_float_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.isampler3d_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.usampler3d_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.sampler2dshadow_vertex dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.sampler3d_fixed_vertex dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.sampler3d_float_vertex dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.isampler3d_vertex dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.usampler3d_vertex dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.sampler2dshadow_vertex Fixes: f821e80213e38e93f96255b3deacb737a600ed40 "gm107/ir: use scalar tex instructions where possible" Signed-off-by: Karol Herbst <> Reviewed-by: Ilia Mirkin <>
2019-01-18nv50/ir: disable tryCollapseChainedMULs in ConstantFolding for precise ↵Karol Herbst1-1/+1
instructions fixes dEQP-GLES2.functional.shaders.invariance.mediump.loop_3 CC: <> Signed-off-by: Karol Herbst <> Reviewed-by: Ilia Mirkin <>
2019-01-16v3d: Restructure RO allocations using resource_from_handle.Eric Anholt1-29/+38
I had bugs in the old path where I was laying out as tiled (so we'd render tiled) but then only allocating space in the shared object for linear rendering. The resource_from_handle makes it so the same layout choices are made in both the import and export scanout cases. Also, fixes a leak of the fd that was tripping up the CTS. Now that we're checking PIPE_BIND_SHARED to choose to use RO, the DRM_FORMAT_MOD_LINEAR check wasn't needed any more. Fixes visual corruption and MMU faults in X in renderonly mode. Fixes: bd09bb1629a7 ("v3d: SHARED but not necessarily SCANOUT buffers on RO must be linear.")
2019-01-16v3d: If the modifier is not known on BO import, default to linear for RO.Eric Anholt1-1/+3
Part of fixing DRI3 rendering with RO on X11. Fixes: e113b21cb779 ("v3d: Add renderonly support.")
2019-01-17radeonsi/nir: get correct type for images inside structsTimothy Arceri1-1/+2
Reviewed-by: Marek Olšák <>
2019-01-16swr/rast: Store cached files in multiple subdirsAlok Hota3-38/+52
This improves cache filesystem performance, especially during CI tests Also updated jitcache magic number due to codegen parameter changes Removed 2 `if constexpr` to prevent C++17 requirement
2019-01-16swr/rast: New execution engine per JITAlok Hota2-42/+65
Fixes relocation errors with LLVM 7.0.0
2019-01-16swr/rast: Scope MEM_CLIENT enum for mem usagesAlok Hota6-40/+38
Avoids confusion with other defaulted integer parameters - fixed some unspecified usages - removed unnecessary includes - removed unecessary protected access specifier in buckets framework
2019-01-16swr/rast: Unaligned and translations in gathersAlok Hota1-21/+35
- added graphics address translation in odd gathers - added support for unaligned gathers in fetch shader - changed how 2+ GB offsets are handled to make them compatible with unaligned offsets
2019-01-16swr/rast: partial support for Tiled ResourcesAlok Hota2-0/+164
- updated sample from TRTT surfaces correctly - implemented mapped status return for TRTT surfaces - implemented per-sample instruction minLod clamp - updated bilinear filter weight calculation to be closer to D3D specs - implemented "ReducedTexcoordRange" operation from D3D specs to avoid loss of precision on high-value normalized coordinates
2019-01-16swr/rast: Add annotator to interleave isa textAlok Hota2-3/+36
To make debugging simpler
2019-01-16swr/rast: Use gfxptr_t value in JitGatherVerticesAlok Hota1-18/+16
Use gfxptr_t type value for stream pointer uses in gather and similar calls
2019-01-16gallium/swr: Fix multi-context sync fence deadlock.Bruce Cherniak1-1/+3
Various recreation scenarios lead to API thread getting stuck in swr_fence_finish(). This is a multi-context issue, whereby one context overwrites the fence read-value with a previous sync's lesser value. The fence sync value is supposed to be always increasing. In swr_fence_cb(), only update the "read" value if the new value is greater. (This may seem like we're not waiting on the other context to finish, but had we needed for it to finish there would have been a wait prior to submitting a new sync.) cc:
2019-01-14radeonsi: also apply the GS hang workaround to draws without tessellationMarek Olšák1-11/+14
ported from AMDVLK. Cc: 18.3 <> Reviewed-by: Bas Nieuwenhuizen <>
2019-01-14v3d: SHARED but not necessarily SCANOUT buffers on RO must be linear.Eric Anholt1-1/+1
We don't have a way to talk to RO about modifiers it can do yet, so assume the minimum.
2019-01-14v3d: Add support for shader_image_load_store.Eric Anholt5-2/+196
This is only exposed on V3D 4.1+, because we didn't have the TMU write operations for images on 3.3 (To do GLES 3.1 there, you have to lower it to SSBO load/stores, which is a problem to solve later).
2019-01-14v3d: Add SSBO/atomic counters support.Eric Anholt6-1/+102
So far I assume that all the buffers get written. If they weren't, you'd probably be using UBOs instead.
2019-01-14v3d: Drop the GLSL version level.Eric Anholt1-1/+1
This was an arbitrary "we support lots of stuff" value when I started the driver. However, at 400 we expose OES_gpu_shader5, which claims support for dynamically indexing samplers, which the driver doesn't do yet.
2019-01-14v3d: Add an isr to the simulator to catch GMP violations.Eric Anholt3-0/+39
Otherwise, the simulator raises the GMP interrupt and waits for it to be handled, and v3d ends up spinning in v3d_hw_tick(). Aborting right when violation happens gives us a chance to look at the backtrace of whatever thread triggered the violation.
2019-01-14v3d: Add support for GL_ARB_framebuffer_no_attachments.Eric Anholt3-2/+19
Fixes dEQP-GLES31.functional.state_query.integer.max_framebuffer_height_getboolean when GLES3 is enabled.
2019-01-14v3d: Add support for flushing dirty TMU data at job end.Eric Anholt2-0/+20
This will be needed for SSBOs and image_load_store.
2019-01-10freedreno/a6xx: fix 3d+tiled layoutRob Clark1-34/+52
The last round of fixing 3d layer+level layout skipped the tiled case, since tiled texture support was not in place yet. This finishes the job. Signed-off-by: Rob Clark <>
2019-01-10freedreno/a6xx: move tile_mode to sampler-view CSORob Clark2-7/+7
This is known when the CSO is created, so no need to patch it in later. Also, it seems like smaller textures where the first level is small enough to be linear, it seems like we should set linear tile mode. See: dEQP-GLES3.functional.texture.format.unsized.rgb_unsigned_byte_3d_pot Signed-off-by: Rob Clark <>
2019-01-10freedreno/a6xx: separate stencil restore/resolve fixesRob Clark1-14/+21
Previously we'd use format/etc from the primary (z32) buffer for the stencil (s8), due to confusion about rsc vs psurf. Rework this to drop extra arg and push down handling of separate stencil case (and make sure we take the fmt from the right place). This doesn't completely fix separate-stencil, but at least it avoids the GPU scribbling over random other cmdstream buffers and causing a bunch of bogus fails in dEQP. Signed-off-by: Rob Clark <>
2019-01-10etnaviv: fix typo in cflush_all descriptionGuido Günther1-1/+1
Signed-off-by: Guido Günther <> Reviewed-by: Christian Gmeiner <>
2019-01-09Revert "llvmpipe: Always return some fence in flush (v2)"Roland Scheidegger1-2/+0
This reverts commit f6a6da8131383d8eeee07cd59326a70f4b15866b. With this commit we see massive amounts of asserts triggering in lp_fence_wait(), assert(f->issued), for instance with libgl_xlib state tracker and piglit. Not entirely sure if the assert could just be removed.
2019-01-09radeonsi: Fix use of 1- or 2- component GL_DOUBLE vbo's.Mario Kleiner1-0/+8
With Mesa 18.1, commit be973ed21f6e, si_llvm_load_input_vs() changed the number of source 32-bit wide dword components used for fetching vertex attributes into the vertex shader from a constant 4 to a variable num_channels number, depending on input data format, with some special case handling for input data formats like 64-Bit doubles. In the case of a GL_DOUBLE input data format with one or two components though, e.g, submitted via ... a) glTexCoordPointer(1, GL_DOUBLE, 0, buffer); b) glTexCoordPointer(2, GL_DOUBLE, 0, buffer); ... the input format would be SI_FIX_FETCH_RG_64_FLOAT, but no special case handling was implemented for that case, so in the default path the number of 32-bit dwords would be set to the number of float input components derived from info->input_usage_mask. This ends with corrupted input to the vertex shader, because fetching a 64-bit double from the vbo requires fetching two 32-bit dwords instead of 1, and fetching a two double input requires 4 dword fetches instead of 2, so in these cases the vertex shader receives incomplete/truncated input data: a) float v = gl_MultiTexCoord0.x; -> v.x is corrupted. b) vec2 v = gl_MultiTexCoord0.xy; -> v.x is assigned correctly, but v.y is corrupted. This happens with the standard TGSI IR compiled shaders. Under NIR with R600_DEBUG=nir, we got correct behavior because the current radeonsi nir code always assigns info->input_usage_mask = TGSI_WRITEMASK_XYZW, thereby always fetches 4 dwords regardless of what the shader actually needs. Fix this by properly assigning 2 or 4 dword fetches for one or two component GL_DOUBLE input. Fixes: be973ed21f6e ("radeonsi: load the right number of components for VS inputs and TBOs") Signed-off-by: Mario Kleiner <> Cc: Cc: Marek Olšák <> Signed-off-by: Marek Olšák <>
2019-01-09ac/nir,radv,radeonsi/nir: use correct indices for interpolation intrinsicsRhys Perry1-0/+3
Fixes artifacts in World of Warcraft when Multi-sample Alpha-Test is enabled with DXVK. It also fixes artifacts with Fallout 4's god rays with DXVK. Various piglit interpolateAt*() tests under NIR are also fixed. v2: formatting fix update commit message to include Fallout 4 and the Fixes tag Fixes: f4e499ec791 ('radv: add initial non-conformant radv vulkan driver') Bugzilla: Signed-off-by: Rhys Perry <>
2019-01-09llvmpipe: Always return some fence in flush (v2)Tomasz Figa1-0/+2
If there is no last fence, due to no rendering happening yet, just create a new signaled fence and return it, to match the expectations of the EGL sync fence API. Fixes random "Could not create sync fence 0x3003" assertion failures from Skia on Android, coming from the following code: Reproducible especially with thread count >= 4. One could make the driver always keep the reference to the last fence, but: - the driver seems to explicitly destroy the fence whenever a rendering pass completes and changing that would require a significant functional change to the code. (Specifically, in lp_scene_end_rasterization().) - it still wouldn't solve the problem of an EGL sync fence being created and waited on without any rendering happening at all, which is also likely to happen with Android code pointed to in the commit. Therefore, the simple approach of always creating a fence is taken, similarly to other drivers, such as radeonsi. Tested with piglit llvmpipe suite with no regressions and following tests fixed: egl_khr_fence_sync conformance eglclientwaitsynckhr_flag_sync_flush eglclientwaitsynckhr_nonzero_timeout eglclientwaitsynckhr_zero_timeout eglcreatesynckhr_default_attributes eglgetsyncattribkhr_invalid_attrib eglgetsyncattribkhr_sync_status v2: - remove the useless lp_fence_reference() dance (Nicolai), - explain why creating the dummy fence is the right approach. Signed-off-by: Tomasz Figa <>
2019-01-08v3d: Enable GL_ARB_texture_gather on V3D 4.x.Eric Anholt1-0/+5
This is part of GLES 3.1, and with the NIR lowering we're now passing the GLES31 testcases.
2019-01-08freedreno: Move register constant files to src/freedreno.Bas Nieuwenhuizen10-22475/+1
This way they can be shared. Build tested with meson, but not too sure on the autotools stuff though. Reviewed-by: Dylan Baker <> Acked-by: Rob Clark <>
2019-01-08nir: rename global/local to private/function memoryKarol Herbst2-3/+3
the naming is a bit confusing no matter how you look at it. Within SPIR-V "global" memory is memory accessible from all threads. glsl "global" memory normally refers to shader thread private memory declared at global scope. As we already use "shared" for memory shared across all thrads of a work group the solution where everybody could be happy with is to rename "global" to "private" and use "global" later for memory usually stored within system accessible memory (be it VRAM or system RAM if keeping SVM in mind). glsl "local" memory is memory only accessible within a function, while SPIR-V "local" memory is memory accessible within the same workgroup. v2: rename local to function as well v3: rename vtn_variable_mode_local as well Signed-off-by: Karol Herbst <> Reviewed-by: Jason Ekstrand <>
2019-01-08virgl: use primconvert provoking vertex properlyDave Airlie2-8/+24
This stores the raster state and calls the correct primconvert interface using the currently bound raster state. Reviewed-By: Gert Wollny <> Signed-off-by: Dave Airlie <>
2019-01-08spirv: Add support for using derefs for UBO/SSBO accessJason Ekstrand1-0/+1
For now, it's hidden behind a cap. Hopefully, we can eventually drop that along with all the manual offset code in spirv_to_nir. Reviewed-by: Alejandro Piñeiro <> Reviewed-by: Caio Marcelo de Oliveira Filho <> Tested-by: Bas Nieuwenhuizen <>
2019-01-08nir: Distinguish between normal uniforms and UBOsJason Ekstrand1-2/+3
Previously, NIR had a single nir_var_uniform mode used for atomic counters, UBOs, samplers, images, and normal uniforms. This commit splits this into nir_var_uniform and nir_var_ubo where nir_var_uniform is still a bit of a catch-all but the nir_var_ubo is specific to UBOs. While we're at it, we also rename shader_storage to ssbo to follow the convention. We need this so that we can distinguish between normal uniforms and UBO access at the deref level without going all the way back variable and seeing if it has an interface type. Reviewed-by: Alejandro Piñeiro <> Reviewed-by: Caio Marcelo de Oliveira Filho <>
2019-01-07etnaviv: annotate variables only used in debug buildLucas Stach1-7/+4
Some of the status variables in the compiler are only used in asserts and thus may be unused in release builds. Annotate them accordingly to avoid 'unused but set' warnings from the compiler. Signed-off-by: Lucas Stach <> Reviewed-by: Christian Gmeiner <>
2019-01-07etnaviv: enable full overwrite in a few more casesLucas Stach1-4/+7
Take into account the render target format when checking if the color mask affects all channels of the RT. This allows to enable full overwrite in a few cases where a non-alpha format is used. Signed-off-by: Lucas Stach <> Reviewed-by: Christian Gmeiner <>
2019-01-04v3d: Fix up VS output setup during precompiles.Eric Anholt1-6/+10
I noticed that a VS I was debugging was missing all of its output stores -- outputs_written was for POS, VAR0, VAR3, while the shader's variables were POS, VAR9, and VAR12. I'm not sure what outputs_written is supposed to be doing here, but we can just walk the declared variables and avoid both this bug and the emission of extra stvpms for less-than-vec4 varyings.
2019-01-03virgl: remove empty fileGurchetan Singh1-0/+0
Fixes: 174f53 ("virgl: consolidate transfer code") Reviewed-by: Erik Faye-Lund <>
2019-01-03virgl: don't flush an empty rangeGurchetan Singh1-0/+4
Otherwise, the gl-1.0-long-dlist Piglit test crashes. Fixes: db7757 ("virgl: modify how we handle GL_MAP_FLUSH_EXPLICIT_BIT") Reported by airlied@ v2: Exit on any invalid range (Erik) Bugzilla: Reviewed-by: Dave Airlie <> Reviewed-by: Erik Faye-Lund <> Tested-by: Jakob Bornecrantz <>
2019-01-03freedreno: fix staging resource size for arraysRob Clark1-2/+10
A 2d-array texture (for example), should get the # of array elements from box->depth, rather than depth0 which is minified. Fixes dEQP-GLES3.functional.shaders.texture_functions.texture.sampler2darray_bias_float_fragment with tiled textures. Reported-by: Kristian H. Kristensen <> Signed-off-by: Rob Clark <>
2019-01-03freedreno: remove blit_via_copy_region()Rob Clark1-4/+0
If we hit the memcpy() path for copy_region(), that will try to do a transfer_map(), which goes badly for blits to/from staging triggered by transfer_map() or transfer_unmap(). We could possibly add fd_blit2() which has allow_transfer_map param, and call that for staging blits. But I'm not really sure if trying the blit via copy_region() is very useful. At least for newer gens that implement fd_context::blit(), it probably isn't. Signed-off-by: Rob Clark <>
2019-01-03freedreno/a6xx: rework blitter APIRob Clark1-54/+8
Switch over to using fd_context::blit(), in the same way that a5xx does. The previous patch wires fd_resource_copy_region() up to the blitter so a6xx no longer needs to bypass the core layer to accelerate this. Signed-off-by: Rob Clark <>
2019-01-03freedreno: try blitter for fd_resource_copy_region()Rob Clark1-0/+27
Signed-off-by: Rob Clark <>
2019-01-03freedreno: rework blit APIRob Clark8-27/+29
First step to unify the way fd5 and fd6 blitter works. Currently a6xx bypasses the blit API in order to also accelerate resource_copy_region() But this approach can lead to infinite recursion: #0 fd_alloc_staging (ctx=0x5555936480, rsc=0x7fac485f90, level=0, box=0x7fbab29220) at ../src/gallium/drivers/freedreno/freedreno_resource.c:291 #1 0x0000007fbdebed04 in fd_resource_transfer_map (pctx=0x5555936480, prsc=0x7fac485f90, level=0, usage=258, box=0x7fbab29220, pptrans=0x7fbab29240) at ../src/gallium/drivers/freedreno/freedreno_resource.c:479 #2 0x0000007fbe5c5068 in u_transfer_helper_transfer_map (pctx=0x5555936480, prsc=0x7fac485f90, level=0, usage=258, box=0x7fbab29220, pptrans=0x7fbab29240) at ../src/gallium/auxiliary/util/u_transfer_helper.c:243 #3 0x0000007fbde2dcb8 in util_resource_copy_region (pipe=0x5555936480, dst=0x7fac485f90, dst_level=0, dst_x=0, dst_y=0, dst_z=0, src=0x7fac47c780, src_level=0, src_box_in=0x7fbab2945c) at ../src/gallium/auxiliary/util/u_surface.c:350 #4 0x0000007fbdf2282c in fd_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47c780, src_level=0, src_box=0x7fbab2945c) at ../src/gallium/drivers/freedreno/freedreno_blitter.c:173 #5 0x0000007fbdf085d4 in fd6_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47c780, src_level=0, src_box=0x7fbab2945c) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:587 #6 0x0000007fbde2f3d0 in util_try_blit_via_copy_region (ctx=0x5555936480, blit=0x7fbab29430) at ../src/gallium/auxiliary/util/u_surface.c:864 #7 0x0000007fbdec02c4 in fd_blit (pctx=0x5555936480, blit_info=0x7fbab29588) at ../src/gallium/drivers/freedreno/freedreno_resource.c:993 #8 0x0000007fbdf08408 in fd6_blit (pctx=0x5555936480, info=0x7fbab29588) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:546 #9 0x0000007fbdebdc74 in do_blit (ctx=0x5555936480, blit=0x7fbab29588, fallback=false) at ../src/gallium/drivers/freedreno/freedreno_resource.c:129 #10 0x0000007fbdebe58c in fd_blit_from_staging (ctx=0x5555936480, trans=0x7fac47b7e8) at ../src/gallium/drivers/freedreno/freedreno_resource.c:326 #11 0x0000007fbdebea38 in fd_resource_transfer_unmap (pctx=0x5555936480, ptrans=0x7fac47b7e8) at ../src/gallium/drivers/freedreno/freedreno_resource.c:416 #12 0x0000007fbe5c5c68 in u_transfer_helper_transfer_unmap (pctx=0x5555936480, ptrans=0x7fac47b7e8) at ../src/gallium/auxiliary/util/u_transfer_helper.c:516 #13 0x0000007fbde2de24 in util_resource_copy_region (pipe=0x5555936480, dst=0x7fac485f90, dst_level=0, dst_x=0, dst_y=0, dst_z=0, src=0x7fac47b8e0, src_level=0, src_box_in=0x7fbab2997c) at ../src/gallium/auxiliary/util/u_surface.c:376 #14 0x0000007fbdf2282c in fd_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47b8e0, src_level=0, src_box=0x7fbab2997c) at ../src/gallium/drivers/freedreno/freedreno_blitter.c:173 #15 0x0000007fbdf085d4 in fd6_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47b8e0, src_level=0, src_box=0x7fbab2997c) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:587 ... Instead rework the API to push the fallback back to core code, so that we can rework resource_copy_region() to have it's own fallback path, and then finally convert fd6 over to work in the same way. This also makes ctx->blit() optional, and cleans up some unnecessary callers. Signed-off-by: Rob Clark <>
2019-01-03freedreno: skip depth resolve if not writtenRob Clark3-4/+14
For multi-pass rendering, it is common to keep the same depth buffer from previous pass, to discard geometry that would be hidden by later draws. In the later passes with depth-test enabled, but depth-write disabled, there is no reason to do gmem2mem resolve. TODO probably do something similar for stencil.. although stencil buffer isn't used as commonly these days Signed-off-by: Rob Clark <>
2019-01-02v3d: Refactor compiler entrypoints.Eric Anholt1-26/+6
Before, I had per-stage entryoints with some helpers shared between them. As I extended for compute shaders and shader-db, it turned out that the other common code in the middle wanted to be shared too.
2019-01-02v3d: Don't forget to include RT writes in precompiles.Eric Anholt1-0/+10
Looking at some assembly dumps for an optimization, we were clearly missing important parts of the shader!