summaryrefslogtreecommitdiff
path: root/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
AgeCommit message (Collapse)AuthorFilesLines
2022-11-09nvc0: lookup supported classes instead of determining from chipsetBen Skeggs1-23/+0
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Acked-by: M Henning <drawoc@darkrefraction.com> Reviewed-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17633>
2022-11-02gallium: split up req_local_memKarol Herbst1-1/+1
This will be required if a frontend has to request additional shared mem on top of the shader declared one, but wants to create the CSO before knowing the total amount. In OpenCL applications can bind additional shared mem through kernel arguments and this happens quite late. Note: Clover sets the req_local_mem incorrectly before so we can leave it as broken. v2: fix panfrost code (Alyssa) Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18581>
2022-11-02gallium: drop pipe_compute_state.req_private_memKarol Herbst1-1/+1
nothing used it and nothing will use it, so just drop it and clean up some dead struct fields in drivers. Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18581>
2022-08-30nvc0: make state handling race freeKarol Herbst1-1/+6
I am not entirely convinced that contexts can't mess up the state of other contexts, but with this we at least turn down the amount of races on the CPU side. If we hit bugs later we can always look into it then and figure out what to fix how. I think we might need a better solution for it in the future as state tracking might need to become more involved, but for now this should be good enough. Signed-off-by: Karol Herbst <kherbst@redhat.com> Acked-by: M Henning <drawoc@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10752>
2022-08-30nouveau: wrap nouveau_pushbuf_refnKarol Herbst1-3/+3
This makes it easier to insert locking code around libdrm. Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: M Henning <drawoc@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10752>
2022-08-30nouveau: wrap all nouveau_pushbuf_space callsKarol Herbst1-2/+2
This makes it easier to insert locking code around libdrm. Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: M Henning <drawoc@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10752>
2020-08-25nv50/ir: remove symbol table support for compute shadersKarol Herbst1-1/+1
The initial plan was to use this for OpenCL kernels, but back then the plan was to convert from LLVM to TGSI. As it turns out, we didn't went that way. Right now for OpenCL we don't reqiure supporting multiple entry points inside the same binary and if we want to support it later, we can add this back. Signed-off-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4264>
2019-10-07gallium: add PIPE_RESOURCE_FLAG_SINGLE_THREAD_USE to skip util_range lockMarek Olšák1-1/+1
u_upload_mgr sets it, so that util_range_add can skip the lock. The time spent in tc_transfer_flush_region decreases from 0.8% to 0.2% in torcs on radeonsi. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-06nvc0: add compute invocation counterRhys Perry1-0/+32
The strategy is to keep a CPU-side counter of the direct invocations, and a GPU-side counter of the indirect invocations, and then add them together for queries. The specific technique is a macro which multiplies a list of integers together and accumulates the product into SCRATCH registers held inside of the context. Another macro will read those values out and add them to the passed-in cpu-side counter to be stored in a query buffer the same way that all the other statistics are stored. Original implementation by Rhys Perry, redone by Ilia Mirkin to use the SCRATCH temporaries. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2019-01-27nvc0: don't put text segment into bufctxIlia Mirkin1-1/+4
The text segment is shared among multiple contexts, while each one has its own bufctx. So when reallocating the text segment, some contexts may end up with stale values in their bufctx's. Instead limit the exposure to the bufctx to within a single draw. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-07-30nvc0: serialize before updating some constant buffer bindings on Maxwell+Rhys Perry1-7/+6
To avoid serializing, this has the user constant buffer always be 65536 bytes and enabled unless it's required that something else is used for constant buffer 0. Fixes artifacts with at least XCOM: Enemy Within, 0 A.D. and Unigine Valley, Heaven and Superposition. v2: changed uniform_buffer_bound to be bool instead of a uint32_t v3: remove magic constants v3: remove pointless code in nvc0_validate_driverconst Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100177 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-10nvc0: fix valid range for shader buffersSamuel Pitoiset1-0/+1
When offset != 0, the valid range was wrong because the second argument of util_range_add() is end, not size. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-11nvc0: use a define for the driver constant buffer sizeSamuel Pitoiset1-4/+4
This might avoid mistakes if the size is bumped in the future. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-02nouveau: Add support for SV_WORK_DIMHans de Goede1-6/+17
Add support for SV_WORK_DIM for nvc0 and nve4. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-07-01nvc0: fix up image support for allowing multiple samplesIlia Mirkin1-0/+24
Basically we just have to scale up the coordinates and then add the relevant sample offset. The code to handle this was already largely present from Christoph's earlier attempts to pipe images through back in the dark ages, this just hooks it all up. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-06-05nvc0: do not clear surfaces bins in the validate functionSamuel Pitoiset1-0/+1
We should not call nouveau_bufctx_reset() inside a validate function. This only affects Fermi where images are aliased between 3D and CP. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-05nvc0: re-validate images after launching a grid on FermiSamuel Pitoiset1-0/+3
Images invalidation is a bit weird on Fermi and there is already a hack which forces invalidating all images when launching a computer shader to help in fixing 3D<->CP interaction. However, we need to re-validate images for compute because nvc0_compute_invalidate_surfaces() will destroy the previous binding. This is not really good for performance purposes but this might be improved later. This fixes the following piglits: - spec/arb_compute_shader/execution/basic-uniform-access - spec/arb_compute_shader/execution/mutiple-texture-reading - spec/arb_compute_shader/execution/multiple-workgroups - spec/glsl-4.30/execution/built-in-functions/cs-* (207 tests) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-04nvc0: mark bound buffer range validIlia Mirkin1-0/+3
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-05-28nvc0: do not always invalidate 3D CBs when using computeSamuel Pitoiset1-8/+17
Constant buffers are aliased between 3D and CP on Fermi, but we should only invalidate them when a compute shader actually uses CBs and not all the time after a lauching grid. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-26nvc0: invalidate textures/samplers between 3D and CP on FermiSamuel Pitoiset1-0/+13
Like constant buffers, samplers and textures are aliased on Fermi and we need to invalidate the state when switching from 3D to CP and vice versa. This fixes rendering issues in the UE4 demos. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-21nvc0: bind images on fragment and compute shaders for FermiSamuel Pitoiset1-0/+43
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-19nvc0: account for shader-allocated local memory needsIlia Mirkin1-1/+1
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-04-29nvc0: stick compute kernel arguments into uniform_boSamuel Pitoiset1-7/+5
Having one buffer object for input kernel arguments coming from clover and an other one for OpenGL user uniforms is unnecessary. Using the uniform_bo object for both GL/CL uniforms avoids to declare a new BO. This only affects compute programs but it should not hurt anything because the states are dirtied and data will get reuploaded. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26nvc0: reserve an area for surfaces info in the driver constbufSamuel Pitoiset1-2/+2
To process surfaces coordinates from the codegen part, and because some information like the format is not always available (eg. when writeonly is used), we have to stick some surfaces data in the driver constbuf. This is especially true for OpenCL because we don't know the format at shader compile time. This bumps the size of each shader area from 1K to 2K. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-19nvc0: avoid using magic numbers for the uniform_bo offsetsSamuel Pitoiset1-6/+7
Instead make use of constants to improve readability. The first 32 bytes of the driver constant buffer are unknown... This doesn't seem to be used in the codegen part, but if the texBindBase offset is shifted from 0x20 to 0x00, this breaks the universe for really weird reasons. This sounds like to be related to textures. Anyway, name this NVC0_CB_AUX_UNK_INFO and add a todo should be enough for now. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-09nvc0: add a new validation path for computeSamuel Pitoiset1-26/+20
This makes use of the new state validation interface to be consistent with 3d. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-26nvc0: rework nvc0_compute_validate_program()Samuel Pitoiset1-31/+3
Reduce the amount of duplicated code by re-using nvc0_program_validate(). While we are at it, change the prototype to return void and remove nvc0_compute.h which is now useless. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Pierre Moreau <pierre.morrow@free.fr> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-26nvc0: make sure to validate compute global buffers on FermiSamuel Pitoiset1-1/+3
No reason to not validate those global buffers and this might avoid fails if someone try to use the global memory from compute programs. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Pierre Moreau <pierre.morrow@free.fr> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-26nvc0: move nvc0_validate_global_residents() to nvc0_compute.cSamuel Pitoiset1-0/+15
While we are at it, rename it to nvc0_compute_validate_globals() and update its prototype. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Pierre Moreau <pierre.morrow@free.fr> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-22nvc0: rename 3d dirty flags to NVC0_NEW_3D_XXXSamuel Pitoiset1-2/+2
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-22nvc0: prefix compute macros with _CP_ instead of _COMPUTE_Samuel Pitoiset1-1/+1
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-22nvc0: rename NVXX_COMPUTE to NVXX_CPSamuel Pitoiset1-48/+48
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-22nvc0: rename nvc0_context::dirty to nvc0_context::dirty_3dSamuel Pitoiset1-2/+2
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-21nvc0: reduce likelihood of collision for real buffers on FermiSamuel Pitoiset1-2/+2
Reduce likelihood of collision with real buffers by placing the hole at the top of the 4G area. This fixes some indirect draw+compute tests with large buffers. Suggested by Ilia Mirkin. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-21nvc0: add support for indirect compute on FermiSamuel Pitoiset1-19/+33
When indirect compute is used, the size of the grid (in blocks) is stored as three integers inside a buffer. This requires a macro to set up GRIDDIM_YX and GRIDDIM_Z. Changes from v2: - do not launch the grid if the number of groups for a dimension is 0 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-21nvc0: bind textures/samplers for compute on FermiSamuel Pitoiset1-2/+36
Textures and samplers don't seem to be aliased between COMPUTE and 3D. Changes from v2: - refactor the code to share (almost) the same logic between 3d and compute Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-21nvc0: bind shader buffers for compute on FermiSamuel Pitoiset1-0/+34
This is loosely based on 3D. Shader buffers are bound on c15 (the driver constbuf) at offset 0x200. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-21nvc0: bind driver constbuf for compute on FermiSamuel Pitoiset1-0/+18
Changes from v3: - add new validation state for COMPUTE driver constbuf Changes from v2: - always bind the driver consts even if user params come in via clover Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-21nvc0: bind constant buffers for compute on FermiSamuel Pitoiset1-8/+64
Loosely based on 3D. Changs from v3: - invalidate COMPUTE CBs after validating 3D CBs because they are aliased Changes from v2: - get rid of the 's' param to nvc0_cb_bo_push() because it doesn't matter to upload constbufs for compute using the 3d chan Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-13gallium: add a new interface for pipe_context::launch_grid()Samuel Pitoiset1-11/+8
This introduces pipe_grid_info which contains all information to describe a launch_grid call. This will be used to implement indirect compute in the same fashion as indirect draw. Changes from v2: - correctly initialize pipe_grid_info for nv50/nvc0 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-12nvc0: do not force re-binding of compute constbufs on FermiSamuel Pitoiset1-1/+1
Re-binding compute constant buffers after launching a grid have no effects because they are not currently validated and because dirty_cp is not updated accordingly. This might also prevent weird future behaviours when UBOs will be bound for compute. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-12nvc0: remove useless goto in nvc0_launch_grid()Samuel Pitoiset1-6/+4
Trivial. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-05nv50,nvc0: provide debug messages with shader compilation statsIlia Mirkin1-1/+1
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-18nvc0: do not bind input params at compute state init on FermiSamuel Pitoiset1-8/+0
It looks like binding a constant buffer on compute overwrites the 3D state. To avoid that, we already re-bind all the 3D constant buffers after launching a compute grid but this is not enough. Binding the constant buffer of input parameters for the compute state at initialization corrupts the 3D constant buffers, and it's just useless to bind it because this is not needed until we really launch a grid. This fixes some piglit regressions related to interpolation tests introduced in "nvc0: enable compute support by default on Fermi". Fixes: 00d6186 (nvc0: enable compute support by default on Fermi) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-10nvc0: make use of NVC0_COMPUTE_CLASS for GF110Samuel Pitoiset1-5/+2
In theory, GF110+ should also support NVC8_COMPUTE_CLASS but, in practice, a ILLEGAL_CLASS dmesg fail appears when using it. This fixes compute support and MP performance counters on GF110. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-07-21nouveau: use bool instead of booleanSamuel Pitoiset1-12/+12
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-06-22nvc0: use NV_VRAM_DOMAIN() macroAlexandre Courbot1-1/+1
Use the newly-introduced NV_VRAM_DOMAIN() macro to support alternative VRAM domains for chips that do not have dedicated video memory. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Martin Peres <martin.peres@free.fr>
2013-12-06nvc0: fixup gk110 and up not being listed in various switch statementsBen Skeggs1-1/+1
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2013-09-11Move nv30, nv50 and nvc0 to nouveau.Johannes Obermayr1-0/+271
It is planned to ship openSUSE 13.1 with -shared libs. nouveau.la, nv30.la, nv50.la and nvc0.la are currently LIBADDs in all nouveau related targets. This change makes it possible to easily build one shared libnouveau.so which is then LIBADDed. Also dlopen will be faster for one library instead of three and build time on -jX will be reduced. Whitespace fixes were requested by 'git am'. Signed-off-by: Johannes Obermayr <johannesobermayr@gmx.de> Acked-by: Christoph Bumiller <christoph.bumiller@speed.at> Acked-by: Ian Romanick <ian.d.romanick@intel.com>