summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-02-17nvc0: add support for bindless on maxwell+Ilia Mirkin4-14/+117
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-02-17gm107/ir: change how SUQ works in preparation for bindlessIlia Mirkin3-1/+61
All this information can be retrieved from the TIC directly. Avoid having to dip into the constbuf information about the image. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-02-17i965: Use absolute addressing for constant buffer 0 on Kernel 4.16+.Kenneth Graunke2-1/+32
By default, 3DSTATE_CONSTANT_* Constant Buffer 0 is relative to dynamic state base address. This makes it unusable for pushing UBOs. There is a bit in the INSTPM register (or CS_DEBUG_MODE2 on Skylake) which controls whether buffer 0 is relative to dynamic state base address, or simply a normal pointer. Setting that gives us full flexibility. This lets us push up to 4 UBO ranges. We can't currently write this on Haswell and earlier, and will need to update the kernel command parser, and then do the whole version checking song and dance. We also need a brand new kernel that supports context isolation - on older kernels, newly created contexts inherit register state from whatever happened to be running. So, setting this would have catastrophic impact on other drivers such as libva, Beignet, or older Mesa. See commit 8ec5a4e4a4a32f4de351c5fc2bf0eb615b6eef1b where we did this once before, but had to revert it in commit 013d33122028f2492da90a03a. Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2018-02-17i965: Stop restoring the default L3 configuration on Kernel 4.16+.Kenneth Graunke3-2/+7
Kernel 4.16 has proper context isolation, which means we can change the L3 configuration without worrying about that leaking to other newly created contexts, breaking the assumptions of other userspace. So, disable our workaround to reprogram it back to the default. Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2018-02-17nvc0: Use GP100_COMPUTE_CLASS on GP10BMikko Perttunen1-1/+2
GP10B requires the use of GP100_COMPUTE_CLASS instead of GP104_COMPUTE_CLASS as is used for other non-GP100 chips. Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-02-17i965: Fix aux-surface size checkDaniel Stone2-3/+12
The previous commit reworked the checks intel_from_planar() to check the right individual cases for regular/planar/aux buffers, and do size checks in all cases. Unfortunately, the aux size check was broken, and required the aux surface to be allocated with the correct aux stride, but full image height (!). As the ISL aux surface is not recorded in the DRIimage, we cannot easily access it to check. Instead, store the aux size from when we do have the ISL surface to hand, and check against that later when we go to access the aux surface. Signed-off-by: Daniel Stone <daniels@collabora.com> Fixes: c2c4e5bae3ba ("i965: Fix bugs in intel_from_planar") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-17radeonsi: implement 32-bit pointers in user data SGPRs (v2)Marek Olšák7-59/+141
User SGPRs changes: VS: 14 -> 9 TCS: 14 -> 10 TES: 10 -> 6 GS: 8 -> 4 GSCOPY: 2 -> 1 PS: 9 -> 5 Merged VS-TCS: 24 -> 16 Merged VS-GS: 18 -> 11 Merged TES-GS: 18 -> 11 SGPRS: 2170102 -> 2158430 (-0.54 %) VGPRS: 1645656 -> 1641516 (-0.25 %) Spilled SGPRs: 9078 -> 8810 (-2.95 %) Spilled VGPRs: 130 -> 114 (-12.31 %) Scratch size: 1508 -> 1492 (-1.06 %) dwords per thread Code Size: 52094872 -> 52692540 (1.15 %) bytes Max Waves: 371848 -> 372723 (0.24 %) v2: - the shader cache needs to take address32_hi into account - set amdgpu-32bit-address-high-bits Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v1)
2018-02-17radeonsi: disallow constant buffers with a 64-bit address in slot 0Marek Olšák2-1/+9
State trackers must use a user buffer or const_uploader, or set pipe_resource::flags same as const_uploader->flags. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-02-17radeonsi: move const_uploader allocations to 32-bit address spaceMarek Olšák3-2/+7
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-02-17winsys/radeon: implement and enable 32-bit VM allocationsMarek Olšák3-8/+64
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-02-17winsys/radeon: add struct radeon_vm_heapMarek Olšák3-36/+47
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-02-17winsys/amdgpu: enable 32-bit VM allocationsMarek Olšák1-1/+2
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-02-17gallium/radeon: add 32-bit address space heapsMarek Olšák1-3/+44
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-02-17ac: query high bits of 32-bit address spaceMarek Olšák4-2/+10
2018-02-17gallium: use PIPE_CAP_CONSTBUF0_FLAGSMarek Olšák4-5/+27
2018-02-17gallium: allow drivers to impose BO flags restrictions on constant buffer 0Marek Olšák18-0/+21
Required by radeonsi for optimal behavior.
2018-02-16meson: Add Haiku platform support v4Alexander von Gluck IV11-17/+209
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-02-16anv/icl: Add render target flush after uploading binding tableAnuj Phogat1-0/+20
The PIPE_CONTROL command description says: "Whenever a Binding Table Index (BTI) used by a Render Taget Message points to a different RENDER_SURFACE_STATE, SW must issue a Render Target Cache Flush by enabling this bit. When render target flush is set due to new association of BTI, PS Scoreboard Stall bit must be set in this packet." Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-16anv/icl: Enable float blend optimizationAnuj Phogat1-1/+1
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-16anv/icl: Use gen11 functionsAnuj Phogat2-0/+6
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-16anv/icl: Build anv libs for gen11Anuj Phogat4-2/+32
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-16anv/icl: Generate gen11 entry point functionsAnuj Phogat1-1/+5
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-16anv/icl: Don't use DISPATCH_MODE_SIMD4X2Anuj Phogat1-0/+5
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-16anv/icl: Don't use SingleVertexDispatchAnuj Phogat1-0/+2
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-16anv/icl: Don't set ResetGatewayTimerAnuj Phogat1-0/+2
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-16anv/icl: Add #define genXAnuj Phogat1-0/+3
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-16anv/icl: Add gen11 mocs definesAnuj Phogat1-0/+11
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-16i965: Implement GenerateMipmap directly, rather than using Meta.Kenneth Graunke5-0/+135
Meta is awful and we'd like to stop using it. Implementing this using BLORP allows us to stop trashing a bunch of GL state every time. This follows the structure of st_generate_mipmap(). compute_num_levels is lifted directly from there. Improves performance in Gl41HdrBloom by about 11.794% +/- 1.01919% (n=3) on Kabylake GT2 at 1280x720 (the difference seems much smaller at higher resolutions). v2 (idr): Don't try depth or depth-stencil blorp blits on Gen4 or Gen5 because it's not implemented yet. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-02-16mesa: Move compute_num_levels from st_gen_mipmap.c to mipmap.c.Kenneth Graunke3-27/+29
I want to use compute_num_levels inside i965. Rather than duplicating it, move it from mesa/st to core Mesa, and make it non-static. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-16meson: freedreno depends on nirDylan Baker1-0/+1
This fixes a race condition in building targets that link in freedreno. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105120 Fixes: 0bbecc5a8548883f76a7 ("meson: define driver dependencies") Signed-off-by: Dylan Baker <dylan.c.baker@intel.com> Acked-by: Mark Janes <mark.a.janes@intel.com>
2018-02-16swr/rast: blend_epi32() should return Integer, not FloatGeorge Kyriazis1-1/+1
fix gcc8 compiler error for KNL. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105029 Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16swr/rast: Normalize path for debug metadataGeorge Kyriazis1-1/+2
in template gen_llvm.hpp Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16swr/rast: Consolidate archrast Draw eventsGeorge Kyriazis4-26/+79
Consolidate archrst draw events into single draw event with an attribute that represents the type of draw - Add handlers for new private proto versions of DrawInstancedEvent, DrawIndexedInstancedEvent, DrawInstancedSplitEvent, and DrawIndexedInstancedSplitEvent - Convert the draw events to generic DrawInfoEvents - parse_proto_event_fields() replaces 'AR_DRAW_TYPE' as a field type with 'uint32_t'. This draw type is actually an enum, but can be represented as an unsigned integer. - is_draw_or_dispatch() recognizes DrawInfoEvent as a draw event Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16swr/rast: Add semantics for translating addressGeorge Kyriazis2-0/+5
Added support for another full translation path in fetch jitter. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16swr/rast: Convert C Sampler intrinsicsGeorge Kyriazis2-0/+19
Convert portions of the C sampler to the rasty SIMD lib. Also fix SRL call with a non-immediate. Don't count on the compiler automagically converting an srli call to srl if the shift count isn't an immediate. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16swr/rast: Make SIMDLib templated types easier to useGeorge Kyriazis5-298/+307
"typename SIMD_T::TypeName" --> "TypeName<SIMD_T>" Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16swr/rast: Be more explicit when fetching next componentGeorge Kyriazis2-4/+11
Use a new function to denote that we want to get offset to next component and hide the fact that GEP is used underneath. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16swr/rast: Fix bug related to passing AR handleGeorge Kyriazis1-1/+1
We were passing a garbage handle. Let's not do that. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16swr/rast: Fix primitive replication issue in tesselation PA.George Kyriazis2-2/+3
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16swr/rast: Use llvm intrinsic masked gatherGeorge Kyriazis2-0/+14
Use llvm intrinsic masked.gather instead of manual unroll for the cases where we have vector of pointers. Improves llvm IR debug experience by reducing a ton of IR to a single intrinsic call. Also seems to reduce overall stack use considerably. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16swr/rast: Misc cleanupGeorge Kyriazis3-49/+60
Together with correct detection of clipDistance NaNs when no cullDistance is set Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16swr/rast: Renamed variable in vertexbufferstateGeorge Kyriazis3-6/+8
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16swr/rast: Fix GATHERPS to avoid assertions.George Kyriazis1-2/+3
With the pBase type change, LLVM was asserting because of wrong types. Cast appropriately. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16swr/rast: More precise user clip distance interpolationGeorge Kyriazis2-17/+4
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16swr/rast: Cull prims when all verts have negative clip distancesGeorge Kyriazis1-0/+4
Performance optimization, and fixes some clipping issues. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16swr/rast: whitespace and comment cleanupGeorge Kyriazis2-20/+21
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16swr/rast: Fix invalid number of attributesGeorge Kyriazis1-1/+1
Fix invalid number of attributes passed into tesselation PA. Needs to take into account any offsets from the shader. Innocuous issue, but removes an assert firing in debug. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16swr/rast: Add clipper stats.George Kyriazis4-17/+31
Clipper event is now: event ClipperEvent { uint32_t drawId; uint32_t trivialRejectCount; uint32_t trivialAcceptCount; uint32_t mustClipCount; }; Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16swr/rast: Separate event types to public and privateGeorge Kyriazis7-119/+155
Split into two proto files and modify appropriate build rules for configure / scons / meson builds. There are private internal events (proxy) that communicate information from rasterizer to ArchRast. ArchRast can use these events to calculate a final answer and then emit other public events which will be saved to file. Users will use the public proto file and not the private one. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-02-16swr/rast: Clean up event types and remove BE eventsGeorge Kyriazis2-80/+0
Begin/End events not needed anymore. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>