summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-07-22i965/miptree: Tighten up finish_mcs_writeJason Ekstrand1-7/+8
Multisample surfaces only have a single miplevel so there's no reason to be passing the extra parameters around. It only leads to confusion. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22i965/miptree: Make aux_state work in terms of logical layersJason Ekstrand1-6/+13
This commit changes layer_range_length to return locical layers and also changes the way we allocate the aux_state field to not allocate extra layers for MCS. This will be important as we're about to start doing significantly more detailed tracking of MCS state. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22intel/blorp: Add a partial resolve pass for MCSJason Ekstrand4-1/+213
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22i965/miptree: Remove some unneeded restrictionsJason Ekstrand2-11/+4
intel_miptree_supports_ccs_e should handle the gen >= 9 requirement and there's no reason why we can't do CCS_E on window system buffers so long as we resolve. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22i965/miptree: Stop setting FOR_SCANOUT for renderbuffersJason Ekstrand1-2/+1
Nothing created through intel_miptree_create_for_renderbuffer will ever be exposed externally so there's no need to set FOR_SCANOUT. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22i965/blorp: Do flushes around depth resolvesJason Ekstrand1-78/+72
It turns out that if you have rendering in-flight with CCS_E enabled and you go to do a depth resolve without flushing, the CCS data may never hit the memory. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22i965/blorp: Use the renderbuffer format for clearsJason Ekstrand1-1/+9
This fixes the Piglit ARB_texture_views rendering-formats test. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22anv: Predicate fast-clear resolvesNanley Chery3-16/+120
Image layouts only let us know that an image *may* be fast-cleared. For this reason we can end up with redundant resolves. Testing has shown that such resolves can measurably hurt performance and that predicating them can avoid the penalty. v2: - Introduce additional resolve state management function (Jason Ekstrand). - Enable easy retrieval of fast clear state fields. v3: Use more descriptive field enums (Jason) Signed-off-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22intel/blorp: Allow BLORP calls to be predicatedNanley Chery2-0/+6
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22anv/cmd_buffer: Skip some input attachment transitionsNanley Chery1-5/+26
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22anv: Stop resolving CCS implicitlyNanley Chery3-169/+5
With an earlier patch from this series, resolves are additionally performed on layout transitions. Remove the now unnecessary implicit resolves within render passes. Signed-off-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22anv: Transition more color buffer layoutsNanley Chery2-28/+169
v2: Expound on comment for the pipe controls (Jason Ekstrand). v3: - Cast base_layer to uint64_t to avoid overflow. - Remove "seems" from the pipe control comment. - Fix clamp of layer_count (Jason Ekstrand). Signed-off-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22anv/cmd_buffer: Warn about not enabling CCS_ENanley Chery1-5/+7
Use the performance warning infrastructure to provide helpful information when testing applications. Signed-off-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22anv/cmd_buffer: Move aux_usage assignment upNanley Chery1-32/+30
For readability, bring the assignment of CCS closer to the assignment of NONE and MCS. Signed-off-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22anv/cmd_buffer: Always enable CCS_D in render passesNanley Chery2-11/+20
The lifespan of the fast-clear data will surpass the render pass scope. We need CCS_D to be enabled in order to invalidate blocks previously marked as cleared and to sample cleared data correctly. v2: Avoid refactoring. v3: Allow CCS_D for subpass resolves. Signed-off-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22anv/cmd_buffer: Disable CCS on gen7 color attachments upfrontNanley Chery1-11/+5
The next patch enables the use of CCS_D even when the color attachment will not be fast-cleared. Catch the gen7 case early to simplify the changes required. Signed-off-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22anv/cmd_buffer: Ensure fast-clear values are currentNanley Chery1-0/+114
v2: Rewrite functions, change location of synchronization. Signed-off-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22anv/gpu_memcpy: Add a lighter-weight GPU memcpy functionNanley Chery2-0/+45
We'll be performing a GPU memcpy in more places to copy small amounts of data. Add an alternate function that thrashes less state. v2: - Make a new function (Jason Ekstrand). - Move the #define into the function. v3: - Update the function name (Jason). - Update comments. v4: Use an indirect drawing register as TEMP_REG (Jason Ekstrand). Signed-off-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22anv/cmd_buffer: Restrict fast clears in the GENERAL layoutNanley Chery3-0/+40
v2: Remove ::first_subpass_layout assertion (Jason Ekstrand). v3: Allow some fast clears in the GENERAL layout. v4: Remove extra '||' and adjust line break (Jason Ekstrand). Signed-off-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22anv/cmd_buffer: Don't partially fast clear image layersNanley Chery1-8/+23
v2: Don't pass in the command buffer (Jason Ekstrand). v3: Remove an incorrect assertion and an if condition for gen7. Signed-off-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22anv/cmd_buffer: Initialize the clear values bufferNanley Chery1-1/+78
v2: Rewrite functions. v3 (Jason Ekstrand): - Don't set ResourceMinLOD. - Fix clamp of level_count. Signed-off-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22anv/image: Append CCS/MCS with a fast-clear state bufferNanley Chery2-0/+90
v2: Update comments, function signatures, and add assertions. Signed-off-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22anv/image: Disable CCS if the image doesn't support renderingNanley Chery1-0/+15
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22intel/isl: Add surface state clear value informationNanley Chery2-0/+13
This will be used to load and store clear values from surface state objects. Signed-off-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22anv: Transition MCS buffers from the undefined layoutNanley Chery3-18/+35
v2: Define MCS buffers with any sample count (Jason) Cc: <mesa-stable@lists.freedesktop.org> Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2017-07-22intel/isl: Tighten up restrictions for CCS on gen7Jason Ekstrand1-7/+23
It may technically be possible to enable some sort of fast-clear support for at least the base slice of a 2D array texture on gen7. However, it's not documented to work, we've never tried to do it in GL, and we have no idea what the hardware does if you turn on CCS_D with arrayed rendering. Let's just play it safe and disallow it for now. If someone really cares that much about gen7 performance, they can come along and try to get it working later.
2017-07-22i965/bufmgr: Add comments about GTT coherency issues.Chris Wilson1-0/+22
(Patch written by Ken, but entirely comments written by Chris.) Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-22i965: Drop non-LLC lunacy in the program cache code.Kenneth Graunke3-70/+21
The non-LLC story was a horror show. We uploaded data via pwrite (drm_intel_bo_subdata), which would stall if the cache BO was in use (being read) by the GPU. Obviously, we wanted to avoid that. So, we tried to detect whether the buffer was busy, and if so, we'd allocate a new BO, map the old one read-only (hopefully not stalling), copy all shaders compiled since the dawn of time to the new buffer, upload our new one, toss the old BO, and let the state upload code know that our program cache BO changed. This was a lot of extra data copying, and flagging BRW_NEW_PROGRAM_CACHE would also cause a new STATE_BASE_ADDRESS to be emitted, stalling the entire pipeline. Not only that, but our rudimentary busy tracking consistented of a flag set at execbuf time, and not cleared until we threw out the program cache BO. So, the first shader upload after any drawing would hit this "abandon the cache and start over" copying path. This is largely unnecessary - it's just ancient and crufty code. We can use the same persistent mapping paths on all platforms. On non-ancient kernels, this will use a write combining map, which should be reasonably fast. One aspect that is worse: we do occasionally grow the program cache BO, and copy the old contents to the newer BO. This will suffer from UC readback performance now. To mitigate this, we use the MOVNTDQA based streaming memcpy on platforms with SSE 4.1 (all Gen7+ atoms). Gen4-5 are unfortunately going to be penalized. v2: Add MOVNTDQA path, rebase on other map flag changes. v3: Drop cache->bo_used_by_gpu too (caught by Chris Wilson). Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-22i965: Set MAP_PERSISTENT on program cache buffers.Kenneth Graunke1-4/+8
Chris Wilson pointed out that this mapping really is persistant. Shouldn't actually have any effect today, but best to set it anyway. Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-22i965: Correctly set MAP_WRITE when creating the LLC program cache map.Kenneth Graunke1-1/+1
Using a read-only mapping is completely bogus - we use this mapping to write all new shaders to the cache. Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-22i965/bufmgr: Use write-combine mappings where availableMatt Turner1-3/+88
Write-combine mappings give much better performance on writes than uncached access through the GTT. Improves performance of GFXBench 4's gl_driver2 benchmark at 1024x768 on Apollolake by 3.6086% +/- 0.674193% (n=15). v2: (by Ken) Rebase on lockless mappings, map_count deletion, valgrind updates, potential for CPU/WC maps failing, and other changes. v3: (by Ken and Chris Wilson) (Ken): Rebase on set_domain -> gem_wait (Chris): Fix up a failed CPU/WC mmaping with a GTT mapping Not all objects will be mappable for direct access by the CPU (either using WC/CPU or WC paths), for example, a dmabuf wrapping an object on a foreign device or an object wrapping access to stolen memory. Since either the physical pages are not known or even do not exist, we need to use the mediated, indirect access via the GTT. (If one day, the kernel does suddenly start providing mediated access via a regular WB/WC mmapping, we no longer need the fallback.) v4: Avoid falling back for MAP_RAW (Chris). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-22i965/bufmgr: Skip wait ioctl when not busy.Kenneth Graunke1-0/+4
If the buffer is idle, we I915_GEM_WAIT will return immediately, so we may as well skip the ioctl altogether. We can't trust the "idle" flag for external buffers, but for most, it should be fine. Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-22i965/bufmgr: Explicitly wait instead of using I915_GEM_SET_DOMAIN.Kenneth Graunke1-17/+6
With the advent of asynchronous maps, domain tracking doesn't make a whole lot of sense. Buffers can be in use on both the CPU and GPU at the same time. In order to avoid blocking, we stopped using set_domain for asynchronous mappings, which means that the kernel's tracking has lies. We can't properly track it in userspace either, as the kernel can change domains on us spontaneously (for example, when un-swapping). According to Chris Wilson, I915_GEM_SET_DOMAIN does the following: 1. pins the backing storage (acquiring pages outside of the struct_mutex) 2. waits either for read/write access, including inter-device waits 3. updates the domain, clflushing as required 4. marks the object as used (for swapping) 5. turns off FBC/PSR/fancy scanout caching Item (1) is not terribly important. Most BOs are recycled via the BO cache, so they already have pages. Regardless, we fixed this via an initial set_domain in the previous patch. We implement item (2) with I915_GEM_WAIT. This has one downside: we'll stall unnecessarily if we do a read-only mapping of a buffer that the GPU is reading. I believe this is pretty uncommon. We may want to extend the wait ioctl at some point. Mesa already does item (3) itself. For cache-coherent buffers (most on LLC systems), we don't need to do any clflushing - the CPU and GPU views are coherent. For non-coherent buffers (most on non-LLC systems), we currently only use the CPU for read-only maps, and we explicitly clflush when necessary. We don't care about item (4)...swapping has already killed performance. Plus, with async maps, the kernel's domain tracking is already bogus, so it can't do this accurately regardless. Item (5) should be okay because we avoid cached maps of scanout buffers. Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-22i965/bufmgr: Allocate BO pages outside of the kernel's locking.Kenneth Graunke1-0/+13
Suggested by Chris Wilson. v2: Set the write domain to 0 (suggested by Chris). Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-23glsl: rework misleading block layout codeTimothy Arceri1-4/+4
From the ARB_uniform_buffer_object spec: ""shared" uniform blocks, the default layout, ..." This doesn't fix anything as the default layout is already applied at this point but fixes the misleading code/comment. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-07-23glsl: remove placeholder commentTimothy Arceri1-4/+0
This was added in 2d03f48a65a666 and seems like it was intended as a TODO comment in a function stub rather than a useful code comment. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-07-22st/mesa: use proper resource target type in st_AllocTextureStorage()Brian Paul1-1/+4
When we validate the texture sample count, pass the correct pipe_texture_target for the texture, rather than PIPE_TEXTURE_2D. Also add more comments about MSAA. No piglit regressions with VMware driver. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-07-22mesa: remove pointless assignments in init_teximage_fields_ms()Brian Paul1-3/+0
The NumSamples and FixedSampleLocation fields are set again later at the end of the function so these earlier assignments aren't needed. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-07-22svga: Limit number of immediates in shaderNeha Bhende1-3/+5
imm {128.0, -128.0, 2.0, 3.0} is used for lit instruction which is not used very frequently. So allocate it only if lit instruction is used. Tested with mtt piglit and mtt glretrace v2: As per Charmaine's comment Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-07-22svga: fix constant indices for texcoord scale factors and texture buffer sizeCharmaine Lee1-9/+6
This patch fixes the ordering of the constant indices for texcoord scale factor and texture buffer size to match the order they were added to the constant buffer in svga_get_extra_constants_common(). Tested with MTT piglit, glretrace. Reviewed-by: Brian Paul <brianp@vmware.com>
2017-07-22svga: fix unnormalized->normalized texture coordinate conversionNeha Bhende3-3/+35
Sometimes, converting unnormalized coordinates to normalized coordinates requires an epsilon value to produce the right texels with nearest filtering. Adding 0.0001 to the coordinates when the min/mag filter is nearest fixes the issue. Fixes piglit test fbo-blit-scaled-linear Tested with mtt-piglit, mtt-glretrace Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-07-22svga: only support 4x, 8x, 16x msaaBrian Paul1-0/+5
Skip 2x MSAA, for example, since it's seldom used and just bloats the list of pixel formats. Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-07-22mesa: include texture size in error messagesBrian Paul1-4/+5
Reviewed-by: Alejandro PiƱeiro <apinheiro@igalia.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-07-22i965: Support the mesa_no_error driconf option.Kenneth Graunke2-0/+4
This allows us to override contexts to use no_error functionality even if the applications themselves do not. Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-22anv/blorp: Assert isl_surf_init success in do_buffer_copyJason Ekstrand1-13/+15
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22anv/blorp: Explicitly set row_pitch in do_buffer_copyJason Ekstrand1-1/+1
We have a very specific row pitch that we want and we don't want ISL to be changing it on us so just be explicit about it. Fixes: a40f0430347c07bf2d5794642fe02f5dd248a473 Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22i965: Delete gen8_draw_upload.cKenneth Graunke1-0/+0
For some reason we left an empty file, rather than deleting it.
2017-07-21nv50/ir: disable mul+add to mad for precise instructionsKarol Herbst1-2/+3
fixes missrendering in TombRaider KHR-GL44.gpu_shader5.precise_qualifier KHR-GL45.gpu_shader5.precise_qualifier v4: disable opt only for MAD, it's fine for SAD Signed-off-by: Karol Herbst <karolherbst@gmail.com> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2017-07-21nv50/ir/tgsi: handle precise for most ALU instructionsKarol Herbst1-0/+2
Signed-off-by: Karol Herbst <karolherbst@gmail.com> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2017-07-21nv50/ir: add precise field to InstructionKarol Herbst2-0/+3
v4: initialize field with NULL Signed-off-by: Karol Herbst <karolherbst@gmail.com> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>