summaryrefslogtreecommitdiff
path: root/src/mesa
AgeCommit message (Collapse)AuthorFilesLines
2013-11-07i965/gen7: Expose ARB_shader_atomic_counters.Francisco Jerez2-0/+13
Reviewed-by: Paul Berry <stereotype441@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2013-11-07Revert "i965: Add support for GL_AMD_performance_monitor on Ironlake."Kenneth Graunke5-413/+0
This reverts most of commit 0f2da773070c06b6d20ad264d3abb19c4dfd9761. (I chose to leave the additions to brw_defines.h.) My previous Ironlake implementation was somewhat broken: counter data was global, rather than per-context. This meant that performance monitors captured data from your compositor, 2D driver, and other 3D programs. Originally, I believed that Sandybridge and later had an easy way to avoid this problem (setting per-context flags in OACONTROL), while Ironlake did not. So I'd intended to leave it as a known limitation of performance monitoring support on Ironlake. However, this turned out not to be true. Unfortunately, our hardware only has one set of aggregating performance counters shared between all 3D programs, and their values are not saved or restored by hardware contexts. Also, at least on Sandybridge and Ivybridge, the counters lose their values if the GPU goes to sleep. To work around both of these problems, we have to snapshot the performance counters at the beginning and end of each batch, similar to how we handle query objects on platforms that don't support hardware contexts. For occlusion queries, this batch bookending approach is fairly simple: only one occlusion query can be active at a time, and the result is a single integer. Performance monitors are more complex: an arbitrary number of monitors can be active at a time, each monitoring some subset of our ~30 observability counters. Individual monitors can be started and stopped at any point during the batch. Tracking where each monitor started/ended relative to batch flushes ends up being a pain. And you can run out of space in the buffer. Properly supporting this required some serious rearchitecting of the code. Rather than writing patches to try and morph a broken system into a working one (which operates quite differently), I decided it would be simplest to revert the old code and start fresh. Parts will look familiar, but other parts are new. I also decided it would be best to include Sandybridge and Ivybridge support from the start, since the newer platforms have added complexity that I wanted to make sure worked. They're also what most people care about these days. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2013-11-07st/mesa: Add support for ARB_vertex_type_10f_11f_11f_revFredrik Höglund2-1/+12
Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2013-11-07mesa: fix return statements in varray.cBrian Paul1-2/+2
Return false, not GL_FALSE. Add missing return value. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71359
2013-11-07i965: Add an implementation of intel_miptree_map using streaming loads.Matt Turner1-0/+85
Improves performance of RoboHornet's 2D Canvas toDataURL benchmark [http://www.robohornet.org/#e=canvastodataurl] by approximately 5x on Baytrail on ChromiumOS. Elapsed time drops by -81.4861% +/- 1.22619% (n=3 s=14.9105, confidence=95%). Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
2013-11-07mesa: Add a streaming load memcpy implementation.Matt Turner3-1/+127
Uses SSE 4.1's MOVNTDQA instruction (streaming load) to read from uncached memory without polluting the cache. Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
2013-11-07i965: Fix 'SIMD16 only' dispatch of fragment shader in case of sample shadingAnuj Phogat2-14/+25
This patch make changes to correctly set up the Dispatch GRF Start Register in case of 'SIMD16 only' FS dispatch. This fixes an issue of incorrect rendering on dolphin emulator with GL_SAMPLE_SHADING enabled. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>
2013-11-08i965: Enable ARB_vertex_type_10f_11f_11f_rev on Gen6+.Chris Forbes1-0/+1
This theoretically works on earlier hardware as well, but the extension requires at least GL3.0. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Eric Anholt <eric@anholt.net>
2013-11-08i965: add support for UNSIGNED_INT_10F_11F_11F_REV vertex attribsChris Forbes1-0/+2
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Eric Anholt <eric@anholt.net>
2013-11-08vbo: add 10_11_11 support to vbo_attrib_tmpChris Forbes1-6/+26
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Eric Anholt <eric@anholt.net>
2013-11-08mesa: Add support to _mesa_bytes_per_vertex_attrib for 10_11_11 format.Chris Forbes1-0/+5
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Eric Anholt <eric@anholt.net>
2013-11-08mesa: add varray support for UNSIGNED_INT_10F_11F_11F_REV typeChris Forbes1-3/+17
V2: fix interaction with VertexAttribFormat, since that landed after this was originally written Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Eric Anholt <eric@anholt.net>
2013-11-08mesa: Add extension scaffolding for ARB_vertex_type_10f_11f_11f_revChris Forbes2-0/+2
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Eric Anholt <eric@anholt.net>
2013-11-07i965: Avoid flushing the batch for every blorp op.Eric Anholt4-17/+50
This brings over the batch-wrap-prevention and aperture space checking code from the normal brw_draw.c path, so that we don't need to flush the batch every time. There's a risk here if the intel_emit_post_sync_nonzero_flush() call isn't high enough up in the state emit sequences -- before, we implicitly had one at the batch flush before any state was emitted, so Mesa's workaround emits didn't really matter. Since the SNB fixes by Ken, I didn't see any regressions after 3 piglit runs. Improves cairo-gl performance by 13.7733% +/- 1.74876% (n=30/32) Improves minecraft apitrace performance by 1.03183% +/- 0.482297% (n=90). Reduces low-resolution GLB 2.7 performance by 1.17553% +/- 0.432263% (n=88) Reduces Lightsmark performance by 3.70246% +/- 0.322432% (n=126) No statistically significant performance difference on unigine tropics (n=10) No statistically significant performance difference on openarena (n=755) The two apps that are hurt happen to include stalls on busy buffer objects, so I think this is an effect of missing out on an opportune flush. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-11-07build: Build gen_matypes and matypes.h from src/mesa.Matt Turner5-103/+15
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-11-07build: Change HAVE_X86_ASM to mean x86 or x86-64 asm.Matt Turner2-3/+6
I want a conditional that says generally "we have x86 assembly" in the next patch. Reviewed-by: Eric Anholt <eric@anholt.net>
2013-11-07mesa: Enable ARB_vertex_attrib_bindingFredrik Höglund1-0/+1
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-11-07mesa: Optimize rebinding the same VBOFredrik Höglund1-2/+5
Check if the new buffer object has the same name as the current buffer object before looking it up. Reviewed-by: Eric Anholt <eric@anholt.net>
2013-11-07mesa: Handle zero-stride arrays in _mesa_update_array_max_element()Fredrik Höglund1-2/+4
Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2013-11-07mesa: Add Get* support for ARB_vertex_attrib_bindingFredrik Höglund3-0/+38
Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2013-11-07mesa: Add ARB_vertex_attrib_bindingFredrik Höglund16-125/+691
update_array() and update_array_format() are changed to update the new attrib and binding states, and the client arrays become derived state. Reviewed-by: Eric Anholt <eric@anholt.net>
2013-11-07glapi: Add infrastructure for ARB_vertex_attrib_bindingFredrik Höglund3-6/+72
Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2013-11-07mesa: Make handle_bind_buffer_gen() non-staticFredrik Höglund2-11/+22
...and rename it to _mesa_bind_buffer_gen(). This is so the function can be called from _mesa_BindVertexBuffer(). This patch also adds a caller parameter so we can report the right entry point in error messages. Based on a patch by Eric Anholt. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2013-11-07mesa: Rename gl_array_object::VertexAttrib to _VertexAttribFredrik Höglund13-134/+134
This will become derived state as part of the ARB_vertex_attrib_binding support. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2013-11-07mesa: Split out the format code from update_array()Fredrik Höglund1-57/+93
Split out the code for updating the array format into a new function called update_array_format(). This function will be called by both update_array() and the new glVertexAttrib*Format() entry points in ARB_vertex_attrib_binding. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2013-11-07mesa: Restore gl_array_object::NewArrayFredrik Höglund4-0/+10
This will be used by the ARB_vertex_attrib_binding implementation. This reverts commit db38e9a0e179441f59274f6f2a751912c29872e2. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2013-11-07i965: Use has_surface_tile_offset in depth/stencil alignment workaround.Kenneth Graunke1-2/+2
Currently, has_surface_tile_offset is equivalent to gen == 4 && !is_g4x. We already use it for related checks in brw_wm_surface_state.c, so it makes sense to use it here too. It's simpler and more future-proof. Broadwell also lacks surface tile offsets. With this patch, I won't need to update any generation checking; I can simply not set the flag. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
2013-11-06mesa: add arm64 supportFabio Pedretti1-1/+1
Patch from Ubuntu package Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
2013-11-06i965/gen6: Don't allow SIMD16 dispatch in 4x PERPIXEL mode with computed depth.Paul Berry1-1/+33
Hardware docs say we can only use SIMD8 dispatch in this condition. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2013-11-06mesa: Build program as part of libmesa.Matt Turner2-53/+18
2013-11-06mesa: Clean up use of top_srcdir/top_builddir.Matt Turner1-11/+4
2013-11-06i965: Use unreachable() to silence a compiler warning.Matt Turner1-0/+1
Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2013-11-06mesa: Add unreachable() macro.Matt Turner1-0/+15
Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2013-11-06mesa: for GLSL_DUMP_ON_ERROR, also dump the info logBrian Paul1-0/+2
Since it's helpful to know why the shader did not compile. Also, call fflush() for Windows. Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-11-06i965/fs: Gen4-5: Implement alpha test in shader for MRTChris Forbes3-0/+58
V2: Add comment explaining what emit_alpha_test() is for; fix spurious temp and bogus whitespace. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Eric Anholt <eric@anholt.net>
2013-11-06i965/fs: Gen4-5: Setup discard masks for MRT alpha testChris Forbes2-2/+2
The same setup is required here as when the user-provided shader explicitly uses KIL or discard. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Eric Anholt <eric@anholt.net>
2013-11-06i965: Gen4-5: Include alpha func/ref in program keyChris Forbes2-0/+18
V2: Better explanation of the rationale for doing this. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Eric Anholt <eric@anholt.net>
2013-11-06i965: Gen4-5: Don't enable hardware alpha test with MRTChris Forbes1-1/+2
We have to do this in the shader instead, since these gens lack an independent RT0 alpha value in their render target write messages. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Eric Anholt <eric@anholt.net>
2013-11-05i965: Combine {brw,gen7}_update_texture_buffer_surface() functions.Kenneth Graunke3-40/+5
Now that brw_update_texture_buffer_surface() uses the virtual emit_buffer_surface_state() function, it works for Gen7+ too. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-11-05i965: Unvirtualize brw_create_constant_surface; delete Gen7+ variant.Kenneth Graunke4-45/+17
Now that brw_create_constant_surface uses a virtual function internally, it doesn't need to be virtual itself. We can delete the Gen7+ variant and simplify things. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-11-05i965: Use the new emit_buffer_surface_state() vtable entry.Kenneth Graunke1-10/+10
This will allow us to combine the Gen4-6 and Gen7 variants of these functions. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-11-05i965: Virtualize emit_buffer_surface_state().Kenneth Graunke3-4/+20
This entails adding "mocs" and "rw" parameters to the Gen4-5 version. I made it actually pay attention to the rw flag (even though it is always false), but mocs is always ignored. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-11-05i965: Fix compiler warning.Courtney Goeltzenleuchter2-2/+2
fix: intel_screen.c:1320:4: warning: initialization from incompatible pointer type [enabled by default] Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-11-05i965: Tell the unit states how many binding table entries we have.Eric Anholt7-5/+22
Before the series with 3c9dc2d31b80fc73bffa1f40a91443a53229c8e2 to dynamically assign our binding table indices, we didn't really track our binding table count per shader, so we never filled in these fields. Affects cairo-gl trace runtime by -2.47953% +/- 1.07281% (n=20) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-11-05i965: Fix context initialization after 2f896627175384fd5Eric Anholt1-3/+6
You can't return stack-initialized values and expect anything good to happen. Reviewed-by: Chad Versace <chad.versace@linux.intel.com Reviewed-by: Matt Turner <mattst88@gmail.com>
2013-11-05nouveau: Use _NEW_SCISSOR instead of hooking through dd_function_tableIan Romanick1-7/+3
This will enable removing the dd_function_table::Scissor hook in the near future. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2013-11-05nouveau: Use _NEW_VIEWPORT instead of hooking through dd_function_tableIan Romanick1-14/+3
This will enable removing the dd_function_table::DepthRange hook in the near future. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2013-11-05radeon / r200: Don't pass unused parameters to radeon_viewportIan Romanick4-4/+14
The x, y, width, and height parameters aren't used by radeon_viewport, so don't pass them. This should make future changes to the dd_function_table::Viewport interface a little easier. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jordan Justen <jljusten@gmail.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Cc: Courtney Goeltzenleuchter <courtney@lunarg.com>
2013-11-05i915: Bring sanity to the Viewport functionIan Romanick4-28/+22
The i830 and the i915 driver have the same dd_function_table::Viewport function... it just has two names and lives in two places. Using a single implementation allows cleaning up the saved_viewport nonsense too. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jordan Justen <jljusten@gmail.com> Cc: Courtney Goeltzenleuchter <courtney@lunarg.com>
2013-11-05i965: Eliminate the saved_viewport wrapperIan Romanick2-7/+5
The i965 driver never installed a dd_function_table::Viewport function, so this wrapper never actually did anything. No piglit regressions on IVB on DRI2. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jordan Justen <jljusten@gmail.com> Cc: Courtney Goeltzenleuchter <courtney@lunarg.com>