summaryrefslogtreecommitdiff
path: root/src/mesa/drivers/dri/i965/gen6_vs_state.c
AgeCommit message (Collapse)AuthorFilesLines
2015-08-18i965/gen6: Resolve GCC sign-compare warning.Rhys Kidd1-1/+1
mesa/src/mesa/drivers/dri/i965/gen6_vs_state.c: In function 'gen6_upload_push_constants': mesa/src/mesa/drivers/dri/i965/gen6_vs_state.c:85:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] for (i = 0; i < prog_data->nr_params; i++) { ^ mesa/src/mesa/drivers/dri/i965/gen6_vs_state.c:92:17: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] for (i = 0; i < prog_data->nr_params; i++) { ^ Signed-off-by: Rhys Kidd <rhyskidd@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2015-03-02i965: Rename some PIPE_CONTROL flagsBen Widawsky1-1/+1
I'm not really sure of the origins of the existing flag names. Modern docs have some slightly different names. Having the correct names makes it easier to determine if existing PIPE_CONTROL flag settings are correct, as well as making adding new PIPE_CONTROLs easier. This originally came up while I was trying to implement workarounds and spotted some things called, "flush" which should have been called "invalidate." Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-02-17i965: Do Sandybridge workaround flushes before each primitive.Kenneth Graunke1-4/+2
Sandybridge requires the post-sync non-zero workaround in a ton of places, and if you ever miss one, the GPU usually hangs. Currently, we try to track exactly when a workaround flush is necessary (via the brw->batch.need_workaround_flush flag). This is tricky to get right, and we've botched it several times in the past. This patch unconditionally performs the post-sync non-zero flush at the start of each primitive's state upload (including BLORP). We drop the needs_workaround_flush flag, and drop all the other callers, as the flush has already been performed. We have no data to indicate that simply flushing all the time will hurt performance, and it has the potential to help stability. v2: Add post-sync workaround to initial GPU state upload to be extra cautious (suggested by Chad Versace). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
2014-12-04i965: Store floating point mode choice in brw_stage_prog_data.Kenneth Graunke1-5/+1
We use IEEE mode for GLSL programs, but need to use ALT mode for ARB programs so that 0^0 == 1. The choice is based entirely on the shader source language. Previously, our code to determine which mode we wanted was duplicated in 8 different places (VS and FS for Gen4-5, Gen6, Gen7, and Gen8). The ctx->_Shader->CurrentProgram[stage] == NULL check was confusing as well - we use CurrentProgram (non-derived state), but _Shader (derived state). It also relies on knowing that ARB programs don't use gl_shader_program structures today. The compiler already makes this assumption in a few places, but I'd rather keep that assumption out of the state upload code. With this patch, we select the mode at compile time, and store that choice in prog_data. The state upload code simply uses that decision. This eliminates a BRW_NEW_*_PROGRAM dependency in the state upload code. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-12-02i965: Move BRW_NEW_*_PROG_DATA flags to .brw (not .cache).Kenneth Graunke1-4/+4
I put the BRW_NEW_*_PROG_DATA flags at the beginning so that brw_state_cache.c can still continue using 1 << brw_cache_id. I also added a comment explaining the difference between BRW_NEW_*_PROG_DATA and BRW_NEW_*_PROGRAM, as it took me a long time to remember it. Non-mechanical changes: - brw_state_cache.c and brw_ff_gs.c now signal .brw, not .cache. - brw_state_upload.c - INTEL_DEBUG=state changes. - brw_context.h - bit definition merging. v2: Correct the explanation of BRW_NEW_*_PROG_DATA to mention state-based recompiles, and nix the "proper subset" claim, as it's false. (Caught by Kristian Høgsberg). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-12-02i965: Rename CACHE_NEW_*_PROG to BRW_NEW_*_PROG_DATA.Kenneth Graunke1-3/+3
Now that we've moved a bunch of CACHE_NEW_* bits to BRW_NEW_*, the only ones that are left are legitimately related to the program cache. Yet, it seems a bit wasteful to have an entire bitfield for only 7 bits. State upload is one of the hottest paths in the driver. For each atom in the list, we call check_state() to see if it needs to be emitted. Currently, this involves comparing three separate bitfields (mesa, brw, and cache). Consolidating the brw and cache bitfields would save a small amount of CPU overhead per atom. Broadwell, for example, has 57 state atoms, so this small savings can add up. CACHE_NEW_*_PROG covers the brw_*_prog_data structures, as well as the offset into the program cache BO (prog_offset). Since most uses refer to brw_*_prog_data, I decided to use BRW_NEW_*_PROG_DATA as the name. Removing "cache" completely is a bit painful, so I decided to do it in several patches for easier review, and to separate mechanical changes from manual ones. This one simply renames things, and was made via: $ for file in *.[ch]; do sed -i -e 's/CACHE_NEW_\([A-Z_\*]*\)_PROG/BRW_NEW_\1_PROG_DATA/g' \ -e 's/BRW_NEW_WM_PROG_DATA/BRW_NEW_FS_PROG_DATA/g' $file done Note that BRW_NEW_*_PROG_DATA is still in .cache, not .brw! The next patch will remedy this flaw. It will also fix the alphabetization issues. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Acked-by: Matt Turner <mattst88@gmail.com>
2014-11-29i965: Alphabetize brw_tracked_state flags and use a consistent style.Kenneth Graunke1-9/+11
Most of the dirty flags were listed in some arbitrary order. Some used bonus parenthesis. Some put multiple flags on one line, others put one per line. Some used tabs instead of spaces...but only on some lines. This patch settles on one flag per line, in alphabetical order, using spaces instead of tabs, and sheds the unnecessary parentheses. Sorting was mostly done with vim's visual block feature and !sort, although I alphabetized short lists by hand; it was pretty manual. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-11-20i965: Skip _mesa_load_state_parameters when there are zero parameters.Kenneth Graunke1-6/+6
Saves a tiny bit of CPU overhead. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Acked-by: Eric Anholt <eric@anholt.net>
2014-09-03i965: Move curb_read_length/total_scratch to brw_stage_prog_data.Kenneth Graunke1-2/+2
All shader stages have these fields, so it makes sense to store them in the common base structure, rather than duplicating them in each. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2014-08-14i965: Store uniform constant values in a gl_constant_value instead of floatNeil Roberts1-3/+5
The brw_stage_prog_data struct previously contained an array of float pointers to the values of parameters. These were then copied into a batch buffer to upload the values using a regular assignment. However the float values were also being overloaded to store integer values for integer uniforms. This can break if x87 floating-point registers are used to do the assignment because the fst instruction tries to fix up invalid float values. If an integer constant happened to look like an invalid float value then it would get altered when it was copied into the batch buffer. This patch changes the pointers to be gl_constant_value instead so that the assignment should end up copying without any alteration. This also makes it more obvious that the values being stored here are overloaded for multiple types. There are some static asserts where the values are uploaded to ensure that the size of gl_constant_value is the same as a float. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81150 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-07-02i965: Update a ton of comments about constant buffers.Eric Anholt1-0/+14
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-07-02i965/gen6+: Merge VS/GS and WM push constant buffer upload paths.Eric Anholt1-20/+41
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-07-02i965: Move dispatch_grf_start_reg and first_curbe_grf into stage_prog_data.Eric Anholt1-1/+1
I wanted to access this value from stage-generic code, so stop storing it under two different names. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-07-02i965: Reuse libdrm's header for AUB definitions.Eric Anholt1-1/+1
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-06-26i965: Remove unneeded VS workaround stalls on Baytrail.Greg Hunt1-1/+1
According to the workarounds list, these stalls aren't needed on production Baytrail systems. Piglit confirms that as well. These cause a small slowdown when we are sending a large number of small batches to the GPU. Removing these improves performance by up to 5% on some CPU bound SynMark tests (Batch[4-7], DrvState1, HdrBloom, Multithread, ShMapPcf). Signed-off-by: Gregory Hunt <greg.hunt@mobica.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-05-02i965: Move push constant state packets to push constant update time.Eric Anholt1-1/+10
-0.553779% +/- 0.423394% effect on cairo-perf-trace runtime on glamor (n=612) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-05-02i965/gen6: Don't update unit state when samplers change.Eric Anholt1-1/+1
There's no remaining dependency between these two packets that I can find. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-03-25mesa/sso: rename Shader to the pointer _ShaderGregory Hainaut1-1/+1
Basically a sed but shaderapi.c and get.c. get.c => GL_CURRENT_PROGAM always refer to the "old" UseProgram behavior shaderapi.c => the old api stil update the Shader object directly V2: formatting improvement V3 (idr): * Rebase fixes after a block of code was moved from ir_to_mesa.cpp to shaderapi.c. * Trivial reformatting. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>
2014-02-22i965: Move the remaining driver debug over to stderr.Eric Anholt1-3/+3
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-02-19i965: Move up duplicated fields from stage-specific prog_data to ↵Francisco Jerez1-5/+5
brw_stage_prog_data. There doesn't seem to be any reason for nr_params, nr_pull_params, param, and pull_param to be duplicated in the stage-specific subclasses of brw_stage_prog_data. Moving their definition to the common base class will allow some code sharing in a future commit, the removal of brw_vec4_prog_data_compare and brw_*_prog_data_free, and the simplification of the stage-specific brw_*_prog_data_compare. Reviewed-by: Paul Berry <stereotype441@gmail.com>
2014-01-21mesa: Replace ctx->Shader.Current{Vertex,Fragment,Geometry}Program with an ↵Paul Berry1-1/+1
array. These are replaced with ctx->Shader.CurrentProgram[MESA_SHADER_{VERTEX,FRAGMENT,GEOMETRY}]. In patches to follow, this will allow us to replace a lot of ad-hoc logic with a variable index into the array. With the exception of the changes to mtypes.h, this patch was generated entirely by the command: find src -type f '(' -iname '*.c' -o -iname '*.cpp' ')' \ -print0 | xargs -0 sed -i \ -e 's/\.CurrentVertexProgram/.CurrentProgram[MESA_SHADER_VERTEX]/g' \ -e 's/\.CurrentGeometryProgram/.CurrentProgram[MESA_SHADER_GEOMETRY]/g' \ -e 's/\.CurrentFragmentProgram/.CurrentProgram[MESA_SHADER_FRAGMENT]/g' Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Brian Paul <brianp@vmware.com>
2014-01-20i965: Create a helper function for emitting PIPE_CONTROL flushes.Kenneth Graunke1-9/+4
These days, we need to emit PIPE_CONTROL flushes all over the place. Being able to do that via a single function call seems convenient. Broadwell will also increase the length of these packets by 1; with the refactoring, we should have to do this in substantially fewer places. v2: Add back forgotten intel_emit_post_sync_nonzero_flush (caught by Eric Anholt). Drop unlikely() from BLT_RING check. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>
2013-11-05i965: Tell the unit states how many binding table entries we have.Eric Anholt1-1/+3
Before the series with 3c9dc2d31b80fc73bffa1f40a91443a53229c8e2 to dynamically assign our binding table indices, we didn't really track our binding table count per shader, so we never filled in these fields. Affects cairo-gl trace runtime by -2.47953% +/- 1.07281% (n=20) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-08-31i965/vs: generalize gen6_vs_push_constants in preparation for GS.Paul Berry1-16/+29
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-08-31i965: Make sure constants re-sent after constant buffer reallocation.Paul Berry1-1/+2
The hardware requires that after constant buffers for a stage are allocated using a 3DSTATE_PUSH_CONSTANT_ALLOC_{VS,HS,DS,GS,PS} command, and prior to execution of a 3DPRIMITIVE, the corresponding stage's constant buffers must be reprogrammed using a 3DSTATE_CONSTANT_{VS,HS,DS,GS,PS} command. Previously we didn't need to worry about this, because we only programmed 3DSTATE_PUSH_CONSTANT_ALLOC_{VS,HS,DS,GS,PS} once on startup (or, previous to that, whenever BRW_NEW_CONTEXT was flagged). But now that we reallocate the constant buffers whenever geometry shaders are switched on and off, we need to make sure the constant buffers are reprogrammed. We do this by adding a new bit, BRW_NEW_PUSH_CONSTANT_ALLOCATION, to brw->state.dirty.brw. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-08-31i965: Move data from brw->vs into a base class if gs will also need it.Paul Berry1-10/+13
This paves the way for sharing the code that will set up the vertex and geometry shader pipeline state. v2: Rename the base class to brw_stage_state. Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
2013-08-23i965/vec4: Allow for dispatch_grf_start_reg to vary.Paul Berry1-1/+2
Both 3DSTATE_VS and 3DSTATE_GS have a dispatch_grf_start_reg control, which determines the register where the hardware delivers data sourced from the URB (push constants followed by per-vertex input data). For vertex shaders, we always set dispatch_grf_start_reg to 1, since R1 is always the first register available for push constants in vertex shaders. For geometry shaders, we'll need the flexibility to set dispatch_grf_start_reg to different values depending on the behvaiour of the geometry shader; if it accesses gl_PrimitiveIDIn, we'll need to set it to 2 to allow the primitive ID to be delivered to the thread in R1. This patch eliminates the assumption that dispatch_grf_start_reg is always 1. In vec4_visitor, we record the regnum that was passed to vec4_visitor::setup_uniforms() in prog_data for later use. In vec4_generator, we consult this value when converting an abstract UNIFORM register to a concrete hardware register. And in the code that emits 3DSTATE_VS, we set dispatch_grf_start_reg based on the value recorded in prog_data. This will allow us to set dispatch_grf_start_reg to the appropriate value when compiling geometry shaders. Vertex shaders will continue to always use a dispatch_grf_start_reg of 1. v2: Make dispatch_grf_start_reg "unsigned" rather than "GLuint". Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
2013-08-19i965: Split sampler count variable to be per-stage.Kenneth Graunke1-1/+1
Currently, we only have a single sampler state table shared among all stages, so we just copy wm.sampler_count into vs.sampler_count. In the future, each shader stage will have its own SAMPLER_STATE table, at which point we'll need these separate sampler counts. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-07-15i965: Update workaround flush comments for Gen6 3DSTATE_VS.Kenneth Graunke1-1/+5
Unfortunately, the workaround text never made it into the Sandybridge PRM, so we still have to refer to the BSpec. It also wasn't obvious why we needed this workaround at all, since we don't currently do VS passthrough - but BLORP can turn off the VS. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2013-07-09i965: Delete intel_context entirely.Kenneth Graunke1-3/+2
This makes brw_context inherit directly from gl_context; that was the only thing left in intel_context. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
2013-07-09i965: Shorten context base class dereference chains.Kenneth Graunke1-2/+2
ctx->DrawBuffer is much more sensible than brw->intel.ctx.DrawBuffer. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
2013-07-09i965: Pass brw_context to functions rather than intel_context.Kenneth Graunke1-2/+2
This makes brw_context available in every function that used intel_context. This makes it possible to start migrating fields from intel_context to brw_context. Surprisingly, this actually removes some code, as functions that use OUT_BATCH don't need to declare "intel"; they just use "brw." Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
2013-04-11i965/vs: split brw_vs_prog_data into generic and VS-specific parts.Paul Berry1-8/+8
This will allow the generic parts to be re-used for geometry shaders. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> v2: Put urb_read_length and urb_entry_size in the generic struct. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-02-11i965: Simplify VS push constant upload code since removal of old path.Eric Anholt1-7/+11
We used to have clip planes optionally included in the push constants, resulting in a variable amount of data uploaded, but no more. This also means less wasted space in the batch for our push constants. v2: Update _NEW_TRANSFORM state bit information. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
2012-11-01i965/vs: Remove support for the old parameter layout.Kenneth Graunke1-38/+4
Only the old backend used it. Reviewed-by: Eric Anholt <eric@anholt.net>
2012-10-25i965/vs: Fix debug dumping of VS push constants.Kenneth Graunke1-1/+3
While copying the values into the batch space, we advance the param pointer. The debug code then tries to iterate over all the uploaded values, starting at param...which is now the end of the uploaded data, rather than the start. This patch saves a pointer to the start of push constant space before it gets altered and switches the debug code to use that. Tested by uncommenting the code and examining the output of glsl-vs-clamp-1.shader_test. Previously all values appeared to be zero. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
2012-10-15intel: Remove NV_vertex_program support.Eric Anholt1-3/+0
We were holding on to this code because we were aware that NWN 1 had some support for vertex programs -- no other linux programs I've come across would use it (since other software also has ARB_vp or GLSL support). Only, it turns out that NWN doesn't even give us any vertex programs. Given that we have known issues where the extension has never been fully supported, just give up on it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=46795 Reviewed-by: Brian Paul <brianp@vmware.com>
2012-08-08intel: Make the length for PIPE_CONTROL explicit.Kenneth Graunke1-1/+1
PIPE_CONTROL has variable length, depending upon generation and whether we want to do 32-bit or 64-bit data writes. Make it explicit, rather than hiding a length of 4 in the #define for _3DSTATE_PIPE_CONTROL. Generated by s/3DSTATE_PIPE_CONTROL/3DSTATE_PIPE_CONTROL | (4 - 2)/g. This is equivalent since the #define used to have | 2 in it. A grep through the sources shows that all instances have been converted, so it's safe to remove the | 2 from the #define. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Eric Anholt <eric@anholt.net>
2012-02-07i965: Rewrite the HiZ opChad Versace1-0/+9
The HiZ op was implemented as a meta-op. This patch reimplements it by emitting a special HiZ batch. This fixes several known bugs, and likely a lot of undiscovered ones too. ==== Why the HiZ meta-op needed to die ==== The HiZ op was implemented as a meta-op, which caused lots of trouble. All other meta-ops occur as a result of some GL call (for example, glClear and glGenerateMipmap), but the HiZ meta-op was special. It was called in places that Mesa (in particular, the vbo and swrast modules) did not expect---and were not prepared for---state changes to occur (for example: glDraw; glCallList; within glBegin/End blocks; and within swrast_prepare_render as a result of intel_miptree_map). In an attempt to work around these unexpected state changes, I added two hooks in i965: - A hook for glDraw, located in brw_predraw_resolve_buffers (which is called in the glDraw path). This hook detected if a predraw resolve meta-op had occurred, and would hackishly repropagate some GL state if necessary. This ensured that the meta-op state changes would not intefere with the vbo module's subsequent execution of glDraw. - A hook for glBegin, implemented by brwPrepareExecBegin. This hook resolved all buffers before entering a glBegin/End block, thus preventing an infinitely recurring call to vbo_exec_FlushVertices. The vbo module calls vbo_exec_FlushVertices to flush its vertex queue in response to GL state changes. Unfortunately, these hooks were not sufficient. The meta-op state changes still interacted badly with glPopAttrib (as discovered in bug 44927) and with swrast rendering (as discovered by debugging gen6's swrast fallback for glBitmap). I expect there are more undiscovered bugs. Rather than play whack-a-mole in a minefield, the sane approach is to replace the HiZ meta-op with something safer. ==== How it was killed ==== This patch consists of several logical components: 1. Rewrite the HiZ op by replacing function gen6_resolve_slice with gen6_hiz_exec and gen7_hiz_exec. The new functions do not call a meta-op, but instead manually construct and emit a batch to "draw" the HiZ op's rectangle primitive. The new functions alter no GL state. 2. Add fields to brw_context::hiz for the new HiZ op. 3. Emit a workaround flush when toggling 3DSTATE_VS.VsFunctionEnable. 4. Kill all dead HiZ code: - the function gen6_resolve_slice - the dirty flag BRW_NEW_HIZ - the dead fields in brw_context::hiz - the state packet manipulation triggered by the now removed brw_context::hiz::op - the meta-op workaround in brw_predraw_resolve_buffers (discussed above) - the meta-op workaround brwPrepareExecBegin (discussed above) Note: This is a candidate for the 8.0 branch. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Paul Berry <stereotype441@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43327 Reported-by: xunx.fang@intel.com Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44927 Reported-by: chao.a.chen@intel.com Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2012-01-09i965: Remove BRW_NEW_URB_FENCE dirty bit from Gen6+ atoms.Kenneth Graunke1-2/+1
The BRW_NEW_URB_FENCE dirty bit is only flagged by the brw_recalculate_urb_fence state atom which isn't used on Gen6+. Since it's never flagged, there's no reason to depend on it. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
2011-11-10i965: Put a proper sampler count in 3DSTATE_VS.Kenneth Graunke1-2/+3
See similar code for 3DSTATE_WM. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
2011-11-10i965: Rename gen6_*_constants tracked state atoms to "push_constants".Kenneth Graunke1-1/+1
When reading the "brw_wm_constants" and "gen6_wm_constants" atoms side-by-side, I initially failed to notice the crucial difference: the Gen6 atoms are for Push Constants, while brw_wm_constants handles Pull Constants. (Gen4/5 Push Constants are handled by "brw_curbe.") Renaming these should clarify the code and save me from constant confusion over the fact that "gen6_wm_constants" isn't just a newer version of "brw_wm_constants." Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Paul Berry <stereotype441@gmail.com>
2011-11-10i965: Use 0 for the number of binding table entries in 3DSTATE_(VS|WM).Kenneth Graunke1-5/+2
These fields control how many entries the hardware prefetches into the state cache, so they only impact performance, not correctness. However, it's not clear how to use this in a way that's beneficial. According to the documentation, kernels "using a large number" of entries may wish to program this to zero to avoid thrashing the cache; it's unclear how many is too many. Also, Ironlake's WM was missing this feature entirely---the count had to be zero. The dirty bit tracking to handle this complicates the surface state and binding table setup; removing it should simplify things and make future refactoring easier. So just set 0 for the number of entries rather than trying to compute and track it. Appears to have no impact on Nexuiz and OpenArena on Sandybridge. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Paul Berry <stereotype441@gmail.com>
2011-10-31i965/gen6+: Switch GLSL from ALT to IEEE floating point modePaul Berry1-1/+8
i965 graphics hardware has two floating point modes: ALT and IEEE. In ALT mode, floating-point operations never generate infinities or NaNs, and MOV instructions translate infinities and NaNs to finite values. In IEEE mode, infinities and NaNs behave as specified in the IEEE 754 spec. Previously, we used ALT mode for all vertex and fragment programs, whether they were GLSL programs or ARB programs. The GLSL spec is sufficiently vague about how infs and nans are to be handled that it was unclear whether this mode was compliant with the GLSL 1.30 spec or not, and it made it very difficult to test the isinf() and isnan() functions. This patch changes i965 GLSL programs to use IEEE floating-point mode, which is clearly compliant with GLSL 1.30's inf/nan requirements. In addition to making the Piglit isinf and isnan tests pass, this paves the way for future support of the ARB_shader_precision extension. Unfortunately we still have to use ALT floating-point mode when executing ARB programs, because those programs require 0^0 == 1, and i965 hardware generates 0^0 == NaN in IEEE mode. Fixes piglit tests "isinf-and-isnan fs_fbo", "isinf-and-isnan vs_fbo", and {fs,vs}-{isinf,isnan}-{vec2,vec3,vec4}.
2011-10-29i965: Move push constants setup to emit() time.Eric Anholt1-3/+3
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Paul Berry <stereotype441@gmail.com>
2011-10-25i965: Rename (vs|wm)_max_threads to max_(vs|wm)_threads for consistency.Kenneth Graunke1-1/+1
The inconsistency between vs_max_threads and max_vs_entries was rather annoying. I could never seem to remember which one was reversed, which made it harder to find quickly. "Max __ Threads" seems more natural. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
2011-10-24i965: Apply post-sync non-zero workaround to homebrew workaround.Kenneth Graunke1-0/+2
In commit 3e5d3626, Eric added a homebrew workaround to fix GPU hangs in the Mesa "engine" demo and oglc's api-texcoord test. Unfortunately, his PIPE_CONTROL contains a Depth Stall, which necessitates the post-sync non-zero workaround, Fixes GPU hangs in Civilization 4, PlaneShift, and 3DMMES. Hopefully Heroes of Newerth as well, though I haven't tested that. NOTE: This is candidate for the 7.11 branch. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40324 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41096 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-and-tested-by: Eric Anholt <eric@anholt.net>
2011-10-13i965 Gen6+: De-compact clip plane constants for old VS backend.Paul Berry1-8/+7
In commit 018ea68d8780ab5baeef0b8122b8410e5e55ae6d, when I de-compacted clip planes on Gen6+, I updated both the old and new VS back-ends to reflect the change in how clip planes are stored, but I failed to change the code in gen6_vs_state.c that uploads clip plane constants when using the old VS back-end. As a result, if the set of enabled clip planes wasn't contiguous starting with 0, then clipping would not occur properly. This patch corrects gen6_vs_state.c to upload clip plane constants in the new de-compacted form. This only affects the old VS back-end (which is used for fixed-function and ARB vertex programs, not for GLSL vertex shaders). Fixes Piglit test fixed-clip-enables. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41603 Reviewed-by: Eric Anholt <eric@anholt.net>
2011-10-05i965 Gen6: Implement gl_ClipVertex.Paul Berry1-1/+2
This patch implements proper support for gl_ClipVertex by causing the new VS backend to populate the clip distance VUE slots using VERT_RESULT_CLIP_VERTEX when appropriate, and by using the untransformed clip planes in ctx->Transform.EyeUserPlane rather than the transformed clip planes in ctx->Transform._ClipUserPlane when a GLSL-based vertex shader is in use. When not using a GLSL-based vertex shader, we use ctx->Transform._ClipUserPlane (which is what we used prior to this patch). This ensures that clipping is still performed correctly for fixed function and ARB vertex programs. A new function, brw_select_clip_planes() is used to determine whether to use _ClipUserPlane or EyeUserPlane, so that the logic for making this decision is shared between the new and old vertex shaders. Fixes the following Piglit tests on i965 Gen6: - vs-clip-vertex-const-accept - vs-clip-vertex-const-reject - vs-clip-vertex-different-from-position - vs-clip-vertex-equal-to-position - vs-clip-vertex-homogeneity - vs-clip-based-on-position - vs-clip-based-on-position-homogeneity - clip-plane-transformation clipvert_pos - clip-plane-transformation pos_clipvert - clip-plane-transformation pos Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chad Versace <chad@chad-versace.us>
2011-09-28i965 new VS: don't share clip plane constants in pre-GEN6Paul Berry1-18/+19
In pre-GEN6, when using clip planes, both the vertex shader and the clipper need access to the client-supplied clip planes, since the vertex shader needs them to set the clip flags, and the clipper needs them to determine where to insert new vertices. With the old VS backend, we used a clever optimization to avoid placing duplicate copies of these planes in the CURBE: we used the same block of memory for both the clipper and vertex shader constants, with the clip planes at the front of it, and then we instructed the clipper to read just the initial part of this block containing the clip planes. This optimization was tricky, of dubious value, and not completely working in the new VS backend, so I've removed it. Now, when using the new VS backend, separate parts of the CURBE are used for the clipper and the vertex shader. Note that this doesn't affect the number of push constants available to the vertex shader, it simply causes the CURBE to occupy a few more bytes of URB memory. The old VS backend is unaffected. GEN6+, which does clipping entirely in hardware, is also unaffected. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>