summaryrefslogtreecommitdiff
path: root/src/mesa/drivers/dri/i965/brw_state_upload.c
AgeCommit message (Collapse)AuthorFilesLines
2017-03-29mesa: don't use _NEW_TEXTURE mainly in mesa/mainMarek Olšák1-1/+2
v2: add missing %s Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-13i965: add missing brw_defines.h include in brw_program.cEmil Velikov1-0/+1
File is using MI_LOAD_REGISTER_IMM, GEN7_CACHE_MODE_1 and others as defined in the header. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-06i965: Delete vestiges of resource streamer code.Kenneth Graunke1-4/+0
We never actually used the resource streamer in any shipping build of Mesa. We have no plans to do so in the future. We looked into using it in Vulkan, and concluded that it was unusable. We're not the only ones to arrive at the conclusion that it's not worth using. So, drop the last vestiges of resource streamer support and move on. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-01i965: Reduce cross-pollination between the DRI driver and compilerJason Ekstrand1-0/+1
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-02-06i965: Combine the Gen6 SF and Clip viewport atoms.Kenneth Graunke1-2/+1
The next patch will make the guardband calculation dependent on the transformation matrix. Instead of computing it in both atoms, just combine them into a single atom. Cc: "17.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-12-13main: use new driver flag for conservative rasterization stateLionel Landwerlin1-0/+2
Suggested by Marek. v2: Use new driver flag (Marek) v3: Fix i965 comments (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-31i965: Move gen8_disable_stages to brw_upload_initial_gpu_stateNanley Chery1-1/+13
3DSTATE_WM_CHROMAKEY isn't programmed anywhere else. 3DSTATE_WM_HZ_OP is programmed, then cleared by blorp during a HZ op, so repeatedly clearing it after every blorp execution is redundant. Signed-off-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-31i965: Program 3DSTATE_AA_LINE_PARAMETERS in upload_invariant_stateNanley Chery1-4/+0
This packet is non-pipelined and doesn't ever change across emissions. Signed-off-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-26i965: remove unused BRW_STATE_INTERPOLATION_MAP flagTimothy Arceri1-1/+0
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26i965: rewrite brw_setup_vue_interpolation()Timothy Arceri1-3/+2
Here brw_setup_vue_interpolation() is rewritten not to use the InterpQualifier array in gl_fragment_program which will allow us to remove it. This change also makes the code which is only used by gen4/5 more self contained as it now has its own gen5_fragment_program struct rather than storing the map in brw_context. This means the interpolation map will only get processed once and will get stored in the in memory cache rather than being processed everytime the fs changes. Also by calling this from the fs compile code rather than from the upload code and using the interpolation assigned there we can get rid of the BRW_NEW_INTERPOLATION_MAP flag. It might not seem ideal to add a gen5_fragment_program struct however by the end of this series we will have gotten rid of all the brw_{shader_stage}_program structs and replaced them with a generic brw_program struct so there will only be two program structs which is better than what we have now. V2: Don't remove BRW_NEW_INTERPOLATION_MAP from dirty_bit_map until the following patch to fix build error. V3 - Suggestions by Jason: - name struct gen4_fragment_program rather than gen5_fragment_program - don't use enum with memset() - create interp mode set helper and simplify logic to call it - add assert when calling function to show prog will never be NULL for gen4/5 i.e. no Vulkan Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-05i965: Eliminate brw->tes.prog_data pointer.Kenneth Graunke1-1/+0
Just say no to: - brw->tes.base.prog_data = &brw->tes.prog_data->base.base; We'll just use the brw_stage_prog_data pointer in brw_stage_state and downcast it to brw_tes_prog_data as needed. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
2016-10-05i965: Eliminate brw->tcs.prog_data pointer.Kenneth Graunke1-1/+0
Just say no to: - brw->tcs.base.prog_data = &brw->tcs.prog_data->base.base; We'll just use the brw_stage_prog_data pointer in brw_stage_state and downcast it to brw_tcs_prog_data as needed. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
2016-10-05i965: Eliminate brw->vs.prog_data pointer.Kenneth Graunke1-3/+6
Just say no to: - brw->vs.base.prog_data = &brw->vs.prog_data->base.base; We'll just use the brw_stage_prog_data pointer in brw_stage_state and downcast it to brw_vs_prog_data as needed. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
2016-10-03i965: Only emit 1 viewport when possible.Kenneth Graunke1-0/+11
In core profile, we support up to 16 viewports. However, in the majority of cases, only 1 of them is actually used - we only need the others if the last shader stage prior to the rasterizer writes gl_ViewportIndex. Processing all 16 viewports adds additional CPU overhead, which hurts CPU-intensive workloads such as Glamor. This meant that switching to core profile actually penalized Glamor to an extent, which is unfortunate. This patch tracks the number of relevant viewports, switching between 1 and ctx->Const.MaxViewports if gl_ViewportIndex is written. A new BRW_NEW_VIEWPORT_COUNT flag tracks this. This could mean re-emitting viewport state when switching, but hopefully this is offset by doing 1/16th of the work in the common case. The new flag is also lighter weight than BRW_NEW_VUE_MAP_GEOM_OUT, which we were using in one case. According to Eric Anholt, x11perf -copypixwin10 performance improves by 11.5094% +/- 3.10841% (n=10) on his Skylake. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-09-27i965: create populate key functions for tcs and tesTimothy Arceri1-16/+2
These will be used by the on disk shader cache. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-31i965: Merge gen7_clip_state atom into gen6_clip_state atom.Kenneth Graunke1-1/+1
The original motivation was that gen6_clip_state ignored _NEW_POLYGON as it didn't care about early culling. The only other change was that Gen6 ignored BRW_NEW_TES_PROG_DATA as it doesn't have tessellation shaders, but listening to this is harmless as it'll never be signalled. Now that we've added _NEW_POLYGON for is_drawing_lines/points, we can merge the two as the distinction is meaningless. This actually fixes a bug, though: Gen8+ was using the gen6_clip_state atom because it doesn't care about early culling, but it also needs BRW_NEW_TES_PROG_DATA, which was missing. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-08-25i965: Upload surface state for non-coherent framebuffer fetch.Francisco Jerez1-0/+4
This iterates over the list of attached render buffers and binds appropriate surface state structures to the binding table block allocated for shader framebuffer read. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-14i965: fix compiler warnings for 32bit buildTimothy Arceri1-1/+1
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-06-23i965: Combine 3DSTATE_STREAMOUT emitters and genX_sol_state atoms.Kenneth Graunke1-1/+1
They're basically the same. Let's avoid the code duplication. v2: Fix SO_BUFFER_ENABLE stuff to only happen on Gen < 8 (caught by Jason Ekstrand). Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-16i965: Send the minimal number of STATE_BASE_ADDRESS packets.Kenneth Graunke1-12/+2
STATE_BASE_ADDRESS stalls the whole pipeline, and the documentation cautions us to emit it as little as possible for better performance. We recently put some hacks in BLORP to try and avoid emitting it if it was already set correctly. However, this wasn't quite minimal: if BLORP is the first operation (i.e. glClear()), then it would emit it, and subsequent draw calls would emit it again. This caused a small drop in performance in GPUTest Triangle when switching from Meta to BLORP. Unlike most packets, STATE_BASE_ADDRESS isn't influenced by GL state: it needs to be emitted once per batch, before most other commands, or whenever we change the program cache BO. It's also valid in both the 3D and compute pipelines, which makes it even more unique. This patch removes it from the atom mechanism and instead directly calls it as part of every draw, compute dispatch, or BLORP operation. We introduce a new flag indicating that STATE_BASE_ADDRESS has already been emitted this batch, and if so, skip doing it again. When we make a new program cache BO, we simply reset the flag, so the next operation will emit it again. When we flush/reset the batch, we reset the flag. This guarantees that we'll emit STATE_BASE_ADDRESS only when we have to. It's also less code than the old atom mechanism. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-16i965: Combine Gen4-7 and Gen8+ state base address emitters.Kenneth Graunke1-2/+2
We're about to start calling it directly, and this means the callers won't have to think about generations. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-16i965: Move Gen4-5 programs to brw_upload_programs() too.Kenneth Graunke1-5/+6
This way all the programs are in one place again, and it also should make some future STATE_BASE_ADDRESS related changes possible. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-23i965: Introduce state flag for blorpTopi Pohjolainen1-0/+1
In the past, BLORP has clobbered all BRW_NEW_* state flags, to trigger re-emission of the entire 3D pipeline on the next draw. However, there are some packets BLORP simply leaves alone, so there's no need to re-emit them. Trying to reduce the set of dirty bits flagged after BLORP runs is tricky. Instead, we introduce a BRW_NEW_BLORP flag. This should be set on any atom which emits a packet that BLORP also emits. When BLORP runs, it will flag BRW_NEW_BLORP, causing those packets to get re-emitted. This also makes it easy to avoid re-emitting specific atoms - we can simply drop the BRW_NEW_BLORP flag on those. To start, we assume that all packets need to be re-emitted. This is the safest approach and closest to the existing code's behavior. Many of these are obviously not required, and can be dropped in subsequent patches. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-11i965: Split brw_upload_texture_surfaces into compute/render atoms.Kenneth Graunke1-2/+2
When uploading state for the compute pipeline, we don't want to look at VS/TCS/TES/GS/FS programs, as they might be stale, and aren't relevant anyway. Likewise, the render pipeline shouldn't look at CS. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93790 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-19i965: Implement compute sampler state atom.Francisco Jerez1-0/+2
Fixes a number of GLES31 CTS failures and hangs on various hardware: ES31-CTS.texture_gather.plain-gather-depth-2d ES31-CTS.texture_gather.plain-gather-depth-2darray ES31-CTS.texture_gather.plain-gather-depth-cube ES31-CTS.texture_gather.offset-gather-depth-2d ES31-CTS.texture_gather.offset-gather-depth-2darray ES31-CTS.layout_binding.sampler2D_layout_binding_texture_ComputeShader ES31-CTS.layout_binding.sampler2DArray_layout_binding_texture_ComputeShader ES31-CTS.explicit_uniform_location.uniform-loc-types-samplers ES31-CTS.compute_shader.resources-texture Some of them were actually passing by luck on some generations even though we weren't uploading sampler state tables explicitly for the compute stage, most likely because they relied on the cached sampler state left from previous rendering to be close enough. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92589 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93312 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93325 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93407 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93725 Reported-by: Marta Lofstedt <marta.lofstedt@intel.com> Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-14i965: Add state bit to trigger re-emission of color calculator state.Francisco Jerez1-0/+1
This will be used on Gen8+ to make sure that the color calculator state pointers are re-emitted when switching back to the 3D pipeline after some GPGPU workload due to a hardware workaround. There are other state bits already defined that could be used to achieve the same effect but they all cause a ton of unrelated state to be re-emitted (e.g. BRW_NEW_STATE_BASE_ADDRESS), so just define a new one, state bits are cheap. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-12-28i965: Add the TCS/TES state upload atoms to the gen7_atoms list.Kenneth Graunke1-0/+14
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-22i965: Handle mix-and-match TCS/TES with separate shader objects.Kenneth Graunke1-9/+29
GL_ARB_separate_shader_objects allows the application to mix-and-match TCS and TES programs separately. This means that the interface between the two stages isn't known until the final SSO pipeline is in place. This isn't a great match for our hardware: the TCS and TES have to agree on the Patch URB entry layout. Since we store data as per-patch slots followed by per-vertex slots, changing the number of per-patch slots can significantly alter the layout. This can easily happen with SSO. To handle this, we store the [Patch]OutputsWritten and [Patch]InputsRead bitfields in the TCS/TES program keys, introducing program recompiles. brw_upload_programs() decides the layout for both TCS and TES, and passes it to brw_upload_tcs/tes(), which store it in the key. When creating the NIR for a shader specialization, we override nir->info.inputs_read (and friends) to the program key's values. Since everything uses those, no further compiler changes are needed. This also replaces the hack in brw_create_nir(). To avoid recompiles, brw_precompile_tes() looks to see if there's a TCS in the linked shader. If so, it accounts for the TCS outputs, just as brw_upload_programs() would. This eliminates all recompiles in the non-SSO case. In the SSO case, there should only be recompiles when using a TCS and TES that have different input/output interfaces. Fixes Piglit's mix-and-match-tcs-tes test. v2: Pull the brw_upload_programs code into a brw_upload_tess_programs() helper function (requested by Jordan Justen). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-22i965: Upload HS push constants whenever default tess. levels change.Kenneth Graunke1-0/+2
When using tessellation on OpenGL without a TCS, default values for gl_TessLevelOuter/gl_TessLevelInner are provided via the API. Core Mesa will flag ctx->DriverFlags.NewDefaultTessLevels whenever those values change. We add a corresponding BRW_NEW_DEFAULT_TESS_LEVELS flag and hook it up to HS push constants (which will be used to upload these default values to the autogenerated TCS). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-22i965: Consolidate BRW_NEW_TESS_{CTRL,EVAL}_PROGRAM flags.Kenneth Graunke1-4/+3
For several reasons, I don't think it's particularly useful to have separate flags: 1. Most of the time, tessellation shaders are paired, so both will be replaced at the same time. 2. The data layout is tightly coupled. Both need to agree on the number of per-patch slots in the VUE map. Even adding extra TCS outputs that aren't read by the TES will trigger the need for recompiles. 3. The TCS is optional from an API perspective, but required by the hardware whenever tessellation is enabled. So, atoms that deal with the TCS must check brw->tess_eval_program (BRW_NEW_TESS_EVAL_PROGRAM?) rather than brw->tess_ctrl_program to tell whether tessellation is enabled. So, not only is it unlikely to be useful, it's a bit confusing to get right. Simply using one flag for both simplifies this. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-22i965: Only call brw_upload_tcs/tes_prog when using tessellation.Kenneth Graunke1-2/+9
If there's no evaluation shader, tessellation is disabled. The upload functions would just bail. Instead, don't bother calling them. This will simplify the optional-TCS case a bit, as brw_upload_tcs can assume that we're doing tessellation. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-22i965: Add tessellation control shaders.Kenneth Graunke1-0/+1
The TCS is the first tessellation shader stage, and the most complicated. It has access to each of the control points in the input patch, and computes a new output patch. There is one logical invocation per output control point; all invocations run in parallel, and can communicate by reading and writing output variables. One of the main responsibilities of the TCS is to write the special gl_TessLevelOuter[] and gl_TessLevelInner[] output variables which control how much new geometry the hardware tessellation engine will produce. Otherwise, it simply writes outputs that are passed along to the TES. We run in SIMD4x2 mode, handling two logical invocations per EU thread. The hardware doesn't properly manage the dispatch mask for us; it always initializes it to 0xFF. We wrap the whole program in an IF..ENDIF block to handle an odd number of invocations, essentially falling back to SIMD4x1 on the last thread. v2: Update comments (requested by Jordan Justen). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-22i965: Add tessellation evaluation shadersKenneth Graunke1-0/+3
The TES is essentially a post-tessellator VS, which has access to the entire TCS output patch, and a special gl_TessCoord input. Otherwise, they're very straightforward. This patch implements SIMD8 tessellation evaluation shaders for Gen8+. The tessellator can generate a lot of geometry, so operating in SIMD8 mode (8 vertices per thread) is more efficient than SIMD4x2 mode (only 2 vertices per thread). I have another patch which implements SIMD4x2 mode for older hardware (or via an environment variable override). We currently handle all inputs via the pull model. v2: Improve comments (suggested by Jordan Justen). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-11i965: Add tessellation shader push constant support.Kenneth Graunke1-0/+2
Based on a patch by Chris Forbes. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-11i965: Add tessellation shader sampler support.Kenneth Graunke1-0/+2
Based on code by Chris Forbes and Fabian Bieler. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-11i965: Add tessellation shader surface support.Kenneth Graunke1-0/+12
This is brw_gs_surface_state.c copy and pasted twice with search and replace. brw_binding_table.c code is similarly copy and pasted. v2: Drop dword_pitch related fields. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-12-09i965: Hook up L3 partitioning state atom.Francisco Jerez1-0/+4
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09i965: Define and use REG_MASK macro to make masked MMIO writes slightly more ↵Francisco Jerez1-1/+1
readable. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09i965: Define state flag to signal that the URB size has been altered.Francisco Jerez1-0/+1
This will make sure that we recalculate the URB layout anytime the URB size is modified by the L3 partitioning code. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-07i965: Add state bits for tess stagesChris Forbes1-0/+7
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-12-07i965: Add backend structures for tess stagesChris Forbes1-0/+8
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-12-07i965: Create new files for HS/DS/TE state upload code.Kenneth Graunke1-1/+6
For now, this just splits the existing code to disable these stages into separate atoms/files. We can then replace it with real code. v2: Bump the render atoms in this patch so it compiles (in my branch, I'd bumped it in an earlier patch). 61 seems to be the minimum that works, which doesn't match the old value + the number of atoms I added in this patch, so apparently we had some slop before. v3: Actually disable the DS unit on Gen8+. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> [v1] Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-11i965: Combine BRW_NEW_*_BINDING_TABLE dirty bits.Kenneth Graunke1-3/+1
A while back, we moved to directly emitting the Gen7+ state when constructing the binding tables. These flags are only used on Gen4-6, which emit all the binding table pointers at once. We gain nothing by having separate flags, so combine them. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-11-01i965: Setup pull constant state for compute programsJordan Justen1-0/+2
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-09-30i965/cs: Upload UBO/SSBO surfacesJordan Justen1-0/+2
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-09-29i965/cs: Setup surface binding for gl_NumWorkGroupsJordan Justen1-0/+3
This will only be setup when the prog_data uses_num_work_groups boolean is set. At this point nothing will set uses_num_work_groups, but soon code will set it when emitting code for the intrinsic that loads gl_NumWorkGroups. We can't emit this surface information earlier at the start of the DispatchCompute* call because we may not have generated the program yet. Until we generate the program, we don't know if the gl_NumWorkGroups variable is accessed. We also can't emit the surface as part of the brw_cs_state atom, because we might not need the surface if gl_NumWorkGroups is not used by the program. Lastly, we cannot emit the surface later (after state upload) in the DispatchCompute* call, because it needs to be run before the brw_cs_state atom is emitted, since it changes the surface state. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-26i965: Simplify handling of VUE map changes.Kenneth Graunke1-1/+15
The old code was disasterously complex - spread across multiple atoms which may not even run, inspecting the dirty bits to try and decide whether it was necessary to do checks...storing VS information in brw_context...extra flagging... This code tripped me and Carl up very badly when working on the shader cache code. It's very fragile and hard to maintain. Now that geometry shaders only depend on their inputs and don't have to worry about the VS VUE map, we can dramatically simplify this: just compute the VUE map coming out of the geometry shader stage in brw_upload_programs. If it changes, flag it. Done. v2: Also check vue_map.separable. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-25i965: Implement DriverFlags.NewShaderStorageBufferIago Toral Quiroga1-0/+1
We use the same dirty state for SSBOs and UBOs because they share the same infrastructure. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-10i965/cs: Emit texture surfaces to enable CS samplingJordan Justen1-0/+2
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-10i965: Resolve GCC sign-compare warning.Rhys Kidd1-1/+1
mesa/src/mesa/drivers/dri/i965/brw_eu_compact.c: In function 'set_3src_control_index': mesa/src/mesa/drivers/dri/i965/brw_eu_compact.c:805:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] for (int i = 0; i < ARRAY_SIZE(gen8_3src_control_index_table); i++) { ^ mesa/src/mesa/drivers/dri/i965/brw_eu_compact.c: In function 'set_3src_source_index': mesa/src/mesa/drivers/dri/i965/brw_eu_compact.c:839:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] for (int i = 0; i < ARRAY_SIZE(gen8_3src_source_index_table); i++) { ^ mesa/src/mesa/drivers/dri/i965/brw_state_dump.c: In function 'dump_sampler_state': mesa/src/mesa/drivers/dri/i965/brw_state_dump.c:382:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] for (i = 0; i < size / 16; i++) { ^ mesa/src/mesa/drivers/dri/i965/brw_state_upload.c: In function 'brw_pipeline_state_finished': mesa/src/mesa/drivers/dri/i965/brw_state_upload.c:801:13: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (i != pipeline) { ^ mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c: In function 'intel_gen7_hiz_buf_create': mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1544:47: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] for (int level = mt->first_level; level <= mt->last_level; ++level) { ^ mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c: In function 'intel_gen8_hiz_buf_create': mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1638:44: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] for (int level = mt->first_level; level <= mt->last_level; ++level) { ^ mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c: In function 'intel_miptree_alloc_hiz': mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1771:44: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] for (int level = mt->first_level; level <= mt->last_level; ++level) { ^ mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1775:33: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] for (int layer = 0; layer < mt->level[level].depth; ++layer) { ^ Signed-off-by: Rhys Kidd <rhyskidd@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>