AgeCommit message (Collapse)AuthorFilesLines
2015-11-09NO MORE BLUE TRIANGLES!!!!!!!!!!!!!!1111111111111oneoneeleventessquashKenneth Graunke1-1/+1
▄▄▄ ▄▄▄ ▄▄ ▄▄▄ ▄▄▄ ██▄ ▄██ ████ ██▄ ▄██ ██▄▄██ ████ ██▄▄██ ▀██▀ ██ ██ ▀██▀ ██ ██████ ██ ██ ▄██ ██▄ ██ ██ ██ ██ ▀▀ ▀▀ ▀▀ ▀▀ ▀▀ ▀▀ ▀▀
2015-11-09LOOOOOOOPKenneth Graunke3-3/+13
2015-11-09disable single program flowKenneth Graunke1-2/+1
UFO on IVB doesn't do it. UFO on SKL doesn't do it. I can't imagine this is actually necessary. SPF seemed to be papering over the previous CSE / WE_all bug.
2015-11-09HAX: Disable CSE of stuffsKenneth Graunke1-2/+0
We're losing WE_all and that's breaking pretty much everything on the planet when MPF is on.
2015-11-09VS min URB hackKenneth Graunke1-3/+5
2015-11-08i965: Implement SIMD8 tessellation evaluation shader support.Kenneth Graunke10-10/+208
XXX: Missing tess_coord read support. Should configure for push... Signed-off-by: Kenneth Graunke <>
2015-11-08i965/fs: Implement get_nir_src_imm().Kenneth Graunke2-0/+11
Signed-off-by: Kenneth Graunke <>
2015-11-08nir: Add helpers for getting input/output intrinsic sources.Kenneth Graunke2-0/+45
With the many variants of IO intrinsics, particular sources are often in different locations. It's convenient to say "give me the indirect offset" or "give me the vertex index" and have it just work, without having to think about exactly which kind of intrinsic you have. Signed-off-by: Kenneth Graunke <>
2015-11-08i965: Allow indirect GS input indexing in the scalar backend.Kenneth Graunke4-46/+105
This allows arbitrary non-constant indices on GS input arrays, both for the vertex index, and any array offsets beyond that. All indirects are handled via the pull model. We could potentially handle indirect addressing of pushed data as well, but it would add additional code complexity, and we usually have to pull inputs anyway due to the sheer volume of input data. Plus, marking pushed inputs as live due to indirect addressing could exacerbate register pressure problems pretty badly. We'd need to be careful. Signed-off-by: Kenneth Graunke <>
2015-11-08i965: Add a SHADER_OPCODE_URB_READ_SIMD8_PER_SLOT opcode.Kenneth Graunke4-5/+10
We need to use per-slot offsets when there's non-uniform indexing, as each SIMD channel could have a different index. We want to use them for any non-constant index (even if uniform), as it lives in the message header instead of the descriptor, allowing us to set offsets in GRFs rather than immediates. Signed-off-by: Kenneth Graunke <>
2015-11-08i965: Introduce a INDIRECT_THREAD_PAYLOAD_MOV opcode.Kenneth Graunke6-0/+61
The geometry and tessellation control shader stages both read from multiple URB entries (one per vertex). The thread payload contains several URB handles which reference these separate memory segments. In GLSL, these inputs are represented as per-vertex arrays; the outermost array index selects which vertex's inputs to read. This array index does not necessarily need to be constant. To handle that, we need to use indirect addressing on GRFs to select which of the thread payload registers has the appropriate URB handle. (This is before we can even think about applying the pull model!) This patch introduces a new opcode which performs a MOV from a source using VxH indirect addressing (which allows each of the 8 SIMD channels to select distinct data.) It also marks a whole segment of the payload as "used", so the register allocator recognizes the read and avoids reusing those registers. Signed-off-by: Kenneth Graunke <>
2015-11-08i965/brw_reg: Add a brw_VxH_indirect helperJason Ekstrand1-0/+11
Reviewed-by: Kenneth Graunke <>
2015-11-08i965: Split nir_emit_intrinsic by stage with a general fallback.Kenneth Graunke2-277/+381
Many intrinsics only apply to a particular stage (such as discard). In other cases, we may want to interpret them differently based on the stage (such as load_primitive_id or load_input). The current method isn't that pretty - we handle all intrinsics in one giant function. Sometimes we assert on stage, sometimes we forget. Different behaviors are handled via if-ladders based on stage. This commit introduces new nir_emit_<stage>_intrinsic() functions, and makes nir_emit_instr() call those. In turn, those fall back to the generic nir_emit_intrinsic() function for cases they don't want to handle specially. This makes it clear which intrinsics only exist in one stage, and makes it easy to handle inputs/outputs differently for various stages. Signed-off-by: Kenneth Graunke <>
2015-11-08shader time fixesKenneth Graunke2-0/+15
2015-11-04try to fix minimum DS URB entriesKenneth Graunke1-1/+3
not observed to help anything
2015-11-04all the UBO!Kenneth Graunke1-0/+2
2015-11-04HAX: GLES31 hackery...for running dEQP.Kenneth Graunke13-51/+59
2015-11-04deqp hack: force gl_PointSize to be read from 3DSTATE_SFKenneth Graunke1-1/+1
probably wrong, but makes tests pass. revisit.
2015-11-04i965: Implement tessellation shaders.Kenneth Graunke39-82/+2335
Written by Chris Forbes, Fabian Bieler, and Kenneth Graunke.
2015-11-04dump VS VUE map on DEBUG=vsKenneth Graunke1-1/+5
2015-11-04i965: Implement a get_nir_vertex_index_src() helper.Kenneth Graunke2-0/+25
Signed-off-by: Kenneth Graunke <>
2015-11-04i965: Implement a get_nir_indirect_src() helper.Kenneth Graunke2-0/+27
Signed-off-by: Kenneth Graunke <>
2015-11-04i965: Implement ARB_pipeline_statistics_query tessellation counters.Kenneth Graunke1-4/+4
We basically just need to uncomment Ben's code. Signed-off-by: Kenneth Graunke <>
2015-11-04i965: Add HS/DS push constant support.Kenneth Graunke4-0/+66
This was probably written by Chris Forbes? Signed-off-by: Kenneth Graunke <>
2015-11-04i965: Add HS/DS sampler support.Kenneth Graunke3-0/+52
Based on code by Chris Forbes and Fabian Bieler. Signed-off-by: Kenneth Graunke <>
2015-11-04i965: Add HS/DS surface support.Kenneth Graunke8-1/+403
This is brw_gs_surface_state.c copy and pasted twice with search and replace. brw_binding_table.c code is similarly copy and pasted. Signed-off-by: Kenneth Graunke <>
2015-11-04i965: Create new files for HS/DS/TE state upload code.Kenneth Graunke9-109/+258
For now, this just splits the existing code to disable these stages into separate atoms/files. We can then replace it with real code. Signed-off-by: Kenneth Graunke <>
2015-11-04i965: Add a debug function for printing VUE maps.Kenneth Graunke2-0/+46
Signed-off-by: Kenneth Graunke <>
2015-11-04i965: Add tessellation shader VUE map code.Chris Forbes2-2/+95
2015-11-04i965: URB allocations for tessellationChris Forbes3-28/+174
V3: Also setup URB semaphores area. Signed-off-by: Chris Forbes <>
2015-11-04i965: Add state bits for tess stagesChris Forbes3-0/+26
Signed-off-by: Chris Forbes <>
2015-11-04i965: Add backend structures for tess stagesChris Forbes6-0/+96
Signed-off-by: Chris Forbes <>
2015-11-04i965: Add INTEL_DEBUG=hs,ds flags for debugging tessellation shaders.Kenneth Graunke2-2/+6
Even though both tessellation shader stages must be used together, I still think it makes sense to add separate debug flags for each stage. It makes it possible to read the TCS/HS, rule out problems, then read the TES/DS separately, without sifting through as much printed text. Signed-off-by: Kenneth Graunke <>
2015-11-04i965: Set core tessellation-related limitsChris Forbes1-4/+22
Signed-off-by: Chris Forbes <>
2015-11-04i965: Map GL_PATCHES to 3DPRIM_PATCHLIST_n.Kenneth Graunke2-1/+10
Inspired by a patch by Fabian Bieler. Fabian defined a _3DPRIM_PATCHLIST_0 macro (which isn't actually a valid topology type); I instead chose to make a macro that takes an argument. He also took the number of patch vertices from _mesa_prim (which was set to ctx->TessCtrlProgram.patch_vertices) - I chose to use it directly to avoid the need for the VBO patch. Signed-off-by: Kenneth Graunke <>
2015-11-04i965: Bump the render atoms count.Kenneth Graunke1-1/+1
This avoids having to churn it in tons of patches.
2015-11-04i965: Request lowering of gl_TessLevel* from float[] to vec4s.Kenneth Graunke1-0/+2
Signed-off-by: Kenneth Graunke <>
2015-11-04i965: Enable ARB_tessellation_shader on Gen8+.Kenneth Graunke1-0/+1
Signed-off-by: Kenneth Graunke <>
2015-11-04disable lower outputs to temporariers for tess ctrl outputsKenneth Graunke1-0/+3
+5 piglits. generated code competence-------------------------- :/
2015-11-04NIR TCS OUTPUT SHADOWINGKenneth Graunke1-0/+81
▄▄ ▄▄▄▄ ▄▄ ▄▄ ██ ██▀▀▀ ██ ██ ▄███▄██ ▄████▄ ██▄████ ▄████▄ ███████ ▄▄█████▄ ██ ██ ██▀ ▀██ ██▄▄▄▄██ ██▀ ██▄▄▄▄██ ██ ██▄▄▄▄ ▀ ██ ██ ██ ██ ██▀▀▀▀▀▀ ██ ██▀▀▀▀▀▀ ██ ▀▀▀▀██▄ ▀▀ ▀▀ ▀██▄▄███ ▀██▄▄▄▄█ ██ ▀██▄▄▄▄█ ██ █▄▄▄▄▄██ ▄▄ ▄▄ ▀▀▀ ▀▀ ▀▀▀▀▀ ▀▀ ▀▀▀▀▀ ▀▀ ▀▀▀▀▀▀ ▀▀ ▀▀ ▄▄ ██ ▄▄ ▄█▀ ██ ██ ██ ██ ▀█▄ ▀▀ ██ ▀▀ but I think it probably works...probably...probably
2015-11-04nir: Allow outputs reads and add the relevant intrinsics.Kenneth Graunke4-12/+24
Normally, we rely on nir_lower_outputs_to_temporaries to create shadow variables for outputs, buffering the results and writing them all out at the end of the program. However, this is infeasible for tessellation control shader outputs. Tessellation control shaders can generate multiple output vertices, and write per-vertex outputs. These are arrays indexed by the vertex number; each thread only writes one element, but can read any other element - including those being concurrently written by other threads. The barrier() intrinsic synchronizes between threads. Even if we tried to shadow every output element (which is of dubious value), we'd have to read updated values in at barrier() time, which means we need to allow output reads. Most stages should continue using nir_lower_outputs_to_temporaries(), but in theory drivers could choose not to if they really wanted. Signed-off-by: Kenneth Graunke <>
2015-11-04nir/lower_io: Introduce nir_store_per_vertex_output intrinsics.Kenneth Graunke3-5/+25
Similar to nir_load_per_vertex_input, but for outputs. This is not useful in geometry shaders, but will be useful in tessellation shaders. Signed-off-by: Kenneth Graunke <>
2015-11-04nir/lower_io: Use load_per_vertex_input intrinsics for TCS and TES.Kenneth Graunke1-2/+5
Tessellation control shader inputs are an array indexed by the vertex number, like geometry shader inputs. There aren't per-patch TCS inputs. Tessellation evaluation shaders have both per-vertex and per-patch inputs. Per-vertex inputs get the new intrinsics; per-patch inputs continue to use the ordinary load_input intrinsics, as they already work like we want them to. Signed-off-by: Kenneth Graunke <>
2015-11-04nir: Store PatchInputsRead and PatchOutputsWritten in nir_shader_info.Kenneth Graunke2-0/+7
These tessellation shader related fields need plumbing through NIR. Signed-off-by: Kenneth Graunke <>
2015-11-04glsl: delete ir_set_program_inouts assertKenneth Graunke1-1/+0
trips on deqp tests...just marking the whole variable used is okay we don't have varying packing.
2015-11-04glsl: adjust find-innermost-index to see through swizzlesChris Forbes1-1/+3
This is only used for TCS output assignment validation. Previously, we'd produce an error for assignments like: # TCS out vec4 a[]; a[gl_InvocationID].x = ...
2015-11-04mesa: Remove ES 3.0/3.1 transform feedback primitive counting error.Kenneth Graunke1-1/+6
This is gone in ES 3.2. I don't see any mention of it going away. Presumably it can't work if you're using a GS or TCS/TES. So just ignore it in those cases. Signed-off-by: Kenneth Graunke <>
2015-11-04i965: Add src/dst interference for certain instructions with hazards.Kenneth Graunke7-35/+123
When working on tessellation shaders, I created some vec4 virtual opcodes for creating message headers through a sequence like: mov(8) g7<1>UD 0x00000000UD { align1 WE_all 1Q compacted }; mov(1) g7.5<1>UD 0x00000100UD { align1 WE_all }; mov(1) g7<1>UD g0<0,1,0>UD { align1 WE_all compacted }; mov(1) g7.3<1>UD g8<0,1,0>UD { align1 WE_all }; This is done in the generator since the vec4 backend can't handle align1 regioning. From the visitor's point of view, this is a single opcode: hs_set_output_urb_offsets vgrf7.0:UD, 1U, vgrf8.xxxx:UD Normally, there's no hazard between sources and destinations - an instruction (naturally) reads its sources, then writes the result to the destination. However, when the virtual instruction generates multiple hardware instructions, we can get into trouble. In the above example, if the register allocator assigned vgrf7 and vgrf8 to the same hardware register, then we'd clobber the source with 0 in the first instruction, and read back the wrong value in the last one. It occured to me that this is exactly the same problem we have with SIMD16 instructions that use W/UW or B/UB types with 0 stride. The hardware implicitly decodes them as two SIMD8 instructions, and with the overlapping regions, the first would clobber the second. Previously, we handled that by incrementing the live range end IP by 1, which works, but is excessive: the next instruction doesn't actually care about that. It might also be the end of control flow. This might keep values alive too long. What we really want is to say "my source and destinations interfere". This patch creates new infrastructure for doing just that, and teaches the register allocator to add interference when there's a hazard. For my vec4 case, we can determine this by switching on opcodes. For the SIMD16 case, we just move the existing code there. I audited our existing virtual opcodes that generate multiple instructions; I believe FS_OPCODE_PACK_HALF_2x16_SPLIT needs this treatment as well, but no others. Signed-off-by: Kenneth Graunke <>
2015-11-04i965: Fix scalar VS float[] and vec2[] output arrays.Kenneth Graunke4-2/+17
The scalar VS backend has never handled float[] and vec2[] outputs correctly (my original code was broken). Outputs need to be padded out to vec4 slots. In fs_visitor::nir_setup_outputs(), we tried to process each vec4 slot by looping from 0 to ALIGN(type_size_scalar(type), 4) / 4. However, this is wrong: type_size_scalar() for a float[2] would return 2, or for vec2[2] it would return 4. This looked like a single slot, even though in reality each array element would be stored in separate vec4 slots. Because of this bug, outputs[] and output_components[] would not get initialized for the second element's VARYING_SLOT, which meant emit_urb_writes() would skip writing them. Nothing used those values, and dead code elimination threw a party. To fix this, we introduce a new type_size_vec4_times_4() function which pads array elements correctly, but still counts in scalar components, generating correct indices in store_output intrinsics. Normally, varying packing avoids this problem by turning varyings into vec4s. So this doesn't actually fix any Piglit or dEQP tests today. However, if varying packing is disabled, things would be broken. Tessellation shaders can't use varying packing, so this fixes various tcs-input Piglit tests on a branch of mine. v2: Shorten the implementation of type_size_4x to a single line (caught by Connor Abbott), and rename it to type_size_vec4_times_4() (renaming suggested by Jason Ekstrand). Use type_size_vec4 rather than using type_size_vec4_times_4 and then dividing by 4. Signed-off-by: Kenneth Graunke <>
2015-11-04gles2: Update gl2.h and gl2ext.h to revision 32120Kenneth Graunke2-9/+1085