Age | Commit message (Collapse) | Author | Files | Lines |
|
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
Previously they (very rarely) used C++isms that prevented them from being
compiled as C. As of this commit, they can be compiled as either C or C++.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
This commit is mostly mechanical except that it changes where we set the
swizzle. Previously, the blorp_surface_info constructor defaulted the
swizzle to SWIZZLE_XYZW. Now, we memset to zero and fill out the swizzle
when we setup the rest of the struct.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
Instead of having a virtual member function for getting the WM/PS kernel,
we simply add fields for prog_data and the kernel to brw_blorp_parms and
always make sure those get set as part of the different constructors.
v2: Use use prog_data != NULL to check for a valid program instead of a
magic kernel offset value
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
This is identical to the blorp version which only differs in case
fragment shader isn't used. In that case blorp would reset batch
buffer address to zero.
This is not really needed, and having blorp to use base state
address setup that is compatible with normal upload allows one to
skip resetting it.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
Otherwise clearing with blorp will regress performance in some
synthetic test cases.
v2: Used vsize >= 2 instead of vsize > 0, and updated the comment.
Review by Ken in one of the earlier patches revealed this.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
In case there is no source it means the program does a simple
clear or a resolve. In such case there is no need to program
sampling state or enable pixel kill in fragment shader.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
Previously, we hardcoded "VS URB Starting Address" to 2 (in 8kB chunks),
which meant VS URB data would start at an offset of 16kB.
However, on Haswell GT3 and Gen8+, we allocate the first 32kB for the
push constant region. This means that the PS push constant and VS URB
data regions overlap, which can lead to corruption.
v2 (Ken): Better description of the change, and do not change vs_size
from 2 to 1.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
Haswell GT2 and GT3 have a minimum of 64 entries. Hardcoding 32
is not legal.
v2: Delete stale comment (caught by Alejandro).
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
|
|
v2: (by Ken, incorporating feedback from Matt Turner):
- Rewrite the push constant allocation code to be clearer.
- Only apply the minimum VS entries workaround on Gen 8.
v3: (by Ken)
- Fix a bug in v2 where we failed to allocate the full push constant
space when the number of enabled stages didn't divide the available
push constant space evenly. (Any left over space is now allocated
to the PS, as it was in v1.)
- Fix an off-by-one error in v2's number of enabled stages calculation.
- Use DIV_ROUND_UP for nicer formatting.
- Line wrapping fixes.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
|
|
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
|
|
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
|
|
The values of intel_mipmap_tree::align_w and ::align_h correspond to the
hardware enums HALIGN_* and VALIGN_*.
See the confusion?
align_h != HALIGN
align_h == VALIGN
Reduce the confusion by renaming the variables to match the hardware
enum names:
git ls-files |
xargs sed -i -e 's/align_w/halign/g' \
-e 's/align_h/valign/g'
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
|
|
Switch off hardware-generated binding tables and gather push
constants in the blorp. Blorp requires only a minimal set of
simple constants. There is no need for the extra complexity
to program a gather table entry into the pipeline.
Cc: kenneth@whitecape.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
|
|
Note that the magic number of one in gen7 logic is replaced by
BRW_SF_URB_ENTRY_READ_OFFSET ( == 1 also) for clarity.
On gen6 the change from zero to one (BRW_SF_URB_ENTRY_READ_OFFSET)
has no effect for native blorp as blorp doesn't use any
additional attributes. In fact, regular pipeline setup always
uses BRW_SF_URB_ENTRY_READ_OFFSET even when there are no additional
attributes. Hence the change makes the two (blorp and regular)
consistent.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
|
|
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
|
|
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
|
|
v2 (Ken): s/use_unorm_coords/non_normalized_coords/
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
|
|
This was still needed when we had support for blorp clears but now
this is fixed to nop.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
|
|
We are still allocating a miptree for hiz, but we only use fields from
intel_miptree_aux_buffer. This will allow us to switch over to not
allocating a miptree.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
|
|
Today we allocate a miptree's for the hiz buffer. We needed this in
the past because we would point the hardware at offsets of the hiz
buffer. Since the hiz format is not documented, this is not a good
idea.
Since moving to support layered rendering on Gen7+, we no longer point
at an offset into the buffer on Gen7+.
Therefore, to support hiz on Gen7+, we don't need a full miptree
structure allocated.
This patch starts to create a new auxiliary buffer structure
(intel_miptree_aux_buffer) that can be a more simplistic miptree
side-band buffer associated with a miptree. (For example, to serve the
needs of the hiz buffer.)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
|
|
We will want to setup gen6 separate stencil and hiz miptrees in a
layout that is similar to array_spacing_lod0. This is needed because
gen6 hiz and stencil only support a single mip-level.
In both use cases (gen7+ LOD0 spacing & gen6 separate stencil/hiz),
the array slices will be packed at each LOD without reserving extra
space for LODs within each array slice.
So, we generalize the name of this field and add comments to indicate
the old and new uses.
Motivation for the gen6 change comes from the PRM:
PRM Volume 1, Part 1, 7.18.3.7.2 For separate stencil buffer [DevILK]
to [DevSNB]:
"The separate stencil buffer does not support mip mapping, thus the
storage for LODs other than LOD 0 is not needed."
PRM Volume 2, Part 1, 7.5.3 Hierarchical Depth Buffer
"[DevSNB]: The hierarchical depth buffer does not support the LOD
field, it is assumed by hardware to be zero. A separate
hierarachical depth buffer is required for each LOD used, and the
corresponding buffer’s state delivered to hardware each time a new
depth buffer state with modified LOD is delivered."
v2:
* Rename array_spacing_lod0 to non_mip_arrays
v3:
* Instead, replace array_spacing_lod0 with array_layout enum
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
This simplifies the code, removes use of the old structures, and also
allows us to combine the Gen6 and Gen7+ code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
|
|
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
|
|
Note: region->width/height used to reflect the total_width/height padding
of separate stencil, though mt->total_width didn't. region->width/height
was being used in EGL images, where the padded value would have been the
wrong one, so I converted them to use rb->Width/Height.
v2: Drop debug printf that slipped in (caught by Ken)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
|
|
libdrm 2.4.52 introduces a new 'uint64_t offset64' field, intended to
replace the old 'unsigned long offset' field. To preserve ABI, libdrm
continues to store the presumed offset in both locations.
On Broadwell, a 64-bit kernel may place BOs at "high" (> 4G) addresses.
However, with a 32-bit userspace, the 'unsigned long offset' field will
only be 32-bit, which is not large enough to hold this value. We need
to use a proper uint64_t (like the kernel does).
Technically, a lot of this code doesn't affect Broadwell, so we could
leave it using the old field. But it makes sense to just switch to the
new, properly typed field.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
|
|
Previously, brw_blorp_params contained two fields for determining
sample count: num_samples (which determined the multisample
configuration of the rendering pipeline) and dst.num_samples (which
determined the multisample configuration of the render target
surface). This was redundant, since both fields had to be set to the
same value to avoid rendering errors.
This patch eliminates num_samples to avoid future confusion.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
Haswell needs a copy of the sample mask in 3DSTATE_PS; this makes that
convenient.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
Performed via:
$ for file in *; do sed -i 's/ *//g'; done
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
BLORP is essential. However, porting it to Gen8 is a huge amount of
work. Disabling it for now allows us to proceed with basic hardware
enablement.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
v2: Don't go to extra work to avoid extraneous flushes. (Previous
experiments in the kernel have suggested that flushing the pipeline
when it is already empty is extremely cheap).
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
This brings over the batch-wrap-prevention and aperture space checking
code from the normal brw_draw.c path, so that we don't need to flush the
batch every time.
There's a risk here if the intel_emit_post_sync_nonzero_flush() call isn't
high enough up in the state emit sequences -- before, we implicitly had
one at the batch flush before any state was emitted, so Mesa's workaround
emits didn't really matter. Since the SNB fixes by Ken, I didn't see any
regressions after 3 piglit runs.
Improves cairo-gl performance by 13.7733% +/- 1.74876% (n=30/32)
Improves minecraft apitrace performance by 1.03183% +/- 0.482297% (n=90).
Reduces low-resolution GLB 2.7 performance by 1.17553% +/- 0.432263% (n=88)
Reduces Lightsmark performance by 3.70246% +/- 0.322432% (n=126)
No statistically significant performance difference on unigine tropics
(n=10)
No statistically significant performance difference on openarena (n=755)
The two apps that are hurt happen to include stalls on busy buffer
objects, so I think this is an effect of missing out on an opportune
flush.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
|
|
From the documentation:
"[DevIVB] 3DSTATE_DEPTH_BUFFER must always be programmed along with the
other Depth/Stencil state commands(i.e. 3DSTATE_CLEAR_PARAMS,
3DSTATE_STENCIL_BUFFER, or 3DSTATE_HIER_DEPTH_BUFFER)."
We normally do this, but BLORP was failing to do so in the case where it
disables depth.
Not observed to fix anything yet.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Xinkai Chen <yeled.nova@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
|
|
In 3DSTATE_DEPTH_BUFFER, we set Width and Height to the miptree slice's
physical dimensions. (Logical and physical dimensions may differ for
multisample surfaces).
However, in SURFACE_STATE, we always set Width and Height to the slice's
logical dimensions. We should do the same for 3DSTATE_DEPTH_BUFFER,
because the hw docs say so.
No Piglit regressions (-x glx -x glean) on Ivybridge with Wayland.
v2: No Piglit regressions, for real this time.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
|
|
Previously, we would always use the same push constant allocation
regardless of what shader programs were being run: the available push
constant space was split into 2 equal size partitions, one for the
vertex shader, and one for the fragment shader.
Now that we are adding geometry shader support, we need to do
something smarter. This patch adjusts things so that when a geometry
shader is in use, we split the available push constant space into 3
nearly-equal size partitions instead of 2.
Since the push constant allocation is now affected by GL state, it can
no longer be set up by brw_upload_initial_gpu_state(); instead it must
be set up by a state atom.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
Previously, we gave all of the URB space (other than the small amount
that is used for push constants) to the vertex shader. However, when
a geometry shader is active, we need to divide it up between the
vertex and geometry shaders.
The size of the URB entries for the vertex and geometry shaders can
vary dramatically from one shader to the next. So it doesn't make
sense to simply split the available space in two. In particular:
- On Ivy Bridge GT1, this would not leave enough space for the worst
case geometry shader, which requires 64k of URB space.
- Due to hardware-imposed limits on the maximum number of URB entries,
sometimes a given shader stage will only be capable of using a small
amount of URB space. When this happens, it may make sense to
allocate substantially less than half of the available space to that
stage.
Our algorithm for dividing space between the two stages is to first
compute (a) the minimum amount of URB space that each stage needs in
order to function properly, and (b) the amount of additional URB space
that each stage "wants" (i.e. that it would be capable of making use
of). If the total amount of space available is not enough to satisfy
needs + wants, then each stage's "wants" amount is scaled back by the
same factor in order to fit.
When only a vertex shader is active, this algorithm produces
equivalent results to the old algorithm (if the vertex shader stage
can make use of all the available URB space, we assign all the space
to it; if it can't, we let it use as much as it can).
In the future, when we need to support tessellation control and
tessellation evaluation pipeline stages, it should be straightforward
to expand this algorithm to cover them.
v2: Use "unsigned" rather than "GLuint".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
IVB/BYT also has the same L3 cacheability control in MOCS as HSW,
so let's make use of it.
pts/xonotic and pts/reaction @ 1920x1080 gain ~4% on my IVB GT2. Most
other things show less gains/no regressions, except furmark which
loses some 10 points.
I didn't have a BYT at hand for testing.
v2: Don't check (brw->gen == 7) in gen7 functions. (chadv)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
|
|
We emit these before configuring depth in the normal path, or actually
using the depth buffer in BLORP - we just failed to emit them when
disabling depth altogether.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
|
|
Previously we would always find the 2D sub-surface of interest,
and then program the surface to this location. Now we always
program the 3DSTATE_DEPTH_BUFFER at the start of the surface.
To select the lod/slice, we utilize the lod & minimum array
element fields.
As part of this change, we must revert 1f112ccf:
Revert "i965/gen7: Align all depth miplevels to 8 in the X direction."
We also must disable brw_workaround_depthstencil_alignment for
gen >= 7. Now the hardware will handle alignment when rendering
to additional slices/LODs.
v2:
* Merge with recent MOCS changes
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
|
|
This will be used in 3DSTATE_DEPTH_BUFFER in a later patch.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
|
|
In layered rendering this will be 0. Otherwise it will be the
selected slice.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
|