Age | Commit message (Collapse) | Author | Files | Lines |
|
For a variety of reasons mmap (selinux and pax to name
a few) and can fail and with current code. This will
result in a crash in the driver, if not worse.
This has been the case since the inception of the
gallium copy of rtasm.
Cc: 9.1 9.2 10.0 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73473
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
(cherry picked from commit 4dd445f1cf80292f10eda53665cefc2a674d838d)
|
|
We were calling draw_total_vs_outputs() too early. The call to
draw_pt_emit_prepare() could result in the vertex size changing.
So call draw_total_vs_outputs() after draw_pt_emit_prepare().
This fix would seem to be needed for the non-LLVM code as well,
but it's not obvious. Instead, I added an assertion there to
try to catch this problem if it were to occur there.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72926
Cc: 10.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
(cherry picked from commit ad814d04ca5d579538885a595331b5b27caefd2a)
Conflicts:
src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline.c
|
|
This fixes a serious regression introduced
in 4e549ddb500cf677b6fa16d9ebdfa67cc23da097.
Cc: 9.2 10.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit d40532f260c15d56e5fa836147e02c031a999682)
|
|
Prevents a memory leak.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
CC: "10.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit a7653c19a3b1adae162864587a7ab1c17ab256e6)
|
|
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
CC: "10.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 598f61ba28bcfd220104e18e89973768babeaac3)
|
|
Thanks to Pino Toscano. Patch from Debian package.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 2d77e4f922a8c34541d8b187e171738006bd6f4d)
|
|
There's only one minor functional change, for immediates the pixel offsets
are no longer added since the values are all the same for all elements in
any case (it might be better if those weren't stored as soa vectors in the
first place maybe).
Reviewed-by: Zack Rusin <zackr@vmware.com>
|
|
With this patch, the llvmpipe and draw modules will calculate the depth bias
according to floating point depth buffer semantics described in the
arb_depth_buffer_float specification, when the driver has a z buffer bound
with a format type of UTIL_FORMAT_TYPE_FLOAT.
By default, the driver will use the existing UNORM calculation for depth bias.
A new function, draw_set_zs_format, was added to calculate the Minimum
Resolvable Depth value and floating point depth sense for the draw module.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
|
|
Patch from Debian package
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
|
|
We weren't adding the soa offsets when constructing the indices
for the gather functions. That meant that we were always returning
the data in the first element.
(Copied straight from the same fix for temps.)
While here fix up a couple of broken comments in the fetch functions,
plus don't name a straight float type float4 which is just confusing.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
|
|
SSE can't handle true vector shifts (with variable shift count),
so llvm is turning them into a mess of extracts, scalar shifts and inserts.
It is however possible to emulate them in lp_build_minify with float muls,
which should be way faster (saves over 20 instructions per 8-wide
lp_build_minify). This wouldn't work for "generic" 32bit shifts though
since we've got only 24bits of mantissa (actually for left shifts it would
work by using sse41 int mul instead of float mul but not for right shifts).
Note that this has very limited scope for now, since this is only used with
per-pixel lod (otherwise we're avoiding the non-constant shift count by doing
per-quad shifts manually), and only 1d textures even then (though the latter
should change).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
|
|
util_format_is_rgba8_variant
Just happened to notice it was missing while looking at it.
|
|
LLVM 3.4 r193971 removed llvm::DisablePrettyStackTrace and made the
pretty stack trace opt-in rather than opt-out.
The default value of DisablePrettyStackTrace has changed to true in LLVM
3.4 and newer.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60929
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
|
|
|
|
We can create clip_ptr_type once instead of n times inside the loop.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
|
|
A convenient front end to indices generate/translate code, for emulating
primitives which are not supported natively by the driver.
This handles saving/restoring index buffer state, etc.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
|
|
Add 'start' parameter to generator/translator.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
|
|
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
|
|
d3d10 requires that cube corners are filtered with accurate weights (that
is, the weight of the non-existing corner texel should be evenly distributed
to the other 3 texels). OpenGL does not require this (but recommends it).
This requires us to use different filtering code, since we need per-texel
weights which our 2d lerp doesn't (and can't) do. And of course the (now
per element) weights need to be adjusted too for it to work.
Invoke the new filtering code whenever there's an edge to keep things simpler,
as it will work for edges too not just corners but of course it's only needed
with corners.
More ugly code for not much gain but at least a hacked up cubemap demo
shows very nice corners now... Not sure yet if and how this should be
configurable...
v2: incorporate feedback from Jose, only use special corner filtering code
when there's a corner not when there's only an edge (as corner filtering code
is slower, though a perf difference was only measureable when always
forcing edge code). Plus some minor style fixes.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
|
|
The new function replaces four old functions: set_fragment/vertex/
geometry/compute_sampler_views().
Note: at this time, it's expected that the 'start' parameter will
always be zero.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Emil Velikov <emil.l.velikov@gmail.com>
|
|
For seamless cube filtering it is necessary to determine new faces and new
coords per sample. The logic for this is _seriously_ complex (what needs
to happen is very "asymmetric" wrt face, x/y under/overflow), further
complicated by the fact that if the 4 samples are in a corner (meaning we
only have actually 3 samples, and all 3 are on different faces) then
falling off the edge is happening _both_ on x and y axis simultaneously.
There was a noticeable performance hit in mesa's cubemap demo when seamless
filtering was forced on (just below 10 percent or so in a debug build, when
disabling all filtering hacks, otherwise it would probably be a bit more) and
when always doing the logic, hence use a branch which it only does it if any
of the pixels in a quad (or in two quads) actually hit this. With that there
was no measurable performance hit in the cubemap demo (neither in a debug nor
release buidl), but this will vary (cubemap demo very rarely hits edges).
Might also be different on other cpus, as this forces SoA sampling path which
potentially can be quite a bit slower.
Note that as for corners, this code gets all the 3 samples which actually
exist right, and the 4th texel will simply be the same as one of the others,
meaning that filter weights will be a bit wrong. This however should be
enough for full OpenGL (but not d3d10) compliance.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
|
|
translate_sse.c contains code for msabi on x86_64, but it appears to be
untested.
Currently arguments 1 and 2 passed to the generated code are moved as 32-bit
quantities into the registers used by sysvabi, irrespective of the architecture.
Since these may be pointers, they must be moved as 64-bit quantities to avoid
truncation.
Commit f4dd0991719ef3e2606920c5100b372181c60899 disabled tranlate_sse.c on MinGW
x86_64, I don't know if was due to this issue, or a different one...
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Brian Paul <brianp@vmware.com>
|
|
Cygwin also uses the msabi calling convention on x86_64, not the sysvabi calling
convention
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Brian Paul <brianp@vmware.com>
ignored, and an empty message aborts the commit.
|
|
implementation which uses mmap()
The heap is NX on 64-bit Cygwin, so use the rtasm_exec_malloc() implementation
which uses mmap() to allocate an anonymous page with execute permission, rather
than the one which just uses malloc().
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Brian Paul <brianp@vmware.com>
|
|
This reverts commit 94d05bf87a21bd364e84f699a0064e5fba58a6f9 as it has a
few problems:
- it breaks windows builds becuase env[LLVM_CXXFLAGS] is never set there
- it is merging not only rtti, but the whole cxxflags (defines etc)
which has proven to be a source of troubles (breaks debugging etc.)
|
|
During the recent bind_sampler_states() interface change in gallium
we changed the CSO single_sampler_done() function so that if we were
decreasing the number of sampler states bound in the driver, we'd
null-out the "extra/old" sampler states to unbind them. See commit
1e2fbf265.
However, we didn't make the corresponding fix for sampler views.
This caused an assertion to fail in the svga driver which checked
that the number of sampler views matched the number of sampler states.
This patch fixes cso_restore_sampler_views() so that it nulls-out
the extra/old sampler views if the number of new views is less than
the number of current/old views.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
|
|
* The rtti fix actually dug up a bug in the scons build scripts.
* Autotools took the LLVM cpp and cxx flags, while scons only took
the cpp flags.
* This grabs the cxx flags and applies them where needed. We may
want to make the same change for the llvm cpp flags in scons.
* The only linux platform I can find with LLVM no-rtti is Ubuntu.
* Fixes bug #70471
Tested-by: Vinson Lee <vlee@freedesktop.org>
|
|
Otherwise (vs_slot < 0) will never be true.
Trivial.
|
|
* As discussed on the mailing list,
forced no-rtti breaks C++ public
API's such as the Haiku C++ libGL.so
* -fno-rtti *can* be still set however
instead of blindly forcing -fno-rtti,
we can rely on the llvm-config
--cppflags output.
If the system llvm is built without
rtti (default), the no-rtti flag will be
present in llvm-config --cppflags
(which we pick up on)
If llvm is built with rtti
(REQUIRES_RTTI=1), then -fno-rtti is
removed from llvm-config --cppflags.
* We could selectively add / remove rtti
from various components, however mixing
rtti and non-rtti code is tricky and
could introduce missing symbols.
* This needs impact tested.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
|
|
_GNU_SOURCE appears to not be used reliably. Use _MSC_VER instead so
that MSVC alone is affected.
|
|
Not used since ages, and it wouldn't work at all with explicit derivatives now
(not that it did before as it ignored them but now the code would just use
the derivs pre-projected which would be quite random numbers).
v2: also get rid of 3 helper functions no longer used.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
|
|
They need some special handling. Quite complicated.
Additionally, use the same code for implicit derivatives too if no_rho_approx
and no_quad_lod is set, because it seems while generally it should be ok
to use per quad lod for implicit derivatives there's at least some test which
insists that in case of cubemaps the shared lod value MUST come from a pixel
inside the primitive (due to the derivatives becoming different if a different
larger major axis is chosen).
v2: based on Brian's feedback, clean up code a bit.
And use sign bit of major axis instead of pre-select s/t/r sign for coord
mirroring (which should be the same in the end, saves 2 ands).
Also fix two bugs with select/mirror of derivatives, the minor axes need to
use major axis sign as well (instead of major derivative axis sign), and
don't mistakenly use absolute values of major derivative and inverse major
values.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
|
|
There's two reasons for this:
1) even when ignoring rho approximation for cube maps, the result is still
not correct, but it's better as the max error at edges is now sqrt(2) instead
of 2 (which was a full mip level), same as it is for ordinary 2d maps when
doing rho approximations (so the error actually goes from factor 2 at edges and
sqrt(2) completely inside a face to sqrt(2) at edges and 0 inside a face).
2) I want to repurpose rho_no_approx for cubemaps for fully correct cubemap
derivatives (so don't need yet another debug var).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
|
|
GNU C++ compiler declares the C99 lrint, etc. when _GNU_SOURCE is
defined, but MSVC does not.
Trivial.
|
|
Both the imul_hi and umul_hi are working with this patch.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
|
|
The code introduces two new 32bit integer multiplication opcodes which
can be used to produce correct 64 bit results. GLSL, OpenCL and D3D10+
require them. We use two seperate opcodes, because they match the
behavior of GLSL and OpenCL, are a lot easier to add than a single
opcode with multiple destinations and because there's not much (any)
difference wrt code-generation.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
|
|
only 8 and 32 bit integers were supported before.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
|
|
This patch adds the lrint, lrintf, llrint, and llrintf rounding utility
functions. When packing unorm depth values, we will round to nearest.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Put 'fragment' in the names. In preparation for upcoming function
renaming.
|
|
The block size for all formats is currently at least 1 byte. Add an
assertion for this.
This should silence several Coverity "Division or modulo by zero"
defects.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
|
|
There is an earlier null check for draw so draw could be null here as
well.
Fixes "Dereference after null check" defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
|