summaryrefslogtreecommitdiff
path: root/src/gallium/drivers/vc4
AgeCommit message (Collapse)AuthorFilesLines
2015-12-11vc4: Add support for multisample framebuffer operations.Eric Anholt7-24/+191
This includes GL_SAMPLE_COVERAGE, GL_SAMPLE_ALPHA_TO_ONE, and GL_SAMPLE_ALPHA_TO_COVAGE. I haven't implemented a dithering function yet, and gallium doesn't give me a good chance to do so for GL_SAMPLE_COVERAGE. (cherry picked from commit a97b40dca4949b5b8b3320e76768e54f430c9e78)
2015-12-11vc4: Add a workaround for HW-2905, and additional failure I saw with MSAA.Eric Anholt1-2/+16
I only stumbled on this while experimenting due to reading about HW-2905. I don't know if the EZ disable in the Z-clear is actually necessary, but go with it for now. (cherry picked from commit edc3305de7d749338ad88a949cedfc290a796fe5)
2015-12-11vc4: Add support for drawing in MSAA.Eric Anholt6-50/+148
(cherry picked from commit edfd4d853a0d26bc0cde811de7b20116db7e66fc)
2015-12-11vc4: Add kernel RCL support for MSAA rendering.Eric Anholt5-39/+239
(cherry picked from commit e7c8ad0a6c8ba263f29b7c3c5120bc6beabeba7b)
2015-12-11vc4: Rename color_ms_write to color_write.Eric Anholt3-22/+21
I was thinking this was the only MSAA resolve thing, so it should be noted separately, but actually load/store general also do MSAA resolve. (cherry picked from commit 568d3a8e32109200cc12549d18118b7660be628b)
2015-12-11vc4: Allow RCL blits to the edge of the surface.Eric Anholt1-2/+8
The recent unaligned fix successfully prevented RCL blits that weren't aligned inside of the surface, but we also want to be able to do RCL blits for the whole surface when the width or height of the surface aren't aligned (we don't care what renders inside of the padding). (cherry picked from commit bf92017ace970104b24219fad0ce5b51bc4509b5)
2015-12-11vc4: Fix check for tile RCL blits with mismatched y.Eric Anholt1-1/+1
This was a typo in 3a508a0d94d020d9cd95f8882e9393d83ffac377 that didn't show up in testcases at that moment. (cherry picked from commit 2792d118f17f92b1908e3f0fc735087bb7ea4c38)
2015-12-11vc4: Fix compiler warning from size_t change.Eric Anholt1-1/+1
I missed this when bringing over the kernel changes. (cherry picked from commit 1529f138fff59bdb857d5f7da0ee2537521d5044)
2015-12-11vc4: Fix accidental scissoring when scissor is disabled.Eric Anholt1-5/+23
Even if the rasterizer has scissor disabled, we'll have whatever vc4->scissor bounds were last set when someone set up a scissor, so we shouldn't clip to them in that case. Fixes piglit fbo-blit-rect, and a lot of MSAA tests once they're enabled. (cherry picked from commit a4eff86f4afb6618aff488e9da5600e33d97a9c3)
2015-12-11vc4: Disable RCL blitting when scissors are enabled.Eric Anholt1-0/+3
We could potentially handle scissored blits when they're tile aligned, but it doesn't seem worth it. If you're doing a scissored blit, you're probably a testcase. Fixes piglit's fbo-scissor-blit fbo (cherry picked from commit d16d666776ee12659145f08bd35566dd2cc0f925)
2015-12-11vc4: Bring over cleanups from submitting to the kernel.Eric Anholt4-87/+78
(cherry picked from commit 0afe83078d10e0d376f7c3e2515ab2682fec0eb1)
2015-12-11vc4: Add debug dumping of MSAA surfaces.Eric Anholt2-6/+145
(cherry picked from commit a69ac4e89c1c3edc33eb4e9361229a3f25de3ee6)
2015-12-11vc4: Add support for laying out MSAA resources.Eric Anholt1-5/+20
For MSAA, we store full resolution tile buffer contents, which have their own tiling format. Since they're full resolution buffers, we have to align their size to full tiles. (cherry picked from commit 3c3b1184eb57951c8a40258c9214a1aece1602e6)
2015-12-11vc4: Add support for storing sample mask.Eric Anholt5-0/+24
From the API perspective, writing 1 bits can't turn on pixels that were off, so we AND it with the sample mask from the payload. (cherry picked from commit 74c4b3b80cc4246fd1eb503d97edb3d293eef5de)
2015-12-11vc4: Fix up tile alignment checks for blitting using just an RCL.Eric Anholt1-6/+22
We were checking that the blit started at 0 and was 1:1, but not that it went to the full width of the surface, or that the width was aligned to a tile. We then told it to blit to the full width/height of the surface, causing contents to be stomped in a bunch of MSAA tests that happen to include half-screen-width blits to 0,0. (cherry picked from commit 3a508a0d94d020d9cd95f8882e9393d83ffac377)
2015-12-11vc4: Add support for loading sample mask.Eric Anholt6-1/+19
(cherry picked from commit a664233042e1ad343184a0c237c3bd7ac5010779)
2015-12-11vc4: Use nir_channel() to simplify all of our nir_swizzle() cases.Eric Anholt2-6/+5
(cherry picked from commit 4cff16bc3a84569da05e672c8226931678aa62c0)
2015-12-11vc4: Fix point size lookup.Eric Anholt1-1/+1
I think I may have regressed this in the NIR conversion. TGSI-to-NIR is putting the PSIZ in the .x channel, not .w, so we were grabbing some garbage for point size, which ended up meaning just not drawing points. Fixes glean pointAtten and pointsprite. (cherry picked from commit 81544f231ad6eba1c7eb8b89273c59eb53a25879)
2015-11-29vc4: Just put USE_VC4_SIMULATOR in DEFINES.Eric Anholt2-5/+0
In the pipe-loader reworks, it was missed in one of the new directories it was used. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> (cherry picked from commit a39eac80fd491abb990b0b77dd5e4adc5b9c53e1)
2015-11-17vc4: Don't bother lowering uniforms when the same value is used twice.Eric Anholt1-13/+33
DEQP likes to do math on uniforms, and the "fmaxabs dst, uni, uni" to get the absolute value would get lowered. The lowering doesn't bother to try to restrict the lifetime of the lowered uniforms, so we'd end up register allocation failng due to this on 5 of the tests (More tests still fail in RA, which look like we'll need to reduce lowered uniform lifetimes to fix). No changes on shader-db, though fewer extra MOVs are generated on even glxgears (MOVs pair well enough that it ends up being the same instruction count).
2015-11-17vc4: Fix uniform reordering to support reading the same uniform twice.Eric Anholt1-8/+18
This does actually happen in the wild (particularly fabs of a uniform), so we'd like to support it.
2015-11-17vc4: Fix documentation on vc4_qir_lower_uniforms.c.Eric Anholt1-7/+3
2015-11-17vc4: Add support for nir_op_uge, using the carry bit on QPU_A_SUB.Eric Anholt5-0/+26
It looks like nir_lower_idiv is going to use it soon, so add support. With Ilia's change, this fixes one case in fs-op-div-large-uint-uint (with GL 3.0 forced on). Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-11-11gallium: add PIPE_CAP_CLEAR_TEXTURE and clear_texture prototypeIlia Mirkin1-0/+1
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-09vc4: Avoid loading undefined (newly-allocated) FBO contents.Eric Anholt1-0/+17
Since X has undefined contents in new pixmaps, it will allocate new textures for an FBO and draw to them without an explicit clear. For VC4, it's much faster to emit a clear than the load of the actual undefined memory contents, so just do that instead.
2015-11-09vc4: Return NULL when we can't make our shadow for a sampler view.Eric Anholt1-0/+4
I'm not sure what the caller does is appropriate (just have a NULL sampler at this slot), but it fixes the immediate crash. Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-11-09vc4: Return GL_OUT_OF_MEMORY when buffer allocation fails.Eric Anholt2-19/+32
I was afraid our callers weren't prepared for this, but it looks like at least for resource creation, mesa/st throws an error appropriately. Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-11-09vc4: Add CL dumping for GL_ARRAY_PRIMITIVE.Eric Anholt1-1/+16
2015-11-09vc4: Fix a compiler warning.Eric Anholt1-1/+1
2015-11-04vc4: When the create ioctl fails, free our cache and try again.Eric Anholt1-5/+24
This greatly increases the pressure you can put on the driver before create fails. Ultimately we need to let the kernel take control of our cached BOs and just take them from us (and other clients) directly, but this is a very easy patch for the moment. Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-11-04vc4: Print the rounded shader size in debug output.Eric Anholt1-1/+1
It's surprising to see "0kb" printed for debug on short shaders, while 4kb alignment won't be suprising.
2015-11-04vc4: Fix dumping the size of BOs allocated/cached.Eric Anholt1-2/+2
60MB of cached BOs are a lot less scary than 600MB.
2015-10-29vc4: Allow user index buffers, to avoid slow readback for shadow IBs.Eric Anholt4-10/+25
Improves low-settings openarena performance by 31.9975% +/- 0.659931% (n=7).
2015-10-28gallium: add PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATSMarek Olšák1-0/+1
For ARB_copy_image. Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-26vc4: Add support for copy propagation with unpack flags present.Eric Anholt2-36/+109
total instructions in shared programs: 89251 -> 87862 (-1.56%) instructions in affected programs: 52971 -> 51582 (-2.62%)
2015-10-26vc4: Rewrite the pack instructions as a MOV with a dst pack flagEric Anholt3-37/+18
Another step in reducing the special-casing of instructions.
2015-10-26vc4: Move dst pack setup out to a helper function with more asserts.Eric Anholt1-10/+22
2015-10-26vc4: Switch the unpack ops to being unpack flags on a mov.Eric Anholt6-123/+42
This paves the way for copy propagating our unpacks. We end up with a small change on shader-db: total instructions in shared programs: 89390 -> 89251 (-0.16%) instructions in affected programs: 19041 -> 18902 (-0.73%) which appears to be because we no longer convert MOVs for an FMAX dst, r4.unpack, r4.unpack (instead of the previous MOV dst, r4.unpack), and this ends up with a slightly better schedule.
2015-10-26vc4: Drop some confused code about pack/unpack handling.Eric Anholt1-23/+4
At one point I thought packs and unpacks were in the same field of the instruction. They aren't. These instructions therefore never cause a pack. total instructions in shared programs: 89472 -> 89390 (-0.09%) instructions in affected programs: 15261 -> 15179 (-0.54%)
2015-10-26vc4: Reduce MOV special-casing in QIR-to-QPU.Eric Anholt1-8/+11
I'm going to introduce some more types of MOV, which also want the elision of raw MOVs.
2015-10-26vc4: Fix up the test for whether the unpack can be from r4.Eric Anholt3-8/+27
We can do 16a/16b from float as well. No difference on shader-db.
2015-10-26vc4: Don't try to follow MOVs across a pack.Eric Anholt1-1/+2
2015-10-26vc4: Only copy propagate raw MOVs.Eric Anholt1-6/+1
No problems being fixed, but needed for the new unpack changes.
2015-10-26vc4: If a QIR source has an unpack set, print it.Eric Anholt3-3/+13
Not used yet, but will be.
2015-10-24vc4: Fix names of the 16-bit unpacksEric Anholt3-6/+6
They're only f16-to-f32 on a float operation, otherwise they're i16-to-i32.
2015-10-24vc4: Don't try to register coalesce into the VPM across non-raw MOVs.Eric Anholt1-1/+1
No known bugs, just something I noticed while updating optimization code for other changes.
2015-10-24vc4: Take advantage of the 8888 pack function in pack_unorm_4x8.Eric Anholt1-0/+14
One instruction instead of four, and it turns out you do this a lot for the Over operator. total uniforms in shared programs: 32168 -> 32087 (-0.25%) uniforms in affected programs: 318 -> 237 (-25.47%) total instructions in shared programs: 89830 -> 89472 (-0.40%) instructions in affected programs: 6434 -> 6076 (-5.56%)
2015-10-24vc4: Fix the test for skipping raw MOVs.Eric Anholt3-1/+10
I don't know what previous test was trying to do, but it dates back to the first add of vc4_qpu_emit.c. No change to shader-db.
2015-10-23vc4: Convert blending to being done in 4x8 unorm normally.Eric Anholt5-51/+276
We can't do this all the time, because you want blending to be done in linear space, and sRGB would lose too much precision being done in 4x8. The win on instructions is pretty huge when you can, though. total uniforms in shared programs: 32065 -> 32168 (0.32%) uniforms in affected programs: 327 -> 430 (31.50%) total instructions in shared programs: 92644 -> 89830 (-3.04%) instructions in affected programs: 15580 -> 12766 (-18.06%) Improves openarena performance at 1920x1080 from 10.7fps to 11.2fps.
2015-10-23vc4: Add QIR/QPU support for the 8-bit vector instructions.Eric Anholt4-0/+45