summaryrefslogtreecommitdiff
path: root/Development/Documentation/GlamorPerformance.mdwn
blob: 027c451bfa047b2ac34b9ca35533d3d3a615a3ac (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
Glamor _should_ be able to accelerate all of X, given a sufficiently capable GL implementation. It doesn't, yet, and this document attempts to describe the remaining work.

## Core ops

The two bits of core GC state that tend to cause fallbacks are GCFunction and GXPlaneMask. For the former we try to use glLogicOp, which works for draw calls but not for texture image calls, and doesn't exist in GLES. For the latter we basically just punt if the planemask isn't trivial.

We could probably accelerate both of the above if the GL(SL) below us has something like [[MESA_shader_integer_functions|https://www.opengl.org/registry/specs/MESA/shader_integer_functions.txt]] or [[EXT_framebuffer_fetch|https://www.khronos.org/registry/gles/extensions/EXT/EXT_shader_framebuffer_fetch.txt]].

Composite clip walk could likely use [[EXT_window_rectangles|https://www.opengl.org/registry/specs/EXT/window_rectangles.txt]] instead of glScissor.

### Image ops

PutImage is implemented with TexSubImage2D, so is only accelerated for GXcopy operations. If LogicOp works, you could accelerate this by using a temporary texture and drawing through it to the destination.

PutImage can't accelerate XYPixmap because that's not an image format that GL hardware (or anything postdating the Amiga, really) is equipped to deal with. You basically can't accelerate this unless you can accelerate arbitrary planemasks.

PutImage (not ShmPutImage) tends to memcpy once more than a classic X driver because it buffers the image data into the GL command stream, where exa might just blast the data directly into the pixmap. It's difficult to get around this without either a hint to the GL that the upload should be immediate (with all the stall/flush that implies), or some _serious_ surgery to the protocol buffering code.

GetImage falls back for ZPixmap with non-trivial planemask. There's not really a sane way to accelerate that through the GL, but it's cheap to fix up in software by just zeroing out the unset planes before writing to the client.

GetImage falls back entirely for XYPixmap, and again, GL very wisely doesn't believe in XYPixmap. Best approach is likely to download to a temporary as if ZPixmap, and then walk the planemask pulling out a plane at a time into the reply buffer.

### Geometry ops

It's extremely difficult to want to reimplement the arc or polygon code directly in the shader, as neither one sees very much use even in legacy apps. We should profile to make sure the current approach of decomposing to spans on the CPU is "fast enough".

The rectangle ops are probably about as perfect as they're going to get.

### Text ops

TODO

### Span ops

SetSpans, like PutImage, operates on the texture so can't accelerate non-GXcopy operations.

FillSpans might benefit from an alternate calling convention, where rather than being passed a span list the caller asks the driver to allocate the span storage, fills that in, then calls Set or Fill.  This would let us eliminate the copy into the vbo and just store there directly.

GetSpans is irrelevant, you can't hit it unless you're using the mi blit routines, and we're not.

## Render

TODO

## XVideo

TODO