summaryrefslogtreecommitdiff
path: root/IntelPerformance.moin
diff options
context:
space:
mode:
Diffstat (limited to 'IntelPerformance.moin')
-rw-r--r--IntelPerformance.moin42
1 files changed, 0 insertions, 42 deletions
diff --git a/IntelPerformance.moin b/IntelPerformance.moin
deleted file mode 100644
index c063407..0000000
--- a/IntelPerformance.moin
+++ /dev/null
@@ -1,42 +0,0 @@
-== Ideas for improving Intel 3D driver performance ==
-=== 965: Profile URB allocation ===
-How often are we cutting down to minimal URB allocation size?
-
-=== G4x: Use transposed reads ===
-This would cut the URB size from sf->wm, allowing more concurrency. g45-transposed-read branch of ~anholt/mesa
-
-=== 965: Cut down on state flagging in brw_new_batch ===
-We just need to re-emit everything (BRW_NEW_CONTEXT), not re-calculate all state. Must make sure that BRW_NEW_CONTEXT is set where it needs to be. Also, merge this and BRW_NEW_BATCH together.
-
-=== 965: Enable other-sized dispatch in wm ===
-Right now we only enable 16-pixel or 8-pixel dispatch, while creating program binaries with multiple entrypoints for differently-sized dispatch could save us many cycles.
-
-=== 965: Merge brw_wm_glsl.c into brw_wm_emit.c ===
-This would get us 16-wide dispatch in GLSL, which looks like a 10-20% performance win.
-
-=== 915: Avoid no-op updates of non-pipelined state ===
-Calling DrawBuffer to the same buffer is painful as it flags us for updating the draw buffer, which is non-pipelined state. This brings the meta clear code from 200mb/s to 6mb/s. Something like what 965 does would be better for state tracking in this driver.
-
-=== both: Save state instead of using push/pop in metaops ===
-push/pop are expensive, and if we just kept track of the state in static structures it would be a win.
-
-=== both: Use fps and vps in metaops ===
-This is partially done now, but using fps and vps for metaops lets us push/pop less state and reduces the cost for mesa and 965 to calculate the state updates that result.
-
-=== both: Avoiding CPU-dirty of BOs kicked out of the aperture ===
-Right now when an application exceeds the aperture size, it hits a performance cliff because BOs removed from the aperture have their pages unpinned so they can be swapped. If we kept the BOs pinned but had a memory pressure handler, we could avoid cpu dirtying them, which would remove most of the unbind/rebind thrashing cost.
-
-=== both: Use PPGTT to have a larger aperture size ===
-Pulls us back from the performance cliff in the previous entry. This would also let us successfully render larger FBOs and textures where we fail currently.
-
-=== i965: Avoid creating new VBO until we've sent the last one out in a batchbuffer ===
-Right now when vbo_exec_api.c flushes a set of primitives, it does BufferData(size, NULL) on the VBO, so that you don't block on mapping the old one due to existing rendering. VBOs are relatively huge, so if you're doing tiny draws it's a lot of overhead, keeping us from using real VBOs for vbo_exec.
-
-=== i965: Implement the ranged mapping extension ===
-This would obsolete the previous entry, as the vbo module would start doing what we want for us.
-
-=== i965: Only upload the constant buffer when the contents or the fencing of it has changed. ===
-Right now we upload it if you change programs, fencing, pipelined state, or anything related to transforms or projection. Would this help?
-
-=== i965: Use PIPE_CONTROL ===
-This is supposed to get us better pipelining behavior than MI_FLUSH.