1 files changed, 68 insertions, 0 deletions
diff --git a/R300FragmentProgramOptimization.mdwn b/R300FragmentProgramOptimization.mdwn
new file mode 100644
index 0000000..192e283
--- /dev/null
+++ b/R300FragmentProgramOptimization.mdwn
@@ -0,0 +1,68 @@
+
+
+# Optimization for  r300/r400 fragment shader program
+
+We want few things, this stuff should be done in gallium. This could be done using the software rendering pipe so debugging is easier and there is no r300/r400 gallium driver yet. For shake of simplicity we will use the ARB fragment & vertex shader extension as we expect the higher level like glsl will be optimized in a first pass by things like llvm. We are focusing on optimizing a bit more the "ASM" we got as output of such stage. Also we don't want that one pass of optimization destroy the work done by another. To solve this i think that trying each permutation of optimization stage and selecting the one producing the slower number of instruction and fitting hardware is the best solution to keep things clean and simple. 
+## Reshuffle texture instruction
+
+To work around the 4 textures indirection limit we need to reshuffle texture instructions. For instance the following program can be rewritten. 
+
+Original which don't pass texture indirection limits (here 5 indirections): 
+
+
+[[!format txt """
+TEMP a, b, c, d;
+// node 0 - 0 indirection
+TEX a, fragment.color, texture[0], 2D;
+// node 1 - 1 indirection
+TEX b, a, texture[1], 2D;
+ADD c, b, 1;
+MUL b, c, b;
+// node 2 - 2 indirection because c have been written in previous node
+TEX c, fragment.color, texture[2], 2D;
+// node 3 - 3 indirection c have been written in previous node
+TEX d, c, texture[3], 2D;
+ADD a, b, d;
+// node 4 (out of limit !) - a have been written in previous node
+TEX result.color, a, texture[4], 2D;
+"""]]
+Reshuffled program which pass texture indirection limits (here 3 indirections): 
+[[!format txt """
+TEMP a, b, c, d;
+TEMP _ts_0;
+// node 0 - 0 indirection
+TEX a, fragment.color, texture[0], 2D;
+TEX c, fragment.color, texture[2], 2D;
+// node 1 - 1 indirection
+TEX b, a, texture[1], 2D;
+TEX d, c, texture[3], 2D;
+ADD _ts_0, b, 1;
+MUL b, _ts_0, b;
+ADD a, b, d;
+// node 2 - 2 indirection a have been written in previous node
+TEX result.color, a, texture[4], 2D;
+"""]]
+
+## Use native swizzle
+
+Please refer to doc to find native swizzle for r300/r400 hw. We want to rewritte asm to take advantage of native swizzle. This stage could be mixed with the scalar/vector optimization pass. 
+
+Following program can be optimize (assuming xyzw & wzyx & wyxz are native but xzwy isn't): 
+[[!format txt """
+TEMP a, b;
+PARAM coef = {0.5f, 0.6f, 0.7f, 0.8f};
+ADD a, fragment.color.xzwy, coef.wyxz;
+"""]]
+To: 
+[[!format txt """
+TEMP a, b;
+PARAM coef = {0.5f, 0.6f, 0.7f, 0.8f};
+ADD a, fragment.color.wzyx, coef.xyzw;
+"""]]
+Well optimization here can be more complexe if we add negation on individual component in the equation. 
+## Reshuffle instruction
+
+By reshuffling instruction you could take advantage of this scalar/vector split in unit to parallelize computation a bit more. 
+## References
+
+[[http://www.opengl.org/registry/specs/ARB/fragment_program.txt|http://www.opengl.org/registry/specs/ARB/fragment_program.txt]]