summaryrefslogtreecommitdiff
path: root/Software/Beignet/optimization-guide.mdwn
diff options
context:
space:
mode:
authorZhigang Gong <gongzg@freedesktop.org>2014-12-04 22:01:41 -0800
committerZhigang Gong <gongzg@freedesktop.org>2014-12-04 22:01:41 -0800
commit842cc875b4fbc077738c8d2c57c8f1dd8c8da040 (patch)
tree428adee89d5dc6db85112bf980dead806fb4cb49 /Software/Beignet/optimization-guide.mdwn
parent92d77778b8559775e2cfcad74169ad61baf364c5 (diff)
Beignet, tweak formats.
Signed-off-by: Zhigang Gong <gongzg@freedesktop.org>
Diffstat (limited to 'Software/Beignet/optimization-guide.mdwn')
-rw-r--r--Software/Beignet/optimization-guide.mdwn17
1 files changed, 8 insertions, 9 deletions
diff --git a/Software/Beignet/optimization-guide.mdwn b/Software/Beignet/optimization-guide.mdwn
index 5f648fbc..d7669708 100644
--- a/Software/Beignet/optimization-guide.mdwn
+++ b/Software/Beignet/optimization-guide.mdwn
@@ -7,10 +7,9 @@ there are some special tips for Beignet optimization.
1. It is recommended to choose multiple of 16 work group size. Too much SLM usage may reduce parallelism at group level.
If kernel uses large amount SLM, it's better to choose large work group size. Please refer the following table for recommendations
- with some SLM usage.
-
-| Amount of SLM | 0 | 4K | 8K | 16K | 32K |
-| WorkGroup size| 16 | 64 | 128 | 256 | 512 |
+ with some SLM usage.
+ | Amount of SLM | 0 | 4K | 8K | 16K | 32K |
+ | WorkGroup size| 16 | 64 | 128 | 256 | 512 |
Actually, a good method is to pass in a NULL local work size parameter to let the driver to determine the best work group size for you.
@@ -30,26 +29,26 @@ there are some special tips for Beignet optimization.
Currently, private buffer access in beignet backend is very slow. Many small private buffer could be optimized by the compiler.
But the following type of dynamic indexed private buffer could not be optimized:
-`
+```c
+
uint private_buffer[32];
for (i = 0; i < xid; i++) {
int dynamic_idx = src[xid];
private_buffer[dynamic_idx % 10] = src1[xid];
...
}
-`
+```
The following case is OK.
-`
+```c
...
uint private_buffer[32];
for (i = 0; i < xid; i++) {
private_buffer[xid % 32] = src1[xid];
...
}
-`
-
+```
1. Use SLM to reduce the memory bandwidth requirement if possible.