~sandmann/pixman - Unnamed repository; edit this file to name it for gitweb.

Age	Commit message (Collapse)	Author	Files	Lines
2012-06-04	upscalesupersampling	Søren Sandmann Pedersen	1	-1/+2

2012-06-04	Enable filtering	Søren Sandmann Pedersen	1	-2/+0

2012-06-04	upscale	Søren Sandmann Pedersen	1	-11/+24

2012-06-04	remove spew	Søren Sandmann Pedersen	1	-4/+0

2012-06-04	point sampling	Søren Sandmann Pedersen	1	-1/+2

2012-06-04	misc fixes	Søren Sandmann Pedersen	1	-8/+11

2012-06-04	upscale demo	Søren Sandmann Pedersen	2	-1/+167

2012-06-02	formatting	Søren Sandmann Pedersen	1	-17/+16

2012-06-02	Don't cache sample rate	Søren Sandmann Pedersen	3	-36/+29

2012-06-02	Port to iters	Søren Sandmann Pedersen	1	-15/+21

2012-06-02	Incorporate supersampling review comments	Adrian Johnson	5	-191/+94
	http://lists.cairographics.org/archives/cairo/2010-August/020655.html
2012-06-02	Implement supersampling for downscaling	Krzysztof Kosiński	4	-6/+434
	Patch from http://lists.cairographics.org/archives/cairo/2010-August/020490.html
2012-06-02	Turn all the fetchers into proper iter_get_scanlines_tbits-iter	Søren Sandmann Pedersen	2	-104/+114
	This avoids an extra indirection and may later on let the functions cache some more information between calls. Also, this lets us get rid of get_scanline_32/64 in pixman_image_t.
2012-05-30	Make use of image flags in mmx and sse2 iterators	Søren Sandmann Pedersen	2	-20/+8
	Now that we have the full image flags available, the SSE2 and MMX iterators can simply check against SAMPLES_COVER_CLIP_NEAREST (which is computed in pixman_image_composite32()) instead of comparing all the x/y/width/height parameters.
2012-05-30	Pass the full image flags to iterators	Søren Sandmann Pedersen	12	-33/+40
	When pixman_image_composite32() is called some flags are computed that indicate various things about the composite operation that can't be deduced from the image flags themselves. These additional flags are not currently available to iterators. All they can do is read the image flags in image->common.flags. Fix that by passing the info->{src, mask, dest}_flags on to the iterator initialization and store the flags in the iter struct as "image_flags". At the same time rename the iterator flags variable to "iter_flags" to avoid confusion.
2012-05-27	mmx: add missing _mm_empty calls	Matt Turner	1	-0/+5
	Fixes spurious test failures on x86-32.
2012-05-26	mmx: add over_reverse_n_8888	Matt Turner	2	-0/+73
	Loongson: over_reverse_n_8888 = L1: 16.04 L2: 15.35 M: 10.20 ( 27.96%) HT: 10.95 VT: 10.45 R: 9.18 RT: 6.99 ( 76Kops/s) over_reverse_n_8888 = L1: 27.40 L2: 26.67 M: 16.97 ( 45.78%) HT: 16.66 VT: 15.38 R: 14.15 RT: 9.44 ( 97Kops/s) image poppler 34.106 35.500 1.48% 6/6 image poppler 29.598 30.835 1.70% 6/6 ARM/iwMMXt: over_reverse_n_8888 = L1: 15.63 L2: 14.33 M: 10.83 ( 27.55%) HT: 9.78 VT: 9.91 R: 9.49 RT: 6.96 ( 69Kops/s) over_reverse_n_8888 = L1: 22.79 L2: 19.40 M: 13.76 ( 34.19%) HT: 11.66 VT: 11.86 R: 11.17 RT: 7.85 ( 75Kops/s) image poppler 38.040 38.606 1.10% 6/6 image poppler 31.686 32.278 0.80% 5/6
2012-05-26	mmx: add add_0565_0565	Matt Turner	1	-0/+86
	Loongson: add_0565_0565 = L1: 15.37 L2: 14.91 M: 11.83 ( 16.06%) HT: 10.53 VT: 10.15 R: 9.74 RT: 6.19 ( 68Kops/s) add_0565_0565 = L1: 45.06 L2: 46.71 M: 27.45 ( 38.00%) HT: 23.76 VT: 22.84 R: 18.96 RT: 9.79 ( 104Kops/s) ARM/iwMMXt: add_0565_0565 = L1: 12.87 L2: 11.58 M: 10.11 ( 12.50%) HT: 9.06 VT: 8.66 R: 7.70 RT: 5.62 ( 58Kops/s) add_0565_0565 = L1: 31.14 L2: 28.87 M: 22.46 ( 28.60%) HT: 18.61 VT: 17.04 R: 15.21 RT: 9.35 ( 90Kops/s)
2012-05-26	fast: add add_0565_0565 function	Matt Turner	1	-0/+44
	I'll need this code for header and tail alignment loops in MMX, so I might as well implement a fast path here.
2012-05-26	mmx: implement expand_4x565 in terms of expand_4xpacked565	Matt Turner	1	-27/+59
	Loongson: over_n_0565 = L1: 38.57 L2: 38.88 M: 30.01 ( 20.97%) HT: 23.60 VT: 23.88 R: 21.95 RT: 11.65 ( 113Kops/s) over_n_0565 = L1: 56.28 L2: 55.90 M: 34.20 ( 23.82%) HT: 25.66 VT: 26.60 R: 23.78 RT: 11.80 ( 115Kops/s) over_8888_0565 = L1: 35.89 L2: 36.11 M: 21.56 ( 45.47%) HT: 18.33 VT: 17.90 R: 16.27 RT: 9.07 ( 98Kops/s) over_8888_0565 = L1: 40.91 L2: 41.06 M: 23.13 ( 48.46%) HT: 19.24 VT: 18.71 R: 16.82 RT: 9.18 ( 99Kops/s) over_n_8_0565 = L1: 28.92 L2: 29.12 M: 21.42 ( 30.00%) HT: 18.37 VT: 17.75 R: 16.15 RT: 8.79 ( 91Kops/s) over_n_8_0565 = L1: 32.32 L2: 32.13 M: 22.44 ( 31.27%) HT: 19.15 VT: 18.66 R: 16.62 RT: 8.86 ( 92Kops/s) over_n_8888_0565_ca = L1: 29.33 L2: 29.22 M: 18.99 ( 66.69%) HT: 16.69 VT: 16.22 R: 14.63 RT: 8.42 ( 88Kops/s) over_n_8888_0565_ca = L1: 34.97 L2: 34.14 M: 20.32 ( 71.73%) HT: 17.67 VT: 17.19 R: 15.23 RT: 8.50 ( 89Kops/s) ARM/iwMMXt: over_n_0565 = L1: 29.70 L2: 30.53 M: 24.47 ( 14.84%) HT: 22.28 VT: 21.72 R: 21.13 RT: 12.58 ( 105Kops/s) over_n_0565 = L1: 41.42 L2: 40.00 M: 30.95 ( 19.13%) HT: 27.06 VT: 27.28 R: 23.43 RT: 14.44 ( 114Kops/s) over_8888_0565 = L1: 12.73 L2: 11.53 M: 9.07 ( 16.47%) HT: 9.00 VT: 9.25 R: 8.44 RT: 7.27 ( 76Kops/s) over_8888_0565 = L1: 23.72 L2: 21.76 M: 15.89 ( 29.51%) HT: 14.36 VT: 14.05 R: 12.44 RT: 8.94 ( 86Kops/s) over_n_8_0565 = L1: 6.80 L2: 7.15 M: 6.37 ( 7.90%) HT: 6.58 VT: 6.24 R: 6.49 RT: 5.94 ( 59Kops/s) over_n_8_0565 = L1: 12.06 L2: 11.02 M: 10.16 ( 13.43%) HT: 9.57 VT: 8.49 R: 9.10 RT: 6.86 ( 69Kops/s) over_n_8888_0565_ca = L1: 7.62 L2: 7.01 M: 6.27 ( 20.52%) HT: 6.00 VT: 6.07 R: 5.68 RT: 5.53 ( 57Kops/s) over_n_8888_0565_ca = L1: 13.54 L2: 11.96 M: 9.76 ( 30.66%) HT: 9.72 VT: 8.45 R: 9.37 RT: 6.85 ( 67Kops/s)
2012-05-26	mmx: add and use expand_4xpacked565 function	Matt Turner	2	-6/+59
	Loongson: add_0565_0565 = L1: 14.39 L2: 13.98 M: 11.28 ( 15.22%) HT: 10.11 VT: 9.74 R: 9.39 RT: 6.05 ( 67Kops/s) add_0565_0565 = L1: 15.37 L2: 14.91 M: 11.83 ( 16.06%) HT: 10.53 VT: 10.15 R: 9.74 RT: 6.19 ( 68Kops/s) ARM/iwMMXt: add_0565_0565 = L1: 11.12 L2: 10.40 M: 8.82 ( 10.65%) HT: 7.98 VT: 7.41 R: 7.57 RT: 5.21 ( 54Kops/s) add_0565_0565 = L1: 12.87 L2: 11.58 M: 10.11 ( 12.50%) HT: 9.06 VT: 8.66 R: 7.70 RT: 5.62 ( 58Kops/s)
2012-05-26	Post-release version bump to 0.27.1	Søren Sandmann Pedersen	1	-2/+2

2012-05-26	Pre-release version bump to 0.26.0	Søren Sandmann Pedersen	1	-2/+2

2012-05-25	Fix MSVC compilation	Ingmar Runge	1	-2/+7
	Only up to three SSE intrinsics supported in function declaration.
2012-05-24	test: Composite with solid images instead of using pixman_image_fill_*	Søren Sandmann Pedersen	2	-11/+15
	There is a couple of places where the test suite uses the pixman_image_fill_* functions to initialize images. These functions can fail, and will do so if the "fast" implementation is disabled. So to make sure the test suite passes even using PIXMAN_DISABLE="fast", use pixman_image_composite32() with a solid image instead of pixman_image_fill_*.
2012-05-23	MIPS: DSPr2: Added bilinear over_8888_8_8888 fast path.	Nemanja Lukic	4	-0/+177
	Performance numbers before/after on MIPS-74kc @ 1GHz Referent (before): cairo-perf-trace: [ # ] backend test min(s) median(s) stddev. count [ # ] image: pixman 0.25.3 [ 0] image firefox-fishtank 2289.180 2290.567 0.05% 5/6 Optimized: cairo-perf-trace: [ # ] backend test min(s) median(s) stddev. count [ # ] image: pixman 0.25.3 [ 0] image firefox-fishtank 1700.925 1708.314 0.22% 5/6
2012-05-23	MIPS: DSPr2: Fix bug in over_n_8888_8888_ca/over_n_8888_0565_ca routines	Nemanja Lukic	1	-32/+28
	In main loop (unrolled by factor 2), instead of negating multiplied mask values by srca, values of srca was negated, and passed as alpha argument for UN8x4_MUL_UN8x4_ADD_UN8x4 macro. Instead of: ma = ~ma; UN8x4_MUL_UN8x4_ADD_UN8x4 (d, ma, s); Code was doing this: ma = ~srca; UN8x4_MUL_UN8x4_ADD_UN8x4 (d, ma, s); Key is in substituting registers s0/s1 (containing srca value), with t0/t1 containing mask values multiplied by srca. Register usage is also improved (less registers are saved on stack, for over_n_8888_8888_ca routine). The bug was introduced in commit d2ee5631 and revealed by composite test.
2012-05-20	demos: Add parrot.jpg to EXTRA_DIST	Søren Sandmann Pedersen	1	-1/+1
	Pointed out by Cyril Brulebois.
2012-05-15	configure.ac: Fail the ARM/iwMMXt test if not compiling with -march=iwmmxt	Matt Turner	1	-0/+3
	If not compiling with -march=iwmmxt, the configure test will still pass, thinking that the __builtin_arm_* intrinsic is a function instead of generating a single instruction. Since no linking is done, the configure test doesn't catch this, and we get linking errors in the build.
2012-05-15	Post-release version bump to 0.25.7	Søren Sandmann Pedersen	1	-1/+1

2012-05-15	Pre-release version bump to 0.25.6	Søren Sandmann Pedersen	1	-1/+1
	Note that 0.25.4 was a botched release that doesn't have a tag and doesn't correspond to any commit ID. It was however uploaded and announced, so I'll just use the 0.25.6 version number.
2012-05-15	demos/Makefile.am: Add parrot.c to EXTRA_DIST	Søren Sandmann Pedersen	1	-0/+2
	To get 'make distcheck' to pass.
2012-05-11	configure.ac: Rename loongson -> loongson-mmi	Matt Turner	1	-7/+7
	Make it match with the other fast paths, and the PIXMAN_DISABLE value is already loongson-mmi.
2012-05-11	configure.ac: Fix loongson-mmi out-of-tree builds	Matt Turner	1	-1/+1
	When building out-of-tree, gcc wasn't able to find loongson-mmintrin.h to compile the test program. Add -I$srcdir to CFLAGS to point gcc to it.
2012-05-11	MIPS: DSPr2: Added over_n_8_8888 and over_n_8_0565 fast paths.	Nemanja Lukic	3	-0/+301
	Performance numbers before/after on MIPS-74kc @ 1GHz Referent (before): lowlevel-blt-bench: over_n_8_8888 = L1: 10.40 L2: 9.79 M: 8.47 ( 33.62%) HT: 7.64 VT: 7.59 R: 7.48 RT: 5.30 ( 40Kops/s) over_n_8_0565 = L1: 7.40 L2: 7.23 M: 6.78 ( 17.94%) HT: 6.23 VT: 6.17 R: 6.14 RT: 4.62 ( 37Kops/s) Optimized: lowlevel-blt-bench: over_n_8_8888 = L1: 27.25 L2: 26.24 M: 18.15 ( 72.12%) HT: 14.52 VT: 14.31 R: 13.83 RT: 7.57 ( 48Kops/s) over_n_8_0565 = L1: 18.91 L2: 17.59 M: 15.06 ( 39.90%) HT: 12.18 VT: 11.98 R: 11.83 RT: 6.80 ( 46Kops/s)
2012-05-10	mmx: add and use pack_4x565 function	Matt Turner	1	-55/+52
	The pack_4x565 makes use of the pack_4xpacked565 function which uses pmadd. Some of the speed up is probably attributable to removing the artificial serialization imposed by the vdest = pack_565 (..., vdest, 0); vdest = pack_565 (..., vdest, 1); ... pattern. Loongson: over_n_0565 = L1: 16.44 L2: 16.42 M: 13.83 ( 9.85%) HT: 12.83 VT: 12.61 R: 12.34 RT: 8.90 ( 93Kops/s) over_n_0565 = L1: 42.48 L2: 42.53 M: 29.83 ( 21.20%) HT: 23.39 VT: 23.72 R: 21.80 RT: 11.60 ( 113Kops/s) over_8888_0565 = L1: 15.61 L2: 15.42 M: 12.11 ( 25.79%) HT: 11.07 VT: 10.70 R: 10.37 RT: 7.25 ( 82Kops/s) over_8888_0565 = L1: 35.01 L2: 35.20 M: 21.42 ( 45.57%) HT: 18.12 VT: 17.61 R: 16.09 RT: 9.01 ( 97Kops/s) over_n_8_0565 = L1: 15.17 L2: 14.94 M: 12.57 ( 17.86%) HT: 11.96 VT: 11.52 R: 10.79 RT: 7.31 ( 79Kops/s) over_n_8_0565 = L1: 29.83 L2: 29.79 M: 21.85 ( 30.94%) HT: 18.82 VT: 18.25 R: 16.15 RT: 8.72 ( 91Kops/s) over_n_8888_0565_ca = L1: 15.25 L2: 15.02 M: 11.64 ( 41.39%) HT: 11.08 VT: 10.72 R: 10.02 RT: 7.00 ( 77Kops/s) over_n_8888_0565_ca = L1: 30.12 L2: 29.99 M: 19.47 ( 68.99%) HT: 17.05 VT: 16.55 R: 14.67 RT: 8.38 ( 88Kops/s) ARM/iwMMXt: over_n_0565 = L1: 19.29 L2: 19.88 M: 17.38 ( 10.54%) HT: 15.53 VT: 16.11 R: 13.69 RT: 11.00 ( 96Kops/s) over_n_0565 = L1: 36.02 L2: 34.85 M: 28.04 ( 16.97%) HT: 22.12 VT: 24.21 R: 22.36 RT: 12.22 ( 103Kops/s) over_8888_0565 = L1: 18.38 L2: 16.59 M: 12.34 ( 22.29%) HT: 11.67 VT: 11.71 R: 11.02 RT: 6.89 ( 72Kops/s) over_8888_0565 = L1: 24.96 L2: 22.17 M: 15.11 ( 26.81%) HT: 14.14 VT: 13.71 R: 13.18 RT: 8.13 ( 78Kops/s) over_n_8_0565 = L1: 14.65 L2: 12.44 M: 11.56 ( 14.50%) HT: 10.93 VT: 10.39 R: 10.06 RT: 7.05 ( 70Kops/s) over_n_8_0565 = L1: 18.37 L2: 14.98 M: 13.97 ( 16.51%) HT: 12.67 VT: 10.35 R: 11.80 RT: 8.14 ( 74Kops/s) over_n_8888_0565_ca = L1: 14.27 L2: 12.93 M: 10.52 ( 33.23%) HT: 9.70 VT: 9.90 R: 9.31 RT: 6.34 ( 65Kops/s) over_n_8888_0565_ca = L1: 19.69 L2: 17.58 M: 13.40 ( 42.35%) HT: 11.75 VT: 11.33 R: 11.17 RT: 7.49 ( 73Kops/s)
2012-05-10	configure.ac: make -march=loongson2f come before CFLAGS	Matt Turner	1	-1/+1
	Otherwise we'd have -march=loongson2f being overridden by automake's CFLAGS ordering which causes build failures when -march=<not loongson2f> is specified by the user.
2012-05-10	Add Makefile.win32 and Makefile.win32.common to EXTRA_DIST	Søren Sandmann Pedersen	1	-0/+4
	https://bugs.freedesktop.org/show_bug.cgi?id=46905
2012-05-09	.gitignore: add demos/checkerboard and demos/quad2quad	Matt Turner	1	-0/+2

2012-04-27	mmx: Use wpackhus in src_x888_0565 on iwMMXt	Matt Turner	1	-1/+5
	iwMMXt which has an unsigned saturation pack instruction, while MMX/EXT and Loongson don't. ARM/iwMMXt: src_8888_0565 = L1: 110.38 L2: 82.33 M: 40.92 ( 73.22%) HT: 35.63 VT: 32.22 R: 30.07 RT: 18.40 ( 132Kops/s) src_8888_0565 = L1: 117.91 L2: 83.05 M: 41.52 ( 75.58%) HT: 37.63 VT: 35.40 R: 29.37 RT: 19.39 ( 134Kops/s)
2012-04-27	mmx: add src_8888_0565	Matt Turner	2	-0/+96
	Uses the pmadd technique described in http://software.intel.com/sites/landingpage/legacy/mmx/MMX_App_24-16_Bit_Conversion.pdf The technique uses the packssdw instruction which uses signed saturatation. This works in their example because they pack 888 to 555 leaving the high bit as zero. For packing to 565, it is unsuitable, so we replace it with an or+shuffle. Loongson: src_8888_0565 = L1: 106.13 L2: 83.57 M: 33.46 ( 68.90%) HT: 30.29 VT: 27.67 R: 26.11 RT: 15.06 ( 135Kops/s) src_8888_0565 = L1: 122.10 L2: 117.53 M: 37.97 ( 78.58%) HT: 33.14 VT: 30.09 R: 29.01 RT: 15.76 ( 139Kops/s) ARM/iwMMXt: src_8888_0565 = L1: 67.88 L2: 56.61 M: 31.20 ( 56.74%) HT: 29.22 VT: 27.01 R: 25.39 RT: 19.29 ( 130Kops/s) src_8888_0565 = L1: 110.38 L2: 82.33 M: 40.92 ( 73.22%) HT: 35.63 VT: 32.22 R: 30.07 RT: 18.40 ( 132Kops/s)
2012-04-27	mmx: add x8f8g8b8 fetcher	Matt Turner	1	-0/+42
	Loongson: add_x888_x888 = L1: 29.36 L2: 27.81 M: 14.05 ( 38.74%) HT: 12.45 VT: 11.78 R: 11.52 RT: 7.23 ( 75Kops/s) add_x888_x888 = L1: 36.06 L2: 34.55 M: 14.81 ( 41.03%) HT: 14.01 VT: 13.41 R: 13.06 RT: 9.06 ( 90Kops/s) src_x888_8_x888 = L1: 21.92 L2: 20.15 M: 13.35 ( 41.42%) HT: 11.70 VT: 10.95 R: 10.53 RT: 6.18 ( 65Kops/s) src_x888_8_x888 = L1: 25.43 L2: 23.51 M: 14.12 ( 44.00%) HT: 13.14 VT: 12.50 R: 11.86 RT: 7.49 ( 76Kops/s) over_x888_8_0565 = L1: 10.64 L2: 10.17 M: 7.74 ( 21.35%) HT: 6.83 VT: 6.55 R: 6.34 RT: 4.03 ( 46Kops/s) over_x888_8_0565 = L1: 11.41 L2: 10.97 M: 8.07 ( 22.36%) HT: 7.42 VT: 7.18 R: 6.92 RT: 4.62 ( 52Kops/s) ARM/iwMMXt: add_x888_x888 = L1: 22.10 L2: 18.93 M: 13.48 ( 32.29%) HT: 11.32 VT: 10.64 R: 10.36 RT: 6.51 ( 61Kops/s) add_x888_x888 = L1: 24.26 L2: 20.83 M: 14.52 ( 35.64%) HT: 12.66 VT: 12.98 R: 11.34 RT: 7.69 ( 72Kops/s) src_x888_8_x888 = L1: 19.33 L2: 17.66 M: 14.26 ( 38.43%) HT: 11.53 VT: 10.83 R: 10.57 RT: 6.12 ( 58Kops/s) src_x888_8_x888 = L1: 21.23 L2: 19.60 M: 15.41 ( 42.55%) HT: 12.66 VT: 13.30 R: 11.55 RT: 7.32 ( 67Kops/s) over_x888_8_0565 = L1: 8.15 L2: 7.56 M: 6.50 ( 15.58%) HT: 5.73 VT: 5.49 R: 5.50 RT: 3.53 ( 38Kops/s) over_x888_8_0565 = L1: 8.35 L2: 7.85 M: 6.68 ( 16.40%) HT: 6.12 VT: 5.97 R: 5.78 RT: 4.03 ( 43Kops/s)
2012-04-27	mmx: add a8 fetcher	Matt Turner	2	-0/+68
	oprofile of xfce4-terminal-a1 210535 9.0407 libpixman-1.so.0.25.3 fetch_scanline_a8 144802 6.0054 libpixman-1.so.0.25.3 mmx_fetch_a8 Loongson: add_8_8_8 = L1: 17.98 L2: 17.28 M: 14.28 ( 19.79%) HT: 11.11 VT: 10.38 R: 9.97 RT: 5.14 ( 55Kops/s) add_8_8_8 = L1: 20.44 L2: 19.65 M: 15.62 ( 21.53%) HT: 12.86 VT: 11.98 R: 11.32 RT: 6.13 ( 64Kops/s) src_8888_8_0565 = L1: 19.97 L2: 18.59 M: 13.42 ( 32.55%) HT: 11.46 VT: 10.78 R: 10.33 RT: 5.87 ( 61Kops/s) src_8888_8_0565 = L1: 21.16 L2: 19.68 M: 13.94 ( 33.64%) HT: 12.31 VT: 11.52 R: 11.02 RT: 6.54 ( 68Kops/s) src_x888_8_x888 = L1: 20.54 L2: 18.88 M: 13.07 ( 40.74%) HT: 11.05 VT: 10.36 R: 10.02 RT: 5.68 ( 60Kops/s) src_x888_8_x888 = L1: 21.92 L2: 20.15 M: 13.35 ( 41.42%) HT: 11.70 VT: 10.95 R: 10.53 RT: 6.18 ( 65Kops/s) over_x888_8_0565 = L1: 10.32 L2: 9.85 M: 7.63 ( 21.13%) HT: 6.56 VT: 6.30 R: 6.12 RT: 3.80 ( 43Kops/s) over_x888_8_0565 = L1: 10.64 L2: 10.17 M: 7.74 ( 21.35%) HT: 6.83 VT: 6.55 R: 6.34 RT: 4.03 ( 46Kops/s) ARM/iwMMXt: add_8_8_8 = L1: 13.10 L2: 11.67 M: 10.74 ( 13.46%) HT: 8.62 VT: 8.15 R: 7.94 RT: 4.39 ( 44Kops/s) add_8_8_8 = L1: 13.81 L2: 12.79 M: 11.63 ( 13.93%) HT: 9.33 VT: 9.20 R: 9.04 RT: 5.43 ( 52Kops/s) src_8888_8_0565 = L1: 16.62 L2: 15.07 M: 12.52 ( 27.46%) HT: 10.07 VT: 10.17 R: 9.95 RT: 5.64 ( 54Kops/s) src_8888_8_0565 = L1: 16.84 L2: 16.11 M: 13.22 ( 27.71%) HT: 11.74 VT: 10.90 R: 10.80 RT: 6.66 ( 62Kops/s) src_x888_8_x888 = L1: 17.49 L2: 16.22 M: 13.73 ( 38.73%) HT: 10.10 VT: 10.33 R: 9.55 RT: 5.21 ( 52Kops/s) src_x888_8_x888 = L1: 19.33 L2: 17.66 M: 14.26 ( 38.43%) HT: 11.53 VT: 10.83 R: 10.57 RT: 6.12 ( 58Kops/s) over_x888_8_0565 = L1: 7.57 L2: 7.29 M: 6.37 ( 15.97%) HT: 5.53 VT: 5.33 R: 5.21 RT: 3.22 ( 35Kops/s) over_x888_8_0565 = L1: 8.15 L2: 7.56 M: 6.50 ( 15.58%) HT: 5.73 VT: 5.49 R: 5.50 RT: 3.53 ( 38Kops/s)
2012-04-27	mmx: add r5g6b5 fetcher	Matt Turner	1	-0/+100
	Loongson: add_0565_0565 = L1: 12.73 L2: 12.26 M: 10.05 ( 13.87%) HT: 8.77 VT: 8.50 R: 8.25 RT: 5.28 ( 58Kops/s) add_0565_0565 = L1: 14.04 L2: 13.63 M: 10.96 ( 15.19%) HT: 9.73 VT: 9.43 R: 9.11 RT: 5.93 ( 64Kops/s) ARM/iwMMXt: add_0565_0565 = L1: 10.36 L2: 10.03 M: 9.04 ( 10.88%) HT: 3.11 VT: 7.16 R: 7.72 RT: 5.12 ( 51Kops/s) add_0565_0565 = L1: 10.84 L2: 10.20 M: 9.15 ( 11.46%) HT: 7.60 VT: 7.82 R: 7.70 RT: 5.41 ( 53Kops/s)
2012-04-27	mmx: Use Loongson pextrh instruction in expand565	Matt Turner	2	-0/+15
	Same story as pinsrh in the previous commit. text data bss dec hex filename 25336 1952 0 27288 6a98 .libs/libpixman_loongson_mmi_la-pixman-mmx.o 25072 1952 0 27024 6990 .libs/libpixman_loongson_mmi_la-pixman-mmx.o -dsll: 95 +dsll: 70 -dsrl: 135 +dsrl: 105 -ldc1: 462 +ldc1: 445 -lw: 721 +lw: 700 +pextrh: 30
2012-04-27	mmx: Use Loongson pinsrh instruction in pack_565	Matt Turner	2	-0/+25
	The pinsrh instruction is analogous to MMX EXT's pinsrw, except like other Loongson vector instructions it cannot access the general purpose registers. In the cases of other Loongson vector instructions, this is a headache, but it is actually a good thing here. Since the instruction is different from MMX, I've named the intrinsic loongson_insert_pi16. text data bss dec hex filename 25976 1952 0 27928 6d18 .libs/libpixman_loongson_mmi_la-pixman-mmx.o 25336 1952 0 27288 6a98 .libs/libpixman_loongson_mmi_la-pixman-mmx.o -and: 181 +and: 147 -dsll: 143 +dsll: 95 -dsrl: 87 +dsrl: 135 -ldc1: 523 +ldc1: 462 -lw: 767 +lw: 721 +pinsrh: 35
2012-04-27	mmx: don't pack and unpack src unnecessarily	Matt Turner	1	-47/+35
	The combine function was store8888'ing the result, and all consumers were immediately load8888'ing it, causing lots of unnecessary pack and unpack instructions. It's a very straight forward conversion, except for mmx_combine_over_u and mmx_combine_saturate_u. mmx_combine_over_u was testing the integer result to skip pixels, so we use the is_* functions to test the __m64 data directly without loading it into an integer register. For mmx_combine_saturate_u there's not a lot we can do, since it uses DIV_UN8.
2012-04-27	mmx: introduce is_equal, is_opaque, and is_zero functions	Matt Turner	1	-0/+41
	To be used by the next commit.
2012-04-27	mmx: simplify srcsrcsrcsrc calculation in over_n_8_0565	Matt Turner	1	-7/+3

2012-04-27	mmx: remove unnecessary uint64_t<->__m64 conversions	Matt Turner	1	-3/+1
	Loongson: add_8888_8888 = L1: 68.73 L2: 55.09 M: 25.39 ( 68.18%) HT: 25.28 VT: 22.42 R: 20.74 RT: 13.26 ( 131Kops/s) add_8888_8888 = L1: 159.19 L2: 114.10 M: 30.74 ( 77.91%) HT: 27.63 VT: 24.99 R: 24.61 RT: 14.49 ( 141Kops/s)