Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
http://lists.cairographics.org/archives/cairo/2010-August/020655.html
|
|
Patch from
http://lists.cairographics.org/archives/cairo/2010-August/020490.html
|
|
This avoids an extra indirection and may later on let the functions
cache some more information between calls.
Also, this lets us get rid of get_scanline_32/64 in pixman_image_t.
|
|
Now that we have the full image flags available, the SSE2 and MMX
iterators can simply check against SAMPLES_COVER_CLIP_NEAREST (which
is computed in pixman_image_composite32()) instead of comparing all
the x/y/width/height parameters.
|
|
When pixman_image_composite32() is called some flags are computed that
indicate various things about the composite operation that can't be
deduced from the image flags themselves. These additional flags are
not currently available to iterators. All they can do is read the
image flags in image->common.flags.
Fix that by passing the info->{src, mask, dest}_flags on to the
iterator initialization and store the flags in the iter struct as
"image_flags". At the same time rename the *iterator* flags variable
to "iter_flags" to avoid confusion.
|
|
Fixes spurious test failures on x86-32.
|
|
Loongson:
over_reverse_n_8888 = L1: 16.04 L2: 15.35 M: 10.20 ( 27.96%) HT: 10.95 VT: 10.45 R: 9.18 RT: 6.99 ( 76Kops/s)
over_reverse_n_8888 = L1: 27.40 L2: 26.67 M: 16.97 ( 45.78%) HT: 16.66 VT: 15.38 R: 14.15 RT: 9.44 ( 97Kops/s)
image poppler 34.106 35.500 1.48% 6/6
image poppler 29.598 30.835 1.70% 6/6
ARM/iwMMXt:
over_reverse_n_8888 = L1: 15.63 L2: 14.33 M: 10.83 ( 27.55%) HT: 9.78 VT: 9.91 R: 9.49 RT: 6.96 ( 69Kops/s)
over_reverse_n_8888 = L1: 22.79 L2: 19.40 M: 13.76 ( 34.19%) HT: 11.66 VT: 11.86 R: 11.17 RT: 7.85 ( 75Kops/s)
image poppler 38.040 38.606 1.10% 6/6
image poppler 31.686 32.278 0.80% 5/6
|
|
Loongson:
add_0565_0565 = L1: 15.37 L2: 14.91 M: 11.83 ( 16.06%) HT: 10.53 VT: 10.15 R: 9.74 RT: 6.19 ( 68Kops/s)
add_0565_0565 = L1: 45.06 L2: 46.71 M: 27.45 ( 38.00%) HT: 23.76 VT: 22.84 R: 18.96 RT: 9.79 ( 104Kops/s)
ARM/iwMMXt:
add_0565_0565 = L1: 12.87 L2: 11.58 M: 10.11 ( 12.50%) HT: 9.06 VT: 8.66 R: 7.70 RT: 5.62 ( 58Kops/s)
add_0565_0565 = L1: 31.14 L2: 28.87 M: 22.46 ( 28.60%) HT: 18.61 VT: 17.04 R: 15.21 RT: 9.35 ( 90Kops/s)
|
|
I'll need this code for header and tail alignment loops in MMX, so I
might as well implement a fast path here.
|
|
Loongson:
over_n_0565 = L1: 38.57 L2: 38.88 M: 30.01 ( 20.97%) HT: 23.60 VT: 23.88 R: 21.95 RT: 11.65 ( 113Kops/s)
over_n_0565 = L1: 56.28 L2: 55.90 M: 34.20 ( 23.82%) HT: 25.66 VT: 26.60 R: 23.78 RT: 11.80 ( 115Kops/s)
over_8888_0565 = L1: 35.89 L2: 36.11 M: 21.56 ( 45.47%) HT: 18.33 VT: 17.90 R: 16.27 RT: 9.07 ( 98Kops/s)
over_8888_0565 = L1: 40.91 L2: 41.06 M: 23.13 ( 48.46%) HT: 19.24 VT: 18.71 R: 16.82 RT: 9.18 ( 99Kops/s)
over_n_8_0565 = L1: 28.92 L2: 29.12 M: 21.42 ( 30.00%) HT: 18.37 VT: 17.75 R: 16.15 RT: 8.79 ( 91Kops/s)
over_n_8_0565 = L1: 32.32 L2: 32.13 M: 22.44 ( 31.27%) HT: 19.15 VT: 18.66 R: 16.62 RT: 8.86 ( 92Kops/s)
over_n_8888_0565_ca = L1: 29.33 L2: 29.22 M: 18.99 ( 66.69%) HT: 16.69 VT: 16.22 R: 14.63 RT: 8.42 ( 88Kops/s)
over_n_8888_0565_ca = L1: 34.97 L2: 34.14 M: 20.32 ( 71.73%) HT: 17.67 VT: 17.19 R: 15.23 RT: 8.50 ( 89Kops/s)
ARM/iwMMXt:
over_n_0565 = L1: 29.70 L2: 30.53 M: 24.47 ( 14.84%) HT: 22.28 VT: 21.72 R: 21.13 RT: 12.58 ( 105Kops/s)
over_n_0565 = L1: 41.42 L2: 40.00 M: 30.95 ( 19.13%) HT: 27.06 VT: 27.28 R: 23.43 RT: 14.44 ( 114Kops/s)
over_8888_0565 = L1: 12.73 L2: 11.53 M: 9.07 ( 16.47%) HT: 9.00 VT: 9.25 R: 8.44 RT: 7.27 ( 76Kops/s)
over_8888_0565 = L1: 23.72 L2: 21.76 M: 15.89 ( 29.51%) HT: 14.36 VT: 14.05 R: 12.44 RT: 8.94 ( 86Kops/s)
over_n_8_0565 = L1: 6.80 L2: 7.15 M: 6.37 ( 7.90%) HT: 6.58 VT: 6.24 R: 6.49 RT: 5.94 ( 59Kops/s)
over_n_8_0565 = L1: 12.06 L2: 11.02 M: 10.16 ( 13.43%) HT: 9.57 VT: 8.49 R: 9.10 RT: 6.86 ( 69Kops/s)
over_n_8888_0565_ca = L1: 7.62 L2: 7.01 M: 6.27 ( 20.52%) HT: 6.00 VT: 6.07 R: 5.68 RT: 5.53 ( 57Kops/s)
over_n_8888_0565_ca = L1: 13.54 L2: 11.96 M: 9.76 ( 30.66%) HT: 9.72 VT: 8.45 R: 9.37 RT: 6.85 ( 67Kops/s)
|
|
Loongson:
add_0565_0565 = L1: 14.39 L2: 13.98 M: 11.28 ( 15.22%) HT: 10.11 VT: 9.74 R: 9.39 RT: 6.05 ( 67Kops/s)
add_0565_0565 = L1: 15.37 L2: 14.91 M: 11.83 ( 16.06%) HT: 10.53 VT: 10.15 R: 9.74 RT: 6.19 ( 68Kops/s)
ARM/iwMMXt:
add_0565_0565 = L1: 11.12 L2: 10.40 M: 8.82 ( 10.65%) HT: 7.98 VT: 7.41 R: 7.57 RT: 5.21 ( 54Kops/s)
add_0565_0565 = L1: 12.87 L2: 11.58 M: 10.11 ( 12.50%) HT: 9.06 VT: 8.66 R: 7.70 RT: 5.62 ( 58Kops/s)
|
|
|
|
|
|
Only up to three SSE intrinsics supported in function declaration.
|
|
There is a couple of places where the test suite uses the
pixman_image_fill_* functions to initialize images. These functions
can fail, and will do so if the "fast" implementation is disabled.
So to make sure the test suite passes even using
PIXMAN_DISABLE="fast", use pixman_image_composite32() with a solid
image instead of pixman_image_fill_*.
|
|
Performance numbers before/after on MIPS-74kc @ 1GHz
Referent (before):
cairo-perf-trace:
[ # ] backend test min(s) median(s) stddev. count
[ # ] image: pixman 0.25.3
[ 0] image firefox-fishtank 2289.180 2290.567 0.05% 5/6
Optimized:
cairo-perf-trace:
[ # ] backend test min(s) median(s) stddev. count
[ # ] image: pixman 0.25.3
[ 0] image firefox-fishtank 1700.925 1708.314 0.22% 5/6
|
|
In main loop (unrolled by factor 2), instead of negating multiplied
mask values by srca, values of srca was negated, and passed as alpha
argument for
UN8x4_MUL_UN8x4_ADD_UN8x4 macro.
Instead of:
ma = ~ma;
UN8x4_MUL_UN8x4_ADD_UN8x4 (d, ma, s);
Code was doing this:
ma = ~srca;
UN8x4_MUL_UN8x4_ADD_UN8x4 (d, ma, s);
Key is in substituting registers s0/s1 (containing srca value), with
t0/t1 containing mask values multiplied by srca. Register usage is
also improved (less registers are saved on stack, for
over_n_8888_8888_ca routine).
The bug was introduced in commit d2ee5631 and revealed by composite test.
|
|
Pointed out by Cyril Brulebois.
|
|
If not compiling with -march=iwmmxt, the configure test will still pass,
thinking that the __builtin_arm_* intrinsic is a function instead of
generating a single instruction. Since no linking is done, the configure
test doesn't catch this, and we get linking errors in the build.
|
|
|
|
Note that 0.25.4 was a botched release that doesn't have a tag and
doesn't correspond to any commit ID. It was however uploaded and
announced, so I'll just use the 0.25.6 version number.
|
|
To get 'make distcheck' to pass.
|
|
Make it match with the other fast paths, and the PIXMAN_DISABLE value is
already loongson-mmi.
|
|
When building out-of-tree, gcc wasn't able to find loongson-mmintrin.h
to compile the test program. Add -I$srcdir to CFLAGS to point gcc to it.
|
|
Performance numbers before/after on MIPS-74kc @ 1GHz
Referent (before):
lowlevel-blt-bench:
over_n_8_8888 = L1: 10.40 L2: 9.79 M: 8.47 ( 33.62%) HT: 7.64 VT: 7.59 R: 7.48 RT: 5.30 ( 40Kops/s)
over_n_8_0565 = L1: 7.40 L2: 7.23 M: 6.78 ( 17.94%) HT: 6.23 VT: 6.17 R: 6.14 RT: 4.62 ( 37Kops/s)
Optimized:
lowlevel-blt-bench:
over_n_8_8888 = L1: 27.25 L2: 26.24 M: 18.15 ( 72.12%) HT: 14.52 VT: 14.31 R: 13.83 RT: 7.57 ( 48Kops/s)
over_n_8_0565 = L1: 18.91 L2: 17.59 M: 15.06 ( 39.90%) HT: 12.18 VT: 11.98 R: 11.83 RT: 6.80 ( 46Kops/s)
|
|
The pack_4x565 makes use of the pack_4xpacked565 function which uses pmadd.
Some of the speed up is probably attributable to removing the artificial
serialization imposed by the
vdest = pack_565 (..., vdest, 0);
vdest = pack_565 (..., vdest, 1);
...
pattern.
Loongson:
over_n_0565 = L1: 16.44 L2: 16.42 M: 13.83 ( 9.85%) HT: 12.83 VT: 12.61 R: 12.34 RT: 8.90 ( 93Kops/s)
over_n_0565 = L1: 42.48 L2: 42.53 M: 29.83 ( 21.20%) HT: 23.39 VT: 23.72 R: 21.80 RT: 11.60 ( 113Kops/s)
over_8888_0565 = L1: 15.61 L2: 15.42 M: 12.11 ( 25.79%) HT: 11.07 VT: 10.70 R: 10.37 RT: 7.25 ( 82Kops/s)
over_8888_0565 = L1: 35.01 L2: 35.20 M: 21.42 ( 45.57%) HT: 18.12 VT: 17.61 R: 16.09 RT: 9.01 ( 97Kops/s)
over_n_8_0565 = L1: 15.17 L2: 14.94 M: 12.57 ( 17.86%) HT: 11.96 VT: 11.52 R: 10.79 RT: 7.31 ( 79Kops/s)
over_n_8_0565 = L1: 29.83 L2: 29.79 M: 21.85 ( 30.94%) HT: 18.82 VT: 18.25 R: 16.15 RT: 8.72 ( 91Kops/s)
over_n_8888_0565_ca = L1: 15.25 L2: 15.02 M: 11.64 ( 41.39%) HT: 11.08 VT: 10.72 R: 10.02 RT: 7.00 ( 77Kops/s)
over_n_8888_0565_ca = L1: 30.12 L2: 29.99 M: 19.47 ( 68.99%) HT: 17.05 VT: 16.55 R: 14.67 RT: 8.38 ( 88Kops/s)
ARM/iwMMXt:
over_n_0565 = L1: 19.29 L2: 19.88 M: 17.38 ( 10.54%) HT: 15.53 VT: 16.11 R: 13.69 RT: 11.00 ( 96Kops/s)
over_n_0565 = L1: 36.02 L2: 34.85 M: 28.04 ( 16.97%) HT: 22.12 VT: 24.21 R: 22.36 RT: 12.22 ( 103Kops/s)
over_8888_0565 = L1: 18.38 L2: 16.59 M: 12.34 ( 22.29%) HT: 11.67 VT: 11.71 R: 11.02 RT: 6.89 ( 72Kops/s)
over_8888_0565 = L1: 24.96 L2: 22.17 M: 15.11 ( 26.81%) HT: 14.14 VT: 13.71 R: 13.18 RT: 8.13 ( 78Kops/s)
over_n_8_0565 = L1: 14.65 L2: 12.44 M: 11.56 ( 14.50%) HT: 10.93 VT: 10.39 R: 10.06 RT: 7.05 ( 70Kops/s)
over_n_8_0565 = L1: 18.37 L2: 14.98 M: 13.97 ( 16.51%) HT: 12.67 VT: 10.35 R: 11.80 RT: 8.14 ( 74Kops/s)
over_n_8888_0565_ca = L1: 14.27 L2: 12.93 M: 10.52 ( 33.23%) HT: 9.70 VT: 9.90 R: 9.31 RT: 6.34 ( 65Kops/s)
over_n_8888_0565_ca = L1: 19.69 L2: 17.58 M: 13.40 ( 42.35%) HT: 11.75 VT: 11.33 R: 11.17 RT: 7.49 ( 73Kops/s)
|
|
Otherwise we'd have -march=loongson2f being overridden by automake's
CFLAGS ordering which causes build failures when -march=<not loongson2f>
is specified by the user.
|
|
https://bugs.freedesktop.org/show_bug.cgi?id=46905
|
|
|
|
iwMMXt which has an unsigned saturation pack instruction, while MMX/EXT
and Loongson don't.
ARM/iwMMXt:
src_8888_0565 = L1: 110.38 L2: 82.33 M: 40.92 ( 73.22%) HT: 35.63 VT: 32.22 R: 30.07 RT: 18.40 ( 132Kops/s)
src_8888_0565 = L1: 117.91 L2: 83.05 M: 41.52 ( 75.58%) HT: 37.63 VT: 35.40 R: 29.37 RT: 19.39 ( 134Kops/s)
|
|
Uses the pmadd technique described in
http://software.intel.com/sites/landingpage/legacy/mmx/MMX_App_24-16_Bit_Conversion.pdf
The technique uses the packssdw instruction which uses signed
saturatation. This works in their example because they pack 888 to 555
leaving the high bit as zero. For packing to 565, it is unsuitable, so
we replace it with an or+shuffle.
Loongson:
src_8888_0565 = L1: 106.13 L2: 83.57 M: 33.46 ( 68.90%) HT: 30.29 VT: 27.67 R: 26.11 RT: 15.06 ( 135Kops/s)
src_8888_0565 = L1: 122.10 L2: 117.53 M: 37.97 ( 78.58%) HT: 33.14 VT: 30.09 R: 29.01 RT: 15.76 ( 139Kops/s)
ARM/iwMMXt:
src_8888_0565 = L1: 67.88 L2: 56.61 M: 31.20 ( 56.74%) HT: 29.22 VT: 27.01 R: 25.39 RT: 19.29 ( 130Kops/s)
src_8888_0565 = L1: 110.38 L2: 82.33 M: 40.92 ( 73.22%) HT: 35.63 VT: 32.22 R: 30.07 RT: 18.40 ( 132Kops/s)
|
|
Loongson:
add_x888_x888 = L1: 29.36 L2: 27.81 M: 14.05 ( 38.74%) HT: 12.45 VT: 11.78 R: 11.52 RT: 7.23 ( 75Kops/s)
add_x888_x888 = L1: 36.06 L2: 34.55 M: 14.81 ( 41.03%) HT: 14.01 VT: 13.41 R: 13.06 RT: 9.06 ( 90Kops/s)
src_x888_8_x888 = L1: 21.92 L2: 20.15 M: 13.35 ( 41.42%) HT: 11.70 VT: 10.95 R: 10.53 RT: 6.18 ( 65Kops/s)
src_x888_8_x888 = L1: 25.43 L2: 23.51 M: 14.12 ( 44.00%) HT: 13.14 VT: 12.50 R: 11.86 RT: 7.49 ( 76Kops/s)
over_x888_8_0565 = L1: 10.64 L2: 10.17 M: 7.74 ( 21.35%) HT: 6.83 VT: 6.55 R: 6.34 RT: 4.03 ( 46Kops/s)
over_x888_8_0565 = L1: 11.41 L2: 10.97 M: 8.07 ( 22.36%) HT: 7.42 VT: 7.18 R: 6.92 RT: 4.62 ( 52Kops/s)
ARM/iwMMXt:
add_x888_x888 = L1: 22.10 L2: 18.93 M: 13.48 ( 32.29%) HT: 11.32 VT: 10.64 R: 10.36 RT: 6.51 ( 61Kops/s)
add_x888_x888 = L1: 24.26 L2: 20.83 M: 14.52 ( 35.64%) HT: 12.66 VT: 12.98 R: 11.34 RT: 7.69 ( 72Kops/s)
src_x888_8_x888 = L1: 19.33 L2: 17.66 M: 14.26 ( 38.43%) HT: 11.53 VT: 10.83 R: 10.57 RT: 6.12 ( 58Kops/s)
src_x888_8_x888 = L1: 21.23 L2: 19.60 M: 15.41 ( 42.55%) HT: 12.66 VT: 13.30 R: 11.55 RT: 7.32 ( 67Kops/s)
over_x888_8_0565 = L1: 8.15 L2: 7.56 M: 6.50 ( 15.58%) HT: 5.73 VT: 5.49 R: 5.50 RT: 3.53 ( 38Kops/s)
over_x888_8_0565 = L1: 8.35 L2: 7.85 M: 6.68 ( 16.40%) HT: 6.12 VT: 5.97 R: 5.78 RT: 4.03 ( 43Kops/s)
|
|
oprofile of xfce4-terminal-a1
210535 9.0407 libpixman-1.so.0.25.3 fetch_scanline_a8
144802 6.0054 libpixman-1.so.0.25.3 mmx_fetch_a8
Loongson:
add_8_8_8 = L1: 17.98 L2: 17.28 M: 14.28 ( 19.79%) HT: 11.11 VT: 10.38 R: 9.97 RT: 5.14 ( 55Kops/s)
add_8_8_8 = L1: 20.44 L2: 19.65 M: 15.62 ( 21.53%) HT: 12.86 VT: 11.98 R: 11.32 RT: 6.13 ( 64Kops/s)
src_8888_8_0565 = L1: 19.97 L2: 18.59 M: 13.42 ( 32.55%) HT: 11.46 VT: 10.78 R: 10.33 RT: 5.87 ( 61Kops/s)
src_8888_8_0565 = L1: 21.16 L2: 19.68 M: 13.94 ( 33.64%) HT: 12.31 VT: 11.52 R: 11.02 RT: 6.54 ( 68Kops/s)
src_x888_8_x888 = L1: 20.54 L2: 18.88 M: 13.07 ( 40.74%) HT: 11.05 VT: 10.36 R: 10.02 RT: 5.68 ( 60Kops/s)
src_x888_8_x888 = L1: 21.92 L2: 20.15 M: 13.35 ( 41.42%) HT: 11.70 VT: 10.95 R: 10.53 RT: 6.18 ( 65Kops/s)
over_x888_8_0565 = L1: 10.32 L2: 9.85 M: 7.63 ( 21.13%) HT: 6.56 VT: 6.30 R: 6.12 RT: 3.80 ( 43Kops/s)
over_x888_8_0565 = L1: 10.64 L2: 10.17 M: 7.74 ( 21.35%) HT: 6.83 VT: 6.55 R: 6.34 RT: 4.03 ( 46Kops/s)
ARM/iwMMXt:
add_8_8_8 = L1: 13.10 L2: 11.67 M: 10.74 ( 13.46%) HT: 8.62 VT: 8.15 R: 7.94 RT: 4.39 ( 44Kops/s)
add_8_8_8 = L1: 13.81 L2: 12.79 M: 11.63 ( 13.93%) HT: 9.33 VT: 9.20 R: 9.04 RT: 5.43 ( 52Kops/s)
src_8888_8_0565 = L1: 16.62 L2: 15.07 M: 12.52 ( 27.46%) HT: 10.07 VT: 10.17 R: 9.95 RT: 5.64 ( 54Kops/s)
src_8888_8_0565 = L1: 16.84 L2: 16.11 M: 13.22 ( 27.71%) HT: 11.74 VT: 10.90 R: 10.80 RT: 6.66 ( 62Kops/s)
src_x888_8_x888 = L1: 17.49 L2: 16.22 M: 13.73 ( 38.73%) HT: 10.10 VT: 10.33 R: 9.55 RT: 5.21 ( 52Kops/s)
src_x888_8_x888 = L1: 19.33 L2: 17.66 M: 14.26 ( 38.43%) HT: 11.53 VT: 10.83 R: 10.57 RT: 6.12 ( 58Kops/s)
over_x888_8_0565 = L1: 7.57 L2: 7.29 M: 6.37 ( 15.97%) HT: 5.53 VT: 5.33 R: 5.21 RT: 3.22 ( 35Kops/s)
over_x888_8_0565 = L1: 8.15 L2: 7.56 M: 6.50 ( 15.58%) HT: 5.73 VT: 5.49 R: 5.50 RT: 3.53 ( 38Kops/s)
|
|
Loongson:
add_0565_0565 = L1: 12.73 L2: 12.26 M: 10.05 ( 13.87%) HT: 8.77 VT: 8.50 R: 8.25 RT: 5.28 ( 58Kops/s)
add_0565_0565 = L1: 14.04 L2: 13.63 M: 10.96 ( 15.19%) HT: 9.73 VT: 9.43 R: 9.11 RT: 5.93 ( 64Kops/s)
ARM/iwMMXt:
add_0565_0565 = L1: 10.36 L2: 10.03 M: 9.04 ( 10.88%) HT: 3.11 VT: 7.16 R: 7.72 RT: 5.12 ( 51Kops/s)
add_0565_0565 = L1: 10.84 L2: 10.20 M: 9.15 ( 11.46%) HT: 7.60 VT: 7.82 R: 7.70 RT: 5.41 ( 53Kops/s)
|
|
Same story as pinsrh in the previous commit.
text data bss dec hex filename
25336 1952 0 27288 6a98 .libs/libpixman_loongson_mmi_la-pixman-mmx.o
25072 1952 0 27024 6990 .libs/libpixman_loongson_mmi_la-pixman-mmx.o
-dsll: 95
+dsll: 70
-dsrl: 135
+dsrl: 105
-ldc1: 462
+ldc1: 445
-lw: 721
+lw: 700
+pextrh: 30
|
|
The pinsrh instruction is analogous to MMX EXT's pinsrw, except like
other Loongson vector instructions it cannot access the general purpose
registers. In the cases of other Loongson vector instructions, this is a
headache, but it is actually a good thing here. Since the instruction is
different from MMX, I've named the intrinsic loongson_insert_pi16.
text data bss dec hex filename
25976 1952 0 27928 6d18 .libs/libpixman_loongson_mmi_la-pixman-mmx.o
25336 1952 0 27288 6a98 .libs/libpixman_loongson_mmi_la-pixman-mmx.o
-and: 181
+and: 147
-dsll: 143
+dsll: 95
-dsrl: 87
+dsrl: 135
-ldc1: 523
+ldc1: 462
-lw: 767
+lw: 721
+pinsrh: 35
|
|
The combine function was store8888'ing the result, and all consumers
were immediately load8888'ing it, causing lots of unnecessary pack and
unpack instructions.
It's a very straight forward conversion, except for mmx_combine_over_u
and mmx_combine_saturate_u. mmx_combine_over_u was testing the integer
result to skip pixels, so we use the is_* functions to test the __m64
data directly without loading it into an integer register.
For mmx_combine_saturate_u there's not a lot we can do, since it uses
DIV_UN8.
|
|
To be used by the next commit.
|
|
|
|
Loongson:
add_8888_8888 = L1: 68.73 L2: 55.09 M: 25.39 ( 68.18%) HT: 25.28 VT: 22.42 R: 20.74 RT: 13.26 ( 131Kops/s)
add_8888_8888 = L1: 159.19 L2: 114.10 M: 30.74 ( 77.91%) HT: 27.63 VT: 24.99 R: 24.61 RT: 14.49 ( 141Kops/s)
|