Age | Commit message (Collapse) | Author | Files | Lines |
|
Test the gradients with various transformations, and test cases where
the gradients are specified with two identical points.
|
|
This function enables floating point traps if possible.
|
|
Also move the ARRAY_LENGTH macro into utils.h so it can be used elsewhere.
|
|
Explain how errors are introduced in the computation performed for
radial gradients.
|
|
Change radial gradient computations and definition to reflect the
radial gradients in PDF specifications (see section 8.7.4.5.4,
Type 3 (Radial) Shadings of the PDF Reference Manual).
Instead of having a valid interpolation parameter value for every
point of the plane, define it only for points withing the area
covered by the family of circles generated by interpolating or
extrapolating the start and end circles.
Points outside this area are now transparent black (rgba 0 0 0 0).
Points within this area have the color assiciated with the maximum
value of the interpolation parameter in that point (if multiple
solutions exist within the range specified by the extend mode).
|
|
The images are being created with non-NULL data, so we have to free it
outselves. This is important because the Cygwin tinderbox is running
out of memory and produces this:
mmap failed on 20000 1507328
mmap failed on 40000 1507328
mmap failed on 20000 1507328
mmap failed on 40000 1507328
mmap failed on 40000 1507328
mmap failed on 40000 1507328
http://tinderbox.x.org/builds/2010-10-05-0014/logs/pixman/#check
|
|
We already exit early for DST, but for the HSL operators with
component alpha, we crash at the moment. Fix that by adding a dummy
combine_dst() function.
|
|
Specifically, add transparent black and superluminescent white with
alpha = 0.
|
|
Each test uses the test number as the random number seed; if it
didn't, all the threads would run the same tests since they would all
start from the same seed.
|
|
Previously this test would try to exhaustively test all combinations
of formats and operators, which meant that it would take hours to run.
Instead, generate images randomly and test compositing those.
Cc: chris@chris-wilson.co.uk
|
|
Previously, this function would evaluate the error under the
assumption that the format was 565 or wider. This patch changes it to
take the actual format into account.
With that fixed, we can turn on testing for the rest of the formats.
Cc: chris@chris-wilson.co.uk
|
|
This function was using the number of bits in a channel as if it were
a mask, which lead to many spurious errors. With that fixed, we can
turn on testing for all formats where all channels have 5 or more
bits.
Cc: chris@chris-wilson.co.uk
|
|
The first broken optimization is that it checks "a != 0x00" where it
should check "s != 0x00". The other is that it skips the computation
when alpha is 0xff. That is wrong because in the formula:
min (1, (1 - Aa)/Ab)
the render specification states that if Ab is 0, the quotient is
defined to positive infinity. That is the case even if (1 - Aa) is 0.
|
|
After fast path cache introduction, the overhead of having this fallback is
insignificant. On the other hand, some of the ARM assembly optimizations (for
example nearest neighbor scaling) do not need NEON.
|
|
Benchmark from Intel Core i7 860:
== before ==
op=1, src_fmt=10020565, dst_fmt=10020565, speed=1335.29 MPix/s
== after ==
op=1, src_fmt=10020565, dst_fmt=10020565, speed=1550.96 MPix/s
== performance of nonscaled src_0565_0565 operation as a reference ==
op=1, src_fmt=10020565, dst_fmt=10020565, speed=2401.31 MPix/s
Benchmark from ARM Cortex-A8:
== before ==
op=1, src_fmt=10020565, dst_fmt=10020565, speed=81.79 MPix/s
== after ==
op=1, src_fmt=10020565, dst_fmt=10020565, speed=89.55 MPix/s
== performance of nonscaled src_0565_0565 operation as a reference ==
op=1, src_fmt=10020565, dst_fmt=10020565, speed=197.44 MPix/s
|
|
== before ==
outrev_8_0565 = L1: 22.91 L2: 22.40 M: 18.75 ( 10.47%)
HT: 12.62 VT: 12.22 R: 11.32 RT: 5.30 ( 58Kops/s)
== after ==
outrev_8_0565 = L1: 176.27 L2: 151.70 M:108.79 ( 60.81%)
HT: 50.43 VT: 37.16 R: 32.26 RT: 9.62 ( 97Kops/s)
|
|
== before ==
add_0565_8_0565 = L1: 14.05 L2: 14.03 M: 11.57 ( 12.94%)
HT: 8.31 VT: 8.10 R: 7.47 RT: 3.64 ( 42Kops/s)
== after ==
add_0565_8_0565 = L1: 123.36 L2: 94.70 M: 74.36 ( 83.15%)
HT: 31.17 VT: 23.97 R: 21.06 RT: 6.42 ( 70Kops/s)
|
|
Prefetch provides up to 40-50% better performance when working
with large images and/or when having lots of L2 cache misses
on ARM Cortex-A8 @ 720MHz:
== before ==
over_n_8888 = L1: 225.83 L2: 181.02 M: 55.57 ( 41.41%)
HT: 38.96 VT: 36.92 R: 32.84 RT: 14.15 ( 123Kops/s)
over_n_0565 = L1: 153.91 L2: 149.69 M: 83.17 ( 30.95%)
HT: 50.41 VT: 49.15 R: 40.56 RT: 15.45 ( 131Kops/s)
== after ==
over_n_8888 = L1: 222.39 L2: 170.95 M: 76.86 ( 57.27%)
HT: 58.80 VT: 53.03 R: 45.51 RT: 14.13 ( 124Kops/s)
over_n_0565 = L1: 151.87 L2: 149.54 M:125.63 ( 46.80%)
HT: 67.85 VT: 57.54 R: 50.21 RT: 15.32 ( 130Kops/s)
|
|
These minor changes should fix a large number of
macro declaration - related "syntax error: empty declaration" warnings
which are seen while compiling the code with the Solaris Studio
compiler.
|
|
This was supposedly an optimization, but it has pathological cases
where it definitely isn't. For example a 1 x n image will cause it to
have terrible memory access patterns and to generate a ton of modulus
operations.
Since no one has ever measured whether it actually is an improvement,
and since it is doing the repeating at the wrong the stage in the
pipeline, and since with the previous commit it can't be triggered
anymore because we now require SAMPLES_COVER_CLIP for regular fast
paths, just delete it.
|
|
The standard fast paths deal with two kinds of images: solids and
bits. These two image types require different flags, but
PIXMAN_STD_FAST_PATH uses the same ones for both.
This patch makes it so that solid images just get the standard flags,
while bits images must be untransformed contain the destination clip
within the sample grid.
This means that the old FAST_PATH_COVERS_CLIP flag is now not used
anymore, so it can be deleted.
|
|
This patch removes an unnecessary typecast of MAP_FAILED,
replaces an erroneous free() by the correct munmap() in the
error path for a failing mprotect(), and, finally, removes
redundant calls to mprotect() that aren't necessary, because
munmap() doesn't call for any specific memory protection.
|
|
|
|
This inconsistent naming somehow survived the refactoring from a while
back.
|
|
The performance is decreased with cache prefetch, especially for
ATOM. So remove these code. Following is the experiment.
old: 0.19.5-with-cache-prefetch
new: 0.19.5-without-cache-prefetch
CPU: Intel Atom N270@1.6GHz
OS: MeeGo (32 bits)
Speedups
========
image-rgba poppler-0 17125.68 (17279.58 0.92%) -> 14765.36 (15926.49 3.54%): 1.16x speedup
image-rgba ocitysmap-0 9008.25 (9040.41 7.50%) -> 8277.94 (8343.09 5.44%): 1.09x speedup
image-rgba xfce4-terminal-a1-0 18020.76 (18230.68 0.97%) -> 16703.77 (16712.42 1.22%): 1.08x speedup
image-rgba gnome-terminal-vim-0 25081.38 (25133.38 0.24%) -> 23407.47 (23652.98 0.54%): 1.07x speedup
image-rgba firefox-talos-gfx-0 57916.97 (57973.20 0.11%) -> 54556.64 (54624.55 0.39%): 1.06x speedup
image-rgba firefox-planet-gnome-0 102377.47 (103496.63 0.70%) -> 96816.65 (97075.54 0.15%): 1.06x speedup
image-rgba swfdec-giant-steps-0 12376.24 (12616.84 1.02%) -> 11705.30 (11825.20 1.06%): 1.06x speedup
CPU: Intel Core(TM)2 Duo CPU T9600@2.80GHz
OS: Ubuntu 10.04 (64bits)
Speedups
========
image-rgba ocitysmap-0 2671.46 (2691.82 8.55%) -> 2296.20 (2307.26 5.77%): 1.16x speedup
image-rgba swfdec-giant-steps-0 1614.55 (1615.18 1.68%) -> 1532.84 (1538.52 0.72%): 1.05x speedup
Signed-off-by: Liu Xinyun <xinyun.liu@intel.com>
Signed-off-by: Chen Miaobo <miaobo.chen@intel.com>
|
|
Not all systems are regular Unices, so let's be careful with the
mmap()-related stuff, which might be unavailable. This patch makes
sure that mmap() and friends is used only when the <sys/mman.h>
header is found.
|
|
Revert this accidentally committed patch.
This reverts commit 19ea0e16b958e5abe491365c203293ab372f3586.
|
|
This hopefully fixes the build failure on OS X.
|
|
OK. here is the work to clear all cache prefetch. Please review it. 3x
On Tue, Sep 21, 2010 at 11:36:30PM +0800, Soeren Sandmann wrote:
> Liu Xinyun <xinyun.liu@intel.com> writes:
>
> > This patch is to add a new configuration option: enable-cache-prefetch,
> > which is default yes.
> >
> > Here is a link which talks on cache issue.
> > http://lists.freedesktop.org/archives/pixman/2010-June/000218.html
> >
> > When disable it on Atom CPU(configured with --enable-cache-prefetch=no),
> > it will have a little performance gain. Here is the patch.
>
> I think the cache prefetch code should just be deleted outright. No
> benchmarks that I'm aware of show it to be an improvement.
>
>
> Thanks,
> Soren
>From bca2192ef524bcae4eea84d0ffed9e8c4855675f Mon Sep 17 00:00:00 2001
From: Liu Xinyun <xinyun.liu@intel.com>
Date: Wed, 22 Sep 2010 00:11:56 +0800
Subject: [PATCH] remove cache prefetch
|
|
|
|
|
|
If the extents of the composite region are broken such that x2 <= x1
or y2 <= y1, then we need to zero the extents before returning so that
the region won't be completely broken when calling
pixman_region32_fini().
|
|
This test is a modified version of Siarhei's compositor throughput
benchmark. It's expanded with explicit reporting of memory bandwidth
consumption for the M-test, and with an additional 8x8-random test
intended to determine peak ops/sec capability. There are also quite a
lot more operations tested for.
|
|
This patch adds a noinline macro, which expands to compiler-dependent
keywords that tell the compiler to never inline a function.
|
|
Impending benchmark code will need a function to get current time
in seconds, and this patch introduces such routine. We try to use
the POSIX gettimeofday() function when available, and fall back to
clock() when not.
|
|
The aligned_malloc() routine will be used in more than one test utility.
At least, a low-level blitter benchmark needs it. Therefore, let's make
this function a part of common test utilities code.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|