Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
|
|
|
|
Allows us increasing the min freetype
|
|
|
|
oss-fuzz/57874
|
|
utf16ToUtf8 expects a null ended string
|
|
|
|
|
|
|
|
This way people building from master can adapt to the new API already
|
|
The condition u[i] < 0x7F was checked twice.
|
|
|
|
Issue reported and patch suggestion by Samad Koita and Aviral Agarwal
Fixes issue #1477
|
|
|
|
According to the postscript spec, only DSC Comments are allowed in the
header.
%%Creator is the header for the software used to generate the postscript
file, which is pdftops in this case, and not as such the generator for
the pdf file. I've chosen to, if available, keep the pdf creator as a
substring in the %%Creator field.
Originates in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1068307
|
|
|
|
... and rename it to prependUnicodeByteOrderMark.
Now all unicode code has moved from GooString.h to UTF.h.
|
|
... and rename it to hasUnicodeByteOrderMarkLE.
This allows to replace GooString by std::string in a few places.
(In a future commit)
|
|
... and rename it to hasUnicodeByteOrderMark.
This allows to replace GooString by std::string in a few places.
(In a future commit)
|
|
|
|
When centering vertically we calculate the y offset based on the height of the text and the annotation
When doing that we must ignore the border width, otherwise the text is offset downwards
|
|
The border reduces the available height, so take it into account for the height too, not only the width
|
|
TextOutputDev::getText expects rotated coordinates, e.g. the correct
bounds for an A4 Landscape page are {0, 0, 842, 595}.
|
|
Currently, the "Lanscape" with default page rectangle test fails, as the
page orientation is not taken into account.
(Seascape is also incorrect, but as the text lies inside the unrotated
A4 cropbox rectangle (bottom left), the text is extracted.)
|
|
The unit tests only covered extraction from the whole page, make sure
the various cases for smaller selections are also covered.
|
|
When 'CIDSystemInfo' dictionary is absent or
has invalid content, instead of aborting the font
because we cannot read the character collection,
let's assume in that case character collection
to be "Adobe-Identity".
Fixes #1465 - Does not show text of Apple-edited PDFs
|
|
|
|
Use std::string::clear instead. The only difference between the two
is that GooString::clear returns the empty string, whereas
std::string::clear does not. But apparently this feature of
GooString::clear was not used anywhere.
|
|
Starting with C++20, the std::string class has methods
starts_with and ends_with, which do the same thing.
Use those instead.
|
|
|
|
|
|
Deprecated `char_traits` template has been removed in LLVM 19
|
|
A custom target with ALL is always generated, even if the files/outputs
specified with DPENDS are not changed.
This can be solved by generating the POT files with a custom_command.
The target triggers evaluation of the custom_command, but the latter will
only be run if the dependencies have changed.
Fixes #1479
|
|
|
|
According to the specification, see NOTE 2 in
https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G7.3882161
it appears that the clipping path should be reset
when the restore (Q) operator is encountered.
Fixes #739
|
|
|
|
I want to use std::string::starts_with
|
|
|
|
|
|
|
|
|
|
The old algorithm restarts the inner loop for the RHS word from the
beginning on each match, i.e. the worst case complexity approaches
O(N^3), while O(N^2) is obviously sufficient for a pairwise compare of
all words. Fortunately, O(N^2) is hardly ever happening, as the inner N
is limited by a) the maxBaseIdx, b) removing duplicates from the set.
For some pathological cases this changes the runtime from minutes to
seconds.
See poppler#1173.
|
|
Currently, the word characters are allocated as a struct of arrays,
e.g. text and charcode are allocated separately.
This causes some space (6 pointers, 6 malloc chunk management
words (size_t/flags), alignment, ...) and runtime overhead (6 allocs/
frees per word).
Changing this to an array of struct reduces this overhead. It also allows
to be more conservative with allocations, as resizing is less costly, i.e.
starting with a single character allocation instead of 16. It is also more
efficient, as most accesses affect multiple or all attributes, i.e.
values in the same or neighboring CPU cache lines.
Using a std::vector instead of separate raw arrays also reduces code
and manual data management.
The "charPos end index" and trailing "edge" attributes are no
longer stored as an additional entry entry in the array, but as dedicated
data members, `charPosEnd` and `edgeEnd`.
The memory saving is most notably for short words, but even for words
with 16 characters there are small savings, and still less allocations
(1 + 4 allocations instead of 6. Growing is fairly cheap, as the CharInfo
struct is trivially copyable.)
See poppler#1173.
|
|
emplace_back"
Says modernize-use-emplace
No need to pass the c, we will set it later so we can just use the
default constructed CharCodeToUnicodeString
|
|
This commit fixes the "across lines" text
search feature of TextPage::findText() when
the match happens from the last line of a
paragraph to the first line of next paragraph.
Includes tests for this bug.
Fixes #1475
Fixes https://gitlab.gnome.org/GNOME/evince/-/issues/2001
|
|
Redo the fix for issue #157 which is about doing
transparent selection for glyphless documents (eg.
tesseract scanned documents) because it stopped
working after commit 29f32a47
|
|
|
|
|
|
This reverts commit 9c2cf5608a21b6fb9be4e0c7918d13cd2b652c23.
|