poppler/poppler - The poppler pdf rendering library (mirrored from https://gitlab.freedesktop.org/poppler/poppler)

Age	Commit message (Collapse)	Author	Files	Lines
12 days	Use std::span for some data, size combos	Sune Vuorela	1	-2/+2
	With c++20, we have std::span which is a nice wrapper around a pointer and a length. Use that rather than carry them around by themselves. We also have std::span created transparantly from vectors and stuff
2024-04-20	Update (C)	Albert Astals Cid	1	-1/+1

2024-04-20	cpp: Fix crash extracting text and font in some files	Albert Astals Cid	1	-1/+1
	Issue reported and patch suggestion by Samad Koita and Aviral Agarwal Fixes issue #1477
2024-03-31	Update (C)	Albert Astals Cid	1	-1/+2

2024-03-30	Reduce worst case algorithmic complexity of TextBlock::coalesce	Stefan Brüns	1	-49/+60
	The old algorithm restarts the inner loop for the RHS word from the beginning on each match, i.e. the worst case complexity approaches O(N^3), while O(N^2) is obviously sufficient for a pairwise compare of all words. Fortunately, O(N^2) is hardly ever happening, as the inner N is limited by a) the maxBaseIdx, b) removing duplicates from the set. For some pathological cases this changes the runtime from minutes to seconds. See poppler#1173.
2024-03-30	Reduce TextWord space and allocation overhead	Stefan Brüns	1	-204/+174
	Currently, the word characters are allocated as a struct of arrays, e.g. text and charcode are allocated separately. This causes some space (6 pointers, 6 malloc chunk management words (size_t/flags), alignment, ...) and runtime overhead (6 allocs/ frees per word). Changing this to an array of struct reduces this overhead. It also allows to be more conservative with allocations, as resizing is less costly, i.e. starting with a single character allocation instead of 16. It is also more efficient, as most accesses affect multiple or all attributes, i.e. values in the same or neighboring CPU cache lines. Using a std::vector instead of separate raw arrays also reduces code and manual data management. The "charPos end index" and trailing "edge" attributes are no longer stored as an additional entry entry in the array, but as dedicated data members, `charPosEnd` and `edgeEnd`. The memory saving is most notably for short words, but even for words with 16 characters there are small savings, and still less allocations (1 + 4 allocations instead of 6. Growing is fairly cheap, as the CharInfo struct is trivially copyable.) See poppler#1173.
2024-03-30	Fix text search across lines between paragraphs	Nelson Benítez León	1	-24/+36
	This commit fixes the "across lines" text search feature of TextPage::findText() when the match happens from the last line of a paragraph to the first line of next paragraph. Includes tests for this bug. Fixes #1475 Fixes https://gitlab.gnome.org/GNOME/evince/-/issues/2001
2024-03-30	Fix regression on issue #157	Nelson Benítez León	1	-12/+14
	Redo the fix for issue #157 which is about doing transparent selection for glyphless documents (eg. tesseract scanned documents) because it stopped working after commit 29f32a47
2024-02-01	Update (C)	Albert Astals Cid	1	-0/+1

2024-02-01	More unicode vectors; fewer raw pointers	Sune Vuorela	1	-6/+2

2024-01-23	Update (C)	Albert Astals Cid	1	-0/+1

2024-01-18	TextPage::takeText: reset actualText for the new page	Adam Sampson	1	-0/+2
	actualText has an internal pointer to the TextPage it's writing to, so if you called takeText and then continued to output more pages to the TextOutputDev, their text would be written to the page you'd taken rather than the new one.
2022-08-19	We can use isnan now	Albert Astals Cid	1	-2/+1

2022-05-13	TextPage::coalesce: Fix crash on broken files	Albert Astals Cid	1	-2/+3
	oss-fuzz/47350
2022-04-30	Update (C)	Albert Astals Cid	1	-1/+1

2022-04-26	fix multiline find_text() bug in two column docs	Nelson Benítez León	1	-0/+6
	Fix for a bug in double column documents where some single line matches are wrongly returned as being multiline matches. Includes test case for the bug.
2022-04-26	fix bug in multiline find_text()	Nelson Benítez León	1	-1/+2
	which caused some false positives being returned. Includes test case for the bug. See original comment about this bug: https://gitlab.gnome.org/GNOME/evince/-/merge_requests/159#note_1431380
2022-03-30	Change GfxFont name into an optional std::string	Albert Astals Cid	1	-1/+1

2022-03-11	Add readability-braces-around-statements	Albert Astals Cid	1	-101/+194

2022-03-10	Update (C) of previous commit	Albert Astals Cid	1	-1/+1

2022-03-09	Replace hand-coded reference counting in GfxFont by std::shared_ptr	Oliver Sander	1	-9/+2

2021-12-07	TextOutputDev: require more spacing between columns	Nelson Benítez León	1	-3/+11
	Require more spacing for adjacent text to be considered a separate column of text. We do that by increasing 'minColSpacing1' parameter, which marks the distance, within which, an adjacent word will be pulled to the current block. We provide a way to tweak the default value: double getMinColSpacing1(); void setMinColSpacing1(double val); Fixes issue #1093
2021-11-01	TextOutputDev improvements	Albert Astals Cid	1	-65/+22
	Vectors don't need to be a pointer and they can contain unique_ptr too Make pools be an array of unique_ptr too Makes for easier memory management
2021-10-30	Make makeWordList return a unique_ptr	Albert Astals Cid	1	-3/+3

2021-10-29	Port a few functions from GooString to std::string	Albert Astals Cid	1	-1/+1

2021-10-11	Update (C)	Albert Astals Cid	1	-1/+1

2021-10-11	TextOutputDev: Respect orientation when selecting words	Marek Kasik	1	-33/+142
	Take rotation of text lines into account when visiting selection. This works for text rotated by multiples of 90 degrees. Issue #499
2021-08-29	Update (C)	Albert Astals Cid	1	-0/+1

2021-08-27	Fix up setmode calls	Peter Williams	1	-4/+4
	To compile and work correctly on both Cygwin and MSVC, we should always call the function `_setmode` and check for either `_WIN32` or `__CYGWIN__` being defined. This fixes the MSVC build and corrects some behavior handling output to stdout on Cygwin.
2021-08-27	CI: Enable google-explicit-constructor	Albert Astals Cid	1	-2/+2
	I was doing some refactoring before and was hit by one of the constructors being magically called when i didn't want that. Since we don't really on it (was just used in some of the explicit type conversions) I think it makes sense to enable And 2 small qt6 clang-tidy fixes because we don't have qt6 on the clang-tidy CI yet There's 2 potentially source incompatible changes in the qt frontend, but i really really hope noone was using the constructors that way
2021-04-25	find, glib: Enhance find to support multi-line matching	Nelson Benítez León	1	-33/+149
	On the backend side, adds 3 new parameters to TextPage::findText(), one bool to enable the feature, one out PDFRectangle to store the part of the match that falls on the next line, and one out bool to inform whether hyphen was present and ignored at end of the previous match part. For the glib binding, this extends the public PopplerRectangle struct by new members to hold additional information about whether the rectangle belongs to a group of rectangles for the same match, and whether a hyphen was ignored at the end of the line. Since PopplerRectangle is public ABI, this is done by making the public PopplerRectangle API return the enlarged struct, and internally casting to the new struct when required, the new members are accessible only via accessor functions. For Qt5 Qt6 bindings, this commit only implements the new flag Poppler::Page::AcrossLines (but no new function and no new return data type) and if this flag is passed, the returned list of rectangles will also include rectangles for the second part of across-line matches. This minimum Qt bindings still allows for the creation of tests for this feature (using the Qt test framework) which this commit do includes. But a more complete binding (with a new return type that includes 'matchContinued' and 'ignoredHypen' boolean fields) is left to do for qt backend maintainers if they want to use this feature in eg. Okular. So, as mentioned, this commit incorporates tests for the implemented across-line matching feature, and the tests do also check for two included aspects of this feature, which are: - Ignoring hyphen character while matching when 1) it's the last character of the line and 2) its corresponding matching character in the search term is not an hyphen too. - Any whitespace characters in the search term will be allowed to match on the logic position where the lines split (i.e. what would normally be the newline character in a text file, but PDF text does not include newline characters between lines). Regarding the enhancement to findText() function which implements matching across lines, just two more notes: - It won't match on text spanning more than two lines, i.e. it only matches text spanning from end of one line to start of next line. - It does not supports finding backwards, if findText() receives both <backward> and <matchAcrossLines> parameters as true, it will ignore the <matchAcrossLines> parameter. Implementing <matchAcrossLines> with backwards direction is possible, but it will make an already complex function like findText() to be even more complex, for little gain as eg. Evince does not even use the <backward> parameter of findText(). Fixes poppler issues #744 and #755 Related Evince issue https://gitlab.gnome.org/GNOME/evince/issues/333
2021-04-07	TextOutputDev: Fix crash in malformed file	Albert Astals Cid	1	-1/+1
	oss-fuzz/32952
2021-03-12	TextSelectionDumper: fix word order for RTL text	Nelson Benítez León	1	-2/+6
	This is used by glib backend (Evince). Fixes issue #53
2021-02-26	Make TextSelectionSizer a bit easier to understand standalone	Albert Astals Cid	1	-4/+9
	Nothing really changes because it's only used in one place and that place called getRegion so there's no leak but looking at the class standalone one could think that one would get a leak if getRegion was not called.
2021-02-14	Update (C)	Albert Astals Cid	1	-1/+1

2021-02-14	TextSelectionDumper: Fix getText() for space after word	Nelson Benítez León	1	-1/+1
	Fix TextSelectionDumper::getText() (which is currently only used by the glib frontend) to not default to add a space after word in the case the word is explicitly set to not carry that space by means of the 'spaceAfter' TextWord field. Fixes issue #1042
2020-11-28	Fix crash when searching things of length 0	Albert Astals Cid	1	-0/+4

2020-10-29	clang: Warn about weak-vtables	Albert Astals Cid	1	-1/+3

2020-08-27	Update (C)	Albert Astals Cid	1	-1/+1

2020-08-26	TextSelectionPainter: support glyphless fonts	Nelson Benítez León	1	-5/+30
	in text selections, by: - Ignoring to draw characters with it. - Painting the selection's background as transparent. Fixes issue #157 Based on inital work by Nelson Benitez and changed to be not tesseract specific by Julian Andres Klode.
2020-07-03	Run clang-format	Albert Astals Cid	1	-4890/+4596
	find . \( -name ".cpp" -or -name ".h" -or -name ".c" -or -name ".cc" \) -exec clang-format -i {} \; If you reached this file doing a git blame, please see README.contributors (instructions added 2 commits in the future to this one)
2020-05-19	Update (C)	Albert Astals Cid	1	-1/+1

2020-05-19	[TextOutputDev] simplify TextFontInfo::matches(const Ref *ref)	Albert Astals Cid	1	-1/+1

2020-05-19	[cpp] Add the font infos to the text_box object.	suzuki toshiya	1	-0/+4

2020-01-07	Mark some static arrays as const	Albert Astals Cid	1	-1/+1

2020-01-05	Update last commit (C)	Albert Astals Cid	1	-1/+1

2020-01-05	Remove UnicodeMap reference counting	Albert Astals Cid	1	-16/+5
	And make the cache just be "infinite", it's not like we support that many maps or that there's so many used in a given session, and if they are, well it's good we cached them All the unicode maps we support use about 2MB of memory, but PSOutputDev is the only one that loads "random" unicodeMaps so to load them all you'd had to print lots of different documents with fonts with lots of different font encodings, so it seems like a not very likely situation and the code gets simplified a bit
2019-12-05	Move textEOL and textPageBreaks out of GlobalParams to TextOutputDev	Albert Astals Cid	1	-8/+10

2019-12-03	Enable modernize-loop-convert	Albert Astals Cid	1	-11/+5

2019-12-02	enable modernize-redundant-void-arg	Albert Astals Cid	1	-2/+2
	No copyright, it's a mechanical change