summaryrefslogtreecommitdiff
path: root/writerfilter
AgeCommit message (Collapse)AuthorFilesLines
2017-07-18A temporary workaround for out-of-order (in-paragraph) tbl on OOXMLcp-5.3-20Mike Kaganski4-0/+35
This allows for import the data in such tables (previously, this text was simply dropped, causing dataloss). Layout problems are not fixed yet. Change-Id: Id7422adfe0998d1e2adcd4bf0b0e0a1dd7ed37bf Reviewed-on: https://gerrit.libreoffice.org/40105 Reviewed-by: Aron Budea <aron.budea@collabora.com> Tested-by: Aron Budea <aron.budea@collabora.com>
2017-07-14tdf#109053: DOCX: Multipage table is not imported properlyTamás Zolnai1-8/+15
An other use case when converting to a "floating table" is not a good idea. In this case we can check whether next to the table anything fits in the text area. If not then we can avoid floating table conversion. Reviewed-on: https://gerrit.libreoffice.org/39811 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Tamás Zolnai <tamas.zolnai@collabora.com> (cherry picked from commit fc55711f01af172eb3a034454405fa941454c781) Change-Id: I798a2f4c7a9dfe6aecbe4a73e3162b49ea5f0adc Reviewed-on: https://gerrit.libreoffice.org/39930 Reviewed-by: Andras Timar <andras.timar@collabora.com> Tested-by: Andras Timar <andras.timar@collabora.com>
2017-07-12Fix tdf#106029 - Add setting XML_doNotExpandShiftReturn when exporting to docxnikki3-0/+18
Change-Id: Ie8ffb0f2d5444c6ead13bdc894715c5a2e6d0baa Reviewed-on: https://gerrit.libreoffice.org/36485 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk> (cherry picked from commit 9ad9c5183f348384b62ec88459a3a5922e423d83) Reviewed-on: https://gerrit.libreoffice.org/39749 (cherry picked from commit a59cf3ecab2f327801c2b580d20df9e8b643cc6c)
2017-07-12tdf#109063 DOCX import: consider wrap space for multi-page floattablesMiklos Vajna1-1/+5
Follow-up to commit 78d1f1c2835b9fae0f91ed771fc1d594c7817502 (fdo#68607 bnc#816593 DomainMapperTableHandler: don't always start a frame, 2013-09-03), turns out in case there is little space between the table and the edge of the body area, then there is no wrapping performed in Word, so we should not convert to floating table, either. The limit seems to be 266 twips (mm100 unit is used in the code), and this seems to be constant: it does not change if both the table and the page width is changed, nor does it change when the empty paragraph to be wrapped has a different paragraph mark size. For the majority of the documents this means no change as usually there is either no space available for wrapping or there is a lot more available. (cherry picked from commit 25445d24cfa87522ee4c47e4aa7e6e816cdc9a36) Conflicts: writerfilter/source/dmapper/PropertyMap.cxx Change-Id: Ibbf7409065ba958854514f23b360be56677c8fe3 Reviewed-on: https://gerrit.libreoffice.org/39828 Reviewed-by: Tamás Zolnai <tamas.zolnai@collabora.com> Tested-by: Tamás Zolnai <tamas.zolnai@collabora.com>
2017-07-11tdf#108545 show an icon (DOCX inside DOCX)Szymon Kłos3-0/+11
If DrawAspect is equal "Icon", show an icon not document preview Document is opened in the separate window, not in-place. Change-Id: I3a8d81e7340b29d247f8ac440c06b0420bb65644 Reviewed-on: https://gerrit.libreoffice.org/39440 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Szymon Kłos <szymon.klos@collabora.com> Reviewed-on: https://gerrit.libreoffice.org/39716 Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk> Tested-by: Miklos Vajna <vmiklos@collabora.co.uk>
2017-07-11tdf#108544 edit in window (XLSX inside DOCX)Szymon Kłos1-0/+7
Change-Id: If1dd46643dc2ae9cc74ba94038609ae3445a416c Reviewed-on: https://gerrit.libreoffice.org/39706 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Szymon Kłos <szymon.klos@collabora.com> (cherry picked from commit 505ce3a2ba3adeef46daecbf9b14c42cea211408) Reviewed-on: https://gerrit.libreoffice.org/39715 Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk> Tested-by: Miklos Vajna <vmiklos@collabora.co.uk>
2017-07-07tdf#108995: take xml:space attribute into accountMike Kaganski2-2/+43
See paragraph 2.10 of XML 1.0 specification and 17.3.3.31 of ECMA-376-1:2016 Change-Id: I7f19d3b9cf2ccce88a5fa03022beeb99facc04fe Reviewed-on: https://gerrit.libreoffice.org/39682 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Mike Kaganski <mike.kaganski@collabora.com> (cherry picked from commit 7c1a51516aaf2767e43b393259a1ad21570df5fb) Reviewed-on: https://gerrit.libreoffice.org/39688 Tested-by: Mike Kaganski <mike.kaganski@collabora.com>
2017-07-07tdf#108714: Also support paragraph-level (line) breaksMike Kaganski3-5/+10
Change-Id: Ida55015363cac3ae29b82a60a9b9a5f1b39086a2 Reviewed-on: https://gerrit.libreoffice.org/39675 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Mike Kaganski <mike.kaganski@collabora.com> (cherry picked from commit f95f0ce163743706a3670c6e33593023c22af2ff) Reviewed-on: https://gerrit.libreoffice.org/39677 Tested-by: Mike Kaganski <mike.kaganski@collabora.com>
2017-06-28tdf#108714 follow-up: handle deferred break in character groupMike Kaganski1-4/+4
If an out-of-order break happens immediately after a table, then in following paragraph group (before character group start) the table level is > 0, and break is ignored. Since out-of-order break only happens at top level, the following character group necessarily designates a new paragraph group, so it's OK to handle that at the character group level, where table level is already updated. Change-Id: Ic1b1bb89e12407b050c2e880ad971794311845a5 Reviewed-on: https://gerrit.libreoffice.org/39347 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Mike Kaganski <mike.kaganski@collabora.com> (cherry picked from commit 553204015f954d20db65e6adcda68b823a8ef235) Reviewed-on: https://gerrit.libreoffice.org/39352 Reviewed-by: Andras Timar <andras.timar@collabora.com> Tested-by: Andras Timar <andras.timar@collabora.com>
2017-06-27tdf#108806: convert CRLF into space in OOXML textMike Kaganski1-2/+7
Change-Id: I8e2e108a705ecdb55c096a589d83d51c48b0b83c Reviewed-on: https://gerrit.libreoffice.org/39286 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Mike Kaganski <mike.kaganski@collabora.com> Reviewed-on: https://gerrit.libreoffice.org/39322 Tested-by: Mike Kaganski <mike.kaganski@collabora.com>
2017-06-27tdf#108714: allow <w:br> as direct child of <w:body>Mike Kaganski6-1/+54
LibreOffice doesn't accept <w:br> element as a child of <w:body>. ECMA-376-1:2016 17.3.3.1 describes br as element of a run content, and points to CT_Br in §A.1. CT_Br may appear only as part of EG_RunInnerContent. In turn, EG_RunInnerContent may appear only inside CT_R. So, using <w:br> outside of <w:r> produces ill-formed OOXML. Open XML SDK 2.5 Productivity Tool for Microsoft Office confirms that, showing OpenXmlUnknownElement error. However, Word accepts it as direct child of <w:body>. It behaves as if the <w:br> were used as first element in first run of the following <w:p> (thus creating page break after next paragraph). Another Word bug that provokes third-parties to create ill-formed documents, and requires LibreOffice to be bug-to-bug compatible. This commit makes the following changes: 1. Registers a dedicated complex type CT_Br_OutOfOrder to handle those unusual breaks, with corresponding handler function. 2. In the handler function, saves the gathered property set to parser state to use later in next paragraph group handler. This reproduces Word behaviour. Change-Id: I5df6927e2de9266b58f87807319ad1c4977e45a7 Reviewed-on: https://gerrit.libreoffice.org/39168 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Mike Kaganski <mike.kaganski@collabora.com> (cherry picked from commit a4a1467bc47b81ad68ecad0d5e2e163670582919) Reviewed-on: https://gerrit.libreoffice.org/39303 Tested-by: Mike Kaganski <mike.kaganski@collabora.com>
2017-06-23Related: tdf#108269 DOCM filter: preserve VBA streamMiklos Vajna3-3/+30
This is a combination of 3 commits (initial support, then two refactor commits to not duplicate code.) 1st commit: This means 2 new streams when roundtripping DOCM files that actually have macros: word/vbaProject.bin and word/vbaData.xml (+ the relation pointing to the second from the first). Reviewed-on: https://gerrit.libreoffice.org/38360 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk> (cherry picked from commit 8a59b30bb1af55f7afd8b98e4b60234f98d84c76) Conflicts: sw/qa/extras/ooxmlexport/ooxmlexport9.cxx Change-Id: Iba24eea4c5bca8f743a53027c71ed2aae48f1934 2nd commit: Related: tdf#108269 DOCM filter: reuse oox code for VBA preservation With this, the project stream import is shared between DOCM and XLSM. Change-Id: I8fbffefc5acf28adea4875fa6bc4148a99b5ebef Reviewed-on: https://gerrit.libreoffice.org/38495 Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk> Tested-by: Jenkins <ci@libreoffice.org> (cherry picked from commit e4adb8d9e77bab353dda26375e11a6b7a456368f) 3rd commit: Related: tdf#108269 DOCM filter: reuse oox code for VBA data preservation Which means the DOCM-specific code to roundtrip VBA things (project, data) can be removed. The oox part has to be extended a bit, as at least for this DOCM bugdoc there is an XML relation of the binary data, while existing shared code assumed the full VBA project is just a single OLE blob. Reviewed-on: https://gerrit.libreoffice.org/38504 Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk> Tested-by: Jenkins <ci@libreoffice.org> (cherry picked from commit 0129c2cd9dd95355412b194c595f4b986403ba1e) Conflicts: writerfilter/inc/ooxml/OOXMLDocument.hxx writerfilter/source/ooxml/OOXMLDocumentImpl.hxx Change-Id: I4085e4dba24475e6fd555e5f34fe7ad0f305c57d Reviewed-on: https://gerrit.libreoffice.org/38558 Reviewed-by: Andras Timar <andras.timar@collabora.com> Tested-by: Andras Timar <andras.timar@collabora.com>
2017-06-23tdf#108682 DOCX import: fix <w:spacing w:line=...> for negative valuesMiklos Vajna1-3/+13
I didn't find UI in Word to create <w:spacing w:line="-260" w:lineRule="auto"/> the equivalent markup when you set line spacing to exactly 13pt for new documents is: <w:spacing w:line="260" w:lineRule="exact"/> The OOXML spec and Microsoft's implementer notes ([MS-OI29500]) is also pretty silent about what a negative value means. However, if this markup is converted to WW8 by Word, then the WW8 LPSD structure is like this (as presented by doc-dumper): <lspd type="LSPD" offset="5086"> <dyaLine value="0xfefc"/> <fMultLinespace value="0x1"/> </lspd> For the 0xfefc value the [MS-DOC] spec clearly states that means the type of the spacing is "exactly", with the value of 0x10000-0xfefc, i.e. the same 260 twips. Change-Id: I84b485d02dea49c610b6df2e06ccce03e1d29d21 Reviewed-on: https://gerrit.libreoffice.org/39091 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk> (cherry picked from commit f575f70b8303ba187f6989920281ff02e7a431c9) Reviewed-on: https://gerrit.libreoffice.org/39162 Reviewed-by: Andras Timar <andras.timar@collabora.com> Tested-by: Andras Timar <andras.timar@collabora.com>
2017-06-22Watermark: auto size in the RTFSzymon Kłos1-0/+2
When Watermark size is set to Auto in the MSO, the saved value is equal 1pt. Before this patch in this case Watermark was invisible due to small size. Change-Id: Ia2028a6547cf98dd31031305bcc5375625b83fe0 Reviewed-on: https://gerrit.libreoffice.org/38883 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk>
2017-06-15Watermark: RTF font import and exportSzymon Kłos1-2/+39
* font size * font family * rotation * TextPath geometry - working transparency & color * revert TextBox export removed by mistake Change-Id: I3f6df86809ae57dc40c275652a96b19d2a3d7eba Reviewed-on: https://gerrit.libreoffice.org/38494 Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk> Tested-by: Jenkins <ci@libreoffice.org> (cherry picked from commit dd0df1c8a213ab6f0959145396bc273bf885af39) Signed-off-by: Andras Timar <andras.timar@collabora.com>
2017-06-10Watermark: RTF import / exportSzymon Kłos1-0/+12
* "wzName" should contain shape name * MS Word watermark has text inside the "gtextUNICODE" (do not create additional shptxt) Change-Id: I7929ec83a9219d6087d36ccbf6d7e735acf63722 Reviewed-on: https://gerrit.libreoffice.org/38219 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk>
2017-06-10Avoid UBSan warning about negative double -> sal_uInt32 conversionStephan Bergmann2-7/+7
Since ea890b1d4bcd6dd59db9f52dce1609c020804e24 "tdf#108408: support unit specifications for ST_HpsMeasure", the OOXMLUniversalMeasureValue ctor is converting textual data to mnValue via intermediary double instead of sal_Int32, so textual data representing negative values now triggers UBSan warnings (e.g., "writerfilter/source/ooxml/OOXMLPropertySet.cxx:630:43: runtime error: -70 is outside the range of representable values of type 'unsigned int'" during CppunitTest_chart2_export; it appears that, while HpsMeasure may be documented to only cover positive values, TwipsMeasure may be negative). But OOXMLUniversalMeasureValue::mnValue is apparently only used in OOXMLUniversalMeasureValue::getInt, to return an int value, so just change its type. Change-Id: I44eabb78f09100c05cc9d1e79a739648f34ea743 (cherry picked from commit 600ec501bafc691d37078a0ed5b4ca8bf32340f1) Reviewed-on: https://gerrit.libreoffice.org/38632 Reviewed-by: Mike Kaganski <mike.kaganski@collabora.com> Tested-by: Mike Kaganski <mike.kaganski@collabora.com>
2017-06-09tdf#108408: support unit specifications for ST_HpsMeasureMike Kaganski6-21/+70
w:ST_HpsMeasure is defined in ECMA-376 5th ed. Part 1, 17.18.42 as This simple type specifies that its contents contain either: * A positive whole number, whose contents consist of a measurement in half-points (equivalent to 1/144th of an inch), or * A positive decimal number immediately followed by a unit identifier. ... This simple type is a union of the following types: * The ST_PositiveUniversalMeasure simple type (§22.9.2.12). * The ST_UnsignedDecimalNumber simple type (§22.9.2.16). This patch generalizes OOXMLUniversalMeasureValue to handle standard- defined units, and introduces two typedefed specifications: OOXMLTwipsMeasureValue (which is used where UniversalMeasure was previously used), and new OOXMLHpsMeasureValue. Unit test included. Reviewed-on: https://gerrit.libreoffice.org/38562 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Mike Kaganski <mike.kaganski@collabora.com> (cherry picked from commit ea890b1d4bcd6dd59db9f52dce1609c020804e24) Change-Id: Iccc6d46f717cb618381baf89dfd3e4bbb844b4af Reviewed-on: https://gerrit.libreoffice.org/38591 Reviewed-by: Mike Kaganski <mike.kaganski@collabora.com> Tested-by: Mike Kaganski <mike.kaganski@collabora.com>
2017-06-09tdf#104450: Use Calibri; let LO to fallback to CarlitoMike Kaganski1-4/+4
Using Calibri will allow to keep originally intended font on round-trip. If Calibri is absent on a system, LO will fallback to Carlito for rendering, but keep original font intact. Reviewed-on: https://gerrit.libreoffice.org/38456 Reviewed-by: Mike Kaganski <mike.kaganski@collabora.com> Tested-by: Mike Kaganski <mike.kaganski@collabora.com> (cherry picked from commit dd1ba90f6069b41e3f2c301809afefc6f63da710) Change-Id: I8f29bed29bc7f48912b2637053ff128ea904c7a1 Reviewed-on: https://gerrit.libreoffice.org/38590 Reviewed-by: Mike Kaganski <mike.kaganski@collabora.com> Tested-by: Mike Kaganski <mike.kaganski@collabora.com>
2017-06-09tdf#108350: Use Carlito for DOCX import by defaultMike Kaganski1-0/+18
In OOXML (i.e. Word since 2007), the default document font is Calibri 11 pt. If a document doesn't contain font information, we should assume our metric-compatible equivalent Carlito to provide best layout match. A unit test included. An existing unit test (testN766487) was corrected to match the font size that Word uses (11; was 12 which doesn't match Word's size). Reviewed-on: https://gerrit.libreoffice.org/38421 Reviewed-by: Mike Kaganski <mike.kaganski@collabora.com> Tested-by: Jenkins <ci@libreoffice.org> (cherry picked from commit 5471a5585cba925bb0dcb2dc41e03ad563998166) Change-Id: I3040f235696282dc7a124cd83fb34a6d95a29a17 Reviewed-on: https://gerrit.libreoffice.org/38589 Reviewed-by: Mike Kaganski <mike.kaganski@collabora.com> Tested-by: Mike Kaganski <mike.kaganski@collabora.com>
2017-05-30tdf#106953 RTF import: fix missing paragraph left marginMiklos Vajna1-0/+5
See commit 3915bf2dc877d5f1140798e24933db0f21386a4a (tdf#95376 DOCX import: fix incorrectly indented tab stops, 2016-01-26) for the various sources that can determine the paragraph indentation. In this case the problem was that too aggressive RTF style deduplication removed a direct indent, which then meant a fallback to the ind-from-num value, not to the ind-from-parastyle one. (cherry picked from commit f528f9499bd91b700c549575e88fa102cfffede9) Change-Id: I3b47b2bbeaaedf405baef24505d23dc49bd01865 Reviewed-on: https://gerrit.libreoffice.org/37670 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Caolán McNamara <caolanm@redhat.com> Tested-by: Caolán McNamara <caolanm@redhat.com> (cherry picked from commit 0022ae02cfea1c5d69d9f4fedeeeb7a30cc4184b)
2017-05-17tdf#107889 DOCX import: consider page breaks for multi-page floattablesMiklos Vajna5-3/+23
This is the DOCX equivalent of commit 6aba29576df7a2a40e54040d4dd09d94d6594741 (tdf#107773 DOC import: consider page breaks for multi-page floattables, 2017-05-11): a specific case where it's clearly superior to import a multi-page floating table as a multi-page one, rather than a floating one. Reviewed-on: https://gerrit.libreoffice.org/37683 Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk> Tested-by: Jenkins <ci@libreoffice.org> (cherry picked from commit 659c0227a50d298780d72902314e03df8824bc06) Conflicts: sw/qa/extras/ooxmlexport/ooxmlexport9.cxx writerfilter/source/dmapper/PropertyMap.cxx writerfilter/source/dmapper/PropertyMap.hxx Change-Id: I71a92d2b10e52e505665831caacad2948d22b4e1
2017-05-17writerfilter: default break type identified as _nextPageJustin Luth1-3/+6
Change-Id: I9247c75819425a97d19c95c48fbaf7a4f8d92c62 Reviewed-on: https://gerrit.libreoffice.org/35379 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Justin Luth <justin_luth@sil.org> (cherry picked from commit 541b377a94fb1247dbf4c39b5bcf55deb8e5ef60)
2017-05-17tdf#103931 writerfilter breaktype: same for implicit and explicitJustin Luth1-1/+2
MSWord normally does NOT specify "nextPage" for the sectionBreak, since that is the default type. That is imported as BreakType == -1. However, Writer ALWAYS exports the section type name, which of course is imported explicitly. **There is an import hack that treats the very first -1 section as continuous IF there are columns**. Since Writer explicitly defines the section type, these documents import differently. When Writer round-trips these types of files, they get totally messed up in Writer, although they look fine in Word. So, treat both implicit and explicit nextPage identically for bTreatAsContinuous during import. Another unit test demonstrated that headers/footers are lost when treating as continuous, so preventing that situation now also. This fix allows several import-only unit tests to round-trip. Reviewed-on: https://gerrit.libreoffice.org/35013 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Justin Luth <justin_luth@sil.org> Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk> (cherry picked from commit 4605bd46984125a99b0e993b71efa6edb411699f) Conflicts: sw/qa/extras/ooxmlexport/ooxmlexport9.cxx Change-Id: I37fa861d82e8da564d28d8e9089fe0f2777650fb
2017-05-10tdf#104407 writerfilter: fix crash with null xRangePropertiesMichael Stahl1-5/+7
The m_xStartingRange is null at this point for whatever reason, and the block immediately above this one already checks xRangeProperties, so let's just do the same here. (Also IsNewDoc(), where the logic between PageDescName and PageNumberOffset presumably shouldn't differ?). (started to crash with abaf6bde4ee91c628bd55a7ec2e876a5d0ecff6e as previously that code was unreachable in RTF import) Change-Id: I20539c3a753ecea357e556ea556c3c26983ce1d1 (cherry picked from commit e4da2e5dfa9e462e0d9c23a1a60caf4b3ef2dc56) Reviewed-on: https://gerrit.libreoffice.org/37305 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Caolán McNamara <caolanm@redhat.com> Tested-by: Caolán McNamara <caolanm@redhat.com> (cherry picked from commit 8521f4c8fb08aa37912f73a73ba1a34c2ccc97ed)
2017-05-10AutoText: add only real AutoText entriesSzymon Kłos5-11/+35
* add only autoTxT gallery type * new test with other types of entries Change-Id: Ibf7751c73dcf3b6ebd69eec5f4931dbeaaf098c8 Reviewed-on: https://gerrit.libreoffice.org/37425 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Szymon Kłos <szymon.klos@collabora.com> Tested-by: Szymon Kłos <szymon.klos@collabora.com> (cherry picked from commit a470d16208a78ae6893d199b3b6bc77a8559b06a) Reviewed-on: https://gerrit.libreoffice.org/37460
2017-05-07tdf#107033 DOCX import: fix unexpected missing footnote separatorMiklos Vajna3-4/+21
Regression from commit 330b860205c7ba69dd6603f65324d0f89ad9cd5f (fdo#68787 DOCX import: handle when w:separator is missing for footnotes, 2013-09-04), the problem was footnote settings were modified also in case there were no footnotes at all in the document. Make the bug scenario and the original one working at the same time by touching footnote settings only in case there is at least one footnote in the current section. (cherry picked from commit e79ef12b7a904f17d4147fa409d055c12b70f952) Change-Id: I163d11769cbd97957662607fbedfba404181e002 Reviewed-on: https://gerrit.libreoffice.org/37228 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Michael Stahl <mstahl@redhat.com> (cherry picked from commit cc6a55d687581db1a174b2a7d01f8a62887b5e24)
2017-05-04AutoText: read names of entriesSzymon Kłos2-0/+21
+ extended model to parse <docPartPr> and <name> marks + names are inserted to the document before content of each entry + SwDOCXReader interprets first paragraph of each section as a name Change-Id: Ib7de152ba1c6bea4f4665f98d321019c3f68863e Reviewed-on: https://gerrit.libreoffice.org/37124 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk>
2017-05-04AutoText: Reading multiple entriesSzymon Kłos10-1/+82
+ each entry is placed in a separate section + extended model and dmapper to react on docPart mark Change-Id: I7e5213a09ae7352d1d09369bd0a209b6d4e18e82 Reviewed-on: https://gerrit.libreoffice.org/37107 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Szymon Kłos <szymon.klos@collabora.com>
2017-05-04AutoText: importing docx contentSzymon Kłos2-0/+8
- passing "ReadGlossaries" flag to the WriterFilter - if set - WriterFilter reads glossary document instead of the main content - updated model.xml to read docParts and docPart nodes - SwDOCXReader adds document content as an AutoText entry Change-Id: I9a0cc91c793d6accc8461e1c3aca791c5997d497 Reviewed-on: https://gerrit.libreoffice.org/36753 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Szymon Kłos <szymon.klos@collabora.com> Tested-by: Szymon Kłos <szymon.klos@collabora.com>
2017-05-04tdf#107104 DOCX drawingML import: fix invisible arrow shapeMiklos Vajna1-0/+6
This is the drawingML equivalent of commit 3d9ebded1358395ed81db7a63629b046aec2aeac (Misc improvements for docx VML import, 2010-10-06), which made sure that shapes are never invisible just because they have zero height or width. For this particular bugdoc the Word-produced WW8 equivalent width is 20 twips, but let's be consistent with the VML import and just round up to 1 mm100. Also fix two existing tests that wanted to test something else, but implicitly asserted that some shapes indeed have zero width/height. (cherry picked from commit e6e5a68f52f4e06b73f0ece3a3886f3bfc30f56d) Change-Id: I9600424520d0a3deecc711b44622eccc041a59da Reviewed-on: https://gerrit.libreoffice.org/36953 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Michael Stahl <mstahl@redhat.com> (cherry picked from commit 7d3baea4a726d6c0cf6cb0d6a8b2c83cef4f580d)
2017-04-24tdf#107116 RTF import: fix missing upper and lower borders around textMiklos Vajna1-1/+20
See commit 1be0a3fa9ebb22b607c54b47739d4467acfed259 (n#825305: writerfilter RTF import: override style properties like Word, 2014-06-17) for the context. Here the problem was that various details of the top border were removed during the style deduplication, but not the top border sprm itself. That was interpreted (correctly) by dmapper as "no border", rather than "inherit from style". (cherry picked from commit e9f0d8d02885eca619552b19eab30c1eade9e7ef) Change-Id: I3dec8df789fc7b75fccfff91ce66f457fecd2f6e Reviewed-on: https://gerrit.libreoffice.org/36692 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Michael Stahl <mstahl@redhat.com> (cherry picked from commit c8c90854506cc7f1c3d7084ab97c156aead003e2)
2017-04-18tdf#106970 DOCX import: don't collapse para auto space for different numsMiklos Vajna1-3/+15
Commit 1bf7f6a1a50ee9f24a3687240fe6ae390b905a6b (tdf#106690 DOCX import: fix automatic spacing before/after numbered para block, 2017-04-04) made sure that autospacing is only collapsed in case the adjacent text nodes both have a numbering rule. It turns out there is an additional condition: even if both text nodes have a numbering rule, do the collapsing only in case they have the same numbering rule. (cherry picked from commit e1c83d0514e6123faa50ad0a7aa6a9031b271c9a) Change-Id: Idb7a2b24d7eaa9094cc36f86b8a483045a33d028 Reviewed-on: https://gerrit.libreoffice.org/36510 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Michael Stahl <mstahl@redhat.com> (cherry picked from commit e57873156d3c04ecc34bb5f38b186ebe29567f0c)
2017-04-12tdf#106690 DOCX import: fix automatic spacing before/after numbered para blockMiklos Vajna3-7/+50
The context is text nodes with automatic before/after spacing and numbering rules set, like: A * B * C * D E The correct behavior seems to be (though I haven't found this explicitly written in the OOXML spec) to drop spacing between B and C and C and D, but not before B and not after D. Originally no spacing was dropped, then commit c486e875de7c8e845594f5043a37ee8800865782 (tdf#95031 DOCX import: auto spacing inside numbering means no spacing, 2016-10-18) removed spacing around all B/C/D. Fix the problem by checking the numbering rules and automatic after spacing of the previous paragraph, so spacing before B and after D is not removed. Change-Id: Icbdb36e31057ab0e8ac033888cf5cc7c52dad5d0 Reviewed-on: https://gerrit.libreoffice.org/36062 Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk> Tested-by: Jenkins <ci@libreoffice.org> (cherry picked from commit 1bf7f6a1a50ee9f24a3687240fe6ae390b905a6b) Reviewed-on: https://gerrit.libreoffice.org/36142 Reviewed-by: Michael Stahl <mstahl@redhat.com> (cherry picked from commit 776839b8bfc6eed905ce97c6fe32af8deb8d1451)
2017-04-12tdf#106692 writerfilter: RTF import: fix \'0d in \leveltextMichael Stahl1-2/+4
It's not a newline but yet another one of those bizarre RTF-encodings. (regression from 10e733908038407791f9c14af2a86417cc4a653c) (cherry picked from commit 69b7204164945cfed385d58e64592ce1b17937d7) Reviewed-on: https://gerrit.libreoffice.org/36284 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk> (cherry picked from commit fd93d09a5b6226a8297b5dd995301d514ec7b8ca) Change-Id: I568050b031b95ac0b6ebfa1a0c39107e62f68bed
2017-04-04writerfilter: DOCX import: fix handling of w:hideMark vs. w:vMergeMichael Stahl1-1/+5
The problem is that Writer's layout can't handle the case where cells are vertically merged and the last row has a fixed height; the vertically merged cell will grow up to the height of the other cells in the non- fixed rows plus the fixed row height, but no larger. So for now, avoid setting fixed row heights in this case. (regression from d1278ef4849661b9ae0eb7aaf4d74fbf91ccaf11) Change-Id: Iac3689e0bb0d5b8a62115ca0fb1f2c553a6e6bbc (cherry picked from commit c382c998ffdaf80c10a3f078fb4f0a37224d1158) Reviewed-on: https://gerrit.libreoffice.org/35960 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk> (cherry picked from commit 7d7d21cfa53c8e80fd4dd0938579d8377da5a840)
2017-03-30tdf#106694 RTF import: fix missing paragraph tab positionMiklos Vajna1-1/+19
The problem here was that while in general paragraph style / direct formatting deduplication is supposed to happen in the tokenizer, paragraph tab positions is an exception, and dmapper expects to see the duplicated tokens. Fix the problem by introducing a blacklist that contains tokens not to deduplicate. Change-Id: I1cca53e99cfdb082df389ff295f3447cc8f9d3b8 Reviewed-on: https://gerrit.libreoffice.org/35790 Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk> Tested-by: Jenkins <ci@libreoffice.org> (cherry picked from commit fea174753b1c6b0882aebb044bf1a1eef6fa50e0)
2017-03-28Resolves: tdf#106724 crash when Title property doesn't already existCaolán McNamara1-8/+8
because we just write past the end instead of resizing before hand (cherry picked from commit 4e32e8900e59f9751a60d9fdef80cdf7d500f72f) Change-Id: I4742980a331b14ca39aff8aa6cfc27db154091ff Reviewed-on: https://gerrit.libreoffice.org/35651 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Michael Stahl <mstahl@redhat.com> (cherry picked from commit 1255360bffebef0f0521b00c4e5af57e6fe09e6b)
2017-03-17tdf#105729 RTF import: \ltrpar should not override \qc from styleMiklos Vajna1-2/+4
This is similar to commit 92fd894ea18672cba4cf961bdc4c0bc98f168102 (tdf#94435 RTF import: \ltrpar should not override \qc, 2015-10-05), except that here the \qc is inherited from the style, it's not a direct formatting. The problematic code was added in commit 2638faa2e834c2da4c195224fd88d32c29b3d0cc (writerfilter08ooo330: applied patch for writerfilter08, 2010-07-28), and it's not really clear to me what is its purpose, given that the DOC import equivalent in SwWW8ImplReader::Read_ParaBiDi() doesn't set the paragraph alignment. Fix the situation by not touching the paragraph alignment for the RTF case at least. (cherry picked from commit 2cc5f18d10cf6ef1349d9518e6f67977f7c5d9bf) Change-Id: I2baa2c8c8012d972740da7cf3f710117812859b3 Reviewed-on: https://gerrit.libreoffice.org/35190 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Christian Lohmaier <lohmaier+LibreOffice@googlemail.com> (cherry picked from commit b5c4e120540053d0cb737720503cb7038f12d5bd)
2017-03-17tdf#103931 DOCX import: fix lost section breakMiklos Vajna1-0/+1
When there are multiple sections in a document, every <w:p> element triggers a handleLastParagraphInSection() call, and that's how the previous section is ended and the next one is started if necessary. In case the section contains no paragraphs at all, the section was lost on import. Fix this by also calling handleLastParagraphInSection() on <w:sectPr> as well. It's not a problem if there are both <w:p> and <w:sectPr> in a section (which is the usual situation) as only the first call closes the previous section / starts the next one. (cherry picked from commit 6603947329a7b372a173a3c60e013e532d0bc5cf) Change-Id: I64f2c403dcb2ceca76d444ab06df3052235d2795 Reviewed-on: https://gerrit.libreoffice.org/34718 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Christian Lohmaier <lohmaier+LibreOffice@googlemail.com> (cherry picked from commit 1e88c10327642e6867db5708e3fd0fb7065bc74c)
2017-03-14vcl PDF import: there is no PNG encoding hereMiklos Vajna2-2/+2
It was a copy&paste error from xmlsecurity/workben/pdfverify.cxx, which does PNG encoding. Change-Id: I7b5108a7cddffdc859276b656a6e1168f23d3863 (cherry picked from commit 89e339fc1937b7de0d0e1f4ced802db7b4a68a9b)
2017-03-10tdf#104287 RTF import: handle bitmap shapes inside tablesMiklos Vajna4-15/+60
Regression from commit 015fd55c94b7b650ed8e572cafaf3b0f903b01b9 (tdf#96275 RTF import: fix anchor of shapes inside tables, 2016-05-10), the problem was that since shapes inside tables are now buffered, some previously hidden problems in the buffering became visible. For one, there was no code to make sure that a bitmap shape is not appended at the end of the buffer again when it gets re-played. For another, only the bitmap shape itself was buffered, not its size. (cherry picked from commit 8240be9170cc473506531dad2fda82469ae84443) Conflicts: sw/qa/extras/rtfimport/rtfimport.cxx writerfilter/source/rtftok/rtfvalue.cxx writerfilter/source/rtftok/rtfvalue.hxx Change-Id: I04d65eb794ff6b160ef77af85479ba25ea5f8aa7 Reviewed-on: https://gerrit.libreoffice.org/34953 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Caolán McNamara <caolanm@redhat.com> Tested-by: Caolán McNamara <caolanm@redhat.com> (cherry picked from commit 9a899023db876630b74493da588b4a5490f90894)
2017-02-24writerfilter: RTF import: hex-escaped \r and \n create paragraph breakMichael Stahl1-1/+13
... in Word 2010, while the spec doesn't say what they do. So just handle \'0d and \'0a like \par. This fixes an assert failure on importing lp556169-2.rtf, where insertTextPortion was called with a string containing "\r", which split the paragraph and that messed up the SwPaM. Change-Id: Iee8b5b47e15d18232de841adfbc9c6498727c384 (cherry picked from commit 10e733908038407791f9c14af2a86417cc4a653c) Reviewed-on: https://gerrit.libreoffice.org/34584 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk> (cherry picked from commit 7a26194b05029f68e58ff71285c7be1c5b4c2c42)
2017-02-24tdf#104081 RTF import: handle \htmautspMiklos Vajna4-4/+10
It's the opposite of OOXML's <w:doNotUseHTMLParagraphAutoSpacing/>, so the default is different. Also adapt the fdo82006 bugdoc where the original bugdoc contained this flag, but the testcase did not. (cherry picked from commit 291c9122b23ce7aa619e828b895b08dcd21bf025) Change-Id: I2fd757a8f95be9b1bee63570c9f587c17d3b22bc Reviewed-on: https://gerrit.libreoffice.org/34568 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Caolán McNamara <caolanm@redhat.com> Tested-by: Caolán McNamara <caolanm@redhat.com> (cherry picked from commit af9d9f274ff26b462048746069a5bb38493ff115)
2017-02-24tdf#106001: Treat CharScaleWidth outliers as 100 in DOCX importAron Budea1-2/+11
Spec limit is [1..600], sometimes documents contain 0, which, similar to other values outside the limit should be treated as 100. Change-Id: I04aec25b638762392de3f9881cd72588f2753e71 Reviewed-on: https://gerrit.libreoffice.org/34341 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk> (cherry picked from commit 6e3a84023b46f6be632b43d2f5713d8d4bb2ba62) Reviewed-on: https://gerrit.libreoffice.org/34368 (cherry picked from commit 06c81a3e61e2d5743ffd8a50d85e5ecee989e46b)
2017-02-17tdf#104181: don't throw on XRelationshipAccess queryMike Kaganski1-4/+4
The queries are followed by conditional blocks; so throwing is unnecessary and erroneous (breaks parser internal state). Change-Id: I49917a85e34866a326b4a2edd30e76f130b8ee27 Reviewed-on: https://gerrit.libreoffice.org/33244 Reviewed-by: Caolán McNamara <caolanm@redhat.com> Tested-by: Caolán McNamara <caolanm@redhat.com> (cherry picked from commit 13827ffa54268dc648e1b72ea574daea02598335)
2017-02-07writerfilter: remove gperf related declarationsMichael Stahl1-11/+1
Reportedly gperf 3.1 changes the signature of in_word_set(), where the len parameter changes from unsigned int to size_t. It turns out the only forward declaration for this function is currently unused, so just remove it. Change-Id: Ifbc582cd31ca37fff9ff95a3706ee902ecfe5223 (cherry picked from commit 19c0eff34a5e1de4f3aff723b7750d4e01d4ba6d) Reviewed-on: https://gerrit.libreoffice.org/33969 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk>
2017-02-06tdf#104082 RTF filter: handle user-defined document properties of type numberMiklos Vajna2-0/+5
Previously only strings were handled, which resulted in not being able to open the bugdoc. (cherry picked from commit fc8c4606e0834cd2128a228c2c95fc7c8f9eb7b1) Change-Id: I2452cbabf48bfaa9f1a3044be4b8cbe4aa9dd0d9 Reviewed-on: https://gerrit.libreoffice.org/33952 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Michael Stahl <mstahl@redhat.com>
2017-02-02tdf#103976 DOCX import: disable incomplete w:before/afterLines style handlingMiklos Vajna1-2/+8
Regression from commit 9e7eb63989ef1cf4b9a0e0404b84ef890db3d8e3 (DOCX import: parse <w:spacing>'s w:before/afterLines attribute, 2014-10-17), the problem is that OOXML has 3 different attributes for the paragraph bottom margin (and other 3 for the top one), while in Writer we just have a top margin. Now the import filter tries to work out which one of these should have priority and ignore the rest, but this is way more complicated when style inheritance has to be taken into account as well. To avoid the regression just restrict w:before/afterLines handling for the case when it's used as direct formatting, that's why this was introduced after all. (cherry picked from commit 353a45aa1b1a15047aa2a92c1383996070e87405) Change-Id: Ie8642c7a9771596def6b8899e098b26c4f8be0b4 Reviewed-on: https://gerrit.libreoffice.org/33774 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Michael Stahl <mstahl@redhat.com>
2017-01-23tdf#48658 writerfilter: only set THROUGH wraps as transparentJustin Luth1-1/+7
commit 15c3a08b8b1e8060f9659c7bc98480a39d1802c5 set transparency before the wrap type was known (which is good in case wrap type is never defined, and the default wrap type IS through, so that fits) but transparency was never re-evaluated once the wrap type was known. In MSWord, the header is at a lower zOrder than the body, so objects that are OVER the header text are still UNDER the body text. Writer emulates this by insisting that ALL through-wrapped header objects are UNDER the header text. (This ought to only apply to objects that spill into the body text area, but that’s pretty hard to calculate, so transparency was applied to any object anchored in the header.) Change-Id: Ie3916c6b7f3fa80caf5994fd910ba4d4d89ec702 Reviewed-on: https://gerrit.libreoffice.org/33152 Reviewed-by: Justin Luth <justin_luth@sil.org> Tested-by: Justin Luth <justin_luth@sil.org> Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk> (cherry picked from commit c0688e8bf047bb123680806317fe040ba2cde407) Reviewed-on: https://gerrit.libreoffice.org/33360 Tested-by: Jenkins <ci@libreoffice.org>