summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-02-13tdf#159453 sw floattable: fix unexpected overlap of in-header fly and body textMiklos Vajna3-4/+54
Regression from commit e2076cf7a92694bc94bdc9f3173c2bddbe881a89 (tdf#155682 sw floattable: fix DOCX with big pictures causes endless loop, 2023-10-25), the bugdoc's body text was wrapping around the floating table from the header, while the expectation was that the top of the body frame is below the bottom of the header frame. It seems IsFollowingTextFlow is only needed when the relation of the floating table is not "page", and this bugdoc has has an examplicit vertical relation of page. Solve the problem by limiting the IsFollowingTextFlow=true request for the floating table to the VertOrientRelation=page case, which fixes the bugdoc and keeps the old use-case working. The doc model for the new bugdoc now matches the WW8 import result. Change-Id: Ia3da65cd52d70b357e448a26a50ffb92a39795e6 Reviewed-on: https://gerrit.libreoffice.org/c/core/+/163290 Reviewed-by: Miklos Vajna <vmiklos@collabora.com> Tested-by: Jenkins
2023-04-21sw: fix crashtesting assert on fdo44018-2.docMichael Stahl1-1/+9
itrpaint.cxx:435: void SwTextPainter::DrawTextLine(): Assertion `!roTaggedParagraph' failed. The problem is that there are in fact 3 lines with the numbering bullet, which is a problem that existed since LO 4.1, but that only changed the WW8 import so it really exists all the way back to OOo 3.0.1. The SwNumberingPortion is created, then it is cloned for the insertion of the tab, then the 2nd one is deleted (which is expected as it is empty), then due to some ChkFlyUnderflow() SwTextFormatter::FormatLine() resets the m_bNumDone flag and the next line gets another numbering. The m_bNumDone flag must be reset if the numbering portion was deleted, but not otherwise. (regression from commit 9b38beadf9eaf027b201cdf0ecb2bce5611014dd) Change-Id: I575947fdfb8786ad6d0f9e83636c39eb929a1b06 Reviewed-on: https://gerrit.libreoffice.org/c/core/+/150709 Tested-by: Jenkins Reviewed-by: Michael Stahl <michael.stahl@allotropia.de>
2023-02-24tdf#78510 sw,cui: split SvxLRSpaceItem for SwTextNode, SwTextFormatCollMichael Stahl71-685/+1607
Leave editengine and non-paragraph usages of SvxLRSpaceItem as-is for now. Add new items RES_MARGIN_LEFT etc., order them so that paragraphs can have 3 consecutive items RES_MARGIN_FIRSTLINE..RES_MARGIN_RIGHT and non-paragraphs also have 2-4 consecutive items RES_MARGIN_RIGHT..RES_MARGIN_LEFT (only the 3 paragraph ones are actually used now). The HTML import filter is particularly annoying because it parses CSS stuff into SfxItemSets without knowing where the items will be applied, so it can't know whether to create SvxLeftMarginItem or SvxTextLeftMarginItem... the split items are created in ParseCSS1_* functions and then converted later if necessary. WW8 import has some weird code as well, SwWW8ImplReader::Read_LR() creates 3 items and then something wants to set every item on its own so SwWW8FltControlStack::SetAttrInDoc() turned out rather weird. Convert the paragraph dialog to handle the split items (by mapping them to SID_ATTR_PARA_FIRSTLINESPACE/SID_ATTR_PARA_LEFTSPACE/ SID_ATTR_PARA_RIGHTSPACE), but the SvxRuler looks a bit more confusing so convert in sw shells for now and leave that for later (also unclear if changing these slot items like SID_ATTR_PARA_LRSPACE breaks any ABIs?). Change-Id: I40431821868fd3e1cceba121b5539ff9ae6befbc Reviewed-on: https://gerrit.libreoffice.org/c/core/+/147024 Tested-by: Michael Stahl <michael.stahl@allotropia.de> Reviewed-by: Michael Stahl <michael.stahl@allotropia.de>
2023-02-20tdf#137883 officecfg: ModuleDependendFilterOrder move Word 2010 ...Michael Stahl1-1/+1
... filter above Word 2007 one, and also move RTF above binary DOC WW8 formats while at it. Change-Id: Ia7a303ace7a9d5782348c90b0ccd95a47ffd7ac7 Reviewed-on: https://gerrit.libreoffice.org/c/core/+/147211 Tested-by: Jenkins Reviewed-by: Michael Stahl <michael.stahl@allotropia.de>
2023-01-18tdf#153082 writerfilter,sw: import/export locale-dependent TOC ...Michael Stahl4-7/+51
... \t style name separators. OOXML says in 17.16.5.68 TOC: \t field-argument Uses paragraphs formatted with styles other than the built-in heading styles. text in this switch's field-argument specifies those styles as a set of comma-separated doublets, with each doublet being a comma-separated set of style name and table of content level. The reality is documented in Word online help: https://support.microsoft.com/en-us/office/field-codes-toc-table-of-contents-field-1f538bc4-60e6-4854-9f64-67754d78d05c?ui=en-US&rs=en-US&ad=US Note: Syntax shown here uses a comma (,) between the Style and Level parameters. A semicolon (;) is also valid, depending on which character is specified as the list separator in your operating system's regional and language settings. Because of language-specific dependencies, we recommend not using the \t switch in templates or documents that are intended for users across multiple language configurations. It's easy enough to recognize both ',' and ';' as separators on import, and unlikely that anybody would use these characters inside a style name; for export, both can't be written and a decision must be made. So do the same thing on export as Word does, assuming most document exchange is between users in the same locale; currently only for "de" locales but more can be added. Interestingly WW8 used to write ';' before 2009 when CWS hb32bugs01 changed it to ','. Change-Id: I2dcfdd009f448f6fae37cbd28929d0bbe504acf9 Reviewed-on: https://gerrit.libreoffice.org/c/core/+/145744 Tested-by: Jenkins Reviewed-by: Michael Stahl <michael.stahl@allotropia.de>
2023-01-18tdf#114537 doc import: trim switches before evaluating FIELD_IFJustin Luth3-1/+26
Although this function is only used for ww8 import (and qa tests), it is documented as being more generic. So I decided to just trim at the source and not try to introduce any MS-isms into the parse function. Something similar will be needed for DOCX, but DOCX import for FIELD_IF is completely missing. Change-Id: I822b400e3e53abd953f4c382947f0e80ae62b234 Reviewed-on: https://gerrit.libreoffice.org/c/core/+/145691 Tested-by: Jenkins Reviewed-by: Justin Luth <jluth@mail.com>
2022-05-03ofz#47198 Use-of-uninitialized-valueCaolán McNamara1-1/+3
seen in ww8 filter with rName of length 0 in if (rName.startsWithIgnoreAsciiCase("Tms Rmn") Change-Id: Ia8a20971161a44d62ead9bfcef59f86b007fd58b Reviewed-on: https://gerrit.libreoffice.org/c/core/+/133713 Tested-by: Caolán McNamara <caolanm@redhat.com> Reviewed-by: Caolán McNamara <caolanm@redhat.com>
2022-03-22tdf#147861 ww8import: use GetFieldResult, not current DocPropertyJustin Luth3-2/+30
In all the testing I could think of on DOCX and DOC examples (and only a very few exist in the unit tests) the actual value of the DocProperty was irrelevant to what Word shows as the document loads. It always takes the in-document, as-last-seen static text. As a way to hack a fix using existing capabilities, I marked as FIXEDFLD the unknown custom fields that weren't handled separately. That fixes what is displayed as the import value, (which of course means that F9 will no longer return a modification back to the DocProperty value). It also means the (fixed) field is lost on export, but a follow-up patch handles that for DOC/RTF/DOCX. There were NO DI_CUSTOM examples in existing ww8 tests, but: -ooxmlexport8: fdo74745.docx, fdo81486.docx -ooxmlexport10: tdf92157.docx and in these cases the plain text matched the variable anyway, but a manual manipulation showed that LO is importing DOCX wrong as well, so a similar import fix needs to happen for RTF/DOCX. My fear is that there are some special-magic-associations that worked properly the old way by accident that I will break by marking them as fixed. No backporting please since obviously very few people report bugs about fields. Change-Id: I3f167eb3bd570b66ee829241bf9d31d557fc8749 Reviewed-on: https://gerrit.libreoffice.org/c/core/+/131237 Tested-by: Jenkins Reviewed-by: Miklos Vajna <vmiklos@collabora.com> Reviewed-by: Justin Luth <jluth@mail.com>
2021-12-21filter: try to detect 0-byte DOC files based on extensionMiklos Vajna4-2/+33
Commit ae1f51b4888a3aa14837ac6e4083f33b2176ca45 (tdf#123476 filter: try to detect 0-byte files based on extension, 2020-10-28), already implemented this UNO-based import filters, do the same for built-in filters as well. Another problem in filter/ was to pick the WW6 filter for .doc -- require export+preferred support in the filter to get WW8 instead. An additional filter that may kick in is MS Word 2003 XML: this is avoided by requiring "preferred". Change-Id: I46e280beb5341213b0fe7a09a549b52c0c1ea3f6 Reviewed-on: https://gerrit.libreoffice.org/c/core/+/127219 Reviewed-by: Miklos Vajna <vmiklos@collabora.com> Tested-by: Jenkins
2021-01-22tdf#121669 ww8 export: use the "we have equal columns" flagJustin Luth4-11/+24
If the columns are marked as AutoWidth, then there is no need to go to the remarkably poor layout code to determine if the columns should be exported as equal. In this case, it appears as if the layout engine hadn't really identified the full width, or evaluated the wish values of each column. This fixes DOCX, DOC, and RTF. Change-Id: I1a1193b65d01e654b3bfbfaee7d8c02a683ae2c0 Reviewed-on: https://gerrit.libreoffice.org/c/core/+/109762 Tested-by: Jenkins Reviewed-by: Justin Luth <justin_luth@sil.org> Reviewed-by: Miklos Vajna <vmiklos@collabora.com>
2020-11-06DOCX import: fix assertion failure when redline ends right before a ToCMiklos Vajna5-0/+134
This was always a problem, but now more visible since commit 8b3c861c46ae12d21b7b3a550e2daa21d2006b77 (tdf#89991 DOCX: import Show changes from older formats, 2019-06-13), as it now affects more documents: tracked changes can be hidden by the time the initial layout is created. With that aside, the immediate problem is an assertion failure in InsertCnt_(), because it assumes that an end node for a section has to have a matching pActualSection, created by start node of the same section. This will fail in case the start node is hidden, but not the end node. The deeper problem is that redlines are not supposed to cross section boundaries: if e.g. multiple cells are selected in a table and the user deletes while tracking changes, then the UI creates multiple redlines instead. The problem here is similar: a delete redline ends right before the section start, so when SwNodes::InsertTextSection() inserts a section node, the end of that redline is automatically moved to the start of the section content (its index increases, the actual SwNode doesn't change). Fix the problem by explicitly checking for a redline end at ToX start and moving it back to the end of last content node. This matches the doc model produced by the WW8 import. Change-Id: Ic7b279185a20d2a32abd054d3fc6be530ddde12a Reviewed-on: https://gerrit.libreoffice.org/c/core/+/105412 Reviewed-by: Miklos Vajna <vmiklos@collabora.com> Tested-by: Jenkins
2020-10-19tdf#136983 partial revert NFC ww8 cleanup: remove unused variablesJustin Luth2-1/+10
This is a partial revert of LO 6.2 commit 2ec0cf500222aef55d02df80154b47fbb92970c9 I can't think of any excuse for how I possibly missed that xDocProps was being defined/used outside of this clause. Just plain stupid and blind. The good news is that the create and modified date still seem to be getting saved somehow/somewhere. So it isn't the disaster that it looks like it could have been. Change-Id: I72ef56fa50b9e92e4ce687b132b1919cfae6c1f6 Reviewed-on: https://gerrit.libreoffice.org/c/core/+/103565 Tested-by: Jenkins Reviewed-by: Justin Luth <justin_luth@sil.org>
2020-08-27related tdf#132149 ww8 export: unit test to prevent bad fixJustin Luth1-0/+6
The unit test I am using for this patch writes out landscape attribute, but provides portrait width/height values. I was tempted to just "fix" the values on export, but this existing document shows that w/h trump p/l in LO (and Word does the same). So that should also round-trip, and this test will ensure that keeps happening. Change-Id: Ib55cb799462abd1039ce7c1c935b3f66761a5dc2 Reviewed-on: https://gerrit.libreoffice.org/c/core/+/101479 Tested-by: Jenkins Reviewed-by: Justin Luth <justin_luth@sil.org>
2020-07-30DOCX import: fix overlapping floating tables when anchored inside a tableMiklos Vajna3-0/+23
The WW8 import does the same in SwWW8ImplReader::StartTable(), now we're on par with that. Change-Id: I2ce0d96d255d8f405203f36a358559687b36e9e3 Reviewed-on: https://gerrit.libreoffice.org/c/core/+/99762 Reviewed-by: Miklos Vajna <vmiklos@collabora.com> Tested-by: Jenkins
2020-07-13tdf#134618 sw: WW8 import: don't insert fieldmark for SHAPE fieldMichael Stahl1-1/+1
Follow DomainMapper_Impl::CloseFieldCommand() and just don't waste effort creating a fieldmark that doesn't provide any benefit. This should avoid any fieldmark related problems introduced in e511a0ca5dde6d731bb126bbfe21768867890102..d9030ad6298e2f49ee63489d6158ea6ad23c0111 Change-Id: I6688dcda1e3b41ac648f3d69740f05d34bb46191 Reviewed-on: https://gerrit.libreoffice.org/c/core/+/98542 Tested-by: Jenkins Reviewed-by: Michael Stahl <michael.stahl@cib.de>
2020-07-07tdf#134264 writerfilter: fix DOCX->DOC of ADDRESSBLOCK fieldMichael Stahl4-112/+245
... and other unsupported ones; the problem was that the field got exported with ww::eUNKNOWN = 1, which can't be imported again. Move the ww8 eField enum to include/ so it can be used from writerfilter. (regression from e511a0ca5dde6d731bb126bbfe21768867890102..d9030ad6298e2f49ee63489d6158ea6ad23c0111) Change-Id: I19193392d62fdf0bba01fac2516bafe9fdfa5a99 Reviewed-on: https://gerrit.libreoffice.org/c/core/+/98221 Tested-by: Jenkins Reviewed-by: Michael Stahl <michael.stahl@cib.de>
2020-04-16tdf#132094 doc: fix export of fill in wrap-through fly framesJustin Luth3-4/+17
This builds on commit 2f13dbac060ae6af7e25ad3eff675cc859cfb3ff by Miklos Vajna on Fri Jun 15 08:49:46 2012 +0100 n#325936 fix ww8 export of fly frames with transparent bg where he wisely and cautiously says Regression from commit ed8b5f2d -- to be safe, reverted only for fly frames in headers. because for some unknown reason, way back in 2002, commit ed8b5f2debac216243930aba0873e0d75de8d0dd forced all frames to specify a background fill. Typically of course this is white, and so who notices? Well, you notice if your frame is transparent, and now the area fill hides something that it is over top of. Like for example a transparent image, where the text wraps through the image. At first I was going to just try and revert everything. Then I decided it likely was a difference between how LO and MSO handled stacking/overlapping things. After that, I was going to just make an exception for eShapeType == mso_sptPictureFrame, but that only seems necessary if there is something underneath. If the something is just a background, that is handled anyway, so really it would only be other shapes or (most importantly) text, so the safest thing is testing wrap through, which there was already a pre-defined variable to reuse (and fix the spelling). Change-Id: I9236579fa692e22205bab5a21c3f9d919f4cf24f Reviewed-on: https://gerrit.libreoffice.org/c/core/+/92215 Tested-by: Jenkins Reviewed-by: Michael Stahl <michael.stahl@cib.de> Reviewed-by: Justin Luth <justin_luth@sil.org>
2020-03-31sw: DOCX export: avoid section breaks in text framesMichael Stahl1-0/+12
The problem is that if Word reads a w:sectPr that is inside a w:textbox and has a w:headerReference, then Word throws a confusing error reporting a location inside the headerN.xml file and refuses to open the file. It looks like Word doesn't actually support sections inside text frames, although it doesn't complain if the section break doesn't contain a header/footer reference. The WW8 export appears to avoid this by checking that TXT_MAINTEXT == m_nTextTyp and skipping sections otherwise, but the m_nTextTyp doesn't change when exporting a text frame in DOCX case, so let's change that. Possibly this makes m_bFlyFrameGraphic variable redundant, not sure about that. Change-Id: If862b226254983bb608bbce180f4aa2f41721273 Reviewed-on: https://gerrit.libreoffice.org/c/core/+/91421 Tested-by: Jenkins Reviewed-by: Michael Stahl <michael.stahl@cib.de>
2020-03-24ofz#21168 sw,writerfilter: limit writerfilter hack to writerfilterCaolán McNamara10-2/+46
The problem is that at the end of WW8 import, a delete redline is inserted that ends up calling DeleteAndJoin from inside AppendRedline(). A fly is anchored AT_CHAR at (node 46, offset 0) and the deletion goes from (node 46, offset 0) to (node 48, offset 13) hence the special case check in IsDestroyFrameAnchoredAtChar() for the IsInReading() prevents it from being deleted, and then its anchor is still registered at the node 46 when it gets deleted. So try to restrict the WriterfilterHack to writerfilter, so it won't affect WW8 import. Unfortunately this is far less obvious than expected, because import can happen for creating a new file, in which case it's all done via UNO in writerfilter, or when inserting into an existing file, in which case SwReader::Read() is used. The SwDocShell's pMedium can't be used becuse in insert file case it will be the loaded file, not the inserted file. There isn't any obvious alternative to adding a silly UNO property for the writerfilter to use. Change-Id: Ia7fdc9bb1925202f6692ebee6e4b6b1fe50e5345 Reviewed-on: https://gerrit.libreoffice.org/c/core/+/90384 Tested-by: Jenkins Reviewed-by: Caolán McNamara <caolanm@redhat.com>
2020-01-30tdf#45589 sw: invalidate on bookmark insertion/deletionMichael Stahl3-4/+62
Invalidate the text frames when a bookmark is inserted or deleted; also when MarkManager::repositionMark() changes the positions. The other calls of SetMarkPos()/SetOtherMarkPos() look like they're all from code that corrects positions after text insertions or deletions so no additional invalidate should be necessary there. It turns out that one WW8 document in sw_filters_test wants to insert a bookmark on a SwGrfNode; check for that in makeMark(). Change-Id: I293e6da9042bea5992cb27091b9cff77e5c7961d Reviewed-on: https://gerrit.libreoffice.org/c/core/+/87157 Tested-by: Jenkins Reviewed-by: Michael Stahl <michael.stahl@cib.de>
2019-12-19tdf#129247 writerfilter,sw: improve handling of CONTROL fieldsMichael Stahl2-1/+14
The "CONTROL Forms.CheckBox.1" field has a shape as its result. Previously this was imported as an unknown generic field by writerfilter and exported as a CONTROL field followed by a SHAPE field; the CONTROL field was discarded by Writer on a subsequent import. Now this is exported as nested fields to WW8, i.e., SHAPE inside the result of CONTROL, which is an improvement. Unfortunately the WW8 import discards the result of the CONTROL field, because its field code is written as ww::eUNKNOWN = 1, not ww::eCONTROL = 87. To fix that, set the ODF_ID_PARAM parameter in writerfilter for these fields, which is checked in MSWordExportBase::OutputTextNode(). This reveals that the field code was set wrongly on the fieldmark too, it should be set as a ODF_CODE_PARAM parameter and not as the type. Furthermore the WW8 import needs to allow nested fields in the eCONTROL field. Change-Id: If79a186ea30c3b4a933ba1d8325111215250b833 Reviewed-on: https://gerrit.libreoffice.org/85418 Reviewed-by: Michael Stahl <michael.stahl@cib.de> Tested-by: Michael Stahl <michael.stahl@cib.de>
2019-12-17ofz#18534 sw: WW8 import: avoid creating redlines that overlap...Michael Stahl6-22/+31
...with fieldmarks, as the editing operations already do. This was triggering ~SwIndexReg assert when creating this redline: $4 = "     \b\nfür \a\003     \b\nKlasse \a\003     \b\t\tSchuljahr \a\003     \b\t\a\003  \b. Halbjahr\nLeistungsbeurteilung lt. Konferenzbeschluss vom " Change-Id: I904be93e044c4b98bb8c806357ed061692303c7a Reviewed-on: https://gerrit.libreoffice.org/85149 Tested-by: Jenkins Reviewed-by: Caolán McNamara <caolanm@redhat.com> Tested-by: Caolán McNamara <caolanm@redhat.com>
2019-12-17tdf#112202 writerfilter,sw: fix loss of headersMichael Stahl5-9/+101
There are several problems here: * CloseSectionGroup() is not only called for actual sections in the document but also at the end of every special text like comment, footnote, etc; only actual sections can set page styles. Writer comments use editengine so cannot even contain sections. * With continous section breaks, headers and footers are inherited from the previous section unless defined by the current section; SwXText::copyText() did not copy the content of the header on page 4 to page 5 correctly because it used an SwXTextCursor to create the selection, which cannot select the table at the start of the header. * For continuous section breaks, WW8 import filter has a heuristic to find the first page break in the section and set the PageDescName property on that node to apply the page style with the headers of the new section; do something similar in writerfilter SectionPropertyMap::CloseSectionGroup() Change-Id: I3ebe3d299f83197cbf8f10de46c34de98677626c Reviewed-on: https://gerrit.libreoffice.org/85213 Tested-by: Jenkins Reviewed-by: Michael Stahl <michael.stahl@cib.de>
2019-11-26sw: WW8 import: fix again asserts on fdo45983-1.doc export to ODTMichael Stahl3-8/+14
The problem is that we now insert 2 dummy characters at the start of a fieldmark instead of 1, so the checks in RedlineStack::MoveAttrs() were off by 1 and we get the same invalid redline containing the start but not the end of a fieldmark. (regression from 7f2e61f884949ab27bcb7e1a02ece9a5cb4354b9) Change-Id: I9752ca4c3a281539e37ddac4fe811e2f9d7374a6 Reviewed-on: https://gerrit.libreoffice.org/83783 Tested-by: Jenkins Reviewed-by: Michael Stahl <michael.stahl@cib.de>
2019-11-26ofz#19065 sw: invalid fieldmarks created in SwWW8ImplReader::End_FieldMichael Stahl3-6/+9
The problem is that the check added in commit 06767a5394f1dfba71c4f0a2a07daa5664bdbd01 "sw: WW8: do not create fieldmark with start in frame and end in body" doesn't work as well as imagined; the CheckNodesRange will only check against mismatching top-level and second-level (in the non-body-text top-levels) sections, whereas in this case the start is in one table cell and the end in the next one. So replace that and move the check into MarkManager::makeMark(), so other things than WW8 import are also checked. Change-Id: I2bf32e7b579d87600b6b6718a3222f37c14aa53d Reviewed-on: https://gerrit.libreoffice.org/83585 Tested-by: Jenkins Reviewed-by: Caolán McNamara <caolanm@redhat.com> Tested-by: Caolán McNamara <caolanm@redhat.com>
2019-11-19ofz#18554 sw: fix Null-dereference due to overlapping fieldmarksMichael Stahl5-13/+40
The problem is that the WW8 import wants to set a fieldmark on a range that contains only the CH_TXT_ATR_FIELDEND of another fieldmark: (rr) p io_pDoc->GetNodes()[12]->m_Text.copy(33,10) $30 = "\bÿÿÿ\001ÿÿÿ\001 " MarkManager::makeMark() must check that a new fieldmark never overlaps existing fieldmarks or meta-fields. While at it, it looks like the test in DocumentContentOperationsManager::DelFullPara() can't necessarily use the passed rPam, because it obviously deletes entire nodes, but at least SwRangeRedline::DelCopyOfSection() doesn't even set nContent on rPam. Also, the check in makeMark() triggers an assert in CppunitTest_sw_uiwriter testTextFormFieldInsertion because SwHistoryTextFieldmark::SetInDoc() was neglecting to subtract 1 from the end position for the CH_TXT_ATR_FIELDEND. Change-Id: I46c1955dd8dd422a41dcbb9bc68dbe09075b4922 Reviewed-on: https://gerrit.libreoffice.org/83000 Tested-by: Jenkins Reviewed-by: Caolán McNamara <caolanm@redhat.com> Tested-by: Caolán McNamara <caolanm@redhat.com>
2019-11-15sw: WW8 import: instead of control character insert '?' for footnoteMichael Stahl1-1/+1
SwWW8ImplReader::ReadChar() inserts a U+0002 control character to temporarily mark a footnote anchor; this is then deleted and replaced with a real footnote hint by SwWW8ImplReader::End_Footnote(). The assumption is that it is necessary to insert a placeholder character to be able to apply formatting to it. But if the document is corrupted, the control character could survive the import, which sounds less than ideal. So either make this magic character more explicit by documenting it in hintids.hxx and removing any outstanding ones at the end of the import, or use a non-offensive character instead; since this should only affect invalid documents, choose the solution with the least effort. Change-Id: I76d396258b32e0f0fb6393942a58a4dc57912211 Reviewed-on: https://gerrit.libreoffice.org/82723 Tested-by: Jenkins Reviewed-by: Caolán McNamara <caolanm@redhat.com> Tested-by: Caolán McNamara <caolanm@redhat.com>
2019-11-15ofz#18526 sw: WW8 import: don't insert control charactersMichael Stahl4-14/+47
Sanitize string before calling InsertString(). This segfaults since: commit b522fc0646915d4da94df38dd249c88b28f25be7 Date: Tue Sep 24 18:11:45 2019 +0200 sw: maintain fieldmarks in DeleteRange()/DeleteAndJoin()/ReplaceRange() Change-Id: I9ef73d924420686f6838fa21900ec57b4d25c905 Reviewed-on: https://gerrit.libreoffice.org/81949 Tested-by: Jenkins Reviewed-by: Caolán McNamara <caolanm@redhat.com> Tested-by: Caolán McNamara <caolanm@redhat.com>
2019-10-31sw: WW8 import: filter control characters in GetFieldResult()Michael Stahl1-1/+30
Triggers the assert in SwSubFont::GetTextSize_() on ooo58234-1.doc, which has a field result with ^G cell separators that is converted to SwInputField, which inserts the field result into SwTextNode. Change-Id: Ibdb93390862a11462d62cf744bac912d6009777e Reviewed-on: https://gerrit.libreoffice.org/81788 Tested-by: Jenkins Reviewed-by: Michael Stahl <michael.stahl@cib.de>
2019-10-23sw: WW8 import: be a little more flexible with FORMTEXT fieldsMichael Stahl2-2/+4
The subsequent export change will somehow create things like (rr) p pF->nLCode $1 = 13 (rr) p rStr $2 = " FORMTEXT \001\062\060" ... so be a little less strict with the 0x01. Change-Id: Ie99002d099a3803989b71ae8c26b7f4bfe61c943 Reviewed-on: https://gerrit.libreoffice.org/81083 Tested-by: Jenkins Reviewed-by: Michael Stahl <michael.stahl@cib.de>
2019-10-23sw: WW8: fix the separator positionMichael Stahl2-3/+5
WW8 inserts the fieldmark at the end of the result, so separator should be at the start; writerfilter inserts fieldmark at the end of command so separator should be at the end. Change-Id: I44c9811139a34f529c553dd2fd46fdaccd554732 Reviewed-on: https://gerrit.libreoffice.org/80674 Tested-by: Jenkins Reviewed-by: Michael Stahl <michael.stahl@cib.de>
2019-10-16writerfilter: sync layout-in-cell vs wrap-though behavior with ww8 importMiklos Vajna1-1/+1
I removed the same check in the WW8 import in commit d630f69d90f15bc652a62648b05ea515de78d16a (Related: tdf#124601 DOC import: improve fLayoutInCell handling, 2019-09-26). There is no reason the DOCX import shouldn't do the same, just for consistency. Change-Id: I9e56a3fcd0b13ba08e347fbc06b0960ac21b372c Reviewed-on: https://gerrit.libreoffice.org/80856 Tested-by: Jenkins Reviewed-by: Miklos Vajna <vmiklos@collabora.com>
2019-09-09tdf#126994 ww8 export: Don't skip TOX end nodeJustin Luth3-3/+9
The section end node processes the section page break, so skipping it after the Table Of Contents meant that a page break here was lost. This fix is specifically for DOCX although it could impact .doc (which already worked, and still does) and .rtf (which probably doesn't work with section end anyway). Utlimately, it just calls OutputEndNode() for an end node, so it shouldn't cause any difficulties. Change-Id: Iabc4a734365febb2b3e3bfed7d3c954b4b01da34 Reviewed-on: https://gerrit.libreoffice.org/78552 Tested-by: Jenkins Reviewed-by: Miklos Vajna <vmiklos@collabora.com>
2019-09-05tdf#95848 sw: DOCX export: crude implementation of abstractNum mappingMichael Stahl8-6/+97
The abstractNum needs to correspond to a SwList, not to a SwNumRule as it is currently implemented. Add a mapping to MSWordExportBase for "overriding" numbering definitions; these are added to m_pUsedNumTable, which appears to be necessary to interact with DuplicateNumRule(), but here we just add nullpointers, because we don't need to modify the SwNumrule, and neither do we want the vector to double-delete it. The mapping is created while iterating over the document, in AttributeOutputBase::ParaNumRule(). It turns out that this approach would work for WW8 too as DuplicateNumRule() was originally added for that format but it won't work easily for RTF; in the DOCX case the WriteNumbering() is called after the main text and footnotes/endnotes, but with RTF it's the other way around :( Change-Id: Ia0409f5ad0b2e089005024ef7f61850a06d4dcbe Reviewed-on: https://gerrit.libreoffice.org/78607 Tested-by: Jenkins Reviewed-by: Michael Stahl <Michael.Stahl@cib.de>
2019-05-25revert tdf#123912 ww8 export: re-protect implicit sectionJustin Luth3-7/+3
Revert LO 6.3 bugfix d9a6eab15a747cf4c8a3d04f4b21fe1a1c3d0721 because LO has changed the behaviour of implicit sections when Protect Forms is enabled. Now they are editable, making the Protect Form compatibility option nearly useless. (tdf#122201 commit d9a6eab15a747cf4c8a3d04f4b21fe1a1c3d0721 which was backported to LO 6.2). Since implicit sections are now editable in LO, they should export as editable. See tdf#124451 for a one-sided discussion about this. Since many people may have used this switch as a simple way to create a protected form in the past, document this change and point to sections as the way to set protection natively. Protect Form is now only good for protecting the form field itself from being deleted while the user is filling out the form, and that is true only for legacy sw::mark::IFieldmark form fields which have no UI in LO for adding them (.doc compatibility forms). Change-Id: I938f015fe63c22e831654e96de77b5809bb924ff Reviewed-on: https://gerrit.libreoffice.org/71716 Tested-by: Jenkins Reviewed-by: Justin Luth <justin_luth@sil.org>
2019-03-30tdf#123912 ww8 export: re-protect implicit sectionJustin Luth3-1/+21
Sections should retain their protected on/off status regardless of the value of PROTECT_FORM. However, if there ARE no sections, then the implicit section should use the document settings. The same is true for the pseudo -1 section which I believe can only be the last section (the fragment of the implicit section that follows the last real section). This is basically a revert of LO 6.2 commit fa667b6dc410f3af57ef436cc117352c829f95e7, restoring the previous behaviour in the case of the implicit section. Change-Id: If0b473445e0add017504a3cb61b63116f92be5ce Reviewed-on: https://gerrit.libreoffice.org/69957 Tested-by: Jenkins Reviewed-by: Justin Luth <justin_luth@sil.org>
2019-03-16tdf#98620 filter\ww8 export: spam Jc if environment defines BiDiJustin Luth3-0/+14
If the BiDi value value comes from the page style, then MS formats have no idea what to do with it, so those values are written out into the paragraph itself. Since Justify is highly dependent on BiDi in order to understand its meaning, it also needs to be spammed. Change-Id: I7407056573bb115e8bab2dce0070b0a718dcc1eb Reviewed-on: https://gerrit.libreoffice.org/66923 Tested-by: Jenkins Reviewed-by: Justin Luth <justin_luth@sil.org>
2019-03-09tdf#98620 filter\ww8 export: spam bidi=LTR alsoJustin Luth1-3/+5
If the AppLanguage is RTL, then it is likely that paragraph styles inheriting from the environment will export as RTL. So, if the paragraph is LTR (from a page style for example), then it needs to be specified in order to override the paragraph style. So, this is not just for RTL, even if it is a default value. Exactly what the paragraph style will become is complicated, and not necessarily the same across the different formats. Thus it is safest to just spam the bidi attributes across any questionable paragraphs. A followup commit will similarly spam Justify in these situations, and again the difference between doc and docx suggests it is better to spam more, rather than less. Change-Id: Id184a983675b147f051821e828583a4bf98b3211 Reviewed-on: https://gerrit.libreoffice.org/66922 Tested-by: Jenkins Reviewed-by: Justin Luth <justin_luth@sil.org>
2019-03-02filter\ww8 export: don't spam RTL if ParaStyle definedJustin Luth1-1/+4
If the paragraph itself inherits BiDi from the environment, this just means that it gets the value of the paragraph style. If paraStyle is defined, then we don't need to spam the bidi property, since it ought to naturally inherit it. I can only see two possible problems. Either my logic is wrong, or else import might not always take the paragraph style into account (for determining the meaning of justify for example). I want to start spamming justify in the case where the BiDi is not specified but is inherited from the environment (page style or AppLanguage). Separating this into multiple patches will help for debugging in case of any regressive tendencies. related to tdf#98620. Change-Id: I36bc63e6659a4b491b5c6f2c99c72ba5bb715a07 Reviewed-on: https://gerrit.libreoffice.org/66921 Tested-by: Jenkins Reviewed-by: Justin Luth <justin_luth@sil.org>
2019-01-25NFC filter\ww8 misc code cleanupsJustin Luth1-6/+3
comment spelling, unnecessary if statement, and clarifying that a variable that I want to use later is never changed. Change-Id: If42ee9cc036188d06ceb858a23724383e3933e18 Reviewed-on: https://gerrit.libreoffice.org/66920 Tested-by: Jenkins Reviewed-by: Justin Luth <justin_luth@sil.org>
2019-01-09tdf#122345 filter/ww8 export: no fake section at end of documentJustin Luth2-0/+6
If Writer has a section that is ending, then the part after that would need to be another section for MS formats. However, if the section ends at the end of the document, there is no need to start a new section. This is particularly important for exporting RTF (preventing accumulating sections/paragraphs), but it also affects docx and doc (without noticable benefit or harm, but now instead of "fake" section properties it will end with the properties of the real section - which can only be a good thing, right?) This is one step in the right direction for resolving the comment //0xffffffff, what ... is going on with that!, fixme most terribly reinterpret_cast<SwSectionFormat*>(sal_IntPtr(-1)) Change-Id: Ie0641eb78c11103b33e3d849fe0b7935476a6505 Reviewed-on: https://gerrit.libreoffice.org/65974 Tested-by: Jenkins Reviewed-by: Justin Luth <justin_luth@sil.org>
2019-01-08tdf#122452 sw_redlinehide: WW8 import/export sets layout show/hide modeMichael Stahl2-4/+11
... and not SetRedlineFlags(), similar to the way the ODF filter does it. The RTF/DOCX filters don't appear to be able to export the show/hide flag currently, they can only handle the enabled flag. Change-Id: I76bc19292882d7de5c28ea6afe0f81eadbd4a04f Reviewed-on: https://gerrit.libreoffice.org/65966 Tested-by: Jenkins Reviewed-by: Michael Stahl <Michael.Stahl@cib.de>
2018-12-20sw filter/ww8 code cleanup: remove duplicate variableJustin Luth1-3/+1
Change-Id: I2be36438ca1ab0646aa8f89dfcb317d6a162b072 Reviewed-on: https://gerrit.libreoffice.org/65266 Tested-by: Jenkins Reviewed-by: Justin Luth <justin_luth@sil.org>
2018-12-17sw filter/ww8 code cleanupJustin Luth3-5/+3
cleanup of various nonsense that I ran across. Change-Id: Ib0a2f7bbe1096b36df88bf77de0eb90405c9f677 Reviewed-on: https://gerrit.libreoffice.org/65246 Tested-by: Jenkins Reviewed-by: Justin Luth <justin_luth@sil.org>
2018-12-17tdf#121110 ww8import Jc80 justify is absolute, not Bidi relativeJustin Luth5-9/+85
Paragraph justification can be specified either absolutely (in old versions or with eWW8:Jc80) or relatively (with eWW8:Jc). The last processed SPRM wins (I assume). The WW8 format seems to ALWAYS specify Jc80, and that is overwritten by an optional Jc SPRM. I haven't seen Jc be processed before a Jc80 SPRM, but if it does, then the justify would need to be treated as absolute. If for some reason neither of these exist, BiDi will adjust by default only if it is the newer WW8 format. Again, that is an assumption because I haven't seen such a document to test. Change-Id: I966077d743f1d148fe2fb9faba87fbdd8f3507f3 Reviewed-on: https://gerrit.libreoffice.org/63591 Tested-by: Jenkins Reviewed-by: Justin Luth <justin_luth@sil.org> Reviewed-by: Miklos Vajna <vmiklos@collabora.com>
2018-12-14sw export: restore ww8 FormatBreak() to pre-2014 logicJustin Luth3-5/+17
In LO 4.3, Feb 2014 commit a31fbb53dba76736b37213b98b64937f05929a67 totally changed the logic of the FormatBreak function, affecting doc and rtf even though the focus was only on docx. This was quickly patched for some specific cases, but the careless changes weren't fully reversed. Doing that now, because reading the code it just seems all wrong. As I understand it, there seems to typically be two passes - a valid pass for bBreakBefore and then a separate PageAfter pass. When DOC changed to prefer a breakBefore sprm, it removed the bBefore flag, did nothing on the bBreakBefore pass, and on the after pass, nC wasn't defined, so it did nothing extra. Dropping the bBefore flag probably broke the docx case. Docx commit a31fbb53dba76736b37213b98b64937f05929a67 just blew that all away, and swapped when SectionBreak was called. Another 2014 patch restored the DOC PageBefore behaviour (nC not defined, so nothing happens), but didn't restore the PageAfter behaviour so SectionBreak was still swapped. So what logically seems to be needed is to restore the bBefore flag (prior to DOC's preference for breakBefore sprm), restore writing PageBreak_After in after pass, and ignore the FollowPageDesc (because it didn't seem to be included purposefully). PageAfter only seems to be UI possible in a table's text flow. So it is VERY uncommon (no instance at all in existing unit tests.) And, not surprisingly, it doesn't export a page break after the table in doc or docx format anyway. At least now it won't put the break BEFORE the table in docx. This will restore these previous behaviours: -doc/rtf: PageAfter no longer written in bBreakBefore stage -docx: -PageAfter no longer written in bBreakBefore stage -PageBefore not affected by FollowPageDesc. PageBefore is generally unaffected by this change, and now the test for page/column break matches again, as would be expected. Change-Id: I265541a04be49e6b60bfbd84c33ab5783b454058 Reviewed-on: https://gerrit.libreoffice.org/64983 Reviewed-by: Justin Luth <justin_luth@sil.org> Tested-by: Justin Luth <justin_luth@sil.org> Reviewed-by: Miklos Vajna <vmiklos@collabora.com>
2018-12-12tdf#121734: ww8 import: use direct formatting for floating object framesMike Kaganski6-10/+47
... and don't modify standard frame styles to have no borders and padding. This makes "Frame", "OLE", and "Graphics" frame styles of imported DOC files to have usual settings (for "Frame", it's 1.5 mm padding and all borders set to 0.05 pt black line). All objects that need invisible frame will have them with all necessary settings set explicitly, which allows to copy and paste such frames to other documents without problems. This makes DOC import aligned with DOCX import in this regard. Change-Id: I6f05cf71e63ceccb8e0ddebe23ec41bf69af9b52 Reviewed-on: https://gerrit.libreoffice.org/64992 Tested-by: Jenkins Reviewed-by: Mike Kaganski <mike.kaganski@collabora.com>
2018-12-05use unique_ptr in ww8 export codeNoel Grandin5-63/+50
Change-Id: I505c8005aebec40b8e812aea10deaf79eb7223ab Reviewed-on: https://gerrit.libreoffice.org/64523 Tested-by: Jenkins Reviewed-by: Noel Grandin <noel.grandin@collabora.co.uk>
2018-11-26partial revert tdf79435 doc: round-trip legacy input formfieldsJustin Luth3-33/+26
Apparently import isn't always properly reading some of these strings, so it is just garbage being round-tripped in some cases. Let's just avoid that until such time as import might be fixed. I couldn't readily identify the import problem. It even happens with version eWW8 files and also TestBeltAndBraces() didn't seem to prevent the problem. These crashes are due to reading garbage: /srv/crashtestdata/files/doc/ooo78311-1.doc -DISTRICT_COURSE_OUTLINE_TEMPLATE.doc /srv/crashtestdata/files/doc/kde79024-2.doc -Ü2_Blanko.doc /srv/crashtestdata/files/doc/ooo24395-1.doc -stateapp-emp.doc /srv/crashtestdata/files/doc/abi9921-1.doc /srv/crashtestdata/files/doc/ooo59101-1.doc -Hovedblankett.DOC /srv/crashtestdata/files/doc/fdo48097-1.doc -BR1010.doc Change-Id: Iceaa53760867f06c73ab900c57f197dbc0fb8e65 Reviewed-on: https://gerrit.libreoffice.org/63938 Tested-by: Jenkins Reviewed-by: Justin Luth <justin_luth@sil.org>
2018-10-19NFC ww8 cleanup: remove unused variables, simplify, whitespaceJustin Luth2-22/+9
Change-Id: Ib4f100d4019643cde893ef1d8643a5c08b55ff8f Reviewed-on: https://gerrit.libreoffice.org/61951 Tested-by: Jenkins Reviewed-by: Justin Luth <justin_luth@sil.org>