At least make CppunitTest_sc_text_functions_test more resilient to ICU version - libreoffice/core

diff options

author	Stephan Bergmann <sbergman@redhat.com>	2022-05-17 09:49:16 +0200
committer	Stephan Bergmann <sbergman@redhat.com>	2022-05-17 15:27:22 +0200
commit	4a23511cdad3a82f7c628426ede3eb0928a3a325 (patch)
tree	973b07e5b472b126ff2aee601e8be4eedcb6e9e3 /offapi/com/sun
parent	0ebc3a389e955f23e216272eac38158a5e5a6309 (diff)

At least make CppunitTest_sc_text_functions_test more resilient to ICU version

61f4250ee9f43902107e4d2e6322cbf54f52dd8e "Make CLEAN fully compliant woth ODFF v1.3" has changed lcl_ScInterpreter_IsPrintable (sc/source/core/tool/interpr1.cxx) to use ICU's u_isdefined to check for Unicode code points of category Cn (i.e., noncharacter or reserved). This is at least questionable, as assignment of code points to that category varies with Unicode versions. And while <https://docs.oasis-open.org/office/OpenDocument/v1.3/os/part4-formula/OpenDocument-v1.3-os-part4-formula.html#__RefHeading__1017856_715980110> "Open Document Format for Office Applications (OpenDocument) Version 1.3. Part 4: Recalculated Formula (OpenFormula) Format: 1.4 Normative References" references "The Unicode Standard, Version 5.2.0" (so one might expect CLEAN to use the category classification from that old Unicode version), versions of ICU keep being updated with current Unicode versions' category classifications. For example, the currently bundled external/icu's icu4c-70_1-src.tgz uses Unicode 14 (according to <https://icu.unicode.org/download/70#h.x1orhyniml8k>) for its implementation of u_isdefined. And for --with-system-icu, all that configure.ac apparently requires is that "icu-i18n >= 4.6" (i.e., it will potentially allow the behavior of u_isdefined to vary over a wide range of Unicode versions). And case in point, 61f4250ee9f43902107e4d2e6322cbf54f52dd8e also added a test to sc/qa/unit/data/functions/text/fods/clean.fods (row 47) that verifies that U+FDCF ARABIC LIGATURE SALAAMUHU ALAYNAA does not get cleaned away by CLEAN. But U+FDCF is only an assigned code point (thus no longer of category Cn) since Unicode 14 (cf. <https://www.unicode.org/charts/PDF/Unicode-14.0/U140-FB50.pdf>), so while builds against external/icu (covering Unicode 14) succeed, --with-system-icu builds like a flatpak build against org.freedesktop.Sdk//21.08, still at ICU 69 and Unicode 13, fail CppunitTest_sc_text_functions_test with > Testing load file:///run/build/libreoffice//sc/qa/unit/data/functions/text/fods/clean.fods: > /run/build/libreoffice/sc/qa/unit/functions_test.cxx:43:TextFunctionsTest::testTextFormulasFODS > double equality assertion failed > - Expected: 1 > - Actual : 0 > - Delta : 1e-14 (<https://flathub.org/builds/#/builders/11/builds/7103>). Irrespective of whether using ICU's varying u_isdefined in the implementation of CLEAN is correct, at least make that "doesn't get CLEAN'ed away" test more resilient to what version of ICU is being used, by using F+UF00 TIBETIAN SYLLABLE OM, which got added all the way back in Unicode 2, rather than U+FDCF ARABIC LIGATURE SALAAMUHU ALAYNAA, which only got added in Unicode 14. (And to add insult to injury, in sc/qa/unit/data/functions/text/fods/clean.fods 61f4250ee9f43902107e4d2e6322cbf54f52dd8e encoded U+FDCF in text not as the three UTF-8 bytes 0xEF 0xB7 0x8F, but rather re-encoded as the six bytes 0xC3 0xAF 0xC2 0xB7 0xC2 0x8F, i.e., the three characters U+00EF LATIN SMALL LETTER I WITH DIARESIS, U+00B7 MIDDLE DOT, U+008F. But I assume that was just a mistake, not something that I should faithfully copy in the file's new version.) Change-Id: Icc8d879b1397d8292914cbd31708d0c561f3b06e Reviewed-on: https://gerrit.libreoffice.org/c/core/+/134474 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com>

Diffstat (limited to 'offapi/com/sun')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: