summaryrefslogtreecommitdiff
path: root/i18nlangtag/source
diff options
context:
space:
mode:
authorHossein <hossein@libreoffice.org>2022-01-04 21:12:14 +0100
committerEike Rathke <erack@redhat.com>2022-01-07 21:38:54 +0100
commit151c56ed547490a99d912524c0e56b5d6d4a1939 (patch)
treeb9e7a2c3768d262fc58316db50e2b871d53a6d2b /i18nlangtag/source
parent9ec6c2989897a2a54b6bcf82f61929c29ac62126 (diff)
tdf#146084 Don't warn for languages without hyphenation
Upon opening a Writer document containing some languages that do not use hyphen, an alert is created with the text: 'Missing hyphenation data Please install the hyphenation package for locale "ab_CD".' in which 'ab_CD' is the locale. This patch removes the warning for these languages, that do not use hyphenation: * Arabic script languages (except Uighur) + Persian (Farsi) + Kashmiri + Kurdish (Central Kurdish and Southern Kurdish with Arabic script) + Punjabi + Sindhi + Malai + Somali + Swahili + Urdu "Words are not hyphenated in Arabic language text, however hyphenation is possible for Uighur text written in the Arabic script" https://www.w3.org/International/i18n-tests/results/word-break-shaping The list from MS documents is lenghty, but some of the languages are were not available in LibreOffice, so they are ommited: https://docs.microsoft.com/en-us/typography/script-development/arabic There were languages like Hausa and Kanuri from Nigeria that use both Latin and Arabic script, but only Latin script was listed in the LibreOffice languages, so they were also ommited. * CJK languages + Japanese + Korean + Chinese + Yue Chinese "CJK languages differ from European languages in that there are no hyphenation rules" https://tug.org/TUGboat/tb25-0/cho.pdf * Vietnamese "In Vietnamese all words consist of single syllables, so they are often very short; hyphenation is not allowed at all." https://tug.org/TUGboat/tb29-1/tb91thanh-vntex.pdf Hyphenation is declined in Vietnamese orthography since 1975 https://www.quora.com/When-did-hyphenation-decline-in-Vietnamese-orthography The fix for Japanese (tdf#143422) was previously done in: 53d5555f13371252874ec962dee4643168d26780 and the functionality is preserverd with the current patch. An alternate approach would be adding all the unicode scripts, specifying the script for each langauge, and decide upon the script (mostly) and not (only) the language. More information about the hyphenation usage of many scripts can be found in: https://r12a.github.io/scripts/ This is the list of Unicode scripts: https://unicode.org/standard/supported.html https://en.wikipedia.org/wiki/Script_(Unicode)#List_of_scripts_in_Unicode Change-Id: I7d2b4ee55a0893d1f0d1f9cd3b7cc037a49589b6 Reviewed-on: https://gerrit.libreoffice.org/c/core/+/126435 Tested-by: Jenkins Reviewed-by: Eike Rathke <erack@redhat.com>
Diffstat (limited to 'i18nlangtag/source')
-rw-r--r--i18nlangtag/source/isolang/mslangid.cxx21
1 files changed, 21 insertions, 0 deletions
diff --git a/i18nlangtag/source/isolang/mslangid.cxx b/i18nlangtag/source/isolang/mslangid.cxx
index 71f6b7b49e66..ad062a8d3dcf 100644
--- a/i18nlangtag/source/isolang/mslangid.cxx
+++ b/i18nlangtag/source/isolang/mslangid.cxx
@@ -165,6 +165,27 @@ LanguageType MsLangId::resolveSystemLanguageByScriptType( LanguageType nLang, sa
return nLang;
}
+// static
+bool MsLangId::usesHyphenation(LanguageType nLang)
+{
+ if (primary(nLang).anyOf(
+ primary(LANGUAGE_ARABIC_PRIMARY_ONLY),
+ primary(LANGUAGE_FARSI),
+ primary(LANGUAGE_KASHMIRI),
+ primary(LANGUAGE_KURDISH_ARABIC_IRAQ),
+ primary(LANGUAGE_PUNJABI),
+ primary(LANGUAGE_SINDHI),
+ primary(LANGUAGE_USER_MALAY_ARABIC_MALAYSIA),
+ primary(LANGUAGE_SOMALI),
+ primary(LANGUAGE_SWAHILI),
+ primary(LANGUAGE_URDU_PAKISTAN))
+ || isCJK(nLang))
+ {
+ return false;
+ }
+ return true;
+}
+
// static
css::lang::Locale MsLangId::Conversion::convertLanguageToLocale(