summaryrefslogtreecommitdiff
path: root/sw/source/filter/ww8/ww8scan.cxx
diff options
context:
space:
mode:
authorDaeHyun Sung <sungdh86+git@gmail.com>2020-08-02 21:00:53 +0900
committerCaolán McNamara <caolanm@redhat.com>2020-10-12 09:42:34 +0200
commitc8e8860f8b1453f0a51c6202ce8ff90b7c4ba515 (patch)
tree31ea1893c914e5ff7874a4dc350dc526c4e21d73 /sw/source/filter/ww8/ww8scan.cxx
parent306891f88b859c5544b8e1e4a1b2f3bec1896b17 (diff)
tdf#134742 Distinguishing all CJK fonts such as Noto CJK Fonts.
Distinguishing both Korean and Japanese fonts from all CJK[Chinese-Japanese-Korean] fonts such as Noto CJK font series and Source Han Sans series, etc. For the first time, I added Hardcode script for "Noto" CJK fonts. The "Noto" CJK fonts support to Simplified Chinese, Traditional Chinese, Traditional Chinese HK, Japanese, and Korean (Pan-CJK fonts). Nowadays, Noto CJK Fonts are shown '简繁'. Noto font's KR(Korean, 한국어/한글) & JP(Japanese,日本語) represent Korean(KR, It shows '한글') and Japanese(JP, It shows '日本語'), respectively. These are not expressed in Chinese fonts, such as Simplified Chinese(简) and Traditional Chinese(繁). Also, Both TC(Traditional Chinese) and HK(Hong Kong) represent 繁. It don't shown 简繁. so, SC(Simplified Chinese) represent 简, It don't shown 简繁 So, I fixed Font select option's Noto CJK-series font Examples on LibreOffice Noto Sans CJK HK 简繁 -> Noto Sans CJK HK 繁 Noto Sans CJK JP 简繁 -> Noto Sans CJK JP 日本語 Noto Sans CJK KR 简繁 -> Noto Sans CJK KR 한글 Noto Sans CJK SC 简繁 -> Noto Sans CJK SC 简 Noto Sans CJK TC 简繁 -> Noto Sans CJK TC 繁 Noto Sans Mono CJK HK 简繁 -> Noto Sans Mono CJK HK 繁 Noto Sans Mono CJK JP 简繁 -> Noto Sans Mono CJK JP 日本語 Noto Sans Mono CJK KR 简繁 -> Noto Sans Mono CJK KR 한글 Noto Sans Mono CJK SC 简繁 -> Noto Sans Mono CJK SC 简 Noto Sans Mono CJK TC 简繁 -> Noto Sans Mono CJK TC 繁 Noto Serif CJK JP 简繁 -> Noto Serif CJK JP 日本語 Noto Serif CJK KR 简繁 -> Noto Serif CJK KR 한글 Noto Serif CJK SC 简繁 -> Noto Serif CJK SC 简 Noto Serif CJK TC 简繁 -> Noto Serif CJK TC 繁 However, It is only support to Noto CJK fonts and lack of distinguish fonts for all CJK[Chinese-Japanese-Korean) fonts. So, I think that change the code and improving the ability to distinguish fonts between Korean, Chinese and Japanese. 1. `remove Hardcode script for "Noto" CJK fonts 2. add hardcode script at attemptToDisambiguateHan(UScriptCode eScript, OutputDevice const &rDevice) and change distinguish among Korean, Japanese and Chinese fonts. Former - static const sal_Unicode aKorean[] = { 0x3131 }; - static const sal_Unicode aJapanese[] = { 0x3007, 0x9F9D }; - static const sal_Unicode aTraditionalChinese[] = { 0x570B }; - static const sal_Unicode aSimplifiedChinese[] = { 0x56FD }; Korean: U+3131 ㄱ Hangul Letter Kiyeok Japanese: U+3007 〇 Ideographic Number Zero & U+9F9D 龝 Traditional Chinese: U+570B Simplified Chinese: U+56FD That code’s problem Both Japaese kanji U+3007 〇 and U+9F9D 龝 also uses in Korean & Chinese. U+3007 〇 Definition: zero It uses in CJK(Chinese, Japanese and Korean) It usually uses number expression in MS Excel, LibreOffice. U+9F9D 龝 Definition: autumn, fall; year Mandarin Chinese reads qiū Korean Hanja sound is 추 chu Japanese Kun sound is ‘AKI' or ‘TOKI’ Japanese On sound is ‘SHUU’ That meaning likes ‘秋’. Korean [한자 너 어디 있었니?] 54. 분탕 焚蕩 http://www.incheonilbo.com/news/articleView.html?idxno=1019040 참고로 가을날 벼에 달라붙은 메뚜기 모양을 한 글자인 龝(추)는 秋의 고자(古字)로 서예가들이 멋을 부리기 위해 사용하기도 한다. Japanese 「龝」の漢字‐読み方・意味・部首・画数 - 漢字辞典 https://kanjitisiki.com/jis2/2-3/020.html 漢字の「龝」についてです。「秋」の異体字です。 Chinese 龝 - 中國哲學書電子化計劃 https://ctext.org/dictionary.pl?if=gb&char=%E9%BE%9D 《康熙字典·四》: 秋:〔古文〕龝《唐韻》七由切《集韻》《韻會》雌由切《正韻》此由切,音鰌。 Also, Both U+570B 國 and U+56FD 国 doesn't distinguish CJK languages. Because, 'U+570B 國’ uses in Traditional Chinese, Korean, Japanese texts. U+570B 國 Korean: 國 21國 정상급 26명 온다…평창서 `외교 올림픽` https://www.mk.co.kr/news/politics/view/2018/01/66693/ 핵융합발전 프로젝트 韓國이 주도..."ITER 부품의 70~80% 도맡아" http://www.dt.co.kr/contents.html?article_no=2020072802109931731004 Japanese: 國 ORANGE RANGE、母校の吹奏楽部・琉球國祭り太鼓とのライブを公開 https://news.yahoo.co.jp/articles/c6a7e9bb83e46662a8638cd5373a5c71d144cb8b Traditional Chinese: 國 國家森林遊樂區免費入園一次 上路一週最熱門是這地方 https://news.ltn.com.tw/news/life/breakingnews/3237355 Also, 'U+56FD 国’ uses in both Simplified Chinese and Japanese. U+56FD 国 Japanese: 国 日本人の子ども連れ去りは国ぐるみの誘拐? 批准した国際条約、国内で適用せずは許されるのか https://www.47news.jp/news/5057377.html Simplified Chinese: 国 中国国际云书馆上线运行 http://world.people.com.cn/n1/2020/0726/c1002-31797808.html My suggestion to change code Changed + static const sal_Unicode aKorean[] = { 0x4E6D, 0x4E76, 0x596C }; + static const sal_Unicode aJapanese[] = { 0x5968, 0x67A0, 0x9D8F }; + static const sal_Unicode aTraditionalChinese[] = { 0x555F, 0x96DE }; + static const sal_Unicode aSimplifiedChinese[] = { 0x4E61, 0x542F, 0x5956 }; CJK language uses Ideographs(Chinese characters) in common. But, For Ideographs(Chinese characters) in the same sense, the shape and code points of Chinese characters are different for each country. Also. Some languages make Ideographs such as Korean-made Ideographs and Japanese-made Ideographs. I added Korean-made Ideographs & Japanese-made Ideographs & only use characters in Japanese. 1.Korean-made Ideographs: U+4E6D 乭 (It reads ‘돌 dol’ in Korean. It only uses in Korean) & U+4E76 乶 (It reads '볼 bol' in Korean. It only uses in Korean) 2.Japanese-made Ideographs: U+67A0 枠 (It reads ‘waku’ in Japanese. It only uses in Japanese) 3. only use in Korean & Japanese. U+596C 奬 It usually uses in Korean. U+5968 奨 & U+9D8F 鶏 These usually use in Japanese. The Traditional Chinese(繁體中文) form of prize, reward is U+734E 獎 The Simplified Chinese(简体中文) form of prize, reward is U+5956 奖 The Korean Hanja(한국 한자/韓國 漢字) form of prize, reward is U+596C 奬 The Japanese Kanji(日本 漢字) form of prize, reward is U+5968 奨 For example, Chinese characters(Ideographs) for Rooster The Traditional Chinese(繁體中文) form of the rooster is both 雞 (U+96DE) [It says jī ] & 鷄 (U+9DC4) The Simplified Chinese(简体中文) form of the rooster is 鸡 (U+9E21) [It says jī ] The Korean Hanja(한국 한자/韓國 漢字) form of the rooster is 鷄 (U+9DC4). [It says 계, revised romanization of korean is "Gyeo”] The Japanese Kanji(日本 漢字) form of the rooster is 鶏(U+9D8F). (It says とり[tori] , にわとり[niwatori]) Adobe CJK Type Blog - Year of the rooster https://blogs.adobe.com/CCJKType/2017/01/year-of-the-rooster.html 4. only use in Traditional Chinese U+555F 啟 & U+96DE 雞 The Traditional Chinese(繁體中文) form of open & begin is U+555F 啟 The Simplified Chinese(简体中文) form of open & begin is U+542F 启 The Korean Hanja(한국 한자/韓國 漢字) form & The Japanese Kanji(日本 漢字) form of open & begin is U+5553 啓 For example, Chinese characters(Ideographs) for Rooster The Traditional Chinese(繁體中文) form of the rooster is both 雞 (U+96DE) [It says jī ] & 鷄 (U+9DC4) The Simplified Chinese(简体中文) form of the rooster is 鸡 (U+9E21) [It says jī ] The Korean Hanja(한국 한자/韓國 漢字) form of the rooster is 鷄 (U+9DC4). [It says 계, revised romanization of korean is "Gyeo”] The Japanese Kanji(日本 漢字) form of the rooster is 鶏(U+9D8F). (It says とり[tori] , にわとり[niwatori]) 5. only use in Simplified Chinese U+4E61 乡 & U+542F 启 & U+5956 奖 The Traditional Chinese(繁體中文) form of country;rural;village is U+9109 鄉 The Simplified Chinese(简体中文) form of country;rural;village is U+4E61 乡 The Korean Hanja(한국 한자/韓國 漢字) form of country;rural;village is U+9115 鄕 & The Japanese Kanji(日本 漢字) form of country;rural;village is U+90F7 郷 The Traditional Chinese(繁體中文) form of open & begin is U+555F 啟 The Simplified Chinese(简体中文) form of open & begin is U+542F 启 The Korean Hanja(한국 한자/韓國 漢字) form & The Japanese Kanji(日本 漢字) form of open & begin is U+5553 啓 The Traditional Chinese(繁體中文) form of prize, reward is U+734E 獎 The Simplified Chinese(简体中文) form of prize, reward is U+5956 奖 The Korean Hanja(한국 한자/韓國 漢字) form of prize, reward is U+596C 奬 The Japanese Kanji(日本 漢字) form of prize, reward is U+5968 奨 So, I checked and built it. I found that distinguish among Korean, Chinese, and Japanese fonts from all CJK[Chinese-Japanese-Korean] fonts such as Noto CJK font series and Source Han Sans series, etc. Change-Id: Icc1f3ea31227f77c0e3ad0ec3ed03663deedee51 Reviewed-on: https://gerrit.libreoffice.org/c/core/+/99951 Tested-by: Jenkins Reviewed-by: Caolán McNamara <caolanm@redhat.com>
Diffstat (limited to 'sw/source/filter/ww8/ww8scan.cxx')
0 files changed, 0 insertions, 0 deletions