expat/README


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

Simple SAX parser library with added UTF-16 support.

When we build expat internally ("bundled"), we build two variants: One
that has an "ASCII" (actually UTF-8) API, another that has a "Unicode"
(meaning UTF-16) API. Additionally, expat is split into two parts,
expat_xmlparse and expat_xmltok. It's the former which has the two
variants, ascii_expat_xmlparse (UTF-8) and expat_xmlparse (UTF-16).

Code that uses expat then declares in its .mk file which one it wants
to use. See the magic in ../RepositoryExternal.mk, where in the
expat_utf16 case -DXML_UNICODE is passed when compiling source code
that wants to use the UTF-16 variant.

Now, this sounds fairly clear so far.

But wait. LO can also be conigured to use a *system* expat
library. The System expat library is only available as one variant,
the "ASCII" one. (But the library is still called just "libexpat", no
"ascii" in the name, that is just LO/OO's convention.) So how does
this work then, how can the code that wants to use the UTF-16 expat
API then actually use the "ASCII" (UTF-8) expat API? Well, in the
SYSTEM_EXPAT case no -DXML_UNICODE is used, so the code needs to check
that and adapt. So in the system libexpat case, mentioning expat_utf16
in a .mk file doesn't mean any UTF-16-using libexpat would actually be
used.

Yeah, this is silly, confusing, etc.

Furthermore, at least Debian actually *does* have also a "Unicode"
expat library, called libexpatw. Debian's LO does not use that,
though. (Using it would require modifications to the LO build
machinery.)

Now, if LO manages just fine with just the UTF-8 (or, "ASCII") system
libexpat in builds where that is used, why is a separate Unicode one
needed when an internal expat is used? Good question. Next
question. Patches welcome.

From:
[http://expat.sourceforge.net/]