summaryrefslogtreecommitdiff
path: root/unoidl/README.md
blob: 9a2f9d26338253aefdd5a2a57cce140320c9fda9 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
Support for UNOIDL registry formats

Library_unoidl contains the unoidl::Manager and unoidl::Provider implementations
for the following registry formats:

* The new UNOIDL binary types.rdb format.
* The old legacy binary types.rdb format (based on modules [[store]] and
  [[registry]]).
* A source-file format, reading (multiple) UNOIDL entity definitions directly
  from a single .idl source file.
* A source-tree format, reading UNOIDL entity definitions directly from a tree
  of .idl source files rooted at a given directory.  (Where an entity named
  foo.bar.Baz is expected in a file named foo/bar/Baz.idl within that tree.)

(While .idl files still contain #include directives for legacy idlc, the source-
based formats ignore any preprocessing directives starting with "#" in the .idl
files.)  unoidl::Manager::addProvider transparently detects the registry format
for a given URI and instantiates the corresponding provider implementation.

Executable_unoidl-write is a helper tool to convert from any of the registry
formats to the UNOIDL format.  It is used at build-time to compile UNOIDL format
.rdb files (that are used at build-time only, or included in installation sets
in URE or program/types/ or as part of bundled extensions that are created
during the build and not merely included as pre-built .oxt files) from source
.idl files.  (The SDK still uses idlc and generates legacy format .rdb files for
now.)

Executable_unoidl-read is a helper tool to convert from any of the registry
formats to the source-file format.  It can be used manually after a LibreOffice
version update to create new reference registries for Executable_unoidl-check.

Executable_unoidl-check is a helper tool to check that one registry is
backwards-compatible with another registry.  It is used at build-time to detect
inadvertent breakage of the udkapi and offapi APIs.

== Specification of the new UNOIDL types.rdb format ==

The format uses byte-oriented, platform-independent, binary files.  Larger
quantities are stored LSB first, without alignment requirements.  Offsets are
32 bit, effectively limiting the overall file size to 4GB, but that is not
considered a limitation in practice (and avoids unnecessary bloat compared to
64 bit offsets).

Annotations can be added for (non-module) entities and certain parts of such
entities (e.g., both for an interface type definition and for a direct method of
an interface type definition; the idea is that it can be added for direct parts
that forma a "many-to-one" relationship; there is a tradeoff between generality
of concept and size of representation, esp. for the C++ representation types in
namespace unoidl) and consist of arbitrary sequences of name/value strings.
Each name/value string is encoded as a single UTF-8 string containing a name (an
arbitrary sequence of Unicode code points not containing U+003D EQUALS SIGN),
optionally followed by U+003D EQUALS SIGN and a value (an arbitrary sequence of
Unicode code points).  The only annotation name currently in use is "deprecated"
(without a value).

The following definitions are used throughout:

* UInt16: 2-byte value, LSB first
* UInt32: 4-byte value, LSB first
* UInt64: 8-byte value, LSB first
* Offset: UInt32 value, counting bytes from start of file
* NUL-Name: zero or more non-NUL US-ASCII bytes followed by a NUL byte
* Len-String: UInt32 number of characters, with 0x80000000 bit 0, followed by
   that many US-ASCII (for UNOIDL related names) resp. UTF-8 (for annotations)
   bytes
* Idx-String: either an Offset (with 0x80000000 bit 1) of a Len-String, or a
   Len-String
* Annotations: UInt32 number N of annotations followed by N * Idx-String
* Entry: Offset of NUL-Name followed by Offset of payload
* Map: zero or more Entries

The file starts with an 8 byte header, followed by information about the root
map (unoidl-write generates files in a single depth-first pass, so the root map
itself is at the end of the file):

* 7 byte magic header "UNOIDL\xFF"
* version byte 0
* Offset of root Map
* UInt32 number of entries of root Map
...

Files generated by unoidl-write follow that by a

  "\0** Created by LibreOffice " LIBO_VERSION_DOTTED " unoidl-write **\0"

banner (cf. config_host/config_version.h.in), as a debugging aid.  (Old versions
used "reg2unoidl" instead of "unoidl-write" in that banner.)

Layout of per-entry payload in the root or a module Map:

* kind byte:

** 0: module
*** followed by:
**** UInt32 number N1 of entries of Map
**** N1 * Entry

** otherwise:
*** 0x80 bit: 1 if published
*** 0x40 bit: 1 if annotated
*** 0x20 bit: flag (may only be 1 for certain kinds, see below)
*** remaining bits:

**** 1: enum type
***** followed by:
****** UInt32 number N1 of members
****** N1 * tuple of:
******* Idx-String
******* UInt32
******* if annotated: Annotations

**** 2: plain struct type (with base if flag is 1)
***** followed by:
****** if "with base": Idx-String
****** UInt32 number N1 of direct members
****** N1 * tuple of:
******* Idx-String name
******* Idx-String type
******* if annotated: Annotations

**** 3: polymorphic struct type template
***** followed by:
****** UInt32 number N1 of type parameters
****** N1 * Idx-String
****** UInt32 number N2 of members
****** N2 * tuple of:
******* kind byte: 0x01 bit is 1 if parameterized type
******* Idx-String name
******* Idx-String type
******* if annotated: Annotations

**** 4: exception type (with base if flag is 1)
***** followed by:
****** if "with base": Idx-String
****** UInt32 number N1 of direct members
****** N1 * tuple of:
******* Idx-String name
******* Idx-String type
******* if annotated: Annotations

**** 5: interface type
***** followed by:
****** UInt32 number N1 of direct mandatory bases
****** N1 * tuple of:
******* Idx-String
******* if annotated: Annotations
****** UInt32 number N2 of direct optional bases
****** N2 * tuple of:
******* Idx-String
******* if annotated: Annotations
****** UInt32 number N3 of direct attributes
****** N3 * tuple of:
******* kind byte:
******** 0x02 bit: 1 if read-only
******** 0x01 bit: 1 if bound
******* Idx-String name
******* Idx-String type
******* UInt32 number N4 of get exceptions
******* N4 * Idx-String
******* UInt32 number N5 of set exceptions
******* N5 * Idx-String
******* if annotated: Annotations
****** UInt32 number N6 of direct methods
****** N6 * tuple of:
******* Idx-String name
******* Idx-String return type
******* UInt32 number N7 of parameters
******* N7 * tuple of:
******** direction byte: 0 for in, 1 for out, 2 for in-out
******** Idx-String name
******** Idx-String type
******* UInt32 number N8 of exceptions
******* N8 * Idx-String
******* if annotated: Annotations

**** 6: typedef
***** followed by:
****** Idx-String

**** 7: constant group
***** followed by:
****** UInt32 number N1 of entries of Map
****** N1 * Entry

**** 8: single-interface--based service (with default constructor if flag is 1)
***** followed by:
****** Idx-String
****** if not "with default constructor":
******* UInt32 number N1 of constructors
******* N1 * tuple of:
******** Idx-String
******** UInt32 number N2 of parameters
******** N2 * tuple of
********* kind byte: 0x04 bit is 1 if rest parameter
********* Idx-String name
********* Idx-String type
******** UInt32 number N3 of exceptions
******** N3 * Idx-String
******** if annotated: Annotations

**** 9: accumulation-based service
***** followed by:
****** UInt32 number N1 of direct mandatory base services
****** N1 * tuple of:
******* Idx-String
******* if annotated: Annotations
****** UInt32 number N2 of direct optional base services
****** N2 * tuple of:
******* Idx-String
******* if annotated: Annotations
****** UInt32 number N3 of direct mandatory base interfaces
****** N3 * tuple of:
******* Idx-String
******* if annotated: Annotations
****** UInt32 number N4 of direct optional base interfaces
****** N4 * tuple of:
******* Idx-String
******* if annotated: Annotations
****** UInt32 number N5 of direct properties
****** N5 * tuple of:
******* UInt16 kind:
******** 0x0100 bit: 1 if optional
******** 0x0080 bit: 1 if removable
******** 0x0040 bit: 1 if maybedefault
******** 0x0020 bit: 1 if maybeambiguous
******** 0x0010 bit: 1 if readonly
******** 0x0008 bit: 1 if transient
******** 0x0004 bit: 1 if constrained
******** 0x0002 bit: 1 if bound
******** 0x0001 bit: 1 if maybevoid
******* Idx-String name
******* Idx-String type
******* if annotated: Annotations

**** 10: interface-based singleton
***** followed by:
****** Idx-String

**** 11: service-based singleton
***** followed by:
****** Idx-String

*** if annotated, followed by: Annotations

Layout of per-entry payload in a constant group Map:

* kind byte:
** 0x80 bit: 1 if annotated
** remaining bits:

*** 0: BOOLEAN
**** followed by value byte, 0 represents false, 1 represents true

*** 1: BYTE
**** followed by value byte, representing values with two's complement

*** 2: SHORT
**** followed by UInt16 value, representing values with two's complement

*** 3: UNSIGNED SHORT
**** followed by UInt16 value

*** 4: LONG
**** followed by UInt32 value, representing values with two's complement

*** 5: UNSIGNED LONG
**** followed by UInt32 value

*** 6: HYPER
**** followed by UInt64 value, representing values with two's complement

*** 7: UNSIGNED HYPER
**** followed by UInt64 value

*** 8: FLOAT
**** followed by 4-byte value, representing values in ISO 60599 binary32 format,
      LSB first

*** 9: DOUBLE
**** followed by 8-byte value, representing values in ISO 60599 binary64 format,
      LSB first

* if annotated, followed by: Annotations