xmerge/source/aportisdoc/java/org/openoffice/xmerge/converter/xml/sxw/aportisdoc/package.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<!--
 #*************************************************************************
 #
  DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
  
  Copyright 2008 by Sun Microsystems, Inc.
 
  OpenOffice.org - a multi-platform office productivity suite
 
  $RCSfile: package.html,v $
 
  $Revision: 1.4 $
 
  This file is part of OpenOffice.org.
 
  OpenOffice.org is free software: you can redistribute it and/or modify
  it under the terms of the GNU Lesser General Public License version 3
  only, as published by the Free Software Foundation.
 
  OpenOffice.org is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  GNU Lesser General Public License version 3 for more details
  (a copy is included in the LICENSE file that accompanied this code).
 
  You should have received a copy of the GNU Lesser General Public License
  version 3 along with OpenOffice.org.  If not, see
  <http://www.openoffice.org/license.html>
  for a copy of the LGPLv3 License.
 
 #*************************************************************************
 -->
<html>
<head>
<title>org.openoffice.xmerge.converter.xml.sxw.aportisdoc package</title>
</head>

<body bgcolor="white">

<p>Provides the tools for doing the conversion of StarWriter XML to
and from AportisDoc format.</p>

<p>It follows the {@link org.openoffice.xmerge} framework for the conversion process.</p>

<p>Since it converts to/from a Palm application format, these converters
follow the <a href=../../../../converter/palm/package-summary.html#streamformat>
<code>PalmDB</code> stream format</a> for writing out to the Palm sync client or
reading in from the Palm sync client.</p>

<p>Note that <code>PluginFactoryImpl</code> also provides a
<code>DocumentMerger</code> object, i.e. {@link org.openoffice.xmerge.converter.xml.sxw.aportisdoc.DocumentMergerImpl DocumentMergerImpl}.
This functionality was derived from its superclass
{@link org.openoffice.xmerge.converter.xml.sxw.SxwPluginFactory
SxwPluginFactory}.</p>

<h2>AportisDoc pdb format - Doc</h2>

<p>The AportisDoc pdb format is widely used by different Palm applications,
e.g. QuickWord, AportisDoc Reader, MiniWrite, etc.  Note that some
of these applications put tweaks into the format.  The converters will only
support the default AportisDoc format, plus some very minor tweaks to accommodate
other applications.</p>

<p>The text content of the format is plain text, i.e. there are no styles
or structures.  There is no notion of lists, list items, paragraphs,
headings, etc.  The format does have support for bookmarks.</p>

<p>For most Doc applications, the default character encoding supported is
the extended ASCII character set, i.e. ISO-8859-1.  StarWriter XML is in
UTF-8 encoding scheme.  Since UTF-8 encoding scheme covers more characters,
converting UTF-8 strings into extended ASCII would mean that there can be
possible loss of character mappings.</p>

<p>Using JAXP, XML files can be parsed and read in as Java <code>String</code>s
which is in Unicode format, there is no loss of character mapping from UTF-8
to Java Strings.  There is possible loss of character mapping in
converting Java <code>String</code>s to ASCII bytes.  Java characters that
cannot be represented in extended ASCII are converted into the ASCII
character '?' or x3F in hex digit via the <code>String.getBytes(encoding)</code>
API.</p>

<h2>SXW to DOC Conversion</h2>

<p>The <code>DocumentSerializerImpl</code> class implements the
<code>org.openoffice.xmerge.DocumentSerializer</code>.
This class specifically provides the conversion process from a given
<code>SxwDocument</code> object to DOC formatted records, which are
then passed back to the client via the <code>ConvertData</code> object.</p>

<p>The following XML tags are handled. [Note that some may not be implemented yet.]</p>
<ul>
<li>
    <p>Paragraphs <tt>&lt;text:p&gt;</tt> and Headings <tt>&lt;text:h&gt;</tt></p>

    <p>Heading elements are classified the same as paragraph
    elements since both have the same possible elements inside.
    Their main difference is that they refer to different types
    of style information, which is outside of their element tags.
    Since there are no styles on the DOC format, headings should
    be treated the same way a paragraph is converted.</p>

    <p>For paragraph elements, convert and transfer text nodes
    that are essential.  Text nodes directly contained within paragraph
    nodes are such.  There are also a number of elements that
    a paragraph element may contain.  These are explained in their
    own context.</p>

    <p>At the end of the paragraph, an EOL character is added by
    the converter to provide a separation for each paragraph,
    since the Doc format does not have a notion of a paragraph.</p>
</li>
<li>
    <p>White spaces <tt>&lt;text:s&gt;</tt> and Tabs <tt>&lt;text:tab-stop&gt;</tt></p>

    <p>In SXW, normally 2 or more white-space characters are collapsed into
    a single space character.  In order to make sure that the document
    content really contains those white-space characters, there are special
    elements assigned to them.</p>

    <p>The space element specifies the number of spaces are in it.
    Thus, converting it just means providing the specific number of spaces
    that the element requires.</p>

    <p>There is also the tab-stop element.  This is a bit tricky.  In a
    StarWriter document, tab-stops are specified by a column position.
    A tab is not an exact number of space, but rather a specific column
    positioning.  Say, regular tab-stops are set at every 5th column.
    At column 4, if I hit a tab, it goes to column 5.  At column 1, hitting
    a tab would put the cursor at column 5 as well.  SmartDoc and AporticDoc
    applications goes by columns for the ASCII tab character. The only problem
    is that in StarWriter, one could specify a different tab-stop, but not
    in most of these Doc applications, at least I have not seen one.
    Solution for this is just to go with the converting to the ASCII tab
    character and not do anything for different tab-stop positioning.</p>
</li>
<li>
    <p>Line breaks <tt>&lt;text:line-break&gt;</tt></p>

    <p>To represent line breaks, it is simpliest to just put an ASCII LF
    character.  Note that the side effect of this is that an end of paragraph
    also contains an ASCII LF character.  Thus, for the DOC to SXW conversion,
    line breaks are not distinguishable from specifying the end of a
    paragraph.</p>
</li>
<li>
    <p>Text spans <tt>&lt;text:span&gt;</tt></p>

    <p>Text spans contain text that have different style attributes
    from the paragraphs'.  Text spans can be embedded within another
    text span.  Since it is purely for style tagging, we only needed
    to convert and transfer the text elements within these.</p>
</li>
<li>
    <p>Hyperlinks <tt>&lt;text:a&gt;</tt>

    <p>Convert and transfer the text portion.</p>
</li>
<li>
    <p>Bookmarks <tt>&lt;text:bookmark&gt;</tt> <tt>&lt;text:bookmark-start&gt;</tt>
    <tt>&lt;text:bookmark-end&gt;</tt> [Not implemented yet]</p>

    <p>In SXW, bookmark elements are embedded inside paragraph elements.
    Bookmarks can either mark a text position or a text range. <tt>&lt;text:bookmark&gt;</tt>
    marks a position while the pair <tt>&lt;text:bookmark-start&gt;</tt> and
    <tt>&lt;text:bookmark-end&gt;</tt></p> marks a text range.  The DOC format only
    supports bookmarking a text position.  Thus, for the conversion,
    <tt>&lt;text:bookmark&gt;</tt> and  <tt>&lt;text:bookmark-start&gt;</tt> will both mark
    a text position.</p>
</li>
<li>
    <p>Change Tracking <tt>&lt;text:tracked-changes&gt;</tt>
    <tt>&lt;text:change*&gt;</tt> [Not implemented yet]</p>

    <p>Change tracking elements are not supported yet on the current
    OpenOffice XML filters, will have to watch out on this.  The text
    within these elements have to be interpreted properly during the
    conversion process.</p>
</li>
<li>
    <p>Lists <tt>&lt;text:unordered-list&gt;</tt> and
    <tt>&lt;text:ordered-lists&gt;</tt></p>

    <p>A list can only contain one optional <tt>&lt;text:list-header&gt;</tt>
    and one or more <tt>&lt;text:list-item&gt;</tt> elements.</p>

    <p>A <tt>&lt;text:list-header&gt;</tt> contains one or more paragraph
    elements.  Since there are no styles, the conversion process does not
    do anything special for list headers, conversion for the paragraphs
    within list headers are the same as explained above.</p>

    <p>A <tt>&lt;text:list-item&gt;</tt> may contain one or more of paragraphs,
    headings, list, etc.  Since the Doc format does not support any list
    structure, there will not be any special handling for this element.
    Conversion for elements within it shall be applied according to the
    element type.  Thus, lists with paragraphs within it will result in just
    plain paragraphs.  Sublists will not be identifiable.  Paragraphs in
    sublists will still appear.</p>
</li>
<li>
    <p><tt>&lt;text:section&gt;</tt></p>

    <p>I am not sure what this is yet, will need to investigate more on this.</p>
</li>
</ul>
<p>There may be other tags that will still need to be addressed for this conversion.</p>

<p>Refer to {@link org.openoffice.xmerge.converter.xml.sxw.aportisdoc.DocumentSerializerImpl DocumentSerializerImpl}
for details of implementation. It uses <code>DocEncoder</code> class to do the encoding
part.</p>

<h2>DOC to SXW Conversion</h2>

<p>The <code>DocumentDeserializerImpl</code> class implements the
<code>org.openoffice.xmerge.DocumentDeserializer</code>. It is 
passed the device document in the form of a <code>ConvertData</code> object.
It will then create a <code>SxwDocument</code> object from the conversion of
the DOC formatted records.</p>

<p>The text content of the Doc format will be transferred as text.  Paragraph
elements will be formed based on the existence of an ASCII LF character.  There
will be at least one paragraph element.</p>

<p>Bookmarks in the Doc format will be converted to the bookmark element
<tt>&lt;text:bookmark&gt;</tt> [Not implemented yet].</p>


<h2>Merging changes</h2>

<p>As mentioned above, the <code>DocumentMerger</code> object produced by
<code>PluginFactoryImpl</code> is <code>DocumentMergerImpl</code>.
Refer to the javadocs for that package/class on its merging specifications.
</p>

<h2>TODO list</h2>

<p><ol>
<li>Investigate Palm's with different character encodings.</li>
<li>Investigate other StarWriter XML tags</li>
</ol></p>

</body>
</html>