diff options
Diffstat (limited to 'xmerge/java/org/openoffice/xmerge/converter/xml/sxw/aportisdoc/package.html')
-rw-r--r-- | xmerge/java/org/openoffice/xmerge/converter/xml/sxw/aportisdoc/package.html | 237 |
1 files changed, 0 insertions, 237 deletions
diff --git a/xmerge/java/org/openoffice/xmerge/converter/xml/sxw/aportisdoc/package.html b/xmerge/java/org/openoffice/xmerge/converter/xml/sxw/aportisdoc/package.html deleted file mode 100644 index 78cfe79bfbbf..000000000000 --- a/xmerge/java/org/openoffice/xmerge/converter/xml/sxw/aportisdoc/package.html +++ /dev/null @@ -1,237 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> -<!-- - - DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. - - Copyright 2000, 2010 Oracle and/or its affiliates. - - OpenOffice.org - a multi-platform office productivity suite - - This file is part of OpenOffice.org. - - OpenOffice.org is free software: you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License version 3 - only, as published by the Free Software Foundation. - - OpenOffice.org is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License version 3 for more details - (a copy is included in the LICENSE file that accompanied this code). - - You should have received a copy of the GNU Lesser General Public License - version 3 along with OpenOffice.org. If not, see - <http://www.openoffice.org/license.html> - for a copy of the LGPLv3 License. - ---> -<html> -<head> -<title>org.openoffice.xmerge.converter.xml.sxw.aportisdoc package</title> -</head> - -<body bgcolor="white"> - -<p>Provides the tools for doing the conversion of StarWriter XML to -and from AportisDoc format.</p> - -<p>It follows the {@link org.openoffice.xmerge} framework for the conversion process.</p> - -<p>Since it converts to/from a Palm application format, these converters -follow the <a href=../../../../converter/palm/package-summary.html#streamformat> -<code>PalmDB</code> stream format</a> for writing out to the Palm sync client or -reading in from the Palm sync client.</p> - -<p>Note that <code>PluginFactoryImpl</code> also provides a -<code>DocumentMerger</code> object, i.e. {@link org.openoffice.xmerge.converter.xml.sxw.aportisdoc.DocumentMergerImpl DocumentMergerImpl}. -This functionality was derived from its superclass -{@link org.openoffice.xmerge.converter.xml.sxw.SxwPluginFactory -SxwPluginFactory}.</p> - -<h2>AportisDoc pdb format - Doc</h2> - -<p>The AportisDoc pdb format is widely used by different Palm applications, -e.g. QuickWord, AportisDoc Reader, MiniWrite, etc. Note that some -of these applications put tweaks into the format. The converters will only -support the default AportisDoc format, plus some very minor tweaks to accommodate -other applications.</p> - -<p>The text content of the format is plain text, i.e. there are no styles -or structures. There is no notion of lists, list items, paragraphs, -headings, etc. The format does have support for bookmarks.</p> - -<p>For most Doc applications, the default character encoding supported is -the extended ASCII character set, i.e. ISO-8859-1. StarWriter XML is in -UTF-8 encoding scheme. Since UTF-8 encoding scheme covers more characters, -converting UTF-8 strings into extended ASCII would mean that there can be -possible loss of character mappings.</p> - -<p>Using JAXP, XML files can be parsed and read in as Java <code>String</code>s -which is in Unicode format, there is no loss of character mapping from UTF-8 -to Java Strings. There is possible loss of character mapping in -converting Java <code>String</code>s to ASCII bytes. Java characters that -cannot be represented in extended ASCII are converted into the ASCII -character '?' or x3F in hex digit via the <code>String.getBytes(encoding)</code> -API.</p> - -<h2>SXW to DOC Conversion</h2> - -<p>The <code>DocumentSerializerImpl</code> class implements the -<code>org.openoffice.xmerge.DocumentSerializer</code>. -This class specifically provides the conversion process from a given -<code>SxwDocument</code> object to DOC formatted records, which are -then passed back to the client via the <code>ConvertData</code> object.</p> - -<p>The following XML tags are handled. [Note that some may not be implemented yet.]</p> -<ul> -<li> - <p>Paragraphs <tt><text:p></tt> and Headings <tt><text:h></tt></p> - - <p>Heading elements are classified the same as paragraph - elements since both have the same possible elements inside. - Their main difference is that they refer to different types - of style information, which is outside of their element tags. - Since there are no styles on the DOC format, headings should - be treated the same way a paragraph is converted.</p> - - <p>For paragraph elements, convert and transfer text nodes - that are essential. Text nodes directly contained within paragraph - nodes are such. There are also a number of elements that - a paragraph element may contain. These are explained in their - own context.</p> - - <p>At the end of the paragraph, an EOL character is added by - the converter to provide a separation for each paragraph, - since the Doc format does not have a notion of a paragraph.</p> -</li> -<li> - <p>White spaces <tt><text:s></tt> and Tabs <tt><text:tab-stop></tt></p> - - <p>In SXW, normally 2 or more white-space characters are collapsed into - a single space character. In order to make sure that the document - content really contains those white-space characters, there are special - elements assigned to them.</p> - - <p>The space element specifies the number of spaces are in it. - Thus, converting it just means providing the specific number of spaces - that the element requires.</p> - - <p>There is also the tab-stop element. This is a bit tricky. In a - StarWriter document, tab-stops are specified by a column position. - A tab is not an exact number of space, but rather a specific column - positioning. Say, regular tab-stops are set at every 5th column. - At column 4, if I hit a tab, it goes to column 5. At column 1, hitting - a tab would put the cursor at column 5 as well. SmartDoc and AporticDoc - applications goes by columns for the ASCII tab character. The only problem - is that in StarWriter, one could specify a different tab-stop, but not - in most of these Doc applications, at least I have not seen one. - Solution for this is just to go with the converting to the ASCII tab - character and not do anything for different tab-stop positioning.</p> -</li> -<li> - <p>Line breaks <tt><text:line-break></tt></p> - - <p>To represent line breaks, it is simpliest to just put an ASCII LF - character. Note that the side effect of this is that an end of paragraph - also contains an ASCII LF character. Thus, for the DOC to SXW conversion, - line breaks are not distinguishable from specifying the end of a - paragraph.</p> -</li> -<li> - <p>Text spans <tt><text:span></tt></p> - - <p>Text spans contain text that have different style attributes - from the paragraphs'. Text spans can be embedded within another - text span. Since it is purely for style tagging, we only needed - to convert and transfer the text elements within these.</p> -</li> -<li> - <p>Hyperlinks <tt><text:a></tt> - - <p>Convert and transfer the text portion.</p> -</li> -<li> - <p>Bookmarks <tt><text:bookmark></tt> <tt><text:bookmark-start></tt> - <tt><text:bookmark-end></tt> [Not implemented yet]</p> - - <p>In SXW, bookmark elements are embedded inside paragraph elements. - Bookmarks can either mark a text position or a text range. <tt><text:bookmark></tt> - marks a position while the pair <tt><text:bookmark-start></tt> and - <tt><text:bookmark-end></tt></p> marks a text range. The DOC format only - supports bookmarking a text position. Thus, for the conversion, - <tt><text:bookmark></tt> and <tt><text:bookmark-start></tt> will both mark - a text position.</p> -</li> -<li> - <p>Change Tracking <tt><text:tracked-changes></tt> - <tt><text:change*></tt> [Not implemented yet]</p> - - <p>Change tracking elements are not supported yet on the current - OpenOffice XML filters, will have to watch out on this. The text - within these elements have to be interpreted properly during the - conversion process.</p> -</li> -<li> - <p>Lists <tt><text:unordered-list></tt> and - <tt><text:ordered-lists></tt></p> - - <p>A list can only contain one optional <tt><text:list-header></tt> - and one or more <tt><text:list-item></tt> elements.</p> - - <p>A <tt><text:list-header></tt> contains one or more paragraph - elements. Since there are no styles, the conversion process does not - do anything special for list headers, conversion for the paragraphs - within list headers are the same as explained above.</p> - - <p>A <tt><text:list-item></tt> may contain one or more of paragraphs, - headings, list, etc. Since the Doc format does not support any list - structure, there will not be any special handling for this element. - Conversion for elements within it shall be applied according to the - element type. Thus, lists with paragraphs within it will result in just - plain paragraphs. Sublists will not be identifiable. Paragraphs in - sublists will still appear.</p> -</li> -<li> - <p><tt><text:section></tt></p> - - <p>I am not sure what this is yet, will need to investigate more on this.</p> -</li> -</ul> -<p>There may be other tags that will still need to be addressed for this conversion.</p> - -<p>Refer to {@link org.openoffice.xmerge.converter.xml.sxw.aportisdoc.DocumentSerializerImpl DocumentSerializerImpl} -for details of implementation. It uses <code>DocEncoder</code> class to do the encoding -part.</p> - -<h2>DOC to SXW Conversion</h2> - -<p>The <code>DocumentDeserializerImpl</code> class implements the -<code>org.openoffice.xmerge.DocumentDeserializer</code>. It is -passed the device document in the form of a <code>ConvertData</code> object. -It will then create a <code>SxwDocument</code> object from the conversion of -the DOC formatted records.</p> - -<p>The text content of the Doc format will be transferred as text. Paragraph -elements will be formed based on the existence of an ASCII LF character. There -will be at least one paragraph element.</p> - -<p>Bookmarks in the Doc format will be converted to the bookmark element -<tt><text:bookmark></tt> [Not implemented yet].</p> - - -<h2>Merging changes</h2> - -<p>As mentioned above, the <code>DocumentMerger</code> object produced by -<code>PluginFactoryImpl</code> is <code>DocumentMergerImpl</code>. -Refer to the javadocs for that package/class on its merging specifications. -</p> - -<h2>TODO list</h2> - -<p><ol> -<li>Investigate Palm's with different character encodings.</li> -<li>Investigate other StarWriter XML tags</li> -</ol></p> - -</body> -</html> |