Frames | No Frames |
1: /* 2: * Copyright (c) 2004 World Wide Web Consortium, 3: * 4: * (Massachusetts Institute of Technology, European Research Consortium for 5: * Informatics and Mathematics, Keio University). All Rights Reserved. This 6: * work is distributed under the W3C(r) Software License [1] in the hope that 7: * it will be useful, but WITHOUT ANY WARRANTY; without even the implied 8: * warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 9: * 10: * [1] http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231 11: */ 12: 13: package org.w3c.dom.ls; 14: 15: import org.w3c.dom.DOMConfiguration; 16: import org.w3c.dom.Node; 17: import org.w3c.dom.DOMException; 18: 19: /** 20: * A <code>LSSerializer</code> provides an API for serializing (writing) a 21: * DOM document out into XML. The XML data is written to a string or an 22: * output stream. Any changes or fixups made during the serialization affect 23: * only the serialized data. The <code>Document</code> object and its 24: * children are never altered by the serialization operation. 25: * <p> During serialization of XML data, namespace fixup is done as defined in [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 26: * , Appendix B. [<a href='http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113'>DOM Level 2 Core</a>] 27: * allows empty strings as a real namespace URI. If the 28: * <code>namespaceURI</code> of a <code>Node</code> is empty string, the 29: * serialization will treat them as <code>null</code>, ignoring the prefix 30: * if any. 31: * <p> <code>LSSerializer</code> accepts any node type for serialization. For 32: * nodes of type <code>Document</code> or <code>Entity</code>, well-formed 33: * XML will be created when possible (well-formedness is guaranteed if the 34: * document or entity comes from a parse operation and is unchanged since it 35: * was created). The serialized output for these node types is either as a 36: * XML document or an External XML Entity, respectively, and is acceptable 37: * input for an XML parser. For all other types of nodes the serialized form 38: * is implementation dependent. 39: * <p>Within a <code>Document</code>, <code>DocumentFragment</code>, or 40: * <code>Entity</code> being serialized, <code>Nodes</code> are processed as 41: * follows 42: * <ul> 43: * <li> <code>Document</code> nodes are written, including the XML 44: * declaration (unless the parameter "xml-declaration" is set to 45: * <code>false</code>) and a DTD subset, if one exists in the DOM. Writing a 46: * <code>Document</code> node serializes the entire document. 47: * </li> 48: * <li> 49: * <code>Entity</code> nodes, when written directly by 50: * <code>LSSerializer.write</code>, outputs the entity expansion but no 51: * namespace fixup is done. The resulting output will be valid as an 52: * external entity. 53: * </li> 54: * <li> If the parameter "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-entities'> 55: * entities</a>" is set to <code>true</code>, <code>EntityReference</code> nodes are 56: * serialized as an entity reference of the form " 57: * <code>&entityName;</code>" in the output. Child nodes (the expansion) 58: * of the entity reference are ignored. If the parameter "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-entities'> 59: * entities</a>" is set to <code>false</code>, only the children of the entity reference 60: * are serialized. <code>EntityReference</code> nodes with no children (no 61: * corresponding <code>Entity</code> node or the corresponding 62: * <code>Entity</code> nodes have no children) are always serialized. 63: * </li> 64: * <li> 65: * <code>CDATAsections</code> containing content characters that cannot be 66: * represented in the specified output encoding are handled according to the 67: * "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-split-cdata-sections'> 68: * split-cdata-sections</a>" parameter. If the parameter is set to <code>true</code>, 69: * <code>CDATAsections</code> are split, and the unrepresentable characters 70: * are serialized as numeric character references in ordinary content. The 71: * exact position and number of splits is not specified. If the parameter 72: * is set to <code>false</code>, unrepresentable characters in a 73: * <code>CDATAsection</code> are reported as 74: * <code>"wf-invalid-character"</code> errors if the parameter "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-well-formed'> 75: * well-formed</a>" is set to <code>true</code>. The error is not recoverable - there is no 76: * mechanism for supplying alternative characters and continuing with the 77: * serialization. 78: * </li> 79: * <li> <code>DocumentFragment</code> nodes are serialized by 80: * serializing the children of the document fragment in the order they 81: * appear in the document fragment. 82: * </li> 83: * <li> All other node types (Element, Text, 84: * etc.) are serialized to their corresponding XML source form. 85: * </li> 86: * </ul> 87: * <p ><b>Note:</b> The serialization of a <code>Node</code> does not always 88: * generate a well-formed XML document, i.e. a <code>LSParser</code> might 89: * throw fatal errors when parsing the resulting serialization. 90: * <p> Within the character data of a document (outside of markup), any 91: * characters that cannot be represented directly are replaced with 92: * character references. Occurrences of '<' and '&' are replaced by 93: * the predefined entities &lt; and &amp;. The other predefined 94: * entities (&gt;, &apos;, and &quot;) might not be used, except 95: * where needed (e.g. using &gt; in cases such as ']]>'). Any 96: * characters that cannot be represented directly in the output character 97: * encoding are serialized as numeric character references (and since 98: * character encoding standards commonly use hexadecimal representations of 99: * characters, using the hexadecimal representation when serializing 100: * character references is encouraged). 101: * <p> To allow attribute values to contain both single and double quotes, the 102: * apostrophe or single-quote character (') may be represented as 103: * "&apos;", and the double-quote character (") as "&quot;". New 104: * line characters and other characters that cannot be represented directly 105: * in attribute values in the output character encoding are serialized as a 106: * numeric character reference. 107: * <p> Within markup, but outside of attributes, any occurrence of a character 108: * that cannot be represented in the output character encoding is reported 109: * as a <code>DOMError</code> fatal error. An example would be serializing 110: * the element <LaCa\u00f1ada/> with <code>encoding="us-ascii"</code>. 111: * This will result with a generation of a <code>DOMError</code> 112: * "wf-invalid-character-in-node-name" (as proposed in "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-well-formed'> 113: * well-formed</a>"). 114: * <p> When requested by setting the parameter "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-normalize-characters'> 115: * normalize-characters</a>" on <code>LSSerializer</code> to true, character normalization is 116: * performed according to the definition of <a href='http://www.w3.org/TR/2004/REC-xml11-20040204/#dt-fullnorm'>fully 117: * normalized</a> characters included in appendix E of [<a href='http://www.w3.org/TR/2004/REC-xml11-20040204/'>XML 1.1</a>] on all 118: * data to be serialized, both markup and character data. The character 119: * normalization process affects only the data as it is being written; it 120: * does not alter the DOM's view of the document after serialization has 121: * completed. 122: * <p> Implementations are required to support the encodings "UTF-8", 123: * "UTF-16", "UTF-16BE", and "UTF-16LE" to guarantee that data is 124: * serializable in all encodings that are required to be supported by all 125: * XML parsers. When the encoding is UTF-8, whether or not a byte order mark 126: * is serialized, or if the output is big-endian or little-endian, is 127: * implementation dependent. When the encoding is UTF-16, whether or not the 128: * output is big-endian or little-endian is implementation dependent, but a 129: * Byte Order Mark must be generated for non-character outputs, such as 130: * <code>LSOutput.byteStream</code> or <code>LSOutput.systemId</code>. If 131: * the Byte Order Mark is not generated, a "byte-order-mark-needed" warning 132: * is reported. When the encoding is UTF-16LE or UTF-16BE, the output is 133: * big-endian (UTF-16BE) or little-endian (UTF-16LE) and the Byte Order Mark 134: * is not be generated. In all cases, the encoding declaration, if 135: * generated, will correspond to the encoding used during the serialization 136: * (e.g. <code>encoding="UTF-16"</code> will appear if UTF-16 was 137: * requested). 138: * <p> Namespaces are fixed up during serialization, the serialization process 139: * will verify that namespace declarations, namespace prefixes and the 140: * namespace URI associated with elements and attributes are consistent. If 141: * inconsistencies are found, the serialized form of the document will be 142: * altered to remove them. The method used for doing the namespace fixup 143: * while serializing a document is the algorithm defined in Appendix B.1, 144: * "Namespace normalization", of [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 145: * . 146: * <p> While serializing a document, the parameter "discard-default-content" 147: * controls whether or not non-specified data is serialized. 148: * <p> While serializing, errors and warnings are reported to the application 149: * through the error handler (<code>LSSerializer.domConfig</code>'s "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-error-handler'> 150: * error-handler</a>" parameter). This specification does in no way try to define all possible 151: * errors and warnings that can occur while serializing a DOM node, but some 152: * common error and warning cases are defined. The types ( 153: * <code>DOMError.type</code>) of errors and warnings defined by this 154: * specification are: 155: * <dl> 156: * <dt><code>"no-output-specified" [fatal]</code></dt> 157: * <dd> Raised when 158: * writing to a <code>LSOutput</code> if no output is specified in the 159: * <code>LSOutput</code>. </dd> 160: * <dt> 161: * <code>"unbound-prefix-in-entity-reference" [fatal]</code> </dt> 162: * <dd> Raised if the 163: * configuration parameter "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-namespaces'> 164: * namespaces</a>" is set to <code>true</code> and an entity whose replacement text 165: * contains unbound namespace prefixes is referenced in a location where 166: * there are no bindings for the namespace prefixes. </dd> 167: * <dt> 168: * <code>"unsupported-encoding" [fatal]</code></dt> 169: * <dd> Raised if an unsupported 170: * encoding is encountered. </dd> 171: * </dl> 172: * <p> In addition to raising the defined errors and warnings, implementations 173: * are expected to raise implementation specific errors and warnings for any 174: * other error and warning cases such as IO errors (file not found, 175: * permission denied,...) and so on. 176: * <p>See also the <a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-LS-20040407'>Document Object Model (DOM) Level 3 Load 177: and Save Specification</a>. 178: */ 179: public interface LSSerializer { 180: /** 181: * The <code>DOMConfiguration</code> object used by the 182: * <code>LSSerializer</code> when serializing a DOM node. 183: * <br> In addition to the parameters recognized by the <a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#DOMConfiguration'> 184: * DOMConfiguration</a> interface defined in [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 185: * , the <code>DOMConfiguration</code> objects for 186: * <code>LSSerializer</code> adds, or modifies, the following 187: * parameters: 188: * <dl> 189: * <dt><code>"canonical-form"</code></dt> 190: * <dd> 191: * <dl> 192: * <dt><code>true</code></dt> 193: * <dd>[<em>optional</em>] Writes the document according to the rules specified in [<a href='http://www.w3.org/TR/2001/REC-xml-c14n-20010315'>Canonical XML</a>]. 194: * In addition to the behavior described in "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-canonical-form'> 195: * canonical-form</a>" [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 196: * , setting this parameter to <code>true</code> will set the parameters 197: * "format-pretty-print", "discard-default-content", and "xml-declaration 198: * ", to <code>false</code>. Setting one of those parameters to 199: * <code>true</code> will set this parameter to <code>false</code>. 200: * Serializing an XML 1.1 document when "canonical-form" is 201: * <code>true</code> will generate a fatal error. </dd> 202: * <dt><code>false</code></dt> 203: * <dd>[<em>required</em>] (<em>default</em>) Do not canonicalize the output. </dd> 204: * </dl></dd> 205: * <dt><code>"discard-default-content"</code></dt> 206: * <dd> 207: * <dl> 208: * <dt> 209: * <code>true</code></dt> 210: * <dd>[<em>required</em>] (<em>default</em>) Use the <code>Attr.specified</code> attribute to decide what attributes 211: * should be discarded. Note that some implementations might use 212: * whatever information available to the implementation (i.e. XML 213: * schema, DTD, the <code>Attr.specified</code> attribute, and so on) to 214: * determine what attributes and content to discard if this parameter is 215: * set to <code>true</code>. </dd> 216: * <dt><code>false</code></dt> 217: * <dd>[<em>required</em>]Keep all attributes and all content.</dd> 218: * </dl></dd> 219: * <dt><code>"format-pretty-print"</code></dt> 220: * <dd> 221: * <dl> 222: * <dt> 223: * <code>true</code></dt> 224: * <dd>[<em>optional</em>] Formatting the output by adding whitespace to produce a pretty-printed, 225: * indented, human-readable form. The exact form of the transformations 226: * is not specified by this specification. Pretty-printing changes the 227: * content of the document and may affect the validity of the document, 228: * validating implementations should preserve validity. </dd> 229: * <dt> 230: * <code>false</code></dt> 231: * <dd>[<em>required</em>] (<em>default</em>) Don't pretty-print the result. </dd> 232: * </dl></dd> 233: * <dt> 234: * <code>"ignore-unknown-character-denormalizations"</code> </dt> 235: * <dd> 236: * <dl> 237: * <dt> 238: * <code>true</code></dt> 239: * <dd>[<em>required</em>] (<em>default</em>) If, while verifying full normalization when [<a href='http://www.w3.org/TR/2004/REC-xml11-20040204/'>XML 1.1</a>] is 240: * supported, a character is encountered for which the normalization 241: * properties cannot be determined, then raise a 242: * <code>"unknown-character-denormalization"</code> warning (instead of 243: * raising an error, if this parameter is not set) and ignore any 244: * possible denormalizations caused by these characters. </dd> 245: * <dt> 246: * <code>false</code></dt> 247: * <dd>[<em>optional</em>] Report a fatal error if a character is encountered for which the 248: * processor cannot determine the normalization properties. </dd> 249: * </dl></dd> 250: * <dt> 251: * <code>"normalize-characters"</code></dt> 252: * <dd> This parameter is equivalent to 253: * the one defined by <code>DOMConfiguration</code> in [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 254: * . Unlike in the Core, the default value for this parameter is 255: * <code>true</code>. While DOM implementations are not required to 256: * support <a href='http://www.w3.org/TR/2004/REC-xml11-20040204/#dt-fullnorm'>fully 257: * normalizing</a> the characters in the document according to appendix E of [<a href='http://www.w3.org/TR/2004/REC-xml11-20040204/'>XML 1.1</a>], this 258: * parameter must be activated by default if supported. </dd> 259: * <dt> 260: * <code>"xml-declaration"</code></dt> 261: * <dd> 262: * <dl> 263: * <dt><code>true</code></dt> 264: * <dd>[<em>required</em>] (<em>default</em>) If a <code>Document</code>, <code>Element</code>, or <code>Entity</code> 265: * node is serialized, the XML declaration, or text declaration, should 266: * be included. The version (<code>Document.xmlVersion</code> if the 267: * document is a Level 3 document and the version is non-null, otherwise 268: * use the value "1.0"), and the output encoding (see 269: * <code>LSSerializer.write</code> for details on how to find the output 270: * encoding) are specified in the serialized XML declaration. </dd> 271: * <dt> 272: * <code>false</code></dt> 273: * <dd>[<em>required</em>] Do not serialize the XML and text declarations. Report a 274: * <code>"xml-declaration-needed"</code> warning if this will cause 275: * problems (i.e. the serialized data is of an XML version other than [<a href='http://www.w3.org/TR/2004/REC-xml-20040204'>XML 1.0</a>], or an 276: * encoding would be needed to be able to re-parse the serialized data). </dd> 277: * </dl></dd> 278: * </dl> 279: */ 280: public DOMConfiguration getDomConfig(); 281: 282: /** 283: * The end-of-line sequence of characters to be used in the XML being 284: * written out. Any string is supported, but XML treats only a certain 285: * set of characters sequence as end-of-line (See section 2.11, 286: * "End-of-Line Handling" in [<a href='http://www.w3.org/TR/2004/REC-xml-20040204'>XML 1.0</a>], if the 287: * serialized content is XML 1.0 or section 2.11, "End-of-Line Handling" 288: * in [<a href='http://www.w3.org/TR/2004/REC-xml11-20040204/'>XML 1.1</a>], if the 289: * serialized content is XML 1.1). Using other character sequences than 290: * the recommended ones can result in a document that is either not 291: * serializable or not well-formed). 292: * <br> On retrieval, the default value of this attribute is the 293: * implementation specific default end-of-line sequence. DOM 294: * implementations should choose the default to match the usual 295: * convention for text files in the environment being used. 296: * Implementations must choose a default sequence that matches one of 297: * those allowed by XML 1.0 or XML 1.1, depending on the serialized 298: * content. Setting this attribute to <code>null</code> will reset its 299: * value to the default value. 300: * <br> 301: */ 302: public String getNewLine(); 303: /** 304: * The end-of-line sequence of characters to be used in the XML being 305: * written out. Any string is supported, but XML treats only a certain 306: * set of characters sequence as end-of-line (See section 2.11, 307: * "End-of-Line Handling" in [<a href='http://www.w3.org/TR/2004/REC-xml-20040204'>XML 1.0</a>], if the 308: * serialized content is XML 1.0 or section 2.11, "End-of-Line Handling" 309: * in [<a href='http://www.w3.org/TR/2004/REC-xml11-20040204/'>XML 1.1</a>], if the 310: * serialized content is XML 1.1). Using other character sequences than 311: * the recommended ones can result in a document that is either not 312: * serializable or not well-formed). 313: * <br> On retrieval, the default value of this attribute is the 314: * implementation specific default end-of-line sequence. DOM 315: * implementations should choose the default to match the usual 316: * convention for text files in the environment being used. 317: * Implementations must choose a default sequence that matches one of 318: * those allowed by XML 1.0 or XML 1.1, depending on the serialized 319: * content. Setting this attribute to <code>null</code> will reset its 320: * value to the default value. 321: * <br> 322: */ 323: public void setNewLine(String newLine); 324: 325: /** 326: * When the application provides a filter, the serializer will call out 327: * to the filter before serializing each Node. The filter implementation 328: * can choose to remove the node from the stream or to terminate the 329: * serialization early. 330: * <br> The filter is invoked after the operations requested by the 331: * <code>DOMConfiguration</code> parameters have been applied. For 332: * example, CDATA sections won't be passed to the filter if "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-cdata-sections'> 333: * cdata-sections</a>" is set to <code>false</code>. 334: */ 335: public LSSerializerFilter getFilter(); 336: /** 337: * When the application provides a filter, the serializer will call out 338: * to the filter before serializing each Node. The filter implementation 339: * can choose to remove the node from the stream or to terminate the 340: * serialization early. 341: * <br> The filter is invoked after the operations requested by the 342: * <code>DOMConfiguration</code> parameters have been applied. For 343: * example, CDATA sections won't be passed to the filter if "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-cdata-sections'> 344: * cdata-sections</a>" is set to <code>false</code>. 345: */ 346: public void setFilter(LSSerializerFilter filter); 347: 348: /** 349: * Serialize the specified node as described above in the general 350: * description of the <code>LSSerializer</code> interface. The output is 351: * written to the supplied <code>LSOutput</code>. 352: * <br> When writing to a <code>LSOutput</code>, the encoding is found by 353: * looking at the encoding information that is reachable through the 354: * <code>LSOutput</code> and the item to be written (or its owner 355: * document) in this order: 356: * <ol> 357: * <li> <code>LSOutput.encoding</code>, 358: * </li> 359: * <li> 360: * <code>Document.inputEncoding</code>, 361: * </li> 362: * <li> 363: * <code>Document.xmlEncoding</code>. 364: * </li> 365: * </ol> 366: * <br> If no encoding is reachable through the above properties, a 367: * default encoding of "UTF-8" will be used. If the specified encoding 368: * is not supported an "unsupported-encoding" fatal error is raised. 369: * <br> If no output is specified in the <code>LSOutput</code>, a 370: * "no-output-specified" fatal error is raised. 371: * <br> The implementation is responsible of associating the appropriate 372: * media type with the serialized data. 373: * <br> When writing to a HTTP URI, a HTTP PUT is performed. When writing 374: * to other types of URIs, the mechanism for writing the data to the URI 375: * is implementation dependent. 376: * @param nodeArg The node to serialize. 377: * @param destination The destination for the serialized DOM. 378: * @return Returns <code>true</code> if <code>node</code> was 379: * successfully serialized. Return <code>false</code> in case the 380: * normal processing stopped but the implementation kept serializing 381: * the document; the result of the serialization being implementation 382: * dependent then. 383: * @exception LSException 384: * SERIALIZE_ERR: Raised if the <code>LSSerializer</code> was unable to 385: * serialize the node. DOM applications should attach a 386: * <code>DOMErrorHandler</code> using the parameter "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-error-handler'> 387: * error-handler</a>" if they wish to get details on the error. 388: */ 389: public boolean write(Node nodeArg, 390: LSOutput destination) 391: throws LSException; 392: 393: /** 394: * A convenience method that acts as if <code>LSSerializer.write</code> 395: * was called with a <code>LSOutput</code> with no encoding specified 396: * and <code>LSOutput.systemId</code> set to the <code>uri</code> 397: * argument. 398: * @param nodeArg The node to serialize. 399: * @param uri The URI to write to. 400: * @return Returns <code>true</code> if <code>node</code> was 401: * successfully serialized. Return <code>false</code> in case the 402: * normal processing stopped but the implementation kept serializing 403: * the document; the result of the serialization being implementation 404: * dependent then. 405: * @exception LSException 406: * SERIALIZE_ERR: Raised if the <code>LSSerializer</code> was unable to 407: * serialize the node. DOM applications should attach a 408: * <code>DOMErrorHandler</code> using the parameter "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-error-handler'> 409: * error-handler</a>" if they wish to get details on the error. 410: */ 411: public boolean writeToURI(Node nodeArg, 412: String uri) 413: throws LSException; 414: 415: /** 416: * Serialize the specified node as described above in the general 417: * description of the <code>LSSerializer</code> interface. The output is 418: * written to a <code>DOMString</code> that is returned to the caller. 419: * The encoding used is the encoding of the <code>DOMString</code> type, 420: * i.e. UTF-16. Note that no Byte Order Mark is generated in a 421: * <code>DOMString</code> object. 422: * @param nodeArg The node to serialize. 423: * @return Returns the serialized data. 424: * @exception DOMException 425: * DOMSTRING_SIZE_ERR: Raised if the resulting string is too long to 426: * fit in a <code>DOMString</code>. 427: * @exception LSException 428: * SERIALIZE_ERR: Raised if the <code>LSSerializer</code> was unable to 429: * serialize the node. DOM applications should attach a 430: * <code>DOMErrorHandler</code> using the parameter "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-error-handler'> 431: * error-handler</a>" if they wish to get details on the error. 432: */ 433: public String writeToString(Node nodeArg) 434: throws DOMException, LSException; 435: 436: }