2 How To Type a GAPDoc Document In this chapter we give a more formal description of what you need to start to type documentation in GAPDoc XML format. Many details were already explained by example in SectionΒ 1.2 of the introduction. We do not answer the question How to write a GAPDoc document? in this chapter. You can (hopefully) find an answer to this question by studying the example in the introduction, seeΒ 1.2, and learning about more details in the reference ChapterΒ 3. The definite source for all details of the official XML standard with useful annotations is: http://www.xml.com/axml/axml.html Although this document must be quite technical, it is surprisingly well readable. 2.1 General XML Syntax We will now discuss the pieces of text which can occur in a general XML document. We start with those pieces which do not contribute to the actual content of the document. 2.1-1 Head of XML Document Each XML document should have a head which states that it is an XML document in some encoding and which XML-defined language is used. In case of a GAPDoc document this should always look as in the following example.  Example     SeeΒ 2.1-13 for a remark on the encoding statement. (There may be local entity definitions inside the DOCTYPE statement, see SubsectionΒ 2.2-3 below.) 2.1-2 Comments A comment in XML starts with the character sequence . Between these sequences there must not be two adjacent dashes --. 2.1-3 Processing Instructions A processing instruction in XML starts with the character sequence  must not occur within the processing instruction. Β  And now we turn to those parts of the document which contribute to its actual content. 2.1-4 Names in XML and Whitespace A name in XML (used for element and attribute identifiers, see below) must start with a letter (in the encoding of the document) or with a colon : or underscore _ character. The following characters may also be digits, dots . or dashes -. This is a simplified description of the rules in the standard, which are concerned with lots of unicode ranges to specify what a letter is. Sequences only consisting of the following characters are considered as whitespace: blanks, tabs, carriage return characters and new line characters. 2.1-5 Elements The actual content of an XML document consists of elements. An element has some content with a leading start tag (2.1-6) and a trailing end tag (2.1-7). The content can contain further elements but they must be properly nested. One can define elements whose content is always empty, those elements can also be entered with a single combined tag (2.1-8). 2.1-6 Start Tags A start-tag consists of a less-than-character < directly followed (without whitespace) by an element name (seeΒ 2.1-4), optional attributes, optional whitespace, and a greater-than-character >. An attribute consists of some whitespace and then its name followed by an equal sign = which is optionally enclosed by whitespace, and the attribute value, which is enclosed either in single or double quotes. The attribute value may not contain the type of quote used as a delimiter or the character <, the character & may only appear to start an entity, seeΒ 2.1-9. We describe inΒ 2.1-11 how to enter special characters in attribute values. Note especially that no whitespace is allowed between the starting < character and the element name. The quotes around an attribute value cannot be omitted. The names of elements and attributes are case sensitive. 2.1-7 End Tags An end tag consists of the two characters . 2.1-8 Combined Tags for Empty Elements Elements which always have empty content can be written with a single tag. This looks like a start tag (seeΒ 2.1-6) except that the trailing greater-than-character > is substituted by the two character sequence />. 2.1-9 Entities An entity in XML is a macro for some substitution text. There are two types of entities. A character entity can be used to specify characters in the encoding of the document (can be useful for entering non-ASCII characters which you cannot manage to type in directly). They are entered with a sequence &#, directly followed by either some decimal digits or an x and some hexadecimal digits, directly followed by a semicolon ;. Using such a character entity is just equivalent to typing the corresponding character directly. Then there are references to named entities. They are entered with an ampersand character & directly followed by a name which is directly followed by a semicolon ;. Such entities must be declared somewhere by giving a substitution text. This text is included in the document and the document is parsed again afterwards. The exact rules are a bit subtle but you probably want to use this only in simple cases. Predefined entities for GAPDoc are described in 2.1-10 and 2.2-3. 2.1-10 Special Characters in XML We have seen that the less-than-character < and the ampersand character & start a tag or entity reference in XML. To get these characters into the document text one has to use entity references, namely < to get < and & to get &. Furthermore > must be used to get > when the string ]]> appears in element content (and not as delimiter of a CDATA section explained below). Another possibility is to use a CDATA statement explained inΒ 2.1-12. 2.1-11 Rules for Attribute Values Attribute values can contain entities which are substituted recursively. But except for the entities < or a character entity it is not allowed that a < character is introduced by the substitution (there is no XML parsing for evaluating the attribute value, just entity substitutions). 2.1-12 CDATA Pieces of text which contain many characters which can be misinterpreted as markup can be enclosed by the character sequences . Everything between these sequences is considered as content of the document and is not further interpreted as XML text. All the rules explained so far in this section do not apply to such a part of the document. The only document content which cannot be entered directly inside a CDATA statement is the sequence ]]>. This can be entered as ]]> outside the CDATA statement.  Example  A nesting of tags like is not allowed.  2.1-13 Encoding of an XML Document We suggest to use the UTF-8 encoding for writing GAPDoc XML documents. But the tools described in Chapter 5 also work with ASCII or the various ISO-8859-X encodings (ISO-8859-1 is also called latin1 and covers most special characters for western European languages). 2.1-14 Well Formed and Valid XML Documents We want to mention two further important words which are often used in the context of XML documents. A piece of text becomes a well formed XML document if all the formal rules described in this section are fulfilled. But this says nothing about the content of the document. To give this content a meaning one needs a declaration of the element and corresponding attribute names as well as of named entities which are allowed. Furthermore there may be restrictions how such elements can be nested. This definition of an XML based markup language is done in a document type definition. An XML document which contains only elements and entities declared in such a document type definition and obeys the rules given there is called valid (with respect to this document type definition). The main file of the GAPDoc package is gapdoc.dtd. This contains such a definition of a markup language. We are not going to explain the formal syntax rules for document type definitions in this section. But in ChapterΒ 3 we will explain enough about it to understand the file gapdoc.dtd and so the markup language defined there. 2.2 Entering GAPDoc Documents Here are some additional rules for writing GAPDoc XML documents. 2.2-1 Other special characters As GAPDoc documents are used to produce LaTeX and HTML documents, the question arises how to deal with characters with a special meaning for other applications (for example &, #, $, %, ~, \, {, }, _, ^, Β  (this is a non-breakable space, ~ in LaTeX) have a special meaning for LaTeX and &, <, > have a special meaning for HTML (and XML). In GAPDoc you can usually just type these characters directly, it is the task of the converter programs which translate to some output format to take care of such special characters. The exceptions to this simple rule are:  & and < must be entered as & and < as explained in 2.1-10.  The content of the GAPDoc elements ,  and  is LaTeX code, see 3.8.  The content of an  element with Only attribute contains code for the specified output type, see 3.9-1. Remark: In former versions of GAPDoc one had to use particular entities for all the special characters mentioned above (&tamp;, &hash;, $, &percent;, ˜, &bslash;, &obrace;, &cbrace;, &uscore;, &circum;, &tlt;, &tgt;). These are no longer needed, but they are still defined for backwards compatibility with older GAPDoc documents. 2.2-2 Mathematical Formulae Mathematical formulae in GAPDoc are typed as in LaTeX. They must be the content of one of three types of GAPDoc elements concerned with mathematical formulae: Math, Display, and M (see SectionsΒ 3.8-1 andΒ 3.8-2 for more details). The first two correspond to LaTeX's math mode and display math mode. The last one is a special form of the Math element type, that imposes certain restrictions on the content. On the other hand the content of an M element is processed in a well defined way for text terminal or HTML output. The Display element also has an attribute such that its content is processed as in M elements. Note that the content of these element is LaTeX code, but the special characters < and & for XML must be entered via the entities described inΒ 2.1-10 or by using a CDATA statement, seeΒ 2.1-12. 2.2-3 More Entities In GAPDoc there are some more predefined entities: β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ &GAP; β”‚ GAP β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ &GAPDoc; β”‚ GAPDoc β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ &TeX; β”‚ TeX β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ &LaTeX; β”‚ LaTeX β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ &BibTeX; β”‚ BibTeX β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ &MeatAxe; β”‚ MeatAxe β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ &XGAP; β”‚ XGAP β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ ©right; β”‚ Β© β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚   β”‚ Β  β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ – β”‚ – β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Table: Predefined Entities in the GAPDoc system Here   is a non-breakable space character. Additional entities are defined for some mathematical symbols, see 3.8 for more details. One can define further local entities right inside the head (seeΒ 2.1-1) of a GAPDoc XML document as in the following example.  Example    text possibly with markup">  ]>  These additional definitions go into the