Previous: @documentlanguage ll[_cc]
: Set the Document Language, Up: Internationalization [Contents][Index]
@documentencoding enc
: Set Input EncodingThe @documentencoding
command declares the input document
encoding, and can also affect the encoding of the output. Write it on
a line by itself, with a valid encoding specification following, near
the beginning of the file.
@documentencoding enc
Texinfo supports these encodings:
US-ASCII
This has no particular effect, but it’s included for completeness.
UTF-8
The vast global character encoding, expressed in 8-bit bytes.
ISO-8859-1
¶ISO-8859-15
ISO-8859-2
These specify the standard encodings for Western European (the first two) and Eastern European languages (the third), respectively. ISO 8859-15 replaces some little-used characters from 8859-1 (e.g., precomposed fractions) with more commonly needed ones, such as the Euro symbol (€).
A full description of the encodings is beyond our scope here; one useful reference is http://czyborra.com/charsets/iso8859.html.
koi8-r
This is the commonly used encoding for the Russian language.
koi8-u
This is the commonly used encoding for the Ukrainian language.
Specifying an encoding enc has the following effects:
In Info output, a so-called ‘Local Variables’ section (see File Variables in The GNU Emacs Manual) is output including enc. This allows Info readers to set the encoding appropriately. It looks like this:
Local Variables: coding: enc End:
Also, in Info and plain text output, unless the option
--disable-encoding is given to makeinfo
, accent
constructs and special characters, such as @'e
, are output as
the actual 8-bit or UTF-8 character in the given encoding where
possible.
In HTML output, a ‘<meta>’ tag is output, in the ‘<head>’ section of the HTML, that specifies enc. Web servers and browsers cooperate to use this information so the correct encoding is used to display the page, if supported by the system. That looks like this:
<meta http-equiv="Content-Type" content="text/html; charset=enc">
In XML and DocBook output, UTF-8 is always used for the output, according to the conventions of those formats.
In TeX output, the characters which are supported in the standard Computer Modern fonts are output accordingly. For example, this means using constructed accents rather than precomposed glyphs. Using a missing character generates a warning message, as does specifying an unimplemented encoding.
Although modern TeX systems support nearly every script in use in
the world, this wide-ranging support is not available in
texinfo.tex, and it’s not feasible to duplicate or incorporate
all that effort. (Our plan to support other scripts is to create a
LaTeX back-end to texi2any
, where the support is already
present.)
For maximum portability of Texinfo documents across the many different
user environments in the world, we recommend sticking to 7-bit ASCII
in the input unless your particular manual needs a substantial amount
of non-ASCII, e.g., it’s written in German. You can use the
@U
command to insert an occasional needed character
(see Inserting Unicode: @U
).