Next: HTML Cross-reference Mismatch, Previous: HTML Cross-reference Command Expansion, Up: HTML Cross-references [Contents][Index]
Usually, characters other than plain 7-bit ASCII are transformed into
the corresponding Unicode code point(s) in Normalization Form C,
which uses precomposed characters where available. (This is the
normalization form recommended by the W3C and other bodies.) This
holds when that code point is 0xffff
or less, as it almost
always is.
These will then be further transformed by the rules above into the string ‘_hhhh’, where hhhh is the code point in hex.
For example, combining this rule and the previous section:
@node @b{A} @TeX{} @u{B} @point{}@enddots{} ⇒ A-TeX-B_0306-_2605_002e_002e_002e
Notice: 1) @enddots
expands to three periods which in
turn expands to three ‘_002e’’s; 2) @u{B}
is a ‘B’
with a breve accent, which does not exist as a pre-accented Unicode
character, therefore expands to ‘B_0306’ (B with combining
breve).
When the Unicode code point is above 0xffff
, the transformation
is ‘__xxxxxx’, that is, two leading underscores followed by
six hex digits. Since Unicode has declared that their highest code
point is 0x10ffff
, this is sufficient. (We felt it was better
to define this extra escape than to always use six hex digits, since
the first two would nearly always be zeros.)
This method works fine if the node name consists mostly of ASCII
characters and contains only few 8-bit ones. But if the document is
written in a language whose script is not based on the Latin alphabet
(for example, Ukrainian), it will create file names consisting almost
entirely of ‘_xxxx’ notations, which is inconvenient and
all but unreadable. To handle such cases, makeinfo
offers
the --transliterate-file-names command line option. This
option enables transliteration of node names into ASCII
characters for the purposes of file name creation and referencing.
The transliteration is based on phonetic principles, which makes the
generated file names more easily understandable.
For the definition of Unicode Normalization Form C, see Unicode report UAX#15, http://www.unicode.org/reports/tr15/. Many related documents and implementations are available elsewhere on the web.
Next: HTML Cross-reference Mismatch, Previous: HTML Cross-reference Command Expansion, Up: HTML Cross-references [Contents][Index]