Normalize the given unicode string according to the specified method.
The methods are:
NFC, NFD, NFKC and NFKD.
The methods are described in detail in the UAX #15 document, which
can currently be found at
http://www.unicode.org/unicode/reports/tr15/tr15-21.html
A short description:
C and D specifies whether to decompose (D) complex characters to
their parts, or compose (C) single characters to complex ones.
K specifies whether or not do a canonical or compatibility
conversion. When K is present, compatibility transformations are
performed as well as the canonical transformations.
In the following text, 'X' denotes the single character 'X', even
if there is more than one character inside the quotation marks.
The reson is that it's somewhat hard to describe unicode in
iso-8859-1.
The Unicode Standard defines two equivalences between characters:
canonical equivalence and compatibility equivalence. Canonical
equivalence is a basic equivalency between characters or
sequences of characters.
'Å' and 'A' '° (combining ring above)' are canonically equivalent.
For round-trip compatibility with existing standards, Unicode has
encoded many entities that are really variants of existing nominal
characters. The visual representations of these character are
typically a subset of the possible visual representations of the
nominal character. These are given compatibility decompositions in
the standard. Because the characters are visually distinguished,
replacing a character by a compatibility equivalent may lose
formatting information unless supplemented by markup or styling.
Examples of compatibility equivalences:
Font variants (thin, italic, extra wide characters etc)
Circled and squared characters
super/subscript ('²' -> '2')
Fractions ('½' -> '1/2')
Other composed characters ('fi' -> 'f' 'i', 'kg' -> 'k' 'g')