Encoding

The set encoding command selects a character encoding.

Syntax:

     set encoding {<value>}
     set encoding locale
     show encoding

Valid values are

  default     - tells a terminal to use its default encoding
  iso_8859_1  - the most common Western European encoding prior to UTF-8.
                Known in the PostScript world as 'ISO-Latin1'.
  iso_8859_15 - a variant of iso_8859_1 that includes the Euro symbol
  iso_8859_2  - used in Central and Eastern Europe
  iso_8859_9  - used in Turkey (also known as Latin5)
  koi8r       - popular Unix cyrillic encoding
  koi8u       - Ukrainian Unix cyrillic encoding
  cp437       - codepage for MS-DOS
  cp850       - codepage for OS/2, Western Europe
  cp852       - codepage for OS/2, Central and Eastern Europe
  cp950       - MS version of Big5 (emf terminal only)
  cp1250      - codepage for MS Windows, Central and Eastern Europe
  cp1251      - codepage for 8-bit Russian, Serbian, Bulgarian, Macedonian
  cp1252      - codepage for MS Windows, Western Europe
  cp1254      - codepage for MS Windows, Turkish (superset of Latin5)
  sjis        - shift-JIS Japanese encoding
  utf8        - variable-length (multibyte) representation of Unicode
                entry point for each character

The command set encoding locale is different from the other options. It attempts to determine the current locale from the runtime environment. On most systems this is controlled by the environmental variables LC_ALL, LC_CTYPE, or LANG. This mechanism is necessary, for example, to pass multibyte character encodings such as UTF-8 or EUC_JP to the wxt and pdf terminals. This command does not affect the locale-specific representation of dates or numbers. See also set locale (p. ) and set decimalsign (p. ).

Generally you must set the encoding before setting the terminal type, as it may affect the choice of appropriate fonts.