[ < ] | [ > ] | [ << ] | [Plus haut] | [ >> ] | [Top] | [Table des matières] | [Index] | [ ? ] |
Character code conversion involves conversion between the encoding used inside Emacs and some other encoding. Emacs supports many different encodings, in that it can convert to and from them. For example, it can convert text to or from encodings such as Latin 1, Latin 2, Latin 3, Latin 4, Latin 5, and several variants of ISO 2022. In some cases, Emacs supports several alternative encodings for the same characters; for example, there are three coding systems for the Cyrillic (Russian) alphabet: ISO, Alternativnyj, and KOI8.
Most coding systems specify a particular character code for conversion, but some of them leave the choice unspecified—to be chosen heuristically for each file, based on the data.
In general, a coding system doesn't guarantee roundtrip identity: decoding a byte sequence using coding system, then encoding the resulting text in the same coding system, can produce a different byte sequence. However, the following coding systems do guarantee that the byte sequence will be the same as what you originally decoded:
chinese-big5 chinese-iso-8bit cyrillic-iso-8bit emacs-mule greek-iso-8bit hebrew-iso-8bit iso-latin-1 iso-latin-2 iso-latin-3 iso-latin-4 iso-latin-5 iso-latin-8 iso-latin-9 iso-safe japanese-iso-8bit japanese-shift-jis korean-iso-8bit raw-text
Encoding buffer text and then decoding the result can also fail to reproduce
the original text. For instance, if you encode Latin-2 characters with
utf-8
and decode the result using the same coding system, you'll get
Unicode characters (of charset mule-unicode-0100-24ff
). If you
encode Unicode characters with iso-latin-2
and decode the result with
the same coding system, you'll get Latin-2 characters.
End of line conversion handles three different conventions used on various systems for representing end of line in files. The Unix convention is to use the linefeed character (also called newline). The DOS convention is to use a carriage-return and a linefeed at the end of a line. The Mac convention is to use just carriage-return.
Base coding systems such as latin-1
leave the end-of-line
conversion unspecified, to be chosen based on the data. Variant coding
systems such as latin-1-unix
, latin-1-dos
and
latin-1-mac
specify the end-of-line conversion explicitly as well.
Most base coding systems have three corresponding variants whose names are
formed by adding ‘-unix’, ‘-dos’ and ‘-mac’.
The coding system raw-text
is special in that it prevents character
code conversion, and causes the buffer visited with that coding system to be
a unibyte buffer. It does not specify the end-of-line conversion, allowing
that to be determined as usual by the data, and has the usual three variants
which specify the end-of-line conversion. no-conversion
is
equivalent to raw-text-unix
: it specifies no conversion of either
character codes or end-of-line.
The coding system emacs-mule
specifies that the data is represented
in the internal Emacs encoding. This is like raw-text
in that no
code conversion happens, but different in that the result is multibyte data.
This function returns the specified property of the coding system
coding-system. Most coding system properties exist for internal
purposes, but one that you might find useful is mime-charset
. That
property's value is the name used in MIME for the character coding which
this coding system can read and write. Examples:
(coding-system-get 'iso-latin-1 'mime-charset) ⇒ iso-8859-1 (coding-system-get 'iso-2022-cn 'mime-charset) ⇒ iso-2022-cn (coding-system-get 'cyrillic-koi8 'mime-charset) ⇒ koi8-r |
The value of the mime-charset
property is also defined as an alias
for the coding system.
[ < ] | [ > ] | [ << ] | [Plus haut] | [ >> ] | [Top] | [Table des matières] | [Index] | [ ? ] |
Ce document a été généré par Eric Reinbold le 13 Octobre 2007 en utilisant texi2html 1.78.