[ < ] | [ > ] | [ << ] | [Plus haut] | [ >> ] | [Top] | [Table des matières] | [Index] | [ ? ] |
Emacs tries to recognize which coding system to use for a given text as an integral part of reading that text. (This applies to files being read, output from subprocesses, text from X selections, etc.) Emacs can select the right coding system automatically most of the time—once you have specified your preferences.
Some coding systems can be recognized or distinguished by which byte sequences appear in the data. However, there are coding systems that cannot be distinguished, not even potentially. For example, there is no way to distinguish between Latin-1 and Latin-2; they use the same byte values with different meanings.
Emacs handles this situation by means of a priority list of coding systems. Whenever Emacs reads a file, if you do not specify the coding system to use, Emacs checks the data against each coding system, starting with the first in priority and working down the list, until it finds a coding system that fits the data. Then it converts the file contents assuming that they are represented in this coding system.
The priority list of coding systems depends on the selected language environment (@pxref{Language Environments}). For example, if you use French, you probably want Emacs to prefer Latin-1 to Latin-2; if you use Czech, you probably want Latin-2 to be preferred. This is one of the reasons to specify a language environment.
However, you can alter the coding system priority list in detail with the command M-x prefer-coding-system. This command reads the name of a coding system from the minibuffer, and adds it to the front of the priority list, so that it is preferred to all others. If you use this command several times, each use adds one element to the front of the priority list.
If you use a coding system that specifies the end-of-line conversion type,
such as iso-8859-1-dos
, what this means is that Emacs should attempt
to recognize iso-8859-1
with priority, and should use DOS end-of-line
conversion when it does recognize iso-8859-1
.
Sometimes a file name indicates which coding system to use for the file.
The variable file-coding-system-alist
specifies this correspondence.
There is a special function modify-coding-system-alist
for adding
elements to this list. For example, to read and write all ‘.txt’ files
using the coding system chinese-iso-8bit
, you can execute this Lisp
expression:
(modify-coding-system-alist 'file "\\.txt\\'" 'chinese-iso-8bit) |
The first argument should be file
, the second argument should be a
regular expression that determines which files this applies to, and the
third argument says which coding system to use for these files.
Emacs recognizes which kind of end-of-line conversion to use based on the
contents of the file: if it sees only carriage-returns, or only
carriage-return linefeed sequences, then it chooses the end-of-line
conversion accordingly. You can inhibit the automatic use of end-of-line
conversion by setting the variable inhibit-eol-conversion
to
non-nil
. If you do that, DOS-style files will be displayed with the
‘^M’ characters visible in the buffer; some people prefer this to the
more subtle ‘(DOS)’ end-of-line type indication near the left edge of
the mode line (voir la section eol-mnemonic).
By default, the automatic detection of coding system is sensitive to escape sequences. If Emacs sees a sequence of characters that begin with an escape character, and the sequence is valid as an ISO-2022 code, that tells Emacs to use one of the ISO-2022 encodings to decode the file.
However, there may be cases that you want to read escape sequences in a file
as is. In such a case, you can set the variable
inhibit-iso-escape-detection
to non-nil
. Then the code
detection ignores any escape sequences, and never uses an ISO-2022
encoding. The result is that all escape sequences become visible in the
buffer.
The default value of inhibit-iso-escape-detection
is nil
. We
recommend that you not change it permanently, only for one specific
operation. That's because many Emacs Lisp source files in the Emacs
distribution contain non-ASCII characters encoded in the coding
system iso-2022-7bit
, and they won't be decoded correctly when you
visit those files if you suppress the escape sequence detection.
The variables auto-coding-alist
, auto-coding-regexp-alist
and
auto-coding-functions
are the strongest way to specify the coding
system for certain patterns of file names, or for files containing certain
patterns; these variables even override ‘-*-coding:-*-’ tags in the
file itself. Emacs uses auto-coding-alist
for tar and archive files,
to prevent it from being confused by a ‘-*-coding:-*-’ tag in a member
of the archive and thinking it applies to the archive file as a whole.
Likewise, Emacs uses auto-coding-regexp-alist
to ensure that RMAIL
files, whose names in general don't match any particular pattern, are
decoded correctly. One of the builtin auto-coding-functions
detects
the encoding for XML files.
When you get new mail in Rmail, each message is translated automatically
from the coding system it is written in, as if it were a separate file.
This uses the priority list of coding systems that you have specified. If a
MIME message specifies a character set, Rmail obeys that specification,
unless rmail-decode-mime-charset
is nil
.
For reading and saving Rmail files themselves, Emacs uses the coding system
specified by the variable rmail-file-coding-system
. The default
value is nil
, which means that Rmail files are not translated (they
are read and written in the Emacs internal character code).
[ < ] | [ > ] | [ << ] | [Plus haut] | [ >> ] | [Top] | [Table des matières] | [Index] | [ ? ] |
Ce document a été généré par Eric Reinbold le 23 Février 2009 en utilisant texi2html 1.78.