[ < ] | [ > ] | [ << ] | [Plus haut] | [ >> ] | [Top] | [Table des matières] | [Index] | [ ? ] |
Emacs can convert unibyte text to multibyte; it can also convert multibyte text to unibyte, though this conversion loses information. In general these conversions happen when inserting text into a buffer, or when putting text from several strings together in one string. You can also explicitly convert a string's contents to either representation.
Emacs chooses the representation for a string based on the text that it is constructed from. The general rule is to convert unibyte text to multibyte text when combining it with other multibyte text, because the multibyte representation is more general and can hold whatever characters the unibyte text has.
When inserting text into a buffer, Emacs converts the text to the buffer's
representation, as specified by enable-multibyte-characters
in that
buffer. In particular, when you insert multibyte text into a unibyte
buffer, Emacs converts the text to unibyte, even though this conversion
cannot in general preserve all the characters that might be in the multibyte
text. The other natural alternative, to convert the buffer contents to
multibyte, is not acceptable because the buffer's representation is a choice
made by the user that cannot be overridden automatically.
Converting unibyte text to multibyte text leaves ASCII characters
unchanged, and likewise character codes 128 through 159. It converts the
non-ASCII codes 160 through 255 by adding the value
nonascii-insert-offset
to each character code. By setting this
variable, you specify which character set the unibyte characters correspond
to (voir la section Character Sets). For example, if nonascii-insert-offset
is 2048, which is (- (make-char 'latin-iso8859-1) 128)
, then the
unibyte non-ASCII characters correspond to Latin 1. If it is
2688, which is (- (make-char 'greek-iso8859-7) 128)
, then they
correspond to Greek letters.
Converting multibyte text to unibyte is simpler: it discards all but the low
8 bits of each character code. If nonascii-insert-offset
has a
reasonable value, corresponding to the beginning of some character set, this
conversion is the inverse of the other: converting unibyte text to multibyte
and back to unibyte reproduces the original unibyte text.
This variable specifies the amount to add to a non-ASCII character
when converting unibyte text to multibyte. It also applies when
self-insert-command
inserts a character in the unibyte
non-ASCII range, 128 through 255. However, the functions
insert
and insert-char
do not perform this conversion.
The right value to use to select character set cs is (-
(make-char cs) 128)
. If the value of nonascii-insert-offset
is zero, then conversion actually uses the value for the Latin 1 character
set, rather than zero.
This variable provides a more general alternative to
nonascii-insert-offset
. You can use it to specify independently how
to translate each code in the range of 128 through 255 into a multibyte
character. The value should be a char-table, or nil
. If this is
non-nil
, it overrides nonascii-insert-offset
.
The next three functions either return the argument string, or a newly created string with no text properties.
This function converts the text of string to unibyte representation,
if it isn't already, and returns the result. If string is a unibyte
string, it is returned unchanged. Multibyte character codes are converted
to unibyte according to nonascii-translation-table
or, if that is
nil
, using nonascii-insert-offset
. If the lookup in the
translation table fails, this function takes just the low 8 bits of each
character.
This function converts the text of string to multibyte representation,
if it isn't already, and returns the result. If string is a multibyte
string or consists entirely of ASCII characters, it is returned
unchanged. In particular, if string is unibyte and entirely
ASCII, the returned string is unibyte. (When the characters are
all ASCII, Emacs primitives will treat the string the same way
whether it is unibyte or multibyte.) If string is unibyte and
contains non-ASCII characters, the function
unibyte-char-to-multibyte
is used to convert each unibyte character
to a multibyte character.
This function returns a multibyte string containing the same sequence of
character codes as string. Unlike string-make-multibyte
, this
function unconditionally returns a multibyte string. If string is a
multibyte string, it is returned unchanged.
This convert the multibyte character char to a unibyte character,
based on nonascii-translation-table
and
nonascii-insert-offset
.
This convert the unibyte character char to a multibyte character,
based on nonascii-translation-table
and
nonascii-insert-offset
.
[ < ] | [ > ] | [ << ] | [Plus haut] | [ >> ] | [Top] | [Table des matières] | [Index] | [ ? ] |
Ce document a été généré par Eric Reinbold le 13 Octobre 2007 en utilisant texi2html 1.78.