Post by Patrick LamaizièreOù trouve-t-on la définition alors ?
J'ai du mal à trouver des exemples de codes pour le décodage aussi (à
part un truc pour emacs en Lisp).
Tu veux le recoder à partir de rien ?
Sinon, le mieux est d'utiliser iconv(3).
Il y a des réponses sur tout ça dans http://czyborra.com/utf/ (Unicode
Transformation Formats) Comme le site a l'air mort, je reproduis la
partie concernant utf-7 :
* UTF-7
All of the above UTFs produce 8bit bytes that are not in ASCII and
that will get stripped on any terminal that is still set to character
size 7 or any mail gateway that ensures RFC 822's rule that mail
messages have to be in ASCII. To solve that problem, David Goldsmith
and Mark Davis invented a mail-safe transformation format UTF-7. It
was first published in RFC 1642 in 1994, prominently included as
Appendix A.1 in The Unicode Standard, Version 2.0, and now updated in
RFC 2152. It makes partial use of the MIME base64 encoding and goes
roughly like this:
char base64[]=
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
putwchar(c)
{
if (c == '+') {
putchar('+');
putchar('-');
}
else if (c < 0x80) {
putchar(c);
}
else if (c < 0x10000) {
putchar('+');
putchar(base64[c>>10&63]);
putchar(base64[c>>4&63]);
putchar(base64[c<<2&63]);
putchar('-');
}
else if (c < 0x110000) {
c = 0xD7C0DC00 + (c >> 10 << 16 | c & 0x3FF);
putchar('+');
putchar(base64[c>>26&63]);
putchar(base64[c>>20&63]);
putchar(base64[c>>14&63]);
putchar(base64[c>>8&63]);
putchar(base64[c>>2&63]);
putchar(base64[c<<4&63]);
putchar('-');
}
}
Except for the '+' escaping, ASCII text remains unchanged with
UTF-7. In some situations, the trailing '-' is optional. And by
joining a whole stretch of non-ASCII characters into a larger base64
block you can encode an average of 3 Unicode characters in 8 bytes
which is much better than the 9 bytes "=E5=A4=A9" for 1 CJK ideograph
in quoted-printable UTF-8.
However, base64 or 8bit SCSU can achieve much better compression, and
UTF-7 is a bad general-purpose processing format: its flickering
base64 grouping is awkward to program, most ASCII values can stand for
almost any character and there are many different possible UTF-7
encodings of the same character so that UTF-7 is practically
unsearchable without conversion.