Penguin
Diff: HowToUnicodeHOWTO
EditPageHistoryDiffInfoLikePages

Differences between current version and predecessor to the previous major change of HowToUnicodeHOWTO.

Other diffs: Previous Revision, Previous Author, or view the Annotated Edit History

Newer page: version 5 Last edited on Monday, October 25, 2004 5:06:27 am by AristotlePagaltzis
Older page: version 4 Last edited on Saturday, September 4, 2004 11:03:42 am by GeorgeLinux Revert
@@ -1,4109 +1 @@
-The Unicode HOWTO  
-  
-  
-  
-----  
-  
-!!!The Unicode HOWTO  
-  
-!!Bruno Haible,  
-<haible@clisp.cons.org>v1., 23 January 2001  
-  
-  
-----  
-''This document describes how to change your Linux system so it uses UTF-8  
-as text encoding. -  
-This is work in progress. Any tips, patches, pointers, URLs are very welcome.''  
-----  
-  
-  
-  
-  
-!!1. Introduction  
-  
-  
-****1.1 Why Unicode?  
-  
-****1.2 Unicode encodings  
-  
-****1.3 Related resources  
-  
-  
-  
-  
-  
-!!2. Display setup  
-  
-  
-****2.1 Linux console  
-  
-****2.2 X11 Foreign fonts  
-  
-****2.3 X11 Unicode fonts  
-  
-****2.4 Unicode xterm  
-  
-****2.5 !TrueType fonts  
-  
-****2.6 Miscellaneous  
-  
-  
-  
-  
-  
-!!3. Locale setup  
-  
-  
-****3.1 Files & the kernel  
-  
-****3.2 Upgrading the C library  
-  
-****3.3 General data conversion  
-  
-****3.4 Locale environment variables  
-  
-****3.5 Creating the locale support files  
-  
-  
-  
-  
-  
-!!4. Specific applications  
-  
-  
-****4.1 Shells  
-  
-****4.2 Networking  
-  
-****4.3 Browsers  
-  
-****4.4 Editors  
-  
-****4.5 Mailers  
-  
-****4.6 Text processing  
-  
-****4.7 Databases  
-  
-****4.8 Other text-mode applications  
-  
-****4.9 Other X11 applications  
-  
-  
-  
-  
-  
-!!5. Printing  
-  
-  
-****5.1 Printing using !TrueType fonts  
-  
-****5.2 Printing using fixed-size fonts  
-  
-****5.3 The classical approach  
-  
-****5.4 No luck with...  
-  
-  
-  
-  
-  
-!!6. Making your programs Unicode aware  
-  
-  
-****6.1 C/C++  
-  
-****6.2 Java  
-  
-****6.3 Lisp  
-  
-****6.4 Ada95  
-  
-****6.5 Python  
-  
-****6.6 !JavaScript/ECMAscript  
-  
-****6.7 Tcl  
-  
-****6.8 Perl  
-  
-****6.9 Related reading  
-  
-  
-  
-  
-  
-!!7. Other sources of information  
-  
-  
-****7.1 Mailing lists  
-  
-----  
-  
-!!1. Introduction  
-  
-  
-  
-  
-  
-  
-  
-!!1.1 Why Unicode?  
-  
-  
-  
-  
-  
-  
-People in different countries use different characters to represent the  
-words of their native languages. Nowadays most applications, including  
-email systems and web browsers, are 8-bit clean, i.e. they can operate on  
-and display text correctly provided that it is represented in an 8-bit  
-character set, like ISO-8859-1.  
-  
-  
-There are far more than 256 characters in the world - think of cyrillic,  
-hebrew, arabic, chinese, japanese, korean and thai -, and new characters  
-are being invented now and then. The problems that come up for users are:  
-  
-  
-  
-  
-  
-****It is impossible to store text with characters from different character  
-sets in the same document. For example, I can cite russian papers in  
-a German or French publication if I use TeX, xdvi and !PostScript,  
-but I cannot do it in plain text.  
-****  
-  
-****As long as every document has its own character set, and recognition  
-of the character set is not automatic, manual user intervention is  
-inevitable. For example, in order to view the homepage of the  
-XTeamLinux distribution  
-http://www.xteamlinux.com.cn/  
-I had to tell Netscape that the web page is coded in GB2312.  
-****  
-  
-****New symbols like the Euro are being invented. ISO has issued a new  
-standard ISO-8859-15, which is mostly like ISO-8859-1 except that it  
-removes some rarely used characters (the old currency sign) and  
-replaced it with the Euro sign. If users adopt this standard, they  
-have documents in different character sets on their disk, and they  
-start having to think about it daily. But computers should make things  
-simpler, not more complicated.  
-****  
-  
-  
-  
-The solution of this problem is the adoption of a world-wide usable character  
-set. This character set is Unicode  
-http://www.unicode.org/.  
-For more info about Unicode, do `man 7 unicode' (manpage contained  
-in the man-pages-1.20 package).  
-  
-  
-  
-  
-!!1.2 Unicode encodings  
-  
-  
-  
-  
-  
-  
-This reduces the user's problem of dealing with character sets to a technical  
-problem: How to transport Unicode characters using the 8-bit bytes?  
-8-bit units are the smallest addressing units of most computers and also the  
-unit used by TCP/IP network connections. The use of 1 byte to represent  
-1 character is, however, an accident of history, caused by the fact that  
-computer development started in Europe and the U.S. where 96 characters were  
-found to be sufficient for a long time.  
-  
-  
-There are basically four ways to encode Unicode characters in bytes:  
-  
-  
-  
-  
-; __UTF-8__:  
-  
-128 characters are encoded using 1 byte (the ASCII characters).  
-1920 characters are encoded using 2 bytes (Roman, Greek, Cyrillic,  
-Coptic, Armenian, Hebrew, Arabic characters).  
-63488 characters are encoded using 3 bytes (Chinese and Japanese among  
-others).  
-The other 2147418112 characters (not assigned yet) can be encoded  
-using 4, 5 or 6 characters.  
-For more info about UTF-8, do `man 7 utf-8' (manpage contained  
-in the man-pages-1.20 package).  
-; __UCS-2__:  
-  
-Every character is represented as two bytes.  
-This encoding can only represent the first 65536 Unicode characters.  
-; __UTF-16__:  
-  
-This is an extension of UCS-2 which can represent 1112064 Unicode  
-characters. The first 65536 Unicode characters are represented as two  
-bytes, the other ones as four bytes.  
-; __UCS-4__:  
-  
-Every character is represented as four bytes.  
-  
-  
-  
-The space requirements for encoding a text, compared to encodings currently  
-in use (8 bit per character for European languages, more for  
-Chinese/Japanese/Korean), is as follows. This has an influence on disk  
-storage space and network download speed (when no form of compression is  
-used).  
-  
-  
-  
-  
-; __UTF-8__:  
-  
-No change for US ASCII, just a few percent more for ISO-8859-1,  
-50% more for Chinese/Japanese/Korean, 100% more for Greek and Cyrillic.  
-; __UCS-2 and UTF-16__:  
-  
-No change for Chinese/Japanese/Korean. 100% more for  
-US ASCII and ISO-8859-1, Greek and Cyrillic.  
-; __UCS-4__:  
-  
-100% more for Chinese/Japanese/Korean. 300% more for US ASCII and  
-ISO-8859-1, Greek and Cyrillic.  
-  
-  
-  
-Given the penalty for US and European documents caused by UCS-2, UTF-16, and  
-UCS-4, it seems unlikely that these encodings have a potential for wide-scale  
-use. The Microsoft Win32 API supports the UCS-2 encoding since 1995 (at  
-least), yet this encoding has not been widely adopted for documents - SJIS  
-remains prevalent in Japan.  
-  
-  
-UTF-8 on the other hand has the potential for wide-scale use, since it  
-doesn't penalize US and European users, and since many text processing  
-programs don't need to be changed for UTF-8 support.  
-  
-  
-In the following, we will describe how to change your Linux system so  
-it uses UTF-8 as text encoding.  
-  
-  
-  
-  
-!Footnotes for C/C++ developers  
-  
-  
-  
-  
-  
-The Microsoft Win32 approach makes it easy for developers to produce  
-Unicode versions of their programs: You "#define UNICODE" at the top  
-of your program and then change many occurrences of `char' to  
-`TCHAR', until your program compiles without warnings. The problem  
-with it is that you end up with two versions of your program: one which  
-understands UCS-2 text but no 8-bit encodings, and one which understands  
-only old 8-bit encodings.  
-  
-  
-Moreover, there is an endianness issue with UCS-2 and UCS-4. The IANA  
-character set registry http://www.iana.org/assignments/character-sets  
-says about ISO-10646-UCS-2: "this needs to specify network byte order: the  
-standard does not specify". Network byte order is big endian. And RFC 2152  
-is even clearer: "ISO/IEC 10646-1:1993(E) specifies that when characters the  
-UCS-2 form are serialized as octets, that the most significant octet appear  
-first."  
-Whereas Microsoft, in its C/C++ development tools, recommends  
-to use machine-dependent endianness (i.e. little endian on ix86 processors)  
-and either a byte-order mark at the beginning of the document, or some  
-statistical heuristics(!).  
-  
-  
-The UTF-8 approach on the other hand keeps `char*' as the standard C  
-string type. As a result, your program will handle US ASCII text,  
-independently of any environment variables, and will handle both  
-ISO-8859-1 and UTF-8 encoded text provided the LANG environment variable  
-is set accordingly.  
-  
-  
-  
-  
-!!1.3 Related resources  
-  
-  
-  
-  
-  
-  
-Markus Kuhn's very up-to-date resource list:  
-  
-  
-* http://www.cl.cam.ac.uk/~mgk25/unicode.html  
-* http://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html  
-  
-  
-  
-Roman Czyborra's overview of Unicode, UTF-8 and UTF-8 aware programs:  
-* http://czyborra.com/utf/#UTF-8  
-  
-Some example UTF-8 files:  
-  
-  
-****In Markus Kuhn's ucs-fonts package:  
-quickbrown.txt,  
-UTF-8-test.txt,  
-UTF-8-demo.txt.  
-****  
-  
-* http://www.columbia.edu/kermit/utf8.html  
-* ftp://ftp.cs.su.oz.au/gary/x-utf8.html  
-* ftp://ftp.nid.ru/pub/os/unix/misc/trans111.tar.gz (The file iso10646 in the Kosta Kostis' trans-1.1.1 package )  
-* ftp://ftp.dante.de/pub/tex/info/lwc/apc/utf8.html  
-* http://www.cogsci.ed.ac.uk/~richard/unicode-sample.html  
-  
-  
-  
-  
-  
-  
-  
-----  
-  
-!!2. Display setup  
-  
-  
-  
-  
-  
-We assume you have already adapted your Linux console and X11 configuration  
-to your keyboard and locale. This is explained in the Danish/International  
-HOWTO, and in the other national HOWTOs: Finnish, French, German, Italian,  
-Polish, Slovenian, Spanish, Cyrillic, Hebrew, Chinese, Thai, Esperanto. But  
-please do not follow the advice given in the Thai HOWTO, to pretend you  
-were using ISO-8859-1 characters (U0000..U00FF) when what you are typing  
-are actually Thai characters (U0E01..U0E5B). Doing so will only cause  
-problems when you switch to Unicode.  
-  
-  
-  
-  
-!!2.1 Linux console  
-  
-  
-  
-  
-  
-  
-I'm not talking much about the Linux console here, because on those machines  
-on which I don't have xdm running, I use it only to type my login name,  
-my password, and "xinit".  
-  
-  
-Anyway, the kbd-.99 package  
-ftp://sunsite.unc.edu/pub/Linux/system/keyboards/kbd-.99.tar.gz  
-and a heavily extended version, the console-tools-.2.3 package  
-ftp://sunsite.unc.edu/pub/Linux/system/keyboards/console-tools-.2.3.tar.gz  
-contains in the kbd-.99/src/ (or console-tools-.2.3/screenfonttools/)  
-directory two programs: `unicode_start' and `unicode_stop'. When you call  
-`unicode_start', the console's screen output is interpreted as UTF-8. Also,  
-the keyboard is put into Unicode mode (see "man kbd_mode"). In this mode,  
-Unicode characters typed as Alt-x1 ... Alt-xn (where x1,...,xn are digits on  
-the numeric keypad) will be emitted in UTF-8. If your keyboard or, more  
-precisely, your normal keymap has non-ASCII letter keys (like the German  
-Umlaute) which you would like to be !CapsLockable, you need to apply the kernel  
-patch  
-linux-2.2.9-keyboard.diff  
-or  
-linux-2.3.12-keyboard.diff.  
-  
-  
-You will want to use display characters from different scripts on the same  
-screen. For this, you need a Unicode console font. The  
-ftp://sunsite.unc.edu/pub/Linux/system/keyboards/kbd-.99.tar.gz  
-and  
-ftp://sunsite.unc.edu/pub/Linux/system/keyboards/console-data-1999.08.29.tar.gz  
-packages contain a font (!LatArCyrHeb-{08,14,16,19}.psf) which  
-covers Latin, Cyrillic, Hebrew, Arabic scripts. It covers ISO 8859 parts  
-1,2,3,4,5,6,8,9,10 all at once. To install it, copy it to  
-/usr/lib/kbd/consolefonts/ and execute  
-"/usr/bin/setfont /usr/lib/kbd/consolefonts/!LatArCyrHeb-14.psf".  
-  
-  
-A more flexible approach is given by Dmitry Yu. Bolkhovityanov  
-<D.Yu.Bolkhovityanov@inp.nsk.su>  
-in  
-http://www.inp.nsk.su/~bolkhov/files/fonts/univga/index.html  
-and  
-http://www.inp.nsk.su/~bolkhov/files/fonts/univga/uni-vga.tgz.  
-To work around the constraint that a VGA font can only cover 512 characters simultaneously,  
-he provides a rich Unicode font (2279 characters, covering Latin, Greek, Cyrillic, Hebrew,  
-Armenian, IPA, math symbols, arrows, and more) in the typical 8x16 size and a script  
-which permits to extract any 512 characters as a console font.  
-  
-  
-If you want cut&paste to work with UTF-8 consoles, you need the patch  
-linux-2.3.12-console.diff  
-from Edmund Thomas Grimley Evans and Stanislav Voronyi.  
-  
-  
-In April 2000, Edmund Thomas Grimley Evans  
-<edmundo@rano.org>  
-has implemented an UTF-8 console terminal emulator. It uses Unicode fonts  
-and relies on the Linux frame buffer device.  
-  
-  
-  
-  
-!!2.2 X11 Foreign fonts  
-  
-  
-  
-  
-  
-  
-Don't hesitate to install Cyrillic, Chinese, Japanese etc. fonts. Even  
-if they are not Unicode fonts, they will help in displaying Unicode  
-documents: at least Netscape Communicator 4 and Java will make use of  
-foreign fonts when available.  
-  
-  
-The following programs are useful when installing fonts:  
-  
-  
-****"mkfontdir directory"  
-prepares a font directory for use by the X server, needs to be executed  
-after installing fonts in a directory.  
-****  
-  
-****"xset -q | sed -e '1,/^Font Path:/d' | sed -e '2,$d' -e 's/^ //'"  
-displays the X server's current font path.  
-****  
-  
-****"xset fp+ directory"  
-adds a directory to the X server's current font path.  
-To add a directory permanently, add a "!FontPath" line to your  
-/etc/XF86Config file, in section "Files".  
-****  
-  
-****"xset fp rehash"  
-needs to be executed after calling mkfontdir on a directory that is  
-already contained in the X server's current font path.  
-****  
-  
-****"xfontsel"  
-allows you to browse the installed fonts by selecting various font  
-properties.  
-****  
-  
-****"xlsfonts -fn fontpattern"  
-lists all fonts matching a font pattern. Also displays various font  
-properties. In particular, "xlsfonts -ll -fn font" lists the font  
-properties CHARSET_REGISTRY and CHARSET_ENCODING, which together  
-determine the font's encoding.  
-****  
-  
-****"xfd -fn font"  
-displays a font page by page.  
-****  
-  
-  
-  
-The following fonts are freely available (not a complete list):  
-  
-  
-****The ones contained in XFree86, sometimes packaged in separate packages.  
-For example, SuSE has only normal 75dpi fonts in the base `xf86' package.  
-The other fonts are in the packages `xfnt100', `xfntbig', `xfntcyr',  
-`xfntscl'.  
-****  
-  
-****The Emacs international fonts,  
-ftp://ftp.gnu.org/pub/gnu/intlfonts/intlfonts-1.2.tar.gz  
-As already mentioned, they are useful even if you prefer XEmacs to  
-GNU Emacs or don't use any Emacs at all.  
-****  
-  
-  
-  
-  
-  
-!!2.3 X11 Unicode fonts  
-  
-  
-  
-  
-  
-  
-Applications wishing to display text belonging to different scripts (like  
-Cyrillic and Greek) at the same time, can do so by using different X fonts  
-for the various pieces of text. This is what Netscape Communicator and Java  
-do. However, this approach is more complicated, because instead of working  
-with `Font' and `XFontStruct', the programmer has to deal with `XFontSet',  
-and also because not all fonts in the font set need to have the same  
-dimensions.  
-  
-  
-  
-  
-  
-****Markus Kuhn has assembled fixed-width 75dpi fonts with Unicode encoding  
-covering Latin, Greek, Cyrillic, Armenian, Georgian, Hebrew scripts and  
-many symbols.  
-They cover ISO 8859 parts 1,2,3,4,5,7,8,9,10,13,14,15,16 all at once.  
-These fonts are required for running xterm in utf-8 mode. They are now  
-contained in XFree86 4..1, therefore you need to install them manually  
-only if you have an older XFree86 3.x version.  
-http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts.tar.gz.  
-****  
-  
-****Markus Kuhn has also assembled double-width fixed 75dpi fonts with Unicode  
-encoding covering Chinese, Japanese and Korean. These fonts are contained  
-in XFree86 4..1 as well.  
-http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts-asian.tar.gz  
-****  
-  
-****Roman Czyborra has assembled an 8x16 / 16x16 75dpi font with Unicode encoding  
-covering a huge part of Unicode. Download unifont.hex.gz and hex2bdf from  
-http://czyborra.com/unifont/.  
-It is not fixed-width: 8 pixels wide for European characters, 16 pixels wide  
-for Chinese characters. Installation instructions:  
-  
-  
-$ gunzip unifont.hex.gz  
-$ hex2bdf < unifont.hex > unifont.bdf  
-$ bdftopcf -o unifont.pcf unifont.bdf  
-$ gzip -9 unifont.pcf  
-# cp unifont.pcf.gz /usr/X11R6/lib/X11/fonts/misc  
-# cd /usr/X11R6/lib/X11/fonts/misc  
-# mkfontdir  
-# xset fp rehash  
-  
-  
-  
-****  
-  
-****Primoz Peterlin has assembled an ETL family fonts covering Latin, Greek,  
-Cyrillic, Armenian, Georgian, Hebrew scripts.  
-ftp://ftp.x.org/contrib/fonts/etl-unicode.tar.gz  
-Use the "bdftopcf" program in order to install it.  
-****  
-  
-****Mark Leisher has assembled a proportional, 17 pixel high (12 point), font,  
-called ClearlyU, covering Latin, Greek, Cyrillic, Armenian, Georgian, Hebrew,  
-Thai, Laotian scripts.  
-http://crl.nmsu.edu/~mleisher/cu.html.  
-Installation instructions:  
-  
-  
-$ bdftopcf -o cu12.pcf cu12.bdf  
-$ gzip -9 cu12.pcf  
-# cp cu12.pcf.gz /usr/X11R6/lib/X11/fonts/misc  
-# cd /usr/X11R6/lib/X11/fonts/misc  
-# mkfontdir  
-# xset fp rehash  
-  
-  
-  
-****  
-  
-  
-  
-  
-  
-  
-  
-  
-!!2.4 Unicode xterm  
-  
-  
-  
-  
-  
-  
-xterm is part of X11R6 and XFree86, but is maintained separately by Tom  
-Dickey.  
-http://www.clark.net/pub/dickey/xterm/xterm.html  
-Newer versions (patch level 146 and above) contain support for converting  
-keystrokes to UTF-8 before sending them to the application running in the  
-xterm, and for displaying Unicode characters that the application outputs  
-as UTF-8 byte sequence. It also contains support for double-wide characters  
-(mostly CJK ideographs) and combining characters, contributed by Robert Brady  
-<robert@suse.co.uk>.  
-  
-  
-To get an UTF-8 xterm running, you need to:  
-  
-  
-****Fetch  
-http://www.clark.net/pub/dickey/xterm/xterm.tar.gz,  
-****  
-  
-****Configure it by calling "./configure --enable-wide-chars ...", then  
-compile and install it.  
-****  
-  
-****Have a Unicode fixed-width font installed. Markus Kuhn's ucs-fonts.tar.gz  
-(see above) is made for this.  
-****  
-  
-****Start "xterm -u8 -fn '-misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1'".  
-The option "-u8" turns on Unicode and UTF-8 handling. The font designated  
-by the long "-fn" option is Markus Kuhn's Unicode font. Without this option,  
-the default font called "fixed" would be used, an ISO-8859-1 6x13 font.  
-****  
-  
-****Take a look at the sample files contained in Markus Kuhn's ucs-fonts  
-package:  
-  
-  
-$ cd .../ucs-fonts  
-$ cat quickbrown.txt  
-$ cat utf-8-demo.txt  
-  
-  
-You should be seeing (among others) greek and russian characters.  
-****  
-  
-****To make xterm come up with UTF-8 handling each time it is started,  
-add the lines  
-  
-  
-xterm*utf8: 1  
-xterm*VT100*font: -misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1  
-xterm*VT100*wideFont: -misc-fixed-medium-r-normal-ja-13-125-75-75-c-120-iso10646-1  
-xterm*VT100*boldFont: -misc-fixed-bold-r-semicondensed--13-120-75-75-c-60-iso10646-1  
-  
-  
-to your $HOME/.Xdefaults (for yourself only).  
-For CJK text processing with double-width characters, the following  
-settings are probably better:  
-  
-  
-xterm*VT100*font: -Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO10646-1  
-xterm*VT100*wideFont: -Misc-Fixed-Medium-R-Normal-ja-18-120-100-100-C-180-ISO10646-1  
-  
-  
-I don't recommend changing  
-the system-wide /usr/X11R6/lib/X11/app-defaults/XTerm, because then your  
-changes will be erased next time you upgrade to a new XFree86 version.  
-****  
-  
-  
-  
-  
-  
-  
-  
-  
-!!2.5 !TrueType fonts  
-  
-  
-  
-  
-  
-  
-The fonts mentioned above are fixed size and not scalable. For some  
-applications, especially printing, high resolution fonts are necessary,  
-though. The most important type of scalable, high resolution fonts are  
-!TrueType fonts.  
-They are currently supported by  
-  
-  
-****XFree86 4..1; you need to add the line  
-  
-  
-Load "freetype"  
-  
-  
-or  
-  
-  
-Load "xtt"  
-  
-  
-to the "Module" section of your XF86Config file.  
-****  
-  
-****The display engines of other operating systems.  
-****  
-  
-****The yudit editor, see below, and its printing engine.  
-****  
-  
-  
-  
-Some no-cost !TrueType fonts with large Unicode coverage are  
-  
-; __Bitstream Cyberbit__:  
-  
-Covers Roman, Cyrillic, Greek, Hebrew, Arabic, combining diacritical marks,  
-Chinese, Korean, Japanese, and more.  
-  
-  
-Downloadable from  
-ftp://ftp.netscape.com/pub/communicator/extras/fonts/windows/Cyberbit.ZIP.  
-It is free for non-commercial purposes.  
-  
-  
-  
-; __Microsoft Arial__:  
-  
-Covers Roman, Cyrillic, Greek, Hebrew, Arabic, some combining diacritical  
-marks, Vietnamese.  
-  
-  
-Downloadable; look on a search engine for ftp-able files called  
-arial.ttf, ariali.ttf, arialbd.ttf,  
-arialbi.ttf.  
-  
-  
-  
-; __Lucida Sans Unicode__:  
-  
-Covers Roman, Cyrillic, Greek, Hebrew, combining diacritical marks.  
-  
-  
-Download: contained in IBM's JDK 1.3.0 for Linux, at  
-http://www.ibm.com/java/jdk/linux130/,  
-or directly downloadable as !LucidaSansRegular.ttf and  
-!LucidaSansOblique.ttf from  
-ftp://ftp.maths.tcd.ie/Linux/opt/IBMJava2-13/jre/lib/fonts/.  
-  
-  
-  
-; __Arphic__:  
-  
-Cover Chinese (both traditional and simplified).  
-  
-  
-Download: at  
-ftp://ftp.gnu.org/non-gnu/chinese-fonts-truetype/.  
-These fonts are truly free.  
-  
-  
-  
-  
-  
-  
-Download locations for these and other !TrueType fonts can be found at  
-Christoph Singer's list of freely downloadable Unicode !TrueType fonts  
-http://www.ccss.de/slovo/unifonts.htm.  
-  
-  
-Truetype fonts are installed similarly to fixed size fonts, except that  
-they go in a separate directory, and that ttmkfdir must be  
-called before mkfontdir:  
-  
-  
-# mkdir -p /usr/X11R6/lib/X11/fonts/truetype  
-# cp /somewhere/Cyberbit.ttf ... /usr/X11R6/lib/X11/fonts/truetype  
-# cd /usr/X11R6/lib/X11/fonts/truetype  
-# ttmkfdir > fonts.scale  
-# mkfontdir  
-# xset fp rehash  
-  
-  
-  
-  
-!TrueType fonts can be converted to low resolution, non-scalable X11 fonts by  
-use of Mark Leisher's ttf2bdf utility  
-ftp://crl.nmsu.edu/CLR/multiling/General/ttf2bdf-2.8-LINUX.tar.gz.  
-For example, to generate a proportional Unicode font for use with cooledit:  
-  
-  
-# cd /usr/X11R6/lib/X11/fonts/local  
-# ttf2bdf ../truetrype/Cyberbit.ttf > cyberbit.bdf  
-# bdftopcf -o cyberbit.pcf cyberbit.bdf  
-# gzip -9 cyberbit.pcf  
-# mkfontdir  
-# xset fp rehash  
-  
-  
-  
-  
-More information about !TrueType fonts can be found in the Linux !TrueType HOWTO  
-http://www.moisty.org/~brion/linux/!TrueType-HOWTO.html.  
-  
-  
-  
-  
-!!2.6 Miscellaneous  
-  
-  
-  
-  
-  
-  
-A small program which tests whether a Linux console or xterm is in UTF-8 mode  
-can be found in the  
-ftp://sunsite.unc.edu/pub/Linux/system/keyboards/x-lt-1.24.tar.gz  
-package by Ricardas Cepas, files testUTF-8.c and testUTF8.c. Most applications  
-should not use this, however: they should look at the environment variables,  
-see section "Locale environment variables".  
-  
-  
-  
-  
-  
-  
-----  
-  
-!!3. Locale setup  
-  
-  
-  
-  
-  
-  
-  
-!!3.1 Files & the kernel  
-  
-  
-  
-  
-  
-  
-You can now already use any Unicode characters in file names. No kernel  
-or file utilities need modifications. This is because file names in the  
-kernel can be anything not containing a null byte, and '/' is used to  
-delimit subdirectories. When encoded using UTF-8, non-ASCII characters  
-will never be encoded using null bytes or slashes. All that happens is  
-that file and directory names occupy more bytes than they contain characters.  
-For example, a filename consisting of five greek characters will appear  
-to the kernel as a 10-byte filename. The kernel does not know (and does  
-not need to know) that these bytes are displayed as greek.  
-  
-  
-This is the general theory, as long as your files stay inside Linux. On  
-filesystems which are used from other operating systems, you have mount  
-options to control conversion of filenames to/from UTF-8:  
-  
-  
-****The "vfat" filesystems has a mount option "utf8".  
-See  
-file:/usr/src/linux/Documentation/filesystems/vfat.txt.  
-When you give an "iocharset" mount option different from the default  
-(which is "iso8859-1"), the results with and without "utf8" are not  
-consistent. Therefore I don't recommend the "iocharset" mount option.  
-****  
-  
-****The "msdos", "umsdos" filesystems have the same mount option, but it  
-appears to have no effect.  
-****  
-  
-****The "iso9660" filesystem has a mount option "utf8".  
-See  
-file:/usr/src/linux/Documentation/filesystems/isofs.txt.  
-****  
-  
-****Since Linux 2.2.x kernels, the "ntfs" filesystem has a mount option  
-"utf8". See  
-file:/usr/src/linux/Documentation/filesystems/ntfs.txt.  
-****  
-  
-The other filesystems (nfs, smbfs, ncpfs, hpfs, etc.) don't convert  
-filenames; therefore they support Unicode file names in UTF-8 encoding only  
-if the other operating system supports them.  
-Recall that to enable a mount option for all future remounts, you add it to  
-the fourth column of the corresponding /etc/fstab line.  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-!!3.2 Upgrading the C library  
-  
-  
-  
-  
-  
-  
-glibc-2.2 supports multibyte locales, in particular UTF-8 locales. But  
-glibc-2.1.x and earlier C libraries do not support it. Therefore you need  
-to upgrade to glibc-2.2. Upgrading from glibc-2.1.x is riskless, because  
-glibc-2.2 is binary compatible with glibc-2.1.x (at least on i386 platforms,  
-and except for IPv6). Nevertheless, I recommend to have a bootable rescue  
-disk handy in case something goes wrong.  
-  
-  
-Prepare the kernel sources. You must have them unpacked and configured.  
-/usr/src/linux/include/linux/autoconf.h must exist. Building the kernel  
-is not needed.  
-  
-  
-Retrieve the glibc sources  
-ftp://ftp.gnu.org/pub/gnu/glibc/,  
-su to root, then unpack, build and install it:  
-  
-  
-# unset LD_PRELOAD  
-# unset LD_LIBRARY_PATH  
-# tar xvfz glibc-2.2.tar.gz  
-# tar xvfz glibc-linuxthreads-2.2.tar.gz -C glibc-2.2  
-# mkdir glibc-2.2-build  
-# cd glibc-2.2-build  
-# ../glibc-2.2/configure --prefix=/usr --with-headers=/usr/src/linux/include --enable-add-ons  
-# make  
-# make check  
-# make info  
-# LC_ALL=C make install  
-# make localedata/install-locales  
-  
-  
-  
-  
-Upgrading from glibc versions earlier than 2.1.x cannot be done this way;  
-consider first installing a Linux distribution based on glibc-2.1.x, and  
-then upgrading to glibc-2.2 as described above.  
-  
-  
-Note that if -- for any reason -- you want to rebuild GCC after having  
-installed glibc-2.2, you need to first apply this patch  
-gcc-glibc-2.2-compat.diff  
-to the GCC sources.  
-  
-  
-  
-  
-!!3.3 General data conversion  
-  
-  
-  
-  
-  
-  
-You will need a program to convert your locally (probably ISO-8859-1) encoded  
-texts to UTF-8. (The alternative would be to keep using texts in different  
-encodings on the same machine; this is not fun in the long run.)  
-One such program is `iconv', which comes with glibc-2.2. Simply use  
-  
-  
-$ iconv --from-code=ISO-8859-1 --to-code=UTF-8 < old_file > new_file  
-  
-  
-  
-  
-Here are two handy shell scripts, called "i2u"  
-i2u.sh  
-(for ISO to UTF conversion) and "u2i"  
-u2i.sh  
-(for UTF to ISO conversion).  
-Adapt according to your current 8-bit character set.  
-  
-  
-If you don't have glibc-2.2 and iconv installed, you can use GNU recode 3.6  
-instead.  
-"i2u"  
-i2u_recode.sh is  
-"recode ISO-8859-1..UTF-8", and  
-"u2i"  
-u2i_recode.sh is  
-"recode UTF-8..ISO-8859-1".  
-ftp://ftp.gnu.org/pub/gnu/recode/recode-3.6.tar.gz  
-  
-Or you can also use CLISP instead. Here are  
-"i2u"  
-i2u.lisp and  
-"u2i"  
-u2i.lisp  
-written in Lisp. Note: You need a CLISP version from July 1999 or newer.  
-ftp://clisp.cons.org/pub/lisp/clisp/source/clispsrc.tar.gz.  
-  
-  
-Other data conversion programs, less powerful than GNU recode, are  
-`trans'  
-ftp://ftp.informatik.uni-erlangen.de/pub/doc/ISO/charsets/trans113.tar.gz,  
-`tcs' from the Plan9 operating system  
-ftp://ftp.informatik.uni-erlangen.de/pub/doc/ISO/charsets/tcs.tar.gz,  
-and  
-`utrans'/`uhtrans'/`hutrans'  
-ftp://ftp.cdrom.com/pub/FreeBSD/distfiles/i18ntools-1..tar.gz  
-by G. Adam Stanislav  
-<adam@whizkidtech.net>.  
-  
-  
-For the repeated conversion of files to UTF-8 from different character sets,  
-a semi-automatic tool can be used:  
-to-utf8  
-presents the non-ASCII parts of a file to the user, lets him decide about the  
-file's original character set, and then converts the file to UTF-8.  
-  
-  
-  
-  
-!!3.4 Locale environment variables  
-  
-  
-  
-  
-  
-  
-You may have the following environment variables set, containing locale  
-names:  
-  
-; __LANGUAGE__:  
-  
-override for LC_MESSAGES, used by GNU gettext only  
-; __LC_ALL__:  
-  
-override for all other LC_* variables  
-; __LC_CTYPE, LC_MESSAGES, LC_COLLATE, LC_NUMERIC, LC_MONETARY, LC_TIME__:  
-  
-individual variables for:  
-character types and encoding,  
-natural language messages,  
-sorting rules,  
-number formatting,  
-money amount formatting,  
-date and time display  
-; __LANG__:  
-  
-default value for all LC_* variables  
-  
-(See `man 7 locale' for a detailed description.)  
-  
-  
-Each of the LC_* and LANG variables can contain a locale name of the  
-following form:  
-  
-  
-  
-  
-language [[_territory[[.codeset ]][[@modifier]  
-  
-  
-  
-where language is an  
-ISO 639  
-language code (lower case), territory is an  
-ISO 3166  
-country code (upper case), codeset denotes a character set, and  
-modifier stands for other particular attributes (for example indicating  
-a particular language dialect, or a nonstandard orthography).  
-  
-  
-LANGUAGE can contain several locale names, separated by colons.  
-  
-  
-In order to tell your system and all applications that you are using UTF-8,  
-you need to add a codeset suffix of UTF-8 to your locale names. For example,  
-if you were using  
-  
-  
-LC_CTYPE=de_DE  
-  
-  
-you would change this to  
-  
-  
-LC_CTYPE=de_DE.UTF-8  
-  
-  
-  
-  
-You do ''not'' need to change your LANGUAGE environment variable.  
-GNU gettext in glibc-2.2 has the ability to convert translations to the right  
-encoding.  
-  
-  
-  
-  
-!!3.5 Creating the locale support files  
-  
-  
-  
-  
-  
-  
-You create using localedef the support files for each UTF-8 locale  
-you intend to use, for example:  
-  
-  
-$ localedef -v -c -i de_DE -f UTF-8 de_DE.UTF-8  
-  
-  
-  
-  
-You typically don't need to create locales named "de" or "fr" without  
-country suffix, because these locales are normally only used by the  
-LANGUAGE variable and not by the LC_* variables, and LANGUAGE is only  
-used as an override for LC_MESSAGES.  
-  
-  
-  
-----  
-  
-!!4. Specific applications  
-  
-  
-  
-  
-  
-  
-  
-!!4.1 Shells  
-  
-  
-  
-  
-  
-  
-  
-  
-!bash  
-  
-  
-  
-  
-  
-By default, GNU bash assumes that every character is one byte long and one  
-column wide. A patch for bash 2.04, by Marcin 'Qrczak' Kowalczyk and  
-Ricardas Cepas, teaches bash about multibyte characters in UTF-8 encoding.  
-bash-2.04-diff  
-  
-Double-width characters, combining characters and bidi are not supported by  
-this patch. It seems a complete redesign of the readline redisplay engine is  
-needed.  
-  
-  
-  
-  
-!!4.2 Networking  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-!telnet  
-  
-  
-  
-  
-  
-In some installations, telnet is not 8-bit clean by default.  
-In order to be able to send Unicode keystrokes to the remote host, you need to  
-set telnet into "outbinary" mode.  
-There are two ways to do this:  
-  
-  
-$ telnet -L <host>  
-  
-  
-and  
-  
-  
-$ telnet  
-telnet> set outbinary  
-telnet> open <host>  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-!kermit  
-  
-  
-  
-  
-  
-The communications program C-Kermit  
-http://www.columbia.edu/kermit/ckermit.html,  
-(an interactive tool for connection setup, telnet, file transfer,  
-with support for TCP/IP and serial lines),  
-in versions 7.0 or newer, understands the file and transfer encodings  
-UTF-8 and UCS-2, and understands the terminal encoding UTF-8, and converts  
-between these encodings and many others. Documentation of these features  
-can be found in  
-http://www.columbia.edu/kermit/ckermit2.html#x6.6.  
-  
-  
-  
-  
-!!4.3 Browsers  
-  
-  
-  
-  
-  
-  
-  
-  
-!Netscape  
-  
-  
-  
-  
-  
-Netscape 4.05 or newer can display HTML documents in UTF-8 encoding. All a  
-document needs is the following line between the  
-<head> and </head> tags:  
-  
-  
-<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">  
-  
-  
-  
-  
-Netscape 4.05 or newer can also display HTML and text files in UCS-2  
-encoding with byte-order mark.  
-  
-  
-  
-http://www.netscape.com/computing/download/  
-  
-  
-  
-!Mozilla  
-  
-  
-  
-  
-  
-Mozilla milestone M16 has much better internationalization than Netscape 4.  
-It can display HTML documents in UTF-8 encoding with support for more  
-languages. Alas, there is a cosmetic problem with CJK fonts: some glyphs  
-can be bigger than the line's height, thus overlapping the previous or next  
-line.  
-  
-  
-  
-http://www.mozilla.org/  
-  
-  
-  
-!Amaya  
-  
-  
-  
-  
-  
-Amaya 4.2.1  
-(  
-http://www.w3.org/Amaya/,  
-http://www.w3.org/Amaya/User/!SourceDist)  
-has now limited handling of UTF-8 encoded HTML pages. It  
-recognizes the encoding, but it displays only ISO-8859-1 and symbol  
-characters; it only ever accesses the fonts  
-  
-  
--adobe-times-*-iso8859-1  
--adobe-helvetica-*-iso8859-1  
--adobe-new century schoolbook-*-iso8859-1  
--adobe-courier-*-iso8859-1  
--adobe-symbol-*-adobe-fontspecific  
-  
-  
-  
-  
-Amaya is in fact a HTML editor, not only a browser. Amaya's strengths among  
-the browsers are its speed, given enough memory, and its rendering  
-of mathematical formulas (MathML support).  
-  
-  
-  
-  
-!lynx  
-  
-  
-  
-  
-  
-lynx-2.8 has an options screen (key 'O') which permits to set the display  
-character set. When running in an xterm or Linux console in UTF-8 mode,  
-set this to "UNICODE UTF-8". Note that for this setting to take effect  
-in the current browser session, you have to confirm on the "Accept Changes"  
-field, and for this setting to take effect in future browser sessions, you  
-have to enable the "Save options to disk" field and then confirm it on  
-the "Accept Changes" field.  
-  
-  
-Now, again, all a document needs is the following line between the  
-<head> and </head> tags:  
-  
-  
-<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">  
-  
-  
-  
-  
-When you are viewing text files in UTF-8 encoding, you also need to  
-pass the command-line option "-assume_local_charset=UTF-8" (affects only  
-file:/... URLs) or "-assume_charset=UTF-8" (affects all URLs).  
-In lynx-2.8.2 you can alternatively, in the options screen (key 'O'),  
-change the assumed document character set to "utf-8".  
-  
-  
-There is also an option in the options screen, to set the "preferred document  
-character set". But it has no effect, at least with file:/... URLs  
-and with http://... URLs served by apache-1.3..  
-  
-  
-There is a spacing and line-breaking problem, however. (Look at the  
-russian section of x-utf8.html, or at utf-8-demo.txt.)  
-  
-  
-Also, in lynx-2.8.2, configured with --enable-prettysrc, the nice colour  
-scheme does not work correctly any more when the display character set  
-has been set to "UNICODE UTF-8". This is fixed by a simple patch  
-lynx282.diff.  
-  
-  
-The Lynx developers say: "For any serious use of UTF-8 screen output with  
-lynx, compiling with slang lib and -DSLANG_MBCS_HACK is still recommended."  
-  
-  
-Latest stable release:  
-ftp://ftp.gnu.org/pub/gnu/lynx/lynx-2.8.2.tar.gz  
-  
-  
-http://lynx.isc.org/  
-  
-General home page:  
-http://lynx.browser.org/  
-  
-  
-http://www.slcc.edu/lynx/  
-  
-Newer development shapshots:  
-http://lynx.isc.org/current/,  
-ftp://lynx.isc.org/current/  
-  
-  
-  
-!w3m  
-  
-  
-  
-  
-  
-w3m by Akinori Ito  
-http://ei5nazha.yz.yamagata-u.ac.jp/~aito/w3m/eng/  
-is a text mode browser for HTML pages and plain-text files.  
-Its layout of HTML tables, enumerations etc. is much prettier than lynx' one.  
-w3m can also be used as a high quality HTML to plain text converter.  
-  
-  
-w3m .1.10 has command line options for the three major Japanese encodings, but  
-can also be used for UTF-8 encoded files. Without command line options,  
-you often have to press Ctrl-L to refresh the display, and line breaking  
-in Cyrillic and CJK paragraphs is not good.  
-  
-  
-To fix this, by Hironori Sakamoto has a patch  
-http://www2u.biglobe.ne.jp/~hsaka/w3m/  
-which adds UTF-8 as display encoding.  
-  
-  
-  
-  
-!Test pages  
-  
-  
-  
-  
-  
-Some test pages for browsers can be found at the pages of Alan Wood  
-http://www.hclrss.demon.co.uk/unicode/#links  
-and James Kass  
-http://home.att.net/~jameskass/.  
-  
-  
-  
-  
-!!4.4 Editors  
-  
-  
-  
-  
-  
-  
-  
-  
-!yudit  
-  
-  
-  
-  
-  
-yudit by Gaacutespaacuter Sinai  
-http://www.yudit.org/  
-is a first-class unicode text editor for the X Window System.  
-It supports simultaneous processing of many languages, input methods,  
-conversions for local character standards.  
-It has facilities for entering text in all languages with only  
-an English keyboard, using keyboard configuration maps.  
-  
-  
-  
-  
-!yudit-1.5  
-  
-  
-  
-  
-  
-It can be compiled in three versions: Xlib GUI, KDE GUI, or Motif GUI.  
-  
-  
-Customization is very easy. Typically you will first customize your font.  
-From the font menu I chose "Unicode". Then, since the command  
-"xlsfonts '*-*-iso10646-1'" still showed some ambiguity, I chose a font  
-size of 13 (to match Markus Kuhn's 13-pixel fixed font).  
-  
-  
-Next, you will customize your input method. The input methods "Straight",  
-"Unicode" and "SGML" are most remarkable. For details about the other  
-built-in input methods, look in /usr/local/share/yudit/data/.  
-  
-  
-To change the default for the next session, edit your $HOME/.yuditrc  
-file.  
-  
-  
-The general editor functionality is limited to editing, cut&paste  
-and search&replace. No undo.  
-  
-  
-  
-  
-!yudit-2.1  
-  
-  
-  
-  
-  
-This version is less easy to learn, because it comes with a homebrewn  
-GUI and no easily accessible help. But it has an undo functionality and  
-should therefore be more usable than version 1.5.  
-  
-  
-  
-  
-!Fonts for yudit  
-  
-  
-  
-  
-  
-yudit can display text using a !TrueType font; see section "!TrueType fonts"  
-above. The Bitstream Cyberbit gives good results. For yudit to find the  
-font, symlink it to /usr/local/share/yudit/data/cyberbit.ttf.  
-  
-  
-  
-  
-!vim  
-  
-  
-  
-  
-  
-vim (as of version 6.0r) has good support for UTF-8: when started in an  
-UTF-8 locale, it assumes UTF-8 encoding for the console and the text files  
-being edited. It supports double-wide (CJK) characters as well and  
-combining characters and therefore fits perfectly into UTF-8 enabled  
-xterm.  
-  
-  
-Installation: Download from  
-http://www.vim.org/.  
-After unpacking the four parts, call ./configure with  
---with-features=big --enable-multibyte arguments  
-(or edit src/Makefile to include the --with-features=big and  
---enable-multibyte options). This will turn on the feature  
-FEAT_MBYTE. Then do "make" and "make install".  
-  
-  
-vim can be used to edit files in other encodings. For example, to edit  
-a BIG5 encoded file: :e ++cc=BIG5 filename. All encoding names  
-supported by iconv are accepted. Plus: vim automatically distinguishes  
-UTF-8 and ISO-8859-1 files without needing any command line option.  
-  
-  
-  
-  
-!cooledit  
-  
-  
-  
-  
-  
-cooledit by Paul Sheer  
-http://www.cooledit.org/  
-is a good text editor for the X Window System. Since version 3.15, it has  
-support for Unicode, including Bidi for Hebrew (but not Arabic).  
-  
-  
-A build error message message about a missing "vga_setpage" function is  
-worked around by adding "-DDO_NOT_USE_VGALIB" to the CFLAGS.  
-  
-  
-To view UTF-8 files in an UTF-8 locale you have to modify a setting in  
-the "Options -> Switches" panel: Enable the checkbox "Display characters  
-outside locale". I also found it necessary to disable "Spellcheck as you  
-type".  
-  
-  
-For viewing texts with both European and CJK characters, cooledit needs a  
-font which contains both, for example the GNU unifont (see section  
-"X11 Unicode fonts"): Start once  
-  
-  
-$ cooledit -fn -gnu-unifont-medium-r-normal--16-160-75-75-c-80-iso10646-1  
-  
-  
-cooledit will then use this font in all future invocations.  
-  
-  
-Unfortunately, the only characters that can be entered through the keyboard  
-are ISO-8859-1 characters and, through a cooledit specific compose mechanism,  
-ISO-8859-2 characters. Inputing arbitrary Unicode characters in cooledit is  
-possible, but a bit tedious.  
-  
-  
-  
-  
-!emacs  
-  
-  
-  
-  
-  
-First of all, you should read the section "International Character Set Support"  
-(node "International") in the Emacs manual. In particular, note that you need  
-to start Emacs using the command  
-  
-  
-$ emacs -fn fontset-standard  
-  
-  
-so that it will use a font set comprising a lot of international characters.  
-  
-  
-In the short term, there are two packages for using UTF-8 in Emacs. None  
-of them needs recompiling Emacs.  
-  
-  
-****The emacs-utf package  
-http://www.cs.ust.hk/faculty/otfried/Mule/  
-by Otfried Cheong provides a "unicode-utf8" encoding to Emacs.  
-****  
-  
-****The oc-unicode package  
-http://www.cs.ust.hk/faculty/otfried/Mule/,  
-by Otfried Cheong, an extension of the Mule-UCS package  
-ftp://etlport.etl.go.jp/pub/mule/Mule-UCS/Mule-UCS-.70.tar.gz  
-(mirrored at  
-http://riksun.riken.go.jp/archives/misc/mule/Mule-UCS/Mule-UCS-.70.tar.gz  
-and  
-ftp://ftp.m17n.org/pub/mule/Mule-UCS/Mule-UCS-.70.tar.gz)  
-by Hisashi Miyashita, provides a "utf-8" encoding to Emacs.  
-****  
-  
-  
-  
-You can use either of these packages, or both together. The advantages  
-of the emacs-utf "unicode-utf8" encoding are: it loads faster, and it deals  
-better with combining characters (important for Thai).  
-The advantage of the Mule-UCS / oc-unicode "utf-8" encoding is: it can apply  
-to a process buffer (such as M-x shell), not only to loading and saving of  
-files; and it respects the widths of characters better (important for  
-Ethiopian). However, it is less reliable: After heavy editing of a file, I  
-have seen some Unicode characters replaced with U+FFFD after the file was  
-saved. (But maybe that were bugs in Emacs 20.5 and 20.6 which are fixed in  
-Emacs 20.7.)  
-  
-  
-To install the emacs-utf package, compile the program "utf2mule" and install  
-it somewhere in your $PATH, also install unicode.el, muleuni-1.el,  
-unicode-char.el somewhere. Then add the lines  
-  
-  
-(setq load-path (cons "/home/user/somewhere/emacs" load-path))  
-(if (not (string-match "XEmacs" emacs-version))  
-(progn  
-(require 'unicode)  
-;(setq unicode-data-path "..../!UnicodeData-3...txt")  
-(if (eq window-system 'x)  
-(progn  
-(setq fontset12  
-(create-fontset-from-fontset-spec  
-"-misc-fixed-medium-r-normal-*-12-*-*-*-*-*-fontset-standard"))  
-(setq fontset13  
-(create-fontset-from-fontset-spec  
-"-misc-fixed-medium-r-normal-*-13-*-*-*-*-*-fontset-standard"))  
-(setq fontset14  
-(create-fontset-from-fontset-spec  
-"-misc-fixed-medium-r-normal-*-14-*-*-*-*-*-fontset-standard"))  
-(setq fontset15  
-(create-fontset-from-fontset-spec  
-"-misc-fixed-medium-r-normal-*-15-*-*-*-*-*-fontset-standard"))  
-(setq fontset16  
-(create-fontset-from-fontset-spec  
-"-misc-fixed-medium-r-normal-*-16-*-*-*-*-*-fontset-standard"))  
-(setq fontset18  
-(create-fontset-from-fontset-spec  
-"-misc-fixed-medium-r-normal-*-18-*-*-*-*-*-fontset-standard"))  
-; (set-default-font fontset15)  
-))))  
-  
-  
-to your $HOME/.emacs file. To activate any of the font sets, use the Mule  
-menu item "Set Font/!FontSet" or Shift-down-mouse-1. The Unicode coverage  
-may of the font sets at different sizes may depend on the installed fonts;  
- here are screen shots at various sizes of UTF-8-demo.txt (  
-12,  
-13,  
-14,  
-15,  
-16,  
-18)  
-and of the Mule script examples (  
-12,  
-13,  
-14,  
-15,  
-16,  
-18).  
-To designate a font set as the initial font set for the first frame at startup,  
-uncomment the set-default-font line in the code snippet above.  
-  
-  
-To install the oc-unicode package, execute the command  
-  
-  
-$ emacs -batch -l oc-comp.el  
-  
-  
-and install the resulting file un-define.elc, as well as  
-oc-unicode.el, oc-charsets.el, oc-tools.el,  
-somewhere. Then add the lines  
-  
-  
-(setq load-path (cons "/home/user/somewhere/emacs" load-path))  
-(if (not (string-match "XEmacs" emacs-version))  
-(progn  
-(require 'oc-unicode)  
-;(setq unicode-data-path "..../!UnicodeData-3...txt")  
-(if (eq window-system 'x)  
-(progn  
-(setq fontset12  
-(oc-create-fontset  
-"-misc-fixed-medium-r-normal-*-12-*-*-*-*-*-fontset-standard"  
-"-misc-fixed-medium-r-normal-ja-12-*-iso10646-*"))  
-(setq fontset13  
-(oc-create-fontset  
-"-misc-fixed-medium-r-normal-*-13-*-*-*-*-*-fontset-standard"  
-"-misc-fixed-medium-r-normal-ja-13-*-iso10646-*"))  
-(setq fontset14  
-(oc-create-fontset  
-"-misc-fixed-medium-r-normal-*-14-*-*-*-*-*-fontset-standard"  
-"-misc-fixed-medium-r-normal-ja-14-*-iso10646-*"))  
-(setq fontset15  
-(oc-create-fontset  
-"-misc-fixed-medium-r-normal-*-15-*-*-*-*-*-fontset-standard"  
-"-misc-fixed-medium-r-normal-ja-15-*-iso10646-*"))  
-(setq fontset16  
-(oc-create-fontset  
-"-misc-fixed-medium-r-normal-*-16-*-*-*-*-*-fontset-standard"  
-"-misc-fixed-medium-r-normal-ja-16-*-iso10646-*"))  
-(setq fontset18  
-(oc-create-fontset  
-"-misc-fixed-medium-r-normal-*-18-*-*-*-*-*-fontset-standard"  
-"-misc-fixed-medium-r-normal-ja-18-*-iso10646-*"))  
-; (set-default-font fontset15)  
-))))  
-  
-  
-to your $HOME/.emacs file. You can choose your appropriate font set as with  
-the emacs-utf package.  
-  
-  
-In order to open an UTF-8 encoded file, you will type  
-  
-  
-M-x universal-coding-system-argument unicode-utf8 RET  
-M-x find-file filename RET  
-  
-  
-or  
-  
-  
-C-x RET c unicode-utf8 RET  
-C-x C-f filename RET  
-  
-  
-(or utf-8 instead of unicode-utf8, if you prefer oc-unicode/Mule-UCS).  
-  
-  
-In order to start a shell buffer with UTF-8 I/O, you will type  
-  
-  
-M-x universal-coding-system-argument utf-8 RET  
-M-x shell RET  
-  
-  
-(This works with oc-unicode/Mule-UCS only.)  
-  
-  
-There is a newer version Mule-UCS-.81. Unfortunately you need to rebuild emacs  
-from source in order to use it.  
-  
-  
-Note that all this works with Emacs 20 in windowing mode only, not in terminal  
-mode. None of the mentioned packages works in Emacs 21, as of this writing.  
-  
-  
-Richard Stallman plans to add integrated UTF-8 support to Emacs in the long  
-term, and so does the XEmacs developers group.  
-  
-  
-  
-  
-!xemacs  
-  
-  
-  
-  
-  
-(This section is written by Gilbert Baumann.)  
-  
-  
-Here is how to teach XEmacs (20.4 configured with MULE) the UTF-8 encoding.  
-Unfortunately you need its sources to be able to patch it.  
-  
-  
-First you need these files provided by Tomohiko Morioka:  
-  
-  
-  
-http://turnbull.sk.tsukuba.ac.jp/Tools/XEmacs/xemacs-21.-b55-emc-b55-ucs.diff  
-and  
-http://turnbull.sk.tsukuba.ac.jp/Tools/XEmacs/xemacs-ucs-conv-.1.tar.gz  
-  
-The .diff is a diff against the C sources. The tar ball is elisp code,  
-which provides lots of code tables to map to and from Unicode. As the  
-name of the diff file suggests it is against XEmacs-21; I needed to  
-help `patch' a bit. The most notable difference to my XEmacs-20.4  
-sources is that file-coding.[[ch] was called mule-coding.[[ch].  
-  
-  
-For those unfamilar with the XEmacs-MULE stuff (as I am) a quick  
-guide:  
-  
-  
-What we call an encoding is called by MULE a `coding-system'. The most  
-important commands are:  
-  
-  
-  
-  
-  
-M-x set-file-coding-system  
-M-x set-buffer-process-coding-system [[comint buffers]  
-  
-  
-  
-  
-and the variable `file-coding-system-alist', which guides `find-file'  
-to guess the encoding used. After stuff was running, the very first  
-thing I did was  
-this.  
-  
-  
-This code looks into the special mode line introduced by -*- somewhere  
-in the first 600 bytes of the file about to opened; if now there is a  
-field "Encoding: xyz;" and the xyz encoding ("coding system" in Emacs speak)  
-exists, choose that. So now you could do e.g.  
-  
-  
-  
-  
-  
-;;; -*- Mode: Lisp; Syntax: Common-Lisp; Package: CLEX; Encoding: utf-8; -*-  
-  
-  
-  
-  
-and XEmacs goes into utf-8 mode here.  
-  
-  
-Atfer everything was running I defined \u03BB (greek lambda) as a  
-macro like:  
-  
-  
-  
-  
-  
-(defmacro \u03BB (x) `(lambda .,x))  
-  
-  
-  
-  
-  
-  
-!nedit  
-  
-  
-  
-  
-  
-  
-  
-!xedit  
-  
-  
-  
-  
-  
-With XFree86-4..1, xedit is able to edit UTF-8 files if you set the locale  
-accordingly (see above), and add the line "Xedit*international: true" to  
-your $HOME/.Xdefaults file.  
-  
-  
-  
-  
-!axe  
-  
-  
-  
-  
-  
-As of version 6.1.2, aXe supports only 8-bit locales. If you add the line  
-"Axe*international: true" to your $HOME/.Xdefaults file, it will simply dump  
-core.  
-  
-  
-  
-  
-!pico  
-  
-  
-  
-  
-  
-As of version 4.30, pine cannot be reasonably used to view or edit UTF-8  
-files. In UTF-8 enabled xterm, it has severe redraw problems.  
-  
-  
-  
-  
-!mined98  
-  
-  
-  
-  
-  
-mined98 is a small text editor by Michiel Huisjes, Achim Muumlller and  
-Thomas Wolff.  
-http://www.inf.fu-berlin.de/~wolff/mined98.tar.gz  
-It lets you edit UTF-8 or 8-bit encoded files, in an UTF-8 or 8-bit xterm.  
-It also has powerful capabilities for entering Unicode characters.  
-  
-  
-mined lets you edit both 8-bit encoded and UTF-8 encoded files. By default  
-it uses an autodetection heuristic. If you don't want to rely on heuristics,  
-pass the command-line option -u when editing an UTF-8 file, or  
-+u when editing an 8-bit encoded file. You can change the  
-interpretation at any time from within the editor: It displays the encoding  
-("L:h" for 8-bit, "U:h" for UTF-8) in the menu line. Click on the first  
-of these characters to change it.  
-  
-  
-mined knows about double-width and combining characters and displays them  
-correctly. It also has a special display mode for combining characters.  
-  
-  
-mined also has a scrollbar and very nice pull-down menus. Alas, the "Home",  
-"End", "Delete" keys do not work.  
-  
-  
-  
-  
-!qemacs  
-  
-  
-  
-  
-  
-qemacs .2 is a small text editor by Fabrice Bellard.  
-http://www-stud.enst.fr/~bellard/qemacs/  
-with Emacs keybindings. It runs in an UTF-8 console or xterm, and can edit  
-both 8-bit encoded and UTF-8 encoded files. It still has a few rough edges,  
-but further development is underway.  
-  
-  
-  
-  
-!!4.5 Mailers  
-  
-  
-  
-  
-  
-  
-MIME: RFC 2279 defines UTF-8 as a MIME charset, which can be transported  
-under the 8bit, quoted-printable and base64 encodings. The older MIME  
-UTF-7 proposal (RFC 2152) is considered to be deprecated and should not  
-be used any further.  
-  
-  
-Mail clients released after January 1, 1999, should be capable of sending and  
-displaying UTF-8 encoded mails, otherwise they are considered deficient.  
-But these mails have to carry the MIME labels  
-  
-  
-Content-Type: text/plain; charset=UTF-8  
-Content-Transfer-Encoding: 8bit  
-  
-  
-Simply piping an UTF-8 file into "mail" without caring about the MIME labels  
-will not work.  
-  
-  
-Mail client implementors should take a look at  
-http://www.imc.org/imc-intl/  
-and  
-http://www.imc.org/mail-i18n.html.  
-  
-  
-Now about the individual mail clients (or "mail user agents"):  
-  
-  
-  
-  
-!pine  
-  
-  
-  
-  
-  
-The situation for an unpatched pine version 4.30 is as follows.  
-  
-  
-Pine does not do character set conversions. But it allows you to view  
-UTF-8 mails in an UTF-8 text window (Linux console or xterm).  
-  
-  
-Normally, Pine will warn about different character sets each time you view  
-an UTF-8 encoded mail. To get rid of this warning, choose S (setup), then  
-C (config), then change the value of "character-set" to UTF-8. This option  
-will not do anything, except to reduce the warnings, as Pine has no built-in  
-knowledge of UTF-8.  
-  
-  
-Also note that Pine's notion of Unicode characters is pretty limited: It  
-will display Latin and Greek characters, but not other kinds of Unicode  
-characters.  
-  
-  
-A patch by Robert Brady  
-<robert@suse.co.uk>  
-http://www.ents.susu.soton.ac.uk/~robert/pine-utf8-.1.diff  
-adds UTF-8 support to Pine. With this patch, it decodes and prints headers  
-and bodies properly. The patch depends on the GNOME libunicode  
-http://cvs.gnome.org/lxr/source/libunicode/.  
-  
-  
-However, alignment remains broken in many places; replying to a mail does  
-not cause the character set to be converted as appropriate; and the editor,  
-pico, cannot deal with multibyte characters.  
-  
-  
-  
-  
-!kmail  
-  
-  
-  
-  
-  
-kmail (as of KDE 1.) does not support UTF-8 mails at all.  
-  
-  
-  
-  
-!Netscape Communicator  
-  
-  
-  
-  
-  
-Netscape Communicator's Messenger can send and display mails in UTF-8  
-encoding, but it needs a little bit of manual user intervention.  
-  
-  
-To send an UTF-8 encoded mail: After opening the "Compose" window, but before  
-starting to compose the message, select from the menu  
-"View -> Character Set -> Unicode (UTF-8)". Then compose the message and  
-send it.  
-  
-  
-When you receive an UTF-8 encoded mail, Netscape unfortunately does not  
-display it in UTF-8 right away, and does not even give a visual clue that  
-the mail was encoded in UTF-8. You have to manually select from the menu  
-"View -> Character Set -> Unicode (UTF-8)".  
-  
-  
-For displaying UTF-8 mails, Netscape uses different fonts. You can adjust  
-your font settings in the "Edit -> Preferences -> Fonts" dialog; choose  
-the "Unicode" font category.  
-  
-  
-  
-  
-!emacs (rmail, vm)  
-  
-  
-  
-  
-  
-  
-  
-!mutt  
-  
-  
-  
-  
-  
-mutt-1.2.x, as available from  
-http://www.mutt.org/,  
-has only rudimentary support for UTF-8: it can convert  
-from UTF-8 into an 8-bit display charset. The mutt-1.3.x  
-development branch also supports UTF-8 as the display charset,  
-so you can run Mutt in an UTF-8 xterm, and has thorough support  
-for MIME and charset conversion (relying on iconv).  
-  
-  
-  
-  
-!exmh  
-  
-  
-  
-  
-  
-exmh 2.1.2 with Tk 8.4a1 can recognize and correctly display UTF-8 mails  
-(without CJK characters) if you add the following lines to your  
-$HOME/.Xdefaults file.  
-  
-  
-!  
-! Exmh  
-!  
-exmh.mimeUCharsets: utf-8  
-exmh.mime_utf-8_registry: iso10646  
-exmh.mime_utf-8_encoding: 1  
-exmh.mime_utf-8_plain_families: fixed  
-exmh.mime_utf-8_fixed_families: fixed  
-exmh.mime_utf-8_proportional_families: fixed  
-exmh.mime_utf-8_title_families: fixed  
-  
-  
-  
-  
-  
-  
-!!4.6 Text processing  
-  
-  
-  
-  
-  
-  
-  
-  
-!groff  
-  
-  
-  
-  
-  
-groff 1.16.1, the GNU implementation of the traditional Unix text processing  
-system troff/nroff, can output UTF-8 formatted text. Simply use  
-`groff -Tutf8' instead of `groff -Tlatin1' or  
-`groff -Tascii'.  
-  
-  
-  
-  
-!TeX  
-  
-  
-  
-  
-  
-The teTeX .9 (and newer) distribution contains an Unicode adaptation of TeX,  
-called Omega  
-(  
-http://www.gutenberg.eu.org/omega/,  
-ftp://ftp.ens.fr/pub/tex/yannis/omega).  
-Together with the unicode.tex file contained in  
-utf8-tex-.1.tar.gz  
-it enables you to use UTF-8 encoded sources as input for TeX. A thousand of  
-Unicode characters are currently supported.  
-  
-  
-All that changes is that you run `omega' (instead of `tex') or `lambda'  
-(instead of `latex'), and insert the following lines at the head of  
-your source input.  
-  
-  
-\ocp\TexUTF=inutf8  
-\!InputTranslation currentfile \TexUTF  
-  
-  
-  
-  
-\input unicode  
-  
-  
-  
-  
-Other maybe related links:  
-http://www.dante.de/projekte/nts/NTS-FAQ.html,  
-ftp://ftp.dante.de/pub/tex/language/chinese/CJK/.  
-  
-  
-  
-  
-!!4.7 Databases  
-  
-  
-  
-  
-  
-  
-  
-  
-!PostgreSQL  
-  
-  
-  
-  
-  
-PostgreSQL 6.4 or newer can be built with the configuration option  
---with-mb=UNICODE.  
-  
-  
-  
-  
-!Interbase  
-  
-  
-  
-  
-  
-Borland/Inprise's Interbase 6.0 can store string fields in UTF-8 format  
-if the option "CHARACTER SET UNICODE_FSS" is given.  
-  
-  
-  
-  
-!!4.8 Other text-mode applications  
-  
-  
-  
-  
-  
-  
-  
-  
-!less  
-  
-  
-  
-  
-  
-With  
-http://www.flash.net/~marknu/less/less-358.tar.gz  
-you can browse UTF-8 encoded text files in an UTF-8 xterm or console.  
-Make sure that the environment variable LESSCHARSET is not set (or is set  
-to utf-8). If you also have a LESSKEY environment variable set, also make  
-sure that the file it points to does not define LESSCHARSET. If necessary,  
-regenerate this file using the `lesskey' command, or unset the LESSKEY  
-environment variable.  
-  
-  
-  
-  
-!lv  
-  
-  
-  
-  
-  
-lv-4.49.3 by Tomio Narita  
-http://www.ff.iij4u.or.jp/~nrt/lv/  
-is a file viewer with builtin character set converters. To view UTF-8 files  
-in an UTF-8 console, use "lv -Au8". But it can also be used to view  
-files in other CJK encodings in an UTF-8 console.  
-  
-  
-There is a small glitch: lv turns off xterm's cursor and doesn't turn it on  
-again.  
-  
-  
-  
-  
-!expand  
-  
-  
-  
-  
-  
-Get the GNU textutils-2.0 and apply the patch  
-textutils-2..diff,  
-then configure, add "#define HAVE_FGETWC 1", "#define HAVE_FPUTWC 1" to  
-config.h. Then rebuild.  
-  
-  
-  
-  
-!col, colcrt, colrm, column, rev, ul  
-  
-  
-  
-  
-  
-Get the util-linux-2.9y package, configure it, then define ENABLE_WIDECHAR in  
-defines.h, change the "#if " to "#if 1" in lib/widechar.h. In  
-text-utils/Makefile, modify CFLAGS and LDFLAGS so that they include the  
-directories where libutf8 is installed. Then rebuild.  
-  
-  
-  
-  
-!figlet  
-  
-  
-  
-  
-  
-figlet 2.2 has an option for UTF-8 input: "figlet -C utf8"  
-  
-  
-  
-  
-!Base utilities  
-  
-  
-  
-  
-  
-The Li18nux list of commands and utilities that ought to be made interoperable  
-with UTF-8 is as follows. Useful information needs to get added here; I just  
-didn't get around it yet :-)  
-  
-  
-As of glibc-2.2, regular expressions only work for 8-bit characters.  
-In an UTF-8 locale, regular expressions that contain non-ASCII characters  
-or that expect to match a single multibyte character with "." do not work.  
-This affects all commands and utilities listed below.  
-  
-  
-  
-  
-; __alias__:  
-  
-No info available yet.  
-; __ar__:  
-  
-No info available yet.  
-; __arch__:  
-  
-No info available yet.  
-; __arp__:  
-  
-No info available yet.  
-; __at__:  
-  
-As of at-3.1.8: The two uses of isalnum in at.c are invalid and should be  
-replaced with a use of quotearg.c or an exclude list of the (fixed) list  
-of shell metacharacters. The two uses of %8s in at.c and atd.c are invalid  
-and should become arbitrary length.  
-; __awk__:  
-  
-No info available yet.  
-; __basename__:  
-  
-As of sh-utils-2.0i: OK.  
-; __batch__:  
-  
-No info available yet.  
-; __bc__:  
-  
-No info available yet.  
-; __bg__:  
-  
-No info available yet.  
-; __bunzip2__:  
-  
-No info available yet.  
-; __bzip2__:  
-  
-No info available yet.  
-; __bzip2recover__:  
-  
-No info available yet.  
-; __cal__:  
-  
-No info available yet.  
-; __cat__:  
-  
-No info available yet.  
-; __cd__:  
-  
-No info available yet.  
-; __cflow__:  
-  
-No info available yet.  
-; __chgrp__:  
-  
-As of fileutils-4.0u: OK.  
-; __chmod__:  
-  
-As of fileutils-4.0u: OK.  
-; __chown__:  
-  
-As of fileutils-4.0u: OK.  
-; __chroot__:  
-  
-As of sh-utils-2.0i: OK.  
-; __cksum__:  
-  
-As of textutils-2.0e: OK.  
-; __clear__:  
-  
-No info available yet.  
-; __cmp__:  
-  
-No info available yet.  
-; __col__:  
-  
-No info available yet.  
-; __comm__:  
-  
-No info available yet.  
-; __command__:  
-  
-No info available yet.  
-; __compress__:  
-  
-No info available yet.  
-; __cp__:  
-  
-As of fileutils-4.0u: OK.  
-; __cpio__:  
-  
-No info available yet.  
-; __crontab__:  
-  
-No info available yet.  
-; __csplit__:  
-  
-No info available yet.  
-; __ctags__:  
-  
-No info available yet.  
-; __cut__:  
-  
-No info available yet.  
-; __date__:  
-  
-As of sh-utils-2.0i: OK.  
-; __dd__:  
-  
-As of fileutils-4.0u: The conv=lcase, conv=ucase options don't work correctly.  
-; __df__:  
-  
-As of fileutils-4.0u: OK.  
-; __diff__:  
-  
-As of diffutils-2.7.2: the --side-by-side mode therefore doesn't compute  
-column width correctly.  
-; __diff3__:  
-  
-No info available yet.  
-; __dirname__:  
-  
-As of sh-utils-2.0i: OK.  
-; __domainname__:  
-  
-No info available yet.  
-; __du__:  
-  
-As of fileutils-4.0u: OK.  
-; __echo__:  
-  
-As of sh-utils-2.0i: OK.  
-; __ed__:  
-  
-No info available yet.  
-; __egrep__:  
-  
-No info available yet.  
-; __env__:  
-  
-As of sh-utils-2.0i: OK.  
-; __ex__:  
-  
-No info available yet.  
-; __expand__:  
-  
-No info available yet.  
-; __expr__:  
-  
-As of sh-utils-2.0i: The operators "match", "substr", "index", "length"  
-don't work correctly.  
-; __false__:  
-  
-As of sh-utils-2.0i: OK.  
-; __fc__:  
-  
-No info available yet.  
-; __fg__:  
-  
-No info available yet.  
-; __fgrep__:  
-  
-No info available yet.  
-; __file__:  
-  
-No info available yet.  
-; __find__:  
-  
-As of findutils-4.1.6: The "-iregex" does not work correctly; this needs a  
-fix in function find/parser.c:insert_regex.  
-; __fold__:  
-  
-No info available yet.  
-; __ftp[[BSD]__:  
-  
-No info available yet.  
-; __fuser__:  
-  
-No info available yet.  
-; __gencat__:  
-  
-No info available yet.  
-; __getconf__:  
-  
-No info available yet.  
-; __getopts__:  
-  
-No info available yet.  
-; __gettext__:  
-  
-No info available yet.  
-; __grep__:  
-  
-No info available yet.  
-; __gunzip__:  
-  
-No info available yet.  
-; __gzip__:  
-  
-gzip-1.3 is UTF-8 capable, but it uses only English messages in ASCII  
-charset. Proper internationalization would require: Use gettext. Call  
-setlocale. In function check_ofname (file gzip.c), use the function rpmatch  
-from GNU text/sh/fileutils instead of asking for "y" or "n". The use  
-of strlen in gzip.c:852 is wrong, needs to use the function mbswidth.  
-; __hash__:  
-  
-No info available yet.  
-; __head__:  
-  
-No info available yet.  
-; __hostname__:  
-  
-As of sh-utils-2.0i: OK.  
-; __iconv__:  
-  
-No info available yet.  
-; __id__:  
-  
-As of sh-utils-2.0i: OK.  
-; __ifconfig__:  
-  
-No info available yet.  
-; __imake__:  
-  
-No info available yet.  
-; __ipcrm__:  
-  
-No info available yet.  
-; __ipcs__:  
-  
-No info available yet.  
-; __jobs__:  
-  
-No info available yet.  
-; __join__:  
-  
-No info available yet.  
-; __kill__:  
-  
-No info available yet.  
-; __killall__:  
-  
-No info available yet.  
-; __ldd__:  
-  
-No info available yet.  
-; __less__:  
-  
-No complete info available yet.  
-; __lex__:  
-  
-No info available yet.  
-; __ln__:  
-  
-As of fileutils-4.0u: OK.  
-; __locale__:  
-  
-As of glibc-2.2: OK.  
-; __localedef__:  
-  
-As of glibc-2.2: OK.  
-; __logger__:  
-  
-No info available yet.  
-; __logname__:  
-  
-As of sh-utils-2.0i: OK.  
-; __lp__:  
-  
-No info available yet.  
-; __lpc[[BSD]__:  
-  
-No info available yet.  
-; __lpq[[BSD]__:  
-  
-No info available yet.  
-; __lpr[[BSD]__:  
-  
-No info available yet.  
-; __lprm[[BSD]__:  
-  
-No info available yet.  
-; __lpstat(LEGACY)__:  
-  
-No info available yet.  
-; __ls__:  
-  
-As of fileutils-4.0y: OK.  
-; __m4__:  
-  
-No info available yet.  
-; __mailx__:  
-  
-No info available yet.  
-; __make__:  
-  
-No info available yet.  
-; __man__:  
-  
-No info available yet.  
-; __mesg__:  
-  
-No info available yet.  
-; __mkdir__:  
-  
-As of fileutils-4.0u: OK.  
-; __mkfifo__:  
-  
-As of fileutils-4.0u: OK.  
-; __mkfs__:  
-  
-No info available yet.  
-; __mkswap__:  
-  
-No info available yet.  
-; __more__:  
-  
-No info available yet.  
-; __mount__:  
-  
-No info available yet.  
-; __msgfmt__:  
-  
-No info available yet.  
-; __msgmerge__:  
-  
-No info available yet.  
-; __mv__:  
-  
-As of fileutils-4.0u: OK.  
-; __netstat__:  
-  
-No info available yet.  
-; __newgrp__:  
-  
-No info available yet.  
-; __nice__:  
-  
-As of sh-utils-2.0i: OK.  
-; __nl__:  
-  
-No info available yet.  
-; __nohup__:  
-  
-As of sh-utils-2.0i: OK.  
-; __nslookup__:  
-  
-No info available yet.  
-; __nm__:  
-  
-No info available yet.  
-; __od__:  
-  
-No info available yet.  
-; __passwd[[BSD]__:  
-  
-No info available yet.  
-; __paste__:  
-  
-No info available yet.  
-; __patch__:  
-  
-No info available yet.  
-; __pathchk__:  
-  
-As of sh-utils-2.0i: OK.  
-; __ping__:  
-  
-No info available yet.  
-; __pr__:  
-  
-No info available yet.  
-; __printf__:  
-  
-As of sh-utils-2.0i: OK.  
-; __ps__:  
-  
-No info available yet.  
-; __pwd__:  
-  
-As of sh-utils-2.0i: OK.  
-; __read__:  
-  
-No info available yet.  
-; __reboot__:  
-  
-No info available yet.  
-; __renice__:  
-  
-No info available yet.  
-; __rm__:  
-  
-As of fileutils-4.0u: OK.  
-; __rmdir__:  
-  
-As of fileutils-4.0u: OK.  
-; __sed__:  
-  
-No info available yet.  
-; __shar[[BSD]__:  
-  
-No info available yet.  
-; __shutdown__:  
-  
-No info available yet.  
-; __sleep__:  
-  
-As of sh-utils-2.0i: OK.  
-; __sort__:  
-  
-No info available yet.  
-; __split__:  
-  
-No info available yet.  
-; __strings__:  
-  
-No info available yet.  
-; __strip__:  
-  
-No info available yet.  
-; __stty__:  
-  
-As of sh-utils-2..11: OK.  
-; __su[[BSD]__:  
-  
-No info available yet.  
-; __sum__:  
-  
-As of textutils-2.0e: OK.  
-; __tail__:  
-  
-No info available yet.  
-; __talk__:  
-  
-No info available yet.  
-; __tar__:  
-  
-As of tar-1.13.17: OK, if user and group names are always ASCII.  
-; __tclsh__:  
-  
-No info available yet.  
-; __tee__:  
-  
-As of sh-utils-2.0i: OK.  
-; __telnet__:  
-  
-No info available yet.  
-; __test__:  
-  
-As of sh-utils-2.0i: OK.  
-; __time__:  
-  
-No info available yet.  
-; __touch__:  
-  
-As of fileutils-4.0u: OK.  
-; __tput__:  
-  
-No info available yet.  
-; __tr__:  
-  
-No info available yet.  
-; __true__:  
-  
-As of sh-utils-2.0i: OK.  
-; __tsort__:  
-  
-No info available yet.  
-; __tty__:  
-  
-As of sh-utils-2.0i: OK.  
-; __type__:  
-  
-No info available yet.  
-; __ulimit__:  
-  
-No info available yet.  
-; __umask__:  
-  
-No info available yet.  
-; __umount__:  
-  
-No info available yet.  
-; __unalias__:  
-  
-No info available yet.  
-; __uname__:  
-  
-As of sh-utils-2.0i: OK.  
-; __uncompress__:  
-  
-No info available yet.  
-; __unexpand__:  
-  
-No info available yet.  
-; __uniq__:  
-  
-No info available yet.  
-; __uudecode__:  
-  
-No info available yet.  
-; __uuencode__:  
-  
-No info available yet.  
-; __vi__:  
-  
-No info available yet.  
-; __wait__:  
-  
-No info available yet.  
-; __wc__:  
-  
-As of textutils-2..8: OK.  
-; __who__:  
-  
-As of sh-utils-2.0i: OK.  
-; __wish__:  
-  
-No info available yet.  
-; __write__:  
-  
-No info available yet.  
-; __xargs__:  
-  
-As of findutils-4.1.5: The program uses strstr; a patch has been submitted  
-to the maintainer.  
-; __xgettext__:  
-  
-No info available yet.  
-; __yacc__:  
-  
-No info available yet.  
-; __zcat__:  
-  
-No info available yet.  
-  
-  
-  
-  
-  
-!!4.9 Other X11 applications  
-  
-  
-  
-  
-  
-  
-Owen Taylor is currently developing a library for rendering multilingual  
-text, called pango.  
-http://www.labs.redhat.com/~otaylor/pango/,  
-http://www.pango.org/.  
-  
-  
-  
-  
-  
-  
-----  
-  
-!!5. Printing  
-  
-  
-  
-  
-  
-Since Postscript itself does not support Unicode fonts, the burden of  
-Unicode support in printing is on the program creating the Postscript  
-document, not on the Postscript renderer.  
-  
-  
-The existing Postscript fonts I've seen - .pfa/.pfb/.afm/.pfm/.gsf -  
-support only a small range of glyphs and are not Unicode fonts.  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-!!5.1 Printing using !TrueType fonts  
-  
-  
-  
-  
-  
-  
-Both the uniprint and wprint programs produce good printed output  
-for Unicode plain text. They require a !TrueType font; see section  
-"!TrueType fonts" above. The Bitstream Cyberbit gives good results.  
-  
-  
-  
-  
-!uniprint  
-  
-  
-  
-  
-  
-The "uniprint" program contained in the yudit package can convert a text  
-file to Postscript. For uniprint to find the Cyberbit font, symlink it to  
-/usr/local/share/yudit/data/cyberbit.ttf.  
-  
-  
-  
-  
-!wprint  
-  
-  
-  
-  
-  
-The "wprint" (!WorldPrint) program by Eduardo Trapani  
-http://ttt.esperanto.org.uy/programoj/angle/wprint.html  
-postprocesses Postscript output produced by Netscape Communicator or Mozilla  
-from HTML pages or plain text files.  
-  
-  
-The output is nearly perfect; only in Cyrillic paragraphs the line breaking  
-is incorrect: the lines are only about half as wide as they should be.  
-  
-  
-  
-  
-!Comparison  
-  
-  
-  
-  
-  
-For plain text, uniprint has a better overall layout. On the other hand,  
-only wprint gets Thai output correct.  
-  
-  
-  
-  
-!!5.2 Printing using fixed-size fonts  
-  
-  
-  
-  
-  
-  
-Generally, printing using fixed-size fonts does not give an as professional  
-output as using !TrueType fonts.  
-  
-  
-  
-  
-!txtbdf2ps  
-  
-  
-  
-  
-  
-The txtbdf2ps .7 program by Serge Winitzki  
-http://members.linuxstart.com/~winitzki/txtbdf2ps.html  
-converts a plain text file to Postscript, by use of a BDF font.  
-Installation:  
-  
-  
-# install -m 777 txtbdf2ps-dev.txt /usr/local/bin/txtbdf2ps  
-  
-  
-Example with a proportional font:  
-  
-  
-$ txtbdf2ps -BDF=cyberbit.bdf -UTF-8 -nowrap < input.txt > output.ps  
-  
-  
-Example with a fixed-width font:  
-  
-  
-$ txtbdf2ps -BDF=unifont.bdf -UTF-8 -nowrap < input.txt > output.ps  
-  
-  
-  
-  
-Note: txtbdf2ps does not support combining characters and bidi.  
-  
-  
-  
-  
-!!5.3 The classical approach  
-  
-  
-  
-  
-  
-  
-Another way to print with !TrueType fonts is to convert the !TrueType font to  
-a Postscript font using the ttf2pt1 utility  
-(  
-http://www.netspace.net.au/~mheath/ttf2pt1/,  
-http://quadrant.netspace.net.au/ttf2pt1/,  
-http://ttf2pt1.sourceforge.net/). Details can be  
-found in Julius Chroboczek's "Printing with !TrueType fonts in Unix" writeup,  
-http://www.dcs.ed.ac.uk/home/jec/programs/xfsft/printing.html.  
-  
-  
-  
-  
-!TeX, Omega  
-  
-  
-  
-  
-  
-TODO: CJK, metafont, omega, dvips, odvips, utf8-tex-.1  
-  
-  
-  
-  
-  
-  
-  
-!!DocBook  
-  
-  
-  
-  
-  
-TODO: db2ps, jadetex  
-  
-  
-  
-  
-!groff -Tps  
-  
-  
-  
-  
-  
-"groff -Tps" produces Postscript output. Its Postscript output driver  
-supports only a very limited number of Unicode characters (only what  
-Postscript supports by itself).  
-  
-  
-  
-  
-  
-  
-  
-!!5.4 No luck with...  
-  
-  
-!Netscape's "Print..."  
-  
-  
-  
-  
-  
-As of version 4.72, Netscape Communicator cannot correctly print HTML  
-pages in UTF-8 encoding. You really have to use wprint.  
-  
-  
-  
-  
-!Mozilla's "Print..."  
-  
-  
-  
-  
-  
-As of version M16, printing of HTML pages is apparently not implemented.  
-  
-  
-  
-  
-!html2ps  
-  
-  
-  
-  
-  
-As of version 1.0b1, the html2ps HTML to Postscript converter does not support  
-UTF-8 encoded HTML pages and has no special treatment of fonts: the generated  
-Postscript uses the standard Postscript fonts.  
-  
-  
-  
-  
-!a2ps  
-  
-  
-  
-  
-  
-As of version 4.12, a2ps doesn't support printing UTF-8 encoded text.  
-  
-  
-  
-  
-!enscript  
-  
-  
-  
-  
-  
-As of version 1.6.1, enscript doesn't support printing UTF-8 encoded text.  
-By default, it uses only the standard Postscript fonts, but it can also  
-include a custom Postscript font in the output.  
-  
-  
-  
-  
-  
-  
-----  
-  
-!!6. Making your programs Unicode aware  
-  
-  
-  
-  
-  
-  
-  
-!!6.1 C/C++  
-  
-  
-  
-  
-  
-  
-The C `char' type is 8-bit and will stay 8-bit because it denotes  
-the smallest addressable data unit. Various facilities are available:  
-  
-  
-  
-  
-!For normal text handling  
-  
-  
-  
-  
-  
-The ISO/ANSI C standard contains, in an amendment which was added in 1995,  
-a "wide character" type `wchar_t', a set of functions like those  
-found in <string.h> and <ctype.h> (declared in  
-<wchar.h> and <wctype.h>, respectively), and  
-a set of conversion functions between `char *' and  
-`wchar_t *' (declared in <stdlib.h>).  
-  
-  
-Good references for this API are  
-  
-  
-****the GNU libc-2.1 manual, chapters 4 "Character Handling" and  
-6 "Character Set Handling",  
-****  
-  
-****the manual pages  
-man-mbswcs.tar.gz, now contained in  
-ftp://ftp.win.tue.nl/pub/linux-local/manpages/man-pages-1.29.tar.gz,  
-****  
-  
-****the !OpenGroup's introduction  
-http://www.unix-systems.org/version2/whatsnew/login_mse.html,  
-****  
-  
-****the !OpenGroup's Single Unix specification  
-http://www.UNIX-systems.org/online.html,  
-****  
-  
-****the ISO/IEC 9899:1999 (ISO C 99) standard. The latest draft before it was  
-adopted is called n2794. You find it at  
-ftp://ftp.csn.net/DMK/sc22wg14/review/  
-or  
-http://java-tutor.com/docs/c/.  
-****  
-  
-****Clive Feather's introduction  
-http://www.lysator.liu.se/c/na1.html,  
-****  
-  
-****the Dinkumware C library reference  
-http://www.dinkumware.com/htm_cl/.  
-****  
-  
-  
-  
-Advantages of using this API:  
-  
-  
-****It's a vendor independent standard.  
-****  
-  
-****The functions do the right thing, depending on the user's locale.  
-All a program needs to call is setlocale(LC_ALL,"");.  
-****  
-  
-  
-  
-Drawbacks of this API:  
-  
-  
-****Some of the functions are not multithread-safe, because they keep a hidden  
-internal state between function calls.  
-****  
-  
-****There is no first-class locale datatype. Therefore this API cannot reasonably  
-be used for anything that needs more than one locale or character set at the  
-same time.  
-****  
-  
-****The OS support for this API is not good on most OSes.  
-****  
-  
-  
-  
-  
-  
-!Portability notes  
-  
-  
-  
-  
-  
-A `wchar_t' may or may not be encoded in Unicode; this is  
-platform and sometimes also locale dependent. A multibyte sequence  
-`char *' may or may not be encoded in UTF-8; this is platform  
-and sometimes also locale dependent.  
-  
-  
-In detail, here is what the  
-Single Unix specification  
-says about the `wchar_t' type:  
-''All wide-character codes in a given process consist of an equal number  
-of bits. This is in contrast to characters, which can consist of a  
-variable number of bytes. The byte or byte sequence that represents a  
-character can also be represented as a wide-character code.  
-Wide-character codes thus provide a uniform size for manipulating text  
-data. A wide-character code having all bits zero is the null  
-wide-character code, and terminates wide-character strings. The  
-wide-character value for each member of the Portable Character Set'' (i.e. ASCII) ''will equal its value when used as the lone character in an integer  
-character constant. Wide-character codes for other characters are  
-locale- and implementation-dependent. State shift bytes do not have a  
-wide-character code representation.''  
-  
-  
-One particular consequence is that in portable programs you shouldn't use  
-non-ASCII characters in string literals. That means, even though you  
-know the Unicode double quotation marks have the codes U+201C and U+201D,  
-you shouldn't write a string literal L"\u201cHello\u201d, he said"  
-or "\xe2\x80\x9cHello\xe2\x80\x9d, he said" in C programs. Instead,  
-use GNU gettext, write it as gettext("'Hello', he said"), and create  
-a message database en.po which translates "'Hello', he said" to  
-"\u201cHello\u201d, he said".  
-  
-  
-Here is a survey of the portability of the ISO/ANSI C facilities on various  
-Unix flavours.  
-  
-  
-  
-  
-; __GNU glibc-2.2.x__:  
-  
-  
-  
-  
-****<wchar.h> and <wctype.h> exist.  
-****  
-  
-****Has wcs/mbs functions, fgetwc/fputwc/wprintf, everything.  
-****  
-  
-****Has five UTF-8 locales.  
-****  
-  
-****mbrtowc works.  
-****  
-  
-; __GNU glibc-2..x, glibc-2.1.x__:  
-  
-  
-  
-  
-****<wchar.h> and <wctype.h> exist.  
-****  
-  
-****Has wcs/mbs functions, but no fgetwc/fputwc/wprintf.  
-****  
-  
-****No UTF-8 locale.  
-****  
-  
-****mbrtowc returns EILSEQ for bytes >= 0x80.  
-****  
-  
-; __AIX 4.3__:  
-  
-  
-  
-  
-****<wchar.h> and <wctype.h> exist.  
-****  
-  
-****Has wcs/mbs functions, fgetwc/fputwc/wprintf, everything.  
-****  
-  
-****Has many UTF-8 locales, one for every country.  
-****  
-  
-****Needs -D_XOPEN_SOURCE=500 in order to define mbstate_t.  
-****  
-  
-****mbrtowc works.  
-****  
-  
-; __Solaris 2.7__:  
-  
-  
-  
-  
-****<wchar.h> and <wctype.h> exist.  
-****  
-  
-****Has wcs/mbs functions, fgetwc/fputwc/wprintf, everything.  
-****  
-  
-****Has the following UTF-8 locales:  
-en_US.UTF-8, de.UTF-8, es.UTF-8, fr.UTF-8, it.UTF-8, sv.UTF-8.  
-****  
-  
-****mbrtowc returns -1/EILSEQ (instead of -2) for bytes >= 0x80.  
-****  
-  
-; __OSF/1 4.0d__:  
-  
-  
-  
-  
-****<wchar.h> and <wctype.h> exist.  
-****  
-  
-****Has wcs/mbs functions, fgetwc/fputwc/wprintf, everything.  
-****  
-  
-****Has an add-on universal.utf8@ucs4 locale, see "man 5 unicode".  
-****  
-  
-****mbrtowc does not know about UTF-8.  
-****  
-  
-; __Irix 6.5__:  
-  
-  
-  
-  
-****<wchar.h> and <wctype.h> exist.  
-****  
-  
-****Has wcs/mbs functions and fgetwc/fputwc, but not wprintf.  
-****  
-  
-****Has no multibyte locales.  
-****  
-  
-****Has only a dummy definition for mbstate_t.  
-****  
-  
-****Doesn't have mbrtowc.  
-****  
-  
-; __HP-UX 11.00__:  
-  
-  
-  
-  
-****<wchar.h> exists, <wctype.h> does not.  
-****  
-  
-****Has wcs/mbs functions and fgetwc/fputwc, but not wprintf.  
-****  
-  
-****Has a C.utf8 locale.  
-****  
-  
-****Doesn't have mbstate_t.  
-****  
-  
-****Doesn't have mbrtowc.  
-****  
-  
-  
-  
-  
-As a consequence, I recommend to use the restartable and multithread-safe  
-wcsr/mbsr functions, forget about those systems which don't have them (Irix,  
-HP-UX, AIX), and use the UTF-8 locale plug-in libutf8_plug.so (see below)  
-on those systems which permit you to compile programs which use these  
-wcsr/mbsr functions (Linux, Solaris, OSF/1).  
-  
-  
-A similar advice, given by Sun in  
-http://www.sun.com/software/white-papers/wp-unicode/,  
-section "Internationalized Applications with Unicode", is:  
-  
-  
-''To properly internationalize an application, use the following  
-guidelines:''  
-  
-  
-***#''Avoid direct access with Unicode. This is a task of the platform's  
-internationalization framework.''  
-***#  
-  
-***#''Use the POSIX model for multibyte and wide-character interfaces.''  
-***#  
-  
-***#''Only call the APIs that the internationalization framework  
-provides for language and cultural-specific operations.''  
-***#  
-  
-***#''Remain code-set independent.''  
-***#  
-  
-  
-  
-If, for some reason, in some piece of code, you really have to assume that  
-`wchar_t' is Unicode (for example, if you want to do special treatment of  
-some Unicode characters), you should make that piece of code conditional  
-upon the result of is_locale_utf8(). Otherwise you will mess up  
-your program's behaviour in different locales or other platforms. The  
-function is_locale_utf8 is declared in  
-utf8locale.h  
-and defined in  
-utf8locale.c.  
-  
-  
-  
-  
-!The libutf8 library  
-  
-  
-  
-  
-  
-A portable implementation of the ISO/ANSI C API, which supports 8-bit locales  
-and UTF-8 locales, can be found in  
-libutf8-.7.3.tar.gz.  
-  
-  
-Advantages:  
-  
-  
-****Unicode UTF-8 support now, portably, even on OSes whose multibyte character  
-support does not work or which don't have multibyte/wide character support  
-at all.  
-****  
-  
-****The same binary works in all OS supported 8-bit locales and in UTF-8 locales.  
-****  
-  
-****When an OS vendor adds proper multibyte character support, you can take  
-advantage of it by simply recompiling without -DHAVE_LIBUTF8 compiler option.  
-****  
-  
-  
-  
-  
-  
-!The Plan9 way  
-  
-  
-  
-  
-  
-The Plan9 operating system, a variant of Unix, uses UTF-8 as character  
-encoding in all applications. Its wide character type is called  
-`Rune', not `wchar_t'. Parts of its libraries, written by  
-Rob Pike and Howard Trickey, are available at  
-ftp://ftp.cdrom.com/pub/netlib/research/9libs/9libs-1..tar.gz.  
-Another similar library, written by Alistair G. Crooks, is  
-ftp://ftp.cdrom.com/pub/NetBSD/packages/distfiles/libutf-2.10.tar.gz.  
-In particular, each of these libraries contains an UTF-8 aware regular  
-expression matcher.  
-  
-  
-Drawback of this API:  
-  
-  
-****UTF-8 is compiled in, not optional. Programs compiled in this universe lose  
-support for the 8-bit encodings which are still frequently used in Europe.  
-****  
-  
-  
-  
-  
-  
-!For graphical user interface  
-  
-  
-  
-  
-  
-The Qt-2.0 library  
-http://www.troll.no/  
-contains a fully-Unicode QString class. You can use the member functions  
-QString::utf8 and QString::fromUtf8 to convert to/from UTF-8 encoded text.  
-The QString::ascii and QString::latin1 member functions should not be used  
-any more.  
-  
-  
-  
-  
-!For advanced text handling  
-  
-  
-  
-  
-  
-The previously mentioned libraries implement Unicode aware versions of  
-the ASCII concepts. Here are libraries which deal with Unicode concepts,  
-such as titlecase (a third letter case, different from uppercase and  
-lowercase), distinction between punctuation and symbols, canonical  
-decomposition, combining classes, canonical ordering and the like.  
-  
-  
-  
-  
-; __ucdata-2.4__:  
-  
-The ucdata library by Mark Leisher  
-http://crl.nmsu.edu/~mleisher/ucdata.html  
-deals with character properties, case conversion, decomposition, combining  
-classes. The companion package ure-.5  
-http://crl.nmsu.edu/~mleisher/ure-.5.tar.gz  
-is a Unicode regular expression matcher.  
-  
-  
-  
-; __ustring__:  
-  
-The ustring C++ library by Rodrigo Reyes  
-http://ustring.charabia.net/  
-deals with character properties, case conversion, decomposition, combining  
-classes, and includes a Unicode regular expression matcher.  
-  
-  
-  
-; __ICU__:  
-  
-International Components for Unicode  
-http://oss.software.ibm.com/icu/.  
-IBM's very comprehensive internationalization library featuring Unicode strings,  
-resource bundles, number formatters, date/time formatters, message formatters,  
-collation and more. Lots of supported locales. Portable to Unix and Win32,  
-but compiles out of the box only on Linux libc6, not libc5.  
-  
-  
-  
-; __libunicode__:  
-  
-The GNOME libunicode library  
-http://cvs.gnome.org/lxr/source/libunicode/  
-by Tom Tromey and others. It covers character set conversion, character  
-properties, decomposition.  
-  
-  
-  
-  
-  
-  
-  
-  
-!For conversion  
-  
-  
-  
-  
-  
-Two kinds of conversion libraries, which support UTF-8 and a large number  
-of 8-bit character sets, are available:  
-  
-  
-  
-  
-!iconv  
-  
-  
-  
-  
-  
-The iconv implementation by Ulrich Drepper, contained in the GNU glibc-2.2.  
-ftp://ftp.gnu.org/pub/gnu/glibc/glibc-2.2.tar.gz.  
-The iconv manpages are now contained in  
-ftp://ftp.win.tue.nl/pub/linux-local/manpages/man-pages-1.29.tar.gz.  
-  
-  
-The portable iconv implementation by Bruno Haible.  
-ftp://ftp.ilog.fr/pub/Users/haible/gnu/libiconv-1.5.1.tar.gz  
-  
-The portable iconv implementation by Konstantin Chuguev.  
-<joy@urc.ac.ru>  
-ftp://ftp.urc.ac.ru/pub/local/OS/Unix/converters/iconv-.4.tar.gz  
-  
-Advantages:  
-  
-  
-****iconv is POSIX standardized, programs using iconv to convert from/to UTF-8  
-will also run under Solaris. However, the names for the character sets differ  
-between platforms. For example, "EUC-JP" under glibc is "eucJP" under HP-UX.  
-(The official IANA name for this character set is "EUC-JP", so it's clearly  
-a HP-UX deficiency.)  
-****  
-  
-****On glibc-2.1 systems, no additional library is needed. On other systems, one of  
-the two other iconv implementations can be used.  
-****  
-  
-  
-  
-  
-  
-!librecode  
-  
-  
-  
-  
-  
-librecode by Franccedilois Pinard  
-ftp://ftp.gnu.org/pub/gnu/recode/recode-3.6.tar.gz.  
-  
-  
-Advantages:  
-  
-  
-****Support for transliteration, i.e. conversion of non-ASCII characters  
-to sequences of ASCII characters in order to preserve readability by  
-humans, even when a lossless transformation is impossible.  
-****  
-  
-  
-  
-Drawbacks:  
-  
-  
-****Non-standard API.  
-****  
-  
-****Slow initialization.  
-****  
-  
-  
-  
-  
-  
-!ICU  
-  
-  
-  
-  
-  
-International Components for Unicode 1.7  
-http://oss.software.ibm.com/icu/.  
-IBM's internationalization library also has conversion facilities, declared  
-in `ucnv.h'.  
-  
-  
-Advantages:  
-  
-  
-****Comprehensive set of supported encodings.  
-****  
-  
-  
-  
-Drawbacks:  
-  
-  
-****Non-standard API.  
-****  
-  
-  
-  
-  
-  
-!Other approaches  
-  
-  
-  
-  
-  
-  
-  
-; __libutf-8__:  
-  
-libutf-8 by G. Adam Stanislav  
-<adam@whizkidtech.net>  
-contains a few functions for on-the-fly conversion from/to UTF-8 encoded  
-`FILE*' streams.  
-http://www.whizkidtech.net/i18n/libutf-8-1..tar.gz  
-  
-Advantages:  
-  
-  
-****Very small.  
-****  
-  
-  
-  
-Drawbacks:  
-  
-  
-****Non-standard API.  
-****  
-  
-****UTF-8 is compiled in, not optional. Programs compiled with this library  
-lose support for the 8-bit encodings which are still frequently used in Europe.  
-****  
-  
-****Installation is nontrivial: Makefile needs tweaking, not autoconfiguring.  
-****  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-!!6.2 Java  
-  
-  
-  
-  
-  
-  
-Java has Unicode support built into the language. The type `char' denotes  
-a Unicode character, and the `java.lang.String' class denotes a string  
-built up from Unicode characters.  
-  
-  
-Java can display any Unicode characters through its windowing system AWT,  
-provided that  
-1. you set the Java system property "user.language" appropriately,  
-2. the /usr/lib/java/lib/font.properties.''language'' font set  
-definitions are appropriate, and  
-3. the fonts specified in that file are installed.  
-For example, in order to display text containing japanese characters,  
-you would install japanese fonts and run "java -Duser.language=ja ...".  
-You can combine font sets: In order to display western european, greek  
-and japanese characters simultaneously, you would create a combination  
-of the files "font.properties" (covers ISO-8859-1), "font.properties.el"  
-(covers ISO-8859-7) and "font.properties.ja" into a single file.  
-??This is untested??  
-  
-  
-The interfaces java.io.!DataInput and java.io.!DataOutput have methods called  
-`readUTF' and `writeUTF' respectively. But note that they don't use UTF-8;  
-they use a modified UTF-8 encoding: the NUL character is encoded as the  
-two-byte sequence 0xC0 0x80 instead of 0x00, and a 0x00 byte is added at  
-the end. Encoded this way, strings can contain NUL characters and nevertheless  
-need not be prefixed with a length field - the C <string.h> functions  
-like strlen() and strcpy() can be used to manipulate them.  
-  
-  
-  
-  
-!!6.3 Lisp  
-  
-  
-  
-  
-  
-  
-The Common Lisp standard specifies two character types: `base-char' and  
-`character'. It's up to the implementation to support Unicode or not.  
-The language also specifies a keyword argument `:external-format' to `open',  
-as the natural place to specify a character set or encoding.  
-  
-  
-Among the free Common Lisp implementations, only CLISP  
-http://clisp.cons.org/  
-supports Unicode. You need a CLISP version from March 2000 or newer.  
-ftp://clisp.cons.org/pub/lisp/clisp/source/clispsrc.tar.gz.  
-The types `base-char' and `character' are both equivalent to 16-bit Unicode.  
-The functions char-width and string-width provide an  
-API comparable to wcwidth() and wcswidth().  
-The encoding used for file or socket/pipe I/O can be specified through the  
-`:external-format' argument. The encodings used for tty I/O and the default  
-encoding for file/socket/pipe I/O are locale dependent.  
-  
-  
-Among the commercial Common Lisp implementations:  
-  
-  
-!LispWorks  
-http://www.xanalys.com/software_tools/products/  
-supports Unicode.  
-The type `base-char' is equivalent to ISO-8859-1, and the type `simple-char'  
-(subtype of `character') contains all Unicode characters.  
-The encoding used for file I/O can be specified through the  
-`:external-format' argument, for example '(:UTF-8).  
-Limitations: Encodings cannot be used for socket I/O. The editor cannot edit  
-UTF-8 encoded files.  
-  
-  
-Eclipse  
-http://www.elwood.com/eclipse/eclipse.htm  
-supports Unicode. See  
-http://www.elwood.com/eclipse/char.htm.  
-The type `base-char' is equivalent  
-to ISO-8859-1, and the type `character' contains all Unicode characters.  
-The encoding used for file I/O can be specified through a combination of  
-the `:element-type' and `:external-format' arguments to `open'.  
-Limitations: Character attribute functions are locale dependent. Source and  
-compiled source files cannot contain Unicode string literals.  
-  
-  
-The commercial Common Lisp implementation Allegro CL, in version 6., has  
-Unicode support. The types `base-char' and `character' are both equivalent  
-to 16-bit Unicode. The encoding used for file I/O can be specified through the  
-`:external-format' argument, for example :external-format :utf8.  
-The default encoding is locale dependent. More details are at  
-http://www.franz.com/support/documentation/6./doc/iacl.htm.  
-  
-  
-  
-  
-!!6.4 Ada95  
-  
-  
-  
-  
-  
-  
-Ada95 was designed for Unicode support and the Ada95 standard library  
-features special ISO 10646-1 data types Wide_Character and Wide_String,  
-as well as numerous associated procedures and functions. The GNU Ada95  
-compiler (gnat-3.11 or newer) supports UTF-8 as the external encoding of  
-wide characters. This allows you to use UTF-8 in both source code and  
-application I/O. To activate it in the application, use "WCEM=8" in the  
-FORM string when opening a file, and use compiler option "-gnatW8" if  
-the source code is in UTF-8. See the GNAT  
-(  
-ftp://cs.nyu.edu/pub/gnat/)  
-and Ada95  
-(  
-ftp://ftp.cnam.fr/pub/Ada/PAL/userdocs/docadalt/rm95/index.htm)  
-reference manuals for details.  
-  
-  
-  
-  
-!!6.5 Python  
-  
-  
-  
-  
-  
-  
-Python 2.  
-(  
-http://www.python.org/2./,  
-http://www.python.org/pipermail/python-announce-list/2000-October/000889.html,  
-http://starship.python.net/crew/amk/python/writing/new-python/new-python.html)  
-contains Unicode support. It has a new fundamental data type  
-`unicode', representing a Unicode string, a module `unicodedata' for the  
-character properties, and a set of converters for the most important encodings.  
-See  
-http://starship.python.net/crew/lemburg/unicode-proposal.txt,  
-or the file Misc/unicode.txt in the distribution, for details.  
-  
-  
-  
-  
-!!6.6 !JavaScript/ECMAscript  
-  
-  
-  
-  
-  
-  
-Since !JavaScript version 1.3, strings are always Unicode. There is no  
-character type, but you can use the \uXXXX notation for Unicode characters  
-inside strings. No normalization is done internally, so it expects to receive  
-Unicode Normalization Form C, which the W3C recommends. See  
-http://developer.netscape.com/docs/manuals/communicator/jsref/js13.html#Unicode  
-for details and  
-http://developer.netscape.com/docs/javascript/e262-pdf.pdf  
-for the complete ECMAscript specification.  
-  
-  
-  
-  
-!!6.7 Tcl  
-  
-  
-  
-  
-  
-  
-Tcl/Tk started using Unicode as its base character set with version 8.1.  
-Its internal representation for strings is UTF-8. It supports the \uXXXX  
-notation for Unicode characters. See  
-http://dev.scriptics.com/doc/howto/i18n.html.  
-  
-  
-  
-  
-!!6.8 Perl  
-  
-  
-  
-  
-  
-  
-Perl 5.6 stores strings internally in UTF-8 format, if you write  
-  
-  
-use utf8;  
-  
-  
-at the beginning of your script. length() returns the number of  
-characters of a string. For details, see the Perl-i18n FAQ at  
-http://rf.net/~james/perli18n.html.  
-  
-  
-Support for other (non-8-bit) encodings is available through the iconv  
-interface module  
-http://cpan.perl.org/modules/by-module/Text/Text-Iconv-1.1.tar.gz.  
-  
-  
-  
-  
-!!6.9 Related reading  
-  
-  
-  
-  
-  
-  
-Tomohiro Kubota has written an introduction to internationalization  
-http://www.debian.org/doc/manuals/intro-i18n/.  
-The emphasis of his document is on writing software that runs in any locale,  
-using the locale's encoding.  
-  
-  
-  
-----  
-  
-!!7. Other sources of information  
-  
-  
-  
-  
-  
-  
-  
-!!7.1 Mailing lists  
-  
-  
-  
-  
-  
-  
-Broader audiences can be reached at the following mailing lists.  
-  
-  
-Note that where I write `at', you should write `@'. (Anti-spam device.)  
-  
-  
-  
-  
-!linux-utf8  
-  
-  
-  
-  
-  
-Address: linux-utf8 at nl.linux.org  
-  
-  
-This mailing list is about internationalization with Unicode, and covers  
-a broad range of topics from the keyboard driver to the X11 fonts.  
-  
-  
-Archives are at  
-http://mail.nl.linux.org/linux-utf8/.  
-  
-  
-To subscribe, send a message to majordomo at nl.linux.org  
-with the line "subscribe linux-utf8" in the body.  
-  
-  
-  
-  
-!li18nux  
-  
-  
-  
-  
-  
-Address: linux-i18n at sun.com  
-  
-  
-This mailing list is focused on organizing internationalization work on  
-Linux, and arranging meetings between people.  
-  
-  
-To subscribe, fill in the form at http://www.li18nux.org/  
-and send it to linux-i18n-request at sun.com.  
-  
-  
-  
-  
-!unicode  
-  
-  
-  
-  
-  
-Address: unicode at unicode.org  
-  
-  
-This mailing list is focused on the standardization and continuing development  
-of the Unicode standard, and related technologies, such as Bidi and sorting  
-algorithms.  
-  
-  
-Archives are at  
-ftp://ftp.unicode.org/Public/!MailArchive/,  
-but they are not regularly updated.  
-  
-  
-For subscription information, see  
-http://www.unicode.org/unicode/consortium/distlist.html.  
-  
-  
-  
-  
-!X11 internationalization  
-  
-  
-  
-  
-  
-Address: i18n at xfree86.org  
-  
-  
-This mailing list addresses the people who work on better internationalization  
-of the X11/XFree86 system.  
-  
-  
-Archives are at  
-http://devel.xfree86.org/archives/i18n/.  
-  
-  
-To subscribe, send mail to the friendly person at i18n-request at  
-xfree86.org explaining your motivation.  
-  
-  
-  
-  
-!X11 fonts  
-  
-  
-  
-  
-  
-Address: fonts at xfree86.org  
-  
-  
-This mailing list addresses the people who work on Unicode fonts and the  
-font subsystem for the X11/XFree86 system.  
-  
-  
-Archives are at  
-http://devel.xfree86.org/archives/fonts/.  
-  
-  
-To subscribe, send mail to the overworked person at fonts-request at  
-xfree86.org explaining your motivation .  
-  
-  
-  
-----  
+Describe [HowToUnicodeHOWTO ] here.