Differences between version 12 and predecessor to the previous major change of UnicodeNotes.
Other diffs: Previous Revision, Previous Author, or view the Annotated Edit History
Newer page: | version 12 | Last edited on Thursday, August 19, 2004 9:44:29 pm | by JohnMcPherson | Revert |
Older page: | version 10 | Last edited on Tuesday, August 3, 2004 8:50:03 pm | by JohnMcPherson | Revert |
@@ -6,8 +6,9 @@
With computers taking over the world, something called "unicode" was developed that (attempts to) assign a unique number to every character in every language.
So Latin Y with acute has the code 0x00FD, while a Latin dotless i has the code 0x0131. UTF-8 (see the utf-8(7) man page) is a method of encoding the unicode numbers in a backwards compatible way with [Legacy] systems that use ascii or Latin characters.
The first 256 unicode characters are identical to Western Latin, of which the first 127 are identical to [ASCII]. All [ASCII] characters are represented exactly the same way in UTF-8. See the UTF-8 FAQ, which has the definitive version online at [http://www.cl.cam.ac.uk/~mgk25/unicode.html|http://www.cl.cam.ac.uk/~mgk25/unicode.html].
+----
!!Terminals
To turn on UTF-8 support in xterm (must have been compiled with utf-8 support, xterm version 145 or later), you must invoke xterm with a certain option:%%%
$ xterm -u8
@@ -22,11 +23,27 @@
If you don't specify a font when you start xterm, it will default to "fixed". This font is an "alias" - for the specific font that it maps to, look in /usr/X11R6/lib/fonts/misc/fonts.alias:
...
fixed -misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso8859-1
...
-You should change that to end with "-iso10646-1" instead, if you have a unicode version of the font installed.
+You should change that to end with "-iso10646-1" instead, if you have a unicode version of the font installed. If you don't have administrator rights, you can always make your own alias file, eg put
+ fixed -misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1
+into a file such as $HOME/.fonts/fonts.alias and then put this directory as the first directory on your font path:
+ xset +fp $HOME/.fonts/fonts.alias
+and now any new xterms should be able to display more non-ascii characters.
+
+
+Recent versions of xterm (eg v187) create accented Latin characters if you press a letter while the Alt key is pressed (eg alt+x gives an "ΓΈ" character). This will screw up any text-mode apps that expect alt to do something different (like emacs in text-mode). If you want the old-style behaviour, add
+ XTerm*metaSendsEscape: true
+to your $HOME/.Xdefaults file, or run
+ echo 'XTerm*metaSendsEscape: true' | xrdb -merge
+from a command line. (It will take effect for new xterms).
+
+Otherwise, if you do want alt+letters to create accented characters, and you want emacs (in text mode) to handle them, add:
+ (set-keyboard-coding-system 'utf-8)
+ (set-terminal-coding-system 'utf-8)
+to your $HOME/.emacs file.
+
-gnome-terminal from GNOME 2 seems to be different - it appears to restrict your choices of font to name only (ie you can't specify which encoding to use). I'll update this when I figure it out...
!locale
It's a good idea to set some environment variables to tell applications
what language and encoding you prefer. In NewZealand, you should do
@@ -40,11 +57,25 @@
The system administrator can make this the default by putting
LC_ALL=en_NZ.UTF-8
into /etc/environment (create it if it doesn't already exist).
+
+As well as getting utf-8 support, this has the added advantage that locale-aware applications
+will use the correct currency symbol, unit separator, date formatting etc for your locale.
+(Eg, MozillaMail will show dates as dd/mm/yyyy instead of the default US mm/dd/yyyy)
+
+If you don't have a friendly administrator or can't otherwise get root permissions, you should still be
+able to generate a locale if it isn't already installed:
+
+1. generate a locale giving an encoding, a locale, and an output directory:
+ mkdir -p ~/pkg/locale/ && localedef -f UTF-8 -i en_NZ ~/pkg/locale/en_NZ.UTF-8
+2. Set your LOCPATH environment variable to point to the correct directory
+ echo 'export LOCPATH=~/pkg/locale' >> ~/.bashrc
+ export 'export LC_ALL=en_NZ.UTF-8' >> ~/.bashrc
The program uxterm is a shell script wrapper that sets up the locale properly then runs xterm with the right parameters.
+----
!!The "less" program
"less" looks for an environment variable to determine what is a printable character. The following tells less to display characters for utf-8:
$ LESSCHARSET=utf-8