Penguin

Differences between version 23 and predecessor to the previous major change of UnicodeNotes.

Other diffs: Previous Revision, Previous Author, or view the Annotated Edit History

Newer page: version 23 Last edited on Thursday, December 28, 2006 4:53:32 am by JohnMcPherson Revert
Older page: version 22 Last edited on Monday, May 8, 2006 10:10:43 am by CraigBox Revert
@@ -5,11 +5,15 @@
  
 With computers taking over the world, something called "unicode" was developed that (attempts to) assign a unique number to every character in every language. 
 So Latin Y with acute has the code 0x00FD, while a Latin dotless i has the code 0x0131. UTF-8 (see the utf-8(7) man page) is a method of encoding the unicode numbers in a backwards compatible way with [Legacy] systems that use ascii or Latin characters. 
 The first 256 unicode characters are identical to Western Latin, of which the first 127 are identical to [ASCII]. All [ASCII] characters are represented exactly the same way in UTF-8. See the UTF-8 FAQ, which has the definitive version online at [http://www.cl.cam.ac.uk/~mgk25/unicode.html|http://www.cl.cam.ac.uk/~mgk25/unicode.html]. 
-  
 ---- 
-!!Terminals 
+!!!Creating accented characters  
+QWERTY keyboards for English speakers obviously don't have separate keys for accented characters like other languages do. However, there are still relatively easy ways to get characters into your applications:  
+# Use a ‘character-picker’ applet or similar in your desktop environment. For example, in GNOME you can add a panel applet called "character palette" that offers a customisable variety of common non-ascii characters that you can click on to insert into your clipboard.  
+# Use a "compose" key. For example, in GNOME's keyboard preferences settings you can assign a key to be the Compose key. If the Right Alt key is the compose key, then pressing right alt + ' will make the next character have an ' accent above it, if that is a valid combination. Eg "Compose+`, e" results in è, "Compose+~, n" results in ñ (you have to press compose + shift + ` to get the ~), and so on.  
+----  
+! !!Terminals 
  
 ! Testing your terminal 
 To test if your terminal already supports UTF-8, try running the following command: 
 <pre> 
@@ -112,10 +116,8 @@
 <verbatim> 
 urxvt -fn "xft:Bitstream Vera Sans Mono:pixelsize=16" 
 </verbatim> 
  
-----  
-!!!Terminal programs  
 !!The "less" program 
  
 "less" looks for an environment variable to determine what is a printable character. The following tells less to display characters for utf-8: 
  $ LESSCHARSET=utf-8 
@@ -148,9 +150,9 @@
 To convert between unicode (eg utf-8 or utf-16), use the iconv command. The -t argument is the "to" encoding and -f is the "from" encoding. For example 
  $ iconv -t utf-8 -f iso-8859-1 < somefile.txt > somefile-utf8.txt 
 This is a front end to the iconv(3) library (libiconv) that many recent programs use for handling character encoding and conversion. 
  
-  
+----  
 !!!Mail clients 
 Mozilla has great charsets support, being so new. Netscape >= 4.05 has some support, but does have troubles. Mutt can do utf-8, but I haven't been able to get it to show the headers summary correctly. I don't know about kmail, balsa, or evolution, but my guess is that they are new enough to have good support. 
  
 !!!X Fonts 
@@ -164,11 +166,12 @@
  
 The [convmv|http://j3e.de/linux/convmv/] utility lets you do a bulk conversion of character sets in file names: 
  
 <verbatim> 
-./convmv -r -f latin1 -t utf8 --notest /array/images/mp3/albums/*</tt>  
+./convmv -r -f latin1 -t utf8 --notest /array/images/mp3/albums/* 
 </verbatim> 
  
 fixed my problem. Thanks to [the Unicode/charsets section of the Samba HOWTO|http://www.samba.org/samba/docs/man/Samba-HOWTO-Collection/unicode.html]. 
  
+For the "opposite" problem --- that is, you have a windows machine with a share that is samba-mounted onto a linux client, and the non-ascii characters are getting munged --- you need to give samba some mount options: <tt>iocharset=utf-8</tt> tells samba to use a utf-8 encoding when presenting filenames to linux applications, and <tt>codepage=<i>foo</i></tt> tells samba which encoding the windows machine is using. If your accents are getting screwed up, try <tt>codepage=850</tt>.  
 ---- 
 CategoryNotes