Diff: UTF - Waikato Linux Users Group

Differences between current version and predecessor to the previous major change of UTF.

Other diffs: Previous Revision, Previous Author, or view the Annotated Edit History

Newer page:	version 2	Last edited on Thursday, June 23, 2005 6:12:48 am	by AristotlePagaltzis
Older page:	version 1	Last edited on Thursday, June 23, 2005 5:21:15 am	by AristotlePagaltzis	Revert

@@ -2,4 +2,7 @@

Description of a scheme for storing [Unicode] code points in some unit of storage. In general, any particular code point may require multiple units of storage to express it. A particular [UTF] scheme is referenced by accompanying the acronym with the unit's number of bits. An example, and the most common scheme in practice, is [UTF-8], which has a number of important advantages over all other [UTF] schemes.

[UTF] schemes are used because as of this writing, the full range of [Unicode] code points is 0x00000-0x10FFFF and requires a minimum of 24 bits to express (in practice, 32 bits are used instead), most of which are never used in most text. [UTF] schemes thus provide a highly specialized form of efficient compression that still keeps the resultant text easily processible.

+

+See also:

+* Tim Bray's essay [Characters vs. Bytes | http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF]