Blame: UTF-8 - Waikato Linux Users Group

Annotated edit history of UTF-8 version 2, including all changes. View license author blame.

Rev	Author	#	Line
2	AristotlePagaltzis	1	`[UTF-8] is a [Unicode] Transformation Format which encodes all [Unicode] code points in bytes. It requires anywhere from 1 to 6 of them for any given character. It is the most popular member of the [UTF] family, with good reason.`
1	AristotlePagaltzis	2
2	AristotlePagaltzis	3	Because of the distribution of code points in [Unicode], 90-95% of any typical Western language text requires only one byte per character, and no more than two bytes for almost 100% of non-punctuation characters. It is also directly backwards compatible with [ASCII], which only contains 128 characters: any [UTF-8] character with its high bit reset is identical in meaning to the corresponding [ASCII] character. It can therefore provide an easy transition to [Unicode] for applications/systems that are more used to dealing with 7-bit [ASCII] (for example, [Unix] TwoLetterCommands for stream processing were traditionally very byte-oriented). Unfortunately, it penalizes Eastern scripts with three bytes per character, so Asians generally prefer [UTF]-16.
1	AristotlePagaltzis	4
2	AristotlePagaltzis	5	`However, because it is byte-oriented, this encoding has important advantages that neither [UTF]-16 nor [UTF]-32 nor any other word-based encoding can offer:`
		6	`* [Endianness] in any environment is completely irrelevant to the meaning of a blob of [UTF-8]-encoded text.`
		7	`* No character other than NUL itself ever requires an all-zero-bits byte to be represented, so strncpy(3) and friends work just fine with [UTF-8] text.`
		8
		9	`Also, because of the rules for multibyte character encoding, odds are pretty good for being able to statistically determine whether a blob of text is [UTF-8] or not.`
		10
		11	`See also:`
		12	`* UnicodeNotes for hints on using [UTF-8] in [Unix]/[Linux]`
		13	`* utf8(7) for the gory technical details`
		14	`* [UTF]`

Last edited on Thursday, June 23, 2005 6:12:09 am by AristotlePagaltzis

Edit PageHistory Diff Info LikePages