Penguin

Differences between version 3 and previous revision of MailBox.

Other diffs: Previous Major Revision, Previous Author, or view the Annotated Edit History

Newer page: version 3 Last edited on Sunday, October 17, 2004 6:10:15 am by AristotlePagaltzis Revert
Older page: version 2 Last edited on Sunday, October 17, 2004 5:23:08 am by AristotlePagaltzis Revert
@@ -2,13 +2,19 @@
  
 There are many different physical formats for MailBox~es, which typically use some variation on the following schemes: 
  
 A FlatFile: 
- [MBox] is the most common example of this, where all the messages in a MailBox are encoded into a single file. 
+ All the messages in a MailBox are store in a single file.  
+ [BSD]'s and [Solaris]' [MBox] format is the most common example
 One file per [Email]: 
- [MH], and MailDir are common examples of this scheme, where the mailbox is a directory , and each message is encoded as a single file. 
+ [MH] and MailDir store each [Email] in a file of its own.  
+ This scales much better than the typical FlatFile format , as it is easier to skip from message to message.  
+ It is also more robust against corruption, since mishaps in any single file can only affect one message at most.  
+ The drawback is that large MailBox~es require opening a lot of files and may heavily tax your FileSystem.  
+ An attempt is often made to solve this by the use of some kind of header index/cache,  
+ but no two programs (or even versions of the same program) agree on the format they use
 A DataBase: 
  MicrosoftExchange does this, as well as [DBMail] on [Unix]. It is often the backend of choice for WebMail systems, as well. 
  
-There are, of course hybrid approaches, and, of course most of these approaches have various workarounds (indexing, offset tables, header cache files) to overcome the performance problems that they each suffer from. Alas, all of these workarounds and differing approaches tend to be application specific, which makes the vanilla formats more practical most of the time. 
+There are, of course, hybrid approaches, as well as various workarounds for each approach (indexing, offset tables, header cache files) to overcome the performance problems that it suffers from. Alas, all of these workarounds and differing approaches tend to be application specific, which makes the vanilla formats more practical most of the time. 
  
 For a comparasion of several different schemes, have a read of [http://www.washington.edu/imap/documentation/formats.txt.html], or for an even more subjective "discussion", [http://slashdot.org/article.pl?sid=01/01/27/0138202] ;)