Differences between version 3 and previous revision of MailBox.
Other diffs: Previous Major Revision, Previous Author, or view the Annotated Edit History
Newer page: | version 3 | Last edited on Sunday, October 17, 2004 6:10:15 am | by AristotlePagaltzis | Revert |
Older page: | version 2 | Last edited on Sunday, October 17, 2004 5:23:08 am | by AristotlePagaltzis | Revert |
@@ -2,13 +2,19 @@
There are many different physical formats for MailBox~es, which typically use some variation on the following schemes:
A FlatFile:
- [MBox] is the most common example of this, where all
the messages in a MailBox are encoded into
a single file.
+ All
the messages in a MailBox are store in
a single file.
+ [BSD]'s and [Solaris]' [MBox] format is the most common example
.
One file per [Email]:
- [MH],
and MailDir are common examples
of this scheme, where
the mailbox is a directory
, and each
message is encoded as a
single file.
+ [MH] and MailDir store each [Email] in a file
of its own.
+ This scales much better than
the typical FlatFile format
, as it is easier to skip from message to
message.
+ It
is also more robust against corruption, since mishaps in any
single file can only affect one message at most.
+ The drawback is that large MailBox~es require opening a lot of files and may heavily tax your FileSystem.
+ An attempt is often made to solve this by the use of some kind of header index/cache,
+ but no two programs (or even versions of the same program) agree on the format they use
.
A DataBase:
MicrosoftExchange does this, as well as [DBMail] on [Unix]. It is often the backend of choice for WebMail systems, as well.
-There are, of course hybrid approaches, and, of course most of these approaches have
various workarounds (indexing, offset tables, header cache files) to overcome the performance problems that they each suffer
from. Alas, all of these workarounds and differing approaches tend to be application specific, which makes the vanilla formats more practical most of the time.
+There are, of course,
hybrid approaches, as well as
various workarounds for each approach
(indexing, offset tables, header cache files) to overcome the performance problems that it suffers
from. Alas, all of these workarounds and differing approaches tend to be application specific, which makes the vanilla formats more practical most of the time.
For a comparasion of several different schemes, have a read of [http://www.washington.edu/imap/documentation/formats.txt.html], or for an even more subjective "discussion", [http://slashdot.org/article.pl?sid=01/01/27/0138202] ;)