Penguin

A FileAllocationTable (FAT) is a map maintained by the FileSystem on a HardDisk to keep track of which DiskClusters a file has been stored in. If it spans several of them, they need not be sequential, but rather may be widely scattered over the disk as the situation requires.

It is also a synonym for the FileSystem family that uses it.

This is the only type of FileSystem supported by MS-DOS and the MS-DOS based Windows versions. It went through several iterations as typical capacity of storage media increased, each named after the bit size of an array entry in the FAT: FAT12, FAT16, FAT32. It has several design flaws that have a severe negative impact on performance and, worse, reliability.

FAT has been deprecated by MicrosoftCorporation. Since it has no provisions to store per-user permissions, NTFS has always been the default for the Windows NT line of OperatingSystems. Now that the NT descendant Windows 2000 has obsoleted the MS-DOS based Windows flavours, everyone is recommended to use that.

Structure of the FileAllocationTable

The FAT is organized as an array as large as there are clusters on the disk, each entry in the array storing the number of a DiskCluster which contains the next piece of a file. This way, the FileSystem driver can "hop" from cluster to cluster, collecting the pieces of the file. In the original 12-bit version, FAT12, each entry can have one of the following values:

|^Value|^Meaning |000 | The cluster is unused and available |001 - FF6 | The cluster number of the next cluster in the file. |FF7 | The cluster has a bad sector in it and will not be allocated |FF8 - FFF | The last claster of a file

See FAT12, FAT16, and FAT32 for (some) details on the changes during the evolution of FAT.

Filenames under FAT

Until late in its history, a directory entry in FAT consisted of 8 characters for the name of the file and 3 characters for its "extension".

In the MS-DOS 1.0 design of FAT12, there was no support for subdirectories. You could only store files in the root directory of a drive, which had space for 256 entries. The common media at that time were 180Kb floppies.

In order to better support HardDisks, support for a hierarchical directory structure was hacked into MS-DOS 2.0 along with the addition of FAT16. For a long time afterwards, nothing changed.

At long last, with the introduction of Windows 95, MicrosoftCorporation added another ugly hack to FAT16 that allowed multiple directory entries to be considered as one. In this flavour, directory entries that have a previously illegal combination of flags set are considered to belong to the immediately previous valid entry, and contain its LFN, ie "long filename", piecemeal as a series of Unicode UCS-2 characters. Since directory entries are 32 bytes long and long filenames may be up to 256 characters long, several of these LFN entries may exist per file. The directory entry they belong to contains its file's so-called short name.

Around that time, HardDisks started to reach the limits of the FAT16 storage capacity. The final version of this FileSystem, FAT32 was hacked into Windows 95B. The Windows NT line did not have support for this new FAT derivative until Windows 2000 came about.

Problems

Insufficient redundancy

A FAT is unreliable. A single bit error may lead to inconsistencies with several files - they may become crosslinked, with one of the chains pointing to cluster in the other chain. Since allocation of disk space happens in the same step as assigning it to a file, a crash can easily lead to unduly or incompletely allocated space. A crash during rewriting or deleting a file can lead to the a part of the existing chain persisting, without any file pointing to it, thus leaving it inaccessible. Microsoft tried to mitigate these problems by storing a backup version of the table on the partition, but only managed to exacerbate the problem by requiring a fragile procedure to succeed multiple times. In addition, there is no indication of which copy is authoritative, when they differ.

Better FileSystems manage disk space using a separate bitmap for allocation of clusters and Inodes for assigning them. The latest in advances are attempts to replace all of these with BTrees, as in ReiserFS.

FAT location

The entire FileAllocationTable is located en bloc at the beginning of the disk. This means that any access to file MetaData requires moving the HardDisk heads a potentially far distance away. It would have been a better choice to at least place the table in the center of the disk.

Better FileSystems spread their MetaData around the disk, so it is relatively close to the files it belongs to.

Storage Allocation

FAT uses a FirstFit method to allocate DiskClusters to files. According to the "MS-DOS Operating System Programmer's Reference Manual" (for DOS 2.0), "this permits the most efficient utilization[sic? of disk space because clusters made available by erasing files can be allocated for new files." Unfortunately, this also causes severe file fragmentation. To combat fragmentation, defragmentation programs were written - first by third party vendors, but later also by MicrosoftCorporation themselves. These move DiskClusters around so that all files end up in one piece , one after the other, at the beginning of the disk. Unfortunately, FirstFit strategy means that any grown file's new clusters will now end up behind the mass of allocated ones, causing fragmentation to affect performance even more gravely.

Better FileSystems avoid these problems by spreading files around, so there is space left "behind" a file for when it has to be grown. Some also store small files right inside their MetaData structures. The latest in advances are attempts to keep both MetaData and the files themselves in a BTree, as in ReiserFS, thus ensuring that related pieces of data end up close to each other.

Long filename support

This feature's late addition had multiple unpleasant effects:

  1. When iterating over the list of files in a directory, both the short and long name of a file are reported, separately, so care must be taken to avoid processing files twice.

  2. MS-DOS 7.0 was merely hacked to keep the LFN entries intact, but does not use them. Only the native 32-bit FAT driver used by Windows contains code to manipulate them, and provides APIs to do so. Therefor, DOS based programs cannot even be aware of the existence of LFNs unless running in a DOS box under Windows.

  3. A lot of system maintenance utilities from before the release of Windows 95 considered the new LFN entries to be invalid crud as per the previous spec and either refused to work, or, worse, removed them. Since a lot of the system configuration of a Windows 95 installation refers to files by their long names, it won't even boot further than MS-DOS without them. A lot of money was made from selling tool upgrades around that time.

Tips

Use a disk Cache. Even a tiny Cache of a few hundred Kb will often have a dramatic effect when using a FAT FileSystem, due to a synergy of effects of its flaws:

  1. The MetaData tightly packed in a single block, which makes it an ideal candidate for caching.
  2. It is accessed very frequently, which keeps it in cache.
  3. It is often far away from the data itself, so serving it from the Cache saves huge amounts of HardDisk head movements.

Beware though: running the Cache in WriteBack mode will further jeopardize the reliability of the already fragile FAT. Your best bet is to stick with WriteThrough.

Also, never leave the disk Cache active while running system maintenance utilities such as a defragmentation tool.