Penguin
Note: You are viewing an old revision of this page. View the current version.

Since disks internally use 512 byte sectors to store data anyway, most FileSystems use some multiple of this size (typically 2,048, 4,096, or 8,192 bytes) as the smallest unit to store a file. This means that regardless of its size, a file will always occupy the next largest multiple of the cluster size it can be fit into. Obviously, file sizes aren't often exact multiples of the cluster size, and the larger the cluster size is, the more space goes wasted. At worst, an entire cluster may be allocated to store a single byte.

It may seem that this would make it desirable to choose cluster size as small as possible, but it is not so. The smaller the clusters are, the more of them there are on a disk. This means you need to store much more MetaData to keep track of their use. With modern HardDisks having hundreds of gigabytes of space, it can easily mean having to keep track of hundreds of megabytes of MetaData to organize their use. This can have a dramatic impact on performance, since reading and writing a lot of data also means reading or writing a lot of MetaData, which is typically situated at least a small ways apart from the data.

Throughout the years, numerous attempts have tried to increase proximity of data and MetaData to combat the effects of the drastic increase in MetaData. The latest attempt by modern FileSystems is by using BTrees, which seems to work very well, at the unfortunate cost of highly increased fragility of the MetaData structures.

The default cluster size for Ext2/Ext3 is 4096 bytes (ie 8 sectors), but may be changed at FileSystem creation time. Microsoft tried to overcome inadequacies of their FAT FileSystems by using huge cluster sizes of up to 64k.