Endianness - Waikato Linux Users Group

Note: You are viewing an old revision of this page. View the current version.

“There are 10 kinds of people in the world; those who know binary and those who don't.” – Seen on the net.
“There are 01 kinds of people who know binary; little-endians and everyone else.” – zcat(1)

The order of bytes in a word. The names “Big-endian” and “little-endian” originate from the book “Gulliver’s Travels”, where a tribe of tiny people divide themselves into two factions in a ReligiousWar over which end they should cut their eggs open at – the big end, or the little end. In computer terms, big-endian CPUs store the most significant byte at the lowest byte address of a word and progress to less significant bytes at higher addresses, while little-endian machines start with the least significant byte and store progressively more significant ones. A C program demonstrates this:

#include <stdio.h>
int main( void ) {
   int x = 0xaabbccdd;
   int i;

   for(
      i = 0 ;
      i < sizeof( int ) ;
      printf( i ? "%x " : "%x", ( (unsigned char*) &x )[ i++ ] )
   );

   printf( "\n" );

   return 0;
}

On an x86 system, which is a little-endian architecture, this will print dd cc bb aa, showing that the bytes aa bb cc dd were stored in “reverse” order. On a SPARC it prints the more intuitive aa bb cc dd.

Little-endian was often used in 8-bit machines, since it makes in-memory addition easy for them: load the first byte, increment it, and if it overflows fetch the next byte and increment it. Larger architectures that had no 8-bit roots to begin with didn’t care and were usually big-endian. This explains why nowdays, almost everyone except x86 uses big-endian order: SGI’s MIPS systems, IBM’s RS6000, POWER and PowerPC architectures, SunMicrosystems’ SPARC, and others are all big-endian systems. The DEC Alpha was a notable exception, it used little-endian order.

Besides the two obvious and common orders, there are also bizzare ones like the PDP’s 02 01 04 03, and various others, sometimes dubbed “middle endian.” A possible explanation is that there was an attempt at backwards compatibility in the transition from 1-byte to 2-byte words, but not in that from 2-byte to 4-byte words.

Humans using a left-to-right writing system with arabic numbers are big-endian: when we write “1234,” the 1 means one thousand, the 4 means 4, so we write digits in order from the most to the least significant. Humans using right-to-left writing systems with arabic numbers, such as Arabic and Hebrew, are little-endian, because they approach a number such as 1234 from the right, ie from the least significant digit first. Funnily enough, this means that even though the digits have the same geometrical sequence on paper, they are little-endian in one writing system and big-endian in the other.

Another peculiarity is found in spoken German (or written German with spelled-out numbers): tens and ones are arranged in little-endian order, f.ex. “einundzwanzig” means “one-and-twenty.” However, the more significant digits are arranged in big-endian order: “dreihunderteinundzwanzig” means “three-hundred-and-one-and-twenty”. This is also sometimes seen in older English usage, e.g. “four-and-twenty blackbirds baked in a pie”.

Ultimately, it comes down to reading order versus logical consistency. Big-endian matches the order in which most of us read things from left-to-right, while little-endian simplifies the relationship between three different numberings:

that of the binary digits of an integer—call this i
that of bits within a byte—call this b
that of bytes within a word—call this B

For instance, the bits in a byte are numbered 0-7. A byte can hold an unsigned integer in the range 0 .. 255. Does the bit numbered 0 represent the 2**0 digit, bit 1 represent 2**1, etc? It might or might not—this is a convention defined by the CPU architecture. And what happens with, say, a two-byte integer? Does byte 0 hold bits 0-7 and byte 1 hold bits 8-15, or vice versa? This is where the endianness of the CPU architecture comes in.

Suppose the numbering of a bit in an N-byte integer is j. In little-endian architectures, the following are always true:

b = j mod 8
B = j div 8
i = j

In big-endian architectures, the situation is more complicated. For example, in the Motorola 680x0 architecture, the first and third of the above equations still hold, while the second one becomes

B = (8 * N - 1 - j) div 8

while with the IBM PowerPC, all three equations are different:

b = 7 - j mod 8
B = (8 * N - 1 - j) div 8
i = 8 * N - 1 - j

As you can see, the simplest form of relationships is in the little-endian case.