Penguin

Serialisation refers to conversion of in memory data structures to an easily transmittable/storable format which can be deserialised to an exact copy of the original data structure. The name refers to the fact that while the in memory data structure is usually nested and allows direct access to any of its parts, often scattered over memory, the serialised form is laid out sequentially, lacking the tree of pointers necessary to easily refer to any part of the structure, but collected in a single stream of data that is easy to treat as a unit. Transforming one form into the other is a mostly straightfoward process, except when it isn't, such as when dealing with circular references.

If serialisation is mainly used to persist data across invocations of a program, it is also referred to as persistence.

Within the confines of a single ProgrammingLanguage, many good solutions are available:

Perl
Storable and FreezeThaw
PHP
serialize()
Python
Pickle (3 different versions)

In contrast, solutions spanning multiple languages tend to be domain-specific.

One such domain is marshalling. This is a term used in the context of RPC, where the parameters passed to a function and the value returned from it have to be serialised to be transmitted across a network connection. Special care has to be taken about arguments that are passed by reference. Some form of IDL is commonly used to describe how a remotely called function's arguments are to be marshalled. Some languages also use an IDL for general serialisation. There are a number of standardised binary Seralisation? methods:

  • BER/DER, the ASN.1 encodings. These generate fairly compact, portable seralisations.
  • IIOP?, used in CORBA. This requires the CORBA IDL to de-/serialise.
  • SUNXDR? was invented at Sun for SunRPC?, which is used for things such as NFS.

In recent times, it has become fashionable to use XML as a Serialisation format for RPC, as seen in SOAP, which is mostly pursued by IBM and MicrosoftCorporation, and in XMLRPC?. This is a consequence of the challenge that language neutral Serialisation has always posed. XML's verbosity is beneficial for debugging but a pig on bandwidth.

YAML is another text based serialisation format designed to be language agnostic and easily human-readable and -writable. Implementations are available for most "agile" languages. It has been designed with an eye on use for configuration files, but is not restricted to that purpose.

Another such format is JSON.


CategoryProgramming