Penguin
Note: You are viewing an old revision of this page. View the current version.

There are a number of commonly used ways to represent tree structures in a database, each suited for different access patterns. This page attempts to outline the most common ones, along with what operations are easy and hard to do from a database.

Adjacency List

The Adjacency List model is the most common and most obvious way of representing a tree structure in a database. Unfortunately, retrieving all descendants of a node, usually the most common operation, is difficult and expensive, requiring many queries on large trees.

 Columns   parent_id 
 Cheap operations   Retrieve parent, Retrieve children, Insert, Change parent, Delete 
 Expensive operations   Retrieve tree, Retrieve descendants, Retrieve ancestors 

Adjacency list with root ID

This method is similar to the Adjacency List method, with the exception that it stores the ID of the root of the tree in addition to the ID of the direct parent node. When storing multiple trees in a table, such as threads in a discussion board, this makes retrieving the entire tree easy. Retrieving descendants of a small part of the tree remains difficult, however.

 Columns   root_id, parent_id 
 Cheap operations   Retrieve parent, Retrieve children, Insert, Change parent, Delete, Retrieve tree 
 Expensive operations   Retrieve descendants, Retrieve ancestors 

Ancestor List

This method works by storing in each node a complete list of ancestors, from the root down. The list is typically either stored as a delimited string (eg, "1,24,500") or as an array of IDs. For ease of querying, the id of the node itself is usually included as the last element in the list. This method makes most tree operations easy, but requires the ancestor list for the parent node to be retrieved before a node can be inserted. On deep trees, this method also results in very large lists, resulting in a lot of storage overhead.

 Columns   ancestors (array or string) 
 Cheap operations   Retrieve parent, Retrieve children, Insert, Change parent, Delete, Retrieve tree, Retrieve descendants, Retrieve ancestors 
 Expensive operations   None 

Nested Set

This method works by assigning a 'left' and 'right' ID to each node. Nodes are numbered in depth-first order, so every node is guaranteed to have a 'left' value greater than the 'left' of all its ancestors, and a 'right' value greater than the 'right' of all its ancestors, and sorting a (sub)tree by its 'left' values will return it in depth-first order.

This is also the only method listed here that allows nodes to have an explicit internal ordering of children of each node other than any external ordering provided.

Note that 'retrieve parent' and 'retrieve children' are listed as expensive operations because they are non-optimal. However, they are still no less efficient than 'retrieve ancestors' and 'retrieve descendants'.

'Delete' is listed as a cheap operation under the assumption that 'left' and 'right' IDs do not need to remain contiguous.

 Columns   left, right 
 Cheap operations   Delete, Retrieve tree, Retrieve descendants, Retrieve ancestors 
 Expensive operations   Retrieve parent, Retrieve children, Insert, Change parent 

Nested Set with Adjacency List

This method functions as a combination of the Nested Set and Adjacency List methods, conferring most of the benefits of each.

 Columns   parent_id, left, right 
 Cheap operations   Retrieve parent, Retrieve children, Delete, Retrieve tree, Retrieve descendants, Retrieve ancestors 
 Expensive operations   Insert, Change parent