Penguin
Note: You are viewing an old revision of this page. View the current version.

A system which tracks versions, usually of SourceCode, but potentially any digital content. It allows multiple people to work on a project, automatically coordinating everyone's changes with everyone else's — much like the wiki. Having a project versioned makes it possible to roll back erroneous changes, find out when bugs or errors were introduced and by whom, maintain multiple branches of the same thing simultaneously without conflicts, and many things more. Different VersionControlSystems have different sets of features in terms of multiple file handling, MetaData versioning, distributed storage, security goals, etc.

Popular VersionControlSystems include

See EricRaymond's essay Understanding Version Control for an introduction to the main concepts and a comparison of the major systems. The main stages of development can be summarized as follows:

  1. The design of the earliest systems revolved around versioning a single working copy, directly edited by all users. To prevent attempts at simultaneous modification of a single file, editing was not allowed without checking files out, which only one user at a time could do for any given file.

    Having to give each user access to the same machine and FileSystem in order to work on code was natural at the time these systems were design, in the mainframe era, but today would obviously be a problem. Also, the requirement to check files out causes a lot of friction, even in normal operations, since everyone has to wait on one another – not to mention that someone might forget to check a file back in before leaving on vacation.

  2. The next evolutionary step was to decouple the repository from the working copy, of which there may then be many. The exemplar in this class of systems, known as centralised VCSs, is CVS. It lifts the obvious restrictions of earlier systems with a design in which the repository is mediated by a server. Multiple users can collaborate by checking out their own private working copy of the versioned tree each, but this no longer implies a locking. Checking in changes is simply blocked if someone else has already checked in other changes in the meantime. The latecomer has to manually incorporate the upstream changes before being allowed to check in their own changes.

    This works reasonably well, so CVS was the de facto standard for a decade. However, its single-repository nature, subsequently adopted by most following major systems, perpetuates problems harking back to the earlier model and adds new ones.

    Checking in changes under such a system requires a network connection, as do most operations related to the project history; networked operations are inevitably slow. (Systems like SubVersion tries to selectively speed up some of these operations).

    Anything checked is is always public; this means one has to be very careful about the state of commits and makes it impossible to touch up history (eg. to fix common mistakes like forgetting to include a new file in a commit). This also makes branches a big deal; no matter how experimental, the commits have to be published. Also, branch names are forced into a global namespace, and every experiment ends up getting published visibly.

    Branching is problematic for more reasons too. Most of these systems do not support merging branches very well: after you do it once, the changes from the merged-in branch are mixed in without any tracking, so if you later attempts to merge the same branch will result in lots of conflicts. This makes it very difficult to keep branches in synch; but the more branches fall out of synch, the more effort it takes to merge them. All this adds up to a large barrier, psychological and otherwise, against branching.

    Finally, the single-repository nature means that anyone who wants the safety of revision control needs to have write access to the same repository. And since branching is badly supported, everyone with access to the repository is generally going to be working on the same trunk. This means write access has to be given out selectively to competent, resulting in political headaches within projects, while outsiders are forced to create their patches in an unversioned ghetto.

  3. The solution to all this was to not only give each collaborator a separate working copy, but a separate repository also. This class of system is known as DistributedVersionControlSystems. The technical basis that allows this is algorithmic merging: 3-way merging allows combining non-overlapping changes automatically, and merge point tracking allows repeatedly merging branches without unnecessary conflicts.

    Since each collaborator has their own repository and can make commits, the effect is that everyone has their own private branch, with full versioning for local changes, and these branches can easily published and merged. Actually, each collaborator often has several local branches – since merging is easy and branches never need be published, it is easy to create short-lived branches for experiments or tests, to use them as a general workflow aspect (eg. start a new branch for every separate bug fix), or for any other purpose, whether intended for public consumption or not.

    Everyone has full offline access to the project history, and all repository operations (except pushing or pulling changes, obviously) take place at full local disk speed.

    All this immensely accelerates collaborative development and removes the political headaches surrounding commit access.

    From this point on, the evolution of VCSs has basically been about developing increasingly sophisticated architectures for handling merging.

Each development took a long time, much of which was spent simply coming to recognize that there was a problem that needed to be solved. Many of the new developments remained controversial to adherents of older ways of doing things, even to the present day.


CategoryVersionControl