Optical Music Recognition

While studying at the University of Waikato I was working in the field of Optical Music Recognition, originally using software called CANTOR developed by David Bainbridge.

This involves the problems related to taking a scanned sheet of music, which a computer only recognises as an image, and working out the musical semantics that the image represents.

My thesis entitled “Coordinating Knowledge to Improve Optical Music Recognition” was successfully defended in August 2006. A version of this thesis suitable for viewing online is available for download. Contact me if you would prefer a version that is more suitable for printing (no coloured text, no hyperlinks on figures and references, etc).

I am currently (July 2007) in the process of packaging up the source code created for this process and hope to release it soon. Check back in a week!


A brief outline of the problems, can be found in this paper (PDF, 340 kB), presented at the 2001 NZ Computer Science Research Students' Conference.
There are also slides available.

Or view my research proposal (PDF, 170kB).

Graphical Examples from Current Prototype

Here are some examples of output from the pattern recognition stages based on the design described below. Note that these results are from a prototype system that is still work-in-progress.

As of early 2003, feedback from the Assembly stage is used to correctly identify some objects that were previously un-classified. slides (PDF, 280kB).

Very General Idea

CANTOR currently passes execution through various stages of processing (see figure below). However, one problem with this approach is that any errors made in the first stages will increase through the later stages, or at least cause problems.

Current framework
Framework of most currently existing OMR systems (at least until recently).

  What we would like to do is to be able to use facts and information gained in the later stages to provide feedback to the earlier stages, and to use this new context to (hopefully) minimise errors. For example, if the later stages detect an accidental that is isolated, we can guess that either the accidental is not really an accidental (and was mis-detected and is in fact some other object), or there is a note head to the right of it somewhere that we did not detect. Using this information, we could look again in that locality, but paying special attention to double check both accidentals and noteheads.

Proposed framework
What we would like to do.

Previous Work

At the University of Canterbury I completed my honours project in 1999 on musicians performing while reading music from a computer. PDF File (275k).

Last modified 18 Jul 2007. Email me - jrm @