TYPE OF PROPOSAL: poster
TITLE: Optical Music Recognition: Stroke Tracing and Reconstruction of
Handwritten Manuscripts
KEYWORDS: document imaging, pattern, music

AUTHOR: Kia Ng
AFFILIATION: University of Leeds
E-MAIL: kia@kcng.org

CONTACT ADDRESS: University of Leeds, Dept of Music / Sch of 
Computing, Leeds LS2 9JT, UK.
FAX NUMBER: +44-113-233-2586
PHONE NUMBER: +44-113-233-2572



Nowadays, the computer is an important instrument in music. It can not only
generate sound (audio synthesis) but is also able to perform a wide range of
time consuming and repetitive tasks, such as transposition and part
extraction, with speed and accuracy.  However, a score must be represented in
a machine-readable format before any operation can be carried out. Current
input methods, such as using an electronic keyboard, are laborious and
require human intervention. Optical Music Recognition (OMR) provides an
efficient and automatic method to transform paper-based music scores into a
machine representation.

The potential benefits of an Optical Music Recognition system were recognised
over thirty years ago.  A robust OMR system can provide a convenient and
time-saving input method to transform paper-based music scores into a machine
readable format for widely available music software, in the same way as
Optical Character Recognition (OCR) is useful for text processing
applications.

In this paper, we present a brief survey of Optical Music Recognition
developments and currently available commercial packages for printed music
scores, followed by an introduction to handwritten manuscript recognition
with a discussion of the obstacles associated with it.  We then discuss our
framework design and low-level pre-processing modules, and illustrate the
modules with example inputs and outputs.  Our prototype takes a digitised
music-score grey image (300 d.p.i. with 256 grey) as input.  An iterative
thresholding method is used to obtain a threshold value, and the image is
binarised.  Using the black and white image, the skew of the input image,
usually introduced during digitisation process, can be automatically detected
by reference to the music-typographical features of the roughly parallel
stave lines, and the image is deskewed by rotation.  This is follow by layout
analysis to determine the general normalisation factors including the stave
line thickness.

Stave line thickness and inter-stave-line spacing are used as normalisation
factors, as stave lines form a grid system for musical symbols, of which most
are related to the geometry of the staves: for example, the height of a note
head must approximate the distance between two stave lines plus the thickness
of the two stave lines.  Hence, the sum of average distance between two stave
lines and the average stave-line thickness form the fundamental unit used by
the classification process.

This paper focuses on our stroke-based segmentation approach using a hybrid
of edge detection technique and mathematical morphology. We make use of the
edges, curvature and variations in relative thickness to extract and segment
the musical objects from the underlying grid (stave lines) and disassemble
them into lower-level graphical primitives, such as vertical and horizontal
lines, curves, ellipses and others.  These primitives are classified using a
k-Nearest-Neighbour (kNN) classifier with simple features such as the aspect
ratio, normalised width and height and other feature vectors.  After
recognition, these sub-segmented primitives need to be reconstructed.

As with other forms of optical document analysis, such as OCR, imperfections
introduced during the printing and digitising process that are normally
tolerable to the human eye can often complicated the recognition process.
Musical notation is highly interconnected, and features may connect
horizontally (for example beams), vertically (for example chords) or
sometimes be overlaid (for example, slurs cutting through stems or bar
lines).  Furthermore, when symbols are grouped (beamed), they may vary in
shape and size: for example, consider the shape of isolated semiquavers and
the many possible appearances of four-semiquaver groups.  To resolve any
ambiguities, contextual information is required, for example, a dot,
classified by the kNN classifier, could be an expression sign or a duration
modifier depending on its relative position with respect to a note-head
nearby.

After classification and reconstruction, basic musical syntax and a number of
high level musical analysis techniques are employed to enhance the
recognition, the reconstructed results are re-examined in the light of the
analysis and corrections and enhanced guesses are made if necessary.  We
attempt to detect global information such as the key and time signatures and
use them to provide evidence in the detection and correction of possible
mis-recognition.  It is clearly important that the OMR system should
accurately detect the time signature of a piece or section of a piece of
music in order to produce an accurate representation of the musical score.

Many factors in the input source (for example, resolution, contrast, and
other inherent complexities) could influence the performance of the automatic
recognition and transcription process, especially for handwritten
manuscripts.  In order to provide flexible and efficient transcriptions, we
are designing a graphical user-interface editor with built-in basic musical
syntax and some contextual intelligence to resolve uncertainties, assist the
transcription and output.  The default output format is currently set to
ExpMIDI, which is compatible with the standard MIDI file format, and is
capable of representing and storing expressive symbols such as accents,
phrase markings and others.

We believe that the application of domain knowledge essential for complex
document analysis and recognition and it is particularly important in
hand-written-manuscript recognition since it takes years of experience for a
trained copyist or engraver to intelligently decipher poorly or
inconsistently written scores.