SeminarsUnless otherwise stated, all seminars are on Fridays at 4.00pm in Studio F at NYU (35 West 4th St., 8th Floor). If you or anyone you know would like to receive notification about these events, please subscribe to our mailing list by sending a blank email to: join-mtech-research@forums.nyu.edu Additionally, if you or anyone you know would be interested in speaking on or presenting a relevant topic, please contact Andy Sarroff: andy.sarroff@nyu.edu FUTURE SEMINARS *Click on topic to expand ISMIR 2008: Ninth International Conference on Music Information Retrieval Abstract: The International Conference on Music Information Retrieval (still referred to by its historical acronym, ISMIR) is the first established international forum for research on the organization of music-related data. Given the tremendous growth of digital music and music metadata in recent years, methods for effectively extracting, searching, and organizing music information have received widespread interest from academia and the information and entertainment industries. The purpose of ISMIR is to provide a venue for the exchange of news, ideas, and results through the presentation of original theoretical or practical work. By bringing together researchers and developers, educators and librarians, students and professional users, all working in fields that contribute to this multidisciplinary domain, the conference also serves as a discussion forum, provides introductory and in-depth information on specific domains, and showcases current products.
PAST SEMINARS Spring 2008 SeminarsSymposium- Interactive visualizations of music Abstract:
The computer-based mapping of musical information to intuitive visual representations has many applications to the composition, analysis, education and performance of music, to name a few examples. This workshop will feature talks and demonstrations highlighting some of the most recent advances in the field, and will provide a space for discussing the methodological, artistic and scientific implications of this line of work. Music Tech Open House
Abstract:
The Music Technology Program of New York University’s Steinhardt
School invites you to a showcase of the latest work by our students.
The event consists of posters, performances and interactive
demonstrations spanning a multitude of music and audio related
topics including: The Megalo Project
Abstract: A digital meta-sketchbook for composers The aim of the Megalo project is the creation of a digital meta-sketchbook for composers in order to document the creative process in a non-intrusive manner. Unlike paper sketchbooks, this program will be able to reconstruct the score, as well as its ancillary material, including keyboard improvisation, from any moment in its history. It will be possible to trace the history of any note in the score, from addition to deletion. The program will document the composer's usage of the program and record information from a MIDI keyboard by storing time-stamped entries into a MySQL database. By data-mining the database entries, it will be possible to gather detailed temporal information about the compositional process. This information can be used in a wide variety of ways; for example, a query might collect all the chords played in the last 2 hours. The full version of the project will analyze data collected from 20 composers while they each compose a short work for piano; through analyzing this data, I hope to develop generalized algorithms for describing patterns of compositional behavior. The Hearing Conservation Workshop
Abstract: As professionals and specialists in the audio, music, and acoustics fields, many of us are exposed to unhealthy sound levels. This workshop is targeted to students and professionals in the audio and music industries and will be a rich resource of information for anyone who is concerned about protecting their hearing. br> The workshop will begin at 3 PM and there will be audiologist on site. Pizza and beverages will be served. We encourage all members of the NYU community to attend. Acoustics and Audio Engineering at The Cooper Union
Abstract: The Albert Nerken School of Engineering at The Cooper Union is home to a laboratory dedicated to research and education related to sound. The Cooper Union Audio Lab (CUAL) hosts extensive scientific instrumentation, digital audio workstations, a portable 5000 watt, JBL sound system, musical instruments, a DJ console, and a 900 ft3 full-coverage anechoic chamber. Research conducted in the facility includes topics such as loudspeaker design, musical (instrument) acoustics, and active noise control. Also, the lab supports several courses that address related topics and frequently involve project work. Past student projects have included breaking wine glasses with sound, articulated speaker arrays, bass trap design, and optimization of acoustical materials. Dr. Abbott serves as the lab’s director and will talk about this unique facility and its activities. Motion Capture of Piano Performance: Do Movement Strategies and Touch Change Across Tempo?
Abstract: Piano educators disagree in how performers should develop the ability to perform scale passages evenly and dexterously at very fast rates. One side points out the importance of practicing these fast sequences at very slow speeds, while others insist on practicing at the intended fast tempo, easing the task by chopping up the passage into smaller segments. The main argument of the latter is that movement strategies change across different tempi -- as human gait changes from walking to running -- and wrong movements would be learned if fast passages were practiced slowly. Furthermore, how is the pianist’s touch affected by different tempo conditions? In this talk, I will present research on how to tackle such questions with modern optical methods of capturing human body movements. 12 skilled pianists played simple isochronous melodies at different tempi ranging from medium (500 ms inter-onset interval, IOI) to very fast (75 ms IOI and shorter). A three-dimensional passive motion capture system (Vicon V460, 250 frames per second) tracked the movements of small reflective markers glued on pianists’ finger joints, hand and wrist. Kinematic features of finger and hand movements, such as finger-key landmarks, key-bottom landmarks, finger peak height, or wrist rotation were computed from the motion trajectories. All measures changed considerably with increasing performance rate suggesting that indeed pianists adapt their movements to the required tempo. I will discuss the results in the light of the above mentioned controversy and reflect how such motion data could be used in broader contexts of music research. Large-Scale Music Identification: Algorithms and Applications
Abstract: Music identification is the process of matching an audio stream to a particular song. Previous work has relied on hashing, where an exact or almost-exact match between local features of the test and reference recordings is required. In this work, we present a new approach to music identification based on finite-state transducers and Gaussian mixture models. We apply an unsupervised training process to learn an inventory of music phone units similar to phonemes in speech. We also learn a unique sequence of music units characterizing each song. We further propose a novel application of transducers, and in particular factor transducers, for recognition of music phone sequences. With a database size of 15,000 songs, our system achieves an identification accuracy of 99.5% on undistorted test data, and performs robustly in the presence of noise and distortions. Finally, to guarantee that our transducer representation of music phone sequences will scale gracefully as the number of songs in the database increases, we derive a novel bound on the size of a factor automaton. Specifically, we show that the factor automaton of a set of strings U has at most 2Q - 2 states, where Q is the number of nodes of a prefix-tree representing the strings in U. This new bound is a significant improvement over past results. This is joint work with Mehryar Mohri and Pedro Moreno. Intelligent Tutoring Systems for Music Theory: A Knowledge-Based Programming Framework
Abstract: This talk presents a programming framework for implementing intelligent tutoring systems in music theory. An Intelligent Tutoring System (ITS) is broadly defined as an interactive computer environment that teaches students how to solve problems in a specific domain. More specifically, the talk describes the Counterpoint Tutor, an ITS that coaches students in species counterpoint. The first half of the talk outlines the general ITS framework that guides the design of the Tutor. We show how the system’s development can be based on detailed analysis of skill acquisition, drawing on techniques from Artificial Intelligence and cognitive psychology. We outline the possible role that the ITS approach may play in the study and development of music theory skills. The second half of the talk shows examples of how the system’s domain knowledge—such as counterpoint rules and procedures—can be readily implemented in the knowledge-based programming language Prolog. Using acoustic pulse reflectometry to extract information about pipes
Abstract: Acoustic reflectometry is a non-invasive, time-domain method of identifying the geometry of an acoustical space. A sound pulse is injected into a space and the resulting impulse response details particular changes of impedance, which is a result of a cross-sectional area change or an elbow/T-intersection. Each cause of reflection, known as a scattering junction, has a distinct reflection contour. Previous works were able to identify these scattering junctions via algorithms that attempt to extract particular contours from the impulse response. In the present study, the prominent reflections of the space are observed, isolated, and then compared to a training database of all possible scattering junctions. This method eliminates the necessity to create a contour identification algorithm, as scattering junctions are defined based on its most similar neighbors in the training database. Preliminary results suggest that this computer-learning algorithm can successfully identify reflection contours of a space with varying cross-sectional areas from those that were stored in the training database, which suggests that this method could be a more efficient and versatile alternative to previous identification processes. Computational modeling of music
Abstract:Two case studies in the computational modeling of music are presented. The first is a generative approach that is integrated into a graphical computer-assisted composition system called Hyperscore. One of the key features of Hyperscore is an automated harmonizer based on a model of functional tonal harmony. Graphical input from a user is mapped to the model, which then generates "grammatical" chord progressions. The second part discusses an analytical approach toward the implementation of a quantitative, parametric model for describing musical tension. Musical tension is a high-level musical concept difficult to formalize due to its subjective and multi-dimensional nature. The model is therefore derived from empirical data. Two experiments that ask subjects to respond to musical tension are described; both studies take into account a number of musical parameters including harmony, pitch height, melodic expectation, dynamics, onset frequency, tempo, and rhythmic regularity. Given the results of these studies, linear and nonlinear models for predicting tension given descriptions of the musical parameters are explored. MRMR
Abstract: Mrmr is an ongoing open-source research project to develop a standardized set of protocols and syntax conventions to control live installations and multimedia performances via mobile devices. Its usefulness in multimedia performance and installations is particularly evident as it can enable a performer to easily create new touchscreen interfaces to control performances or interactive scenarios wirelessly via their mobile phone or PDA. In addition, Mrmr’s network-based approach is inherently multi-user. Not only can multiple users impact the same performance or installation, but they can do so from different interfaces with separate functionality exposed. For instance, in an audio-visual environment, one user could control the triggering of a number of different audio clips while a second user could control the amount and kind of filtering or effects processing. Meanwhile, a third user could impact the amount of audio-reactivity of the visuals. Creating a new user interface can be achieved either by sending special network messages to a Mrmr-capable mobile device, or by using a simple, graphical “interface builder” application to build an interface and then using it to send the pre-formatted messages to the device. Modeling Music by Example
Abstract:
Machines have the power and potential to make music. Used as tool or instrument, they require human input for transforming predefined material (sounds, patterns, algorithms, filters) into original music. However, they have not yet been trained to "create" on their own. This presentation goes through signal-based techniques on how to perhaps going about modeling the life cycle of listening, composing, and performing. This is accomplished through an automatic analysis-synthesis approach by combined perceptual and structural modeling of the musical surface, which leads to a minimal data representation. Both sound and music modeling strategies are considered. Sound demonstrations to illustrate the presentation will include automatic mashups and beatmatching, music textures, cross-synthesis, compression, music restoration, and morphing.
EM localization and separation using interaural level and phase cues
Abstract:
This talk will describe our system for localizing and separating multiple sound sources from a reverberant two-channel recording. The talk begins with a characterization of interaural spectrograms for single source recordings, and a method for constructing probabilistic models of interaural parameters that are localized in both time and frequency. These models are then combined into a mixture model of sources and delays, which reduces the multi-source localization problem to the single source problem. The talk will then outline an Expectation Maximization algorithm for calculating the maximum likelihood parameters of this model, which correspond well with interaural parameters measured in isolation. As a byproduct of fitting this model, the algorithm creates probabilistic spectrogram masks which can be used for source separation. In experiments performed in simulated anechoic and reverberant environments, our system improved the signal-to-noise ratio of target sources by 2.7 and 3.4 dB more than two comparable algorithms on average.
Less is more: sparse representations of audio signals in overcomplete dictionaries.
Abstract:
This talk will be focused on signal modeling using sparse decompositions in overcomplete dictionaries, with a strong focus on audio signals. In such models, a signal is approximated by combining a small number of elementary waveforms ("atoms"), taken from a very large collection ("dictionary"). This provides extra flexibility (e.g. apparently avoids time-frequency resolution constraints) but comes with increased complexity over standard analysis e.g. Fourier-based. Greedy techniques have however been developed that provide near-optimal decompositions in reasonable computational cost, i.e. applicable on large-scale multimedia databases. After a general overview, I will discuss two recent applications (to be presented at the WASPAA'07 workshop) : fine-grain scalable audio coding and the use of stereo parameters for instrument identification in polyphonic music.
FM Synthesis: 40 Years and Singing Still
Abstract: It was in 1957, 50 years ago, that Max Mathews at Bell Telephone Laboratories wrote the first sound synthesis program that he developed and later released as Music IV. Running on mainframe computers at large institutions, the production of music was very slow and costly. Professor Chowning’s discovery in 1967 of frequency modulation synthesis–computationally efficient, having few but perceptually salient parameters and time-varying spectra–led to a rapid increase in music synthesized by computers; first by software synthesis, then by real-time hardware synthesis ten years later. With the development of MIDI and computer music, the result was widespread use of computers in music that continues to this day. Professor Chowning’s presentation will include sound-synchronous animations that demonstrate the development ranging from the first experiments from 40 years ago, the first breakthrough in 1971 (brass tones–Jean-Claude Risset), the second breakthrough in 1979 (getting the computer to sing) to his most recent composition, Voices. How good do audio features need to be?
Abstract:
A report on several years of experiments around the notion of audio "features" and the presentation of some recent results on the evaluation of novel feature extractors on a 40,000 title database. This work is in the process of being published but some related material can be found here: Realtime Multiple-Pitch Observation
Abstract:
In this presentation I will introduce a new approach for realtime multiple-pitch observation of musical instruments. The proposed algorithm is different from others in the literature both in its purpose and approach. It is destined not for continuous multiple-f0 recognition but rather for projection of the ongoing spectrum to pre-learned pitch templates. The decomposition algorithm on the other hand, does not compromise signal processing models for pitches and consists of a novel algorithm for efficient decomposition of a spectrum using known pitch structures and based on sparse non-negative constraints. I will conclude the presentation by a real-time simulated demo and will showcase several other applications the algorithm by itself has enjoyed in computer music. Finally, the software is provided as MaxMSP and PureData objects for further explorations. Visually-Driven Strategies of musical compostion
Abstract:
In the last century, musical composition has been largely inspired by processes that, at first, don't seem to be related with music. These processes, from Schoenberg' serialism to Xenakis' statistics or Cage's chance music, explore new strategies of sound (or note) organisation. In the recent past, the digitalization of sound combined with the power of computers has enabled the processing of sounds at the sample level, a level of granularity below milliseconds. Technologies have also made it possible to create new interrelationships between sound and the visual modality. This talk will explore the reverse process: visually-driven strategies of composition. I will describe a commission from the National Portrait Gallery in London in which I produced a sonified portrait. The image is treated as a map of constraints for a multi layer approach. The values extracted from the still image organise the overall form of the sound and also control sound effects such as granular synthesis and frequency filtering. I will go on to present a recent piece based on the sonification of animated images. The work, which draws on the idea of music processes, investigates the similarities between films articulations and musical articulations, and whether a visual art could be transposed into music.
|
|
Department of Music and Performing Arts Professions - 35 W. 4th Street, Suite 777 - New York, NY 10012 - (212) 998-5424 |
|