Research
Interest Areas
Digital Signal Processing
Machine Learning
Modulation Spectral Features
Audio/Image Data Mining
Music Information Retrieval
Doctoral Thesis
Co-Advisors: Dr. Aaron D. Lanterman and Dr. David V. Anderson
Title: A Framework for Exploiting Modulation Spectral Features in Music Data Mining and Other Applications
Summary: The term "modulation" brings diverse concepts to mind, such as the
transmission of sound over radio waves via amplitude modulation (AM) or frequency modulation (FM), musicians modulating "keys" within a musical piece, or an audio engineer using special effects to synthesize a sound. The concept of "modulation frequency" may be most commonly known in AM and FM radio, where music and talk shows are broadcast "on air" via a transmitted carrier frequency that is either modulated in frequency or amplitude by the message, or modulating, signal. The radio receiver is tuned to the carrier frequency to receive the broadcast signal by demodulating it into its modulator and carrier parts. The frequency of the modulator is usually much smaller than the carrier frequency, and therefore, a slowly varying envelope, or modulator, is formed in the time domain for AM and in the frequency domain for FM. When a signal is decomposed into frequency bands, demodulated into modulator and carrier pairs, and portrayed in a carrier frequency-versus-modulator frequency domain, significant information may be automatically observed about the signal. We refer to this domain as the modulation spectral domain.
The objective of this thesis is to develop a framework for extracting "modulation spectral features", which are features in the modulation spectral domain, from music and other signals. The purpose of extracting features from these signals is to perform data mining tasks, such as unsupervised source identification, unsupervised source separation, and audio synthesis. The "modulation spectrum" is referred to as a windowed Fourier transform across time that produces an acoustic frequency versus modulation frequency representation of a signal. Previously, frameworks incorporating the discrete short-time modulation transform (DSTMT) and modulation spectrum have been designed mostly for filtering of speech signals.
A recent interest has risen in using modulation spectral features for music data mining. The field of music data mining, also known as music information retrieval (MIR), has been rapidly developing over the past decade or so (International Society of Music Information Retrieval-ISMIR). One reason for this development is the aim to develop frameworks leveraging the particular characteristics of music signals instead of simply copying methods previously applied to its speech-centered predecessors, such as speech recognition, speech synthesis, and speaker identification. This research broadens the perspective and use of an existing modulation filterbank framework by exploiting modulation features well suited for music signals.
More specifically, this research describes the following: the usefulness of the DSTMT and the modulation spectrum for music data mining tasks; an unsupervised source identification method using modulation spectral features; an unsupervised source separation method; and an analysis of FM features in an "AM-dominated" modulation spectra; and other applications. The objective of the unsupervised identification method is to automatically identify distinct sources of varying modulation content, a process that is currently manual and requires prior information about sources. The objective of the unsupervised source separation is to blindly separate sources in periodic segments of signals with varying modulation content, or temporal patterns. When combined, our unsupervised source identification and source separation make up a modulation spectral feature framework that may be capable of other applications as well, such as vibrato analysis, quantitative evaluation in music lessons, and modulation analysis of EEG seizure signals.