Thesis of Boyang Gao

Subject:

Structured Semantic Descriptions of Music Titles

Start date: 01/10/2009
Defense date: 15/12/2014

Advisor: Liming Chen
Coadvisor: Emmanuel Dellandréa

Summary:

Nowadays automatic music analysis faces two main challenges 1) to automatically classify music into semantic classes for example emotion and genre classes; 2) to finish calculation in an affordable time for big data sets. To address the two problems, we first extract three levels of music information: low, middle and high. We then build different models upon the three levels of features to perform classification. Finally we fuse the three level outputs to provide ultimate classification. Low level information denotes the signal level features directly extracted from music wave such as MFCC. On this level, features are first transformed with the bag-of-words model and then classified by SVMs. To accelerate bag-of-words calculation for big data, we propose to transform k-means, GMM and MAP into matrix multiplication form which can be effectively speeded up by various parallel computing frameworks such as GPU, multi-cored CPU, Hadoop, Spark clusters. On the middle level, we explore to leverage music knowledge such as musical instrument sound and music note statistics. We propose to decompose music onto a MIDI dictionary using modified sparse representation methods. Note statistics is further incorporated to enhance the decomposition precision. On the high level, we plan to utilize lyrics to extract direct emotional information based on natural language processing results. In the final fusion stage, performance weighted fusion methods will be applied.