| Single-Channel Speech Separation using Sparse Non-Negative Matrix Factorization | 
| Mikkel N. Schmidt, Rasmus K. Olsson 
 
 | 
| Abstract | We apply machine learning techniques to the problem of separating multiple speech sources from a single microphone recording.
 The method of choice is a sparse non-negative matrix factorization
 algorithm, which in an unsupervised manner can learn sparse representations
 of the data. This is applied to the learning of personalized
 dictionaries from a speech corpus, which in turn are used
 to separate the audio stream into its components. We show that
 computational savings can be achieved by segmenting the training
 data on a phoneme level. To split the data, a conventional speech
 recognizer is used. The performance of the unsupervised and supervised
 adaptation schemes result in significant improvements in
 terms of the target-to-masker ratio.
 | 
| Type | Conference paper [With referee] | 
| Conference | Interspeech | 
| Year | 2006    Month September | 
| Electronic version(s) | [pdf] | 
 | BibTeX data | [bibtex] | 
| IMM Group(s) | Intelligent Signal Processing |