Single-ended Prediction of Listening Effort for English Speech
* Präsentierender Autor
Zusammenfassung:The single-ended (i.e. reference-free) method for the prediction of listening effort (LE) presented by Huber et al. at the DAGA 2019 has been extended to the English language. The original method employs a deep neural network-based automatic speech recognition (ASR) system that was trained on German speech. The uncertainty of the ASR system in recognizing phonemes, which can be observed by “smeared” phoneme posterior probability representations (or “posteriorgrams”) is quantified by the M-Measure proposed by Hermansky et al. The M-Measure has been found to correlate well with mean subjective ratings of perceived LE. For the German language, a mapping function from the M-Measure to the LE ratings has been derived using the results of several listening tests with more than 400 test items in total. In order to extend the method to English speech, a new listening test was conducted with 17 native English speakers, who rated the perceived LE of 146 speech items mixed with various background sounds taken from English and American movies. Again, the M-Measure shows a high correlation with mean subjective LE ratings, but a different mapping function f:M -> LE was found.