Abstract
Conventional single- andmulti-channel speech enhancementmethods aimat improving the signal-to-noise ratio (SNR) of the signal signals captured through distant microphones, which do not specifically target the improvements of ASR performance. We investigate a nonlinear multiple regression to extract robust features for automatic speech recognition (ASR). The idea is to approximate the log spectra of a close-talking microphone by effectively combining of the log spectra of distant microphones. The devised system turns out to be a generalized log spectral subtraction framework for the robust speech recognition. We demonstrate the effectiveness of the proposed approach through our extensive evaluations on the single- and multi-channel isolated word recognition experiments conducted in 15 real car-driving environments.
Keywords: microphone array, in-car speech recognition, neural network, K-means, beamforming