Assessment of pitch-adaptive front-end signal processing for children's speech recognition
Abstract On account of large acoustic mismatch, automatic speech recognition (ASR) systems trained using adults’ speech data yield poor recognition performance when evaluated on children’s speech data. Despite the use of common speaker normalization techniques like feature-space maximum likelihood regression (fMLLR) and vocal tract length normalization (VTLN), a significant gap remains between the recognition rates for matched and mismatched testing. Our earlier works have already highlighted the sensitivity of salient front-end features including the popular Mel-frequency cepstral coefficient (MFCC) to gross pitch variation across adult and child speakers. Motivated by that, in this work, we explore pitch-adaptive front-end signal processing in deriving the MFCC features to reduce the sensitivity to pitch variation. For this purpose, first an existing vocoder approach known as STRAIGHT spectral analysis is employed for obtaining the smoothed spectrum devoid of pitch harmonics. Secondly, a much simpler spectrum smoothing approach exploiting pitch adaptive-liferting is also presented. The proposed approach is noted to be less sensitive to errors in the pitch estimation than the STRAIGHT-based approach. Both these approaches result in significant improvements for children’s mismatch ASR. The effectiveness of the proposed adaptive-liftering-based approach is also demonstrated in the context of acoustic modeling paradigms based on the subspace Gaussian mixture model (SGMM) and the deep neural network (DNN). Further, it has been shown that the effectiveness of existing speaker normalization techniques remain intact even with the use of proposed pitch-adaptive MFCCs, thus leading to additional gains. Highlights Studying the need for pitch normalization during the front-end speech parameterization step in the case of children’s speech recognition system. Analyzing the reasons behind the pitch sensitivity of MFCC features. Exploring the effectiveness of STRAIGHT-based MFCCs in the context of children’s ASR. A novel approach based on adaptive-liftering to smoothen out the pitchinduced distortions in the magnitude spectra of the speech signal. Exploring the effectiveness of the explored pitch-adaptive approaches for improving the recognition of children’s speech under acoustically mismatched condition on a DNN-based ASR system.
유료 다운로드의 경우 해당 사이트의 정책에 따라 신규 회원가입, 로그인, 유료 구매 등이 필요할 수 있습니다. 해당 사이트에서 발생하는 귀하의 모든 정보활동은 NDSL의 서비스 정책과 무관합니다.
원문복사신청을 하시면, 일부 해외 인쇄학술지의 경우 외국학술지지원센터(FRIC)에서
무료 원문복사 서비스를 제공합니다.
NDSL에서는 해당 원문을 복사서비스하고 있습니다. 위의 원문복사신청 또는 장바구니 담기를 통하여 원문복사서비스 이용이 가능합니다.
- 이 논문과 함께 출판된 논문 + 더보기