Cochlear implant (CI) users have considerable difficulty in understanding speech in reverberant listening environments. Time-frequency (T-F) masking is a common technique that aims to improve speech intelligibility by multiplying reverberant speech by a matrix of gain values to suppress T-F bins dominated by reverberation. Recently proposed mask estimation algorithms leverage machine learning approaches to distinguish between target speech and reverberant reflections. However, the spectro-temporal structure of speech is highly variable and dependent on the underlying phoneme. One way to potentially overcome this variability is to leverage explicit knowledge of phonemic information during mask estimation. This study proposes a phoneme-based mask estimation algorithm, where separate mask estimation models are trained for each phoneme. Sentence recognition tests were conducted in normal hearing listeners to determine whether a phoneme-based mask estimation algorithm is beneficial in the ideal scenario where perfect knowledge of the phoneme is available. The results showed that the phoneme-based masks improved the intelligibility of vocoded speech when compared to conventional phoneme-independent masks. The results suggest that a phoneme-based speech enhancement strategy may potentially benefit CI users in reverberant listening environments.
翻译:Cochlear 植入器(CI) 用户在听觉回旋环境中理解语言方面有相当大的困难。 时间频掩码(T-F)是一种常见技术,目的是通过增殖值矩阵,通过增殖值矩阵,增加语音感知性语言,以压制以反动为主的T-F bins。 最近提议的遮罩估计算法利用机器学习方法,以区分目标语音和反动反射。然而,语言的光谱-时空结构极易变,取决于基本的电话机。克服这种变异的一个潜在办法是在估计遮罩时利用对语音信息的明确知识。本研究提出一种基于手机的遮罩估计算法,对每个电话机进行单独的遮罩估计模型培训。在正常的收听者中进行了判决识别测试,以确定基于手机的遮罩估计算法是否有利于理想的情景,即对电话机能进行完全的了解。结果显示,基于手机的遮蔽面罩与传统的独立遮罩相比,可以提高语音调调的语音的感知性。结果表明,基于手机的语音增强战略可能有利于用户重新监听力环境。