Cochlear 植入处理器中校验语音话音增强用基于电话的代比比率遮罩估计值 (Phoneme-Based Ratio Mask Estimation for Reverberant Speech Enhancement in Cochlear Implant Processors) - 专知论文

会员服务 ·

0

估计/估计量 · 掩码 · 音素 · 语音增强 · INFORMS ·

2021 年 5 月 28 日

Phoneme-Based Ratio Mask Estimation for Reverberant Speech Enhancement in Cochlear Implant Processors

翻译：Cochlear 植入处理器中校验语音话音增强用基于电话的代比比率遮罩估计值

Kevin M. Chu,Leslie M. Collins,Boyla O. Mainsah

Cochlear implant (CI) users have considerable difficulty in understanding speech in reverberant listening environments. Time-frequency (T-F) masking is a common technique that aims to improve speech intelligibility by multiplying reverberant speech by a matrix of gain values to suppress T-F bins dominated by reverberation. Recently proposed mask estimation algorithms leverage machine learning approaches to distinguish between target speech and reverberant reflections. However, the spectro-temporal structure of speech is highly variable and dependent on the underlying phoneme. One way to potentially overcome this variability is to leverage explicit knowledge of phonemic information during mask estimation. This study proposes a phoneme-based mask estimation algorithm, where separate mask estimation models are trained for each phoneme. Sentence recognition tests were conducted in normal hearing listeners to determine whether a phoneme-based mask estimation algorithm is beneficial in the ideal scenario where perfect knowledge of the phoneme is available. The results showed that the phoneme-based masks improved the intelligibility of vocoded speech when compared to conventional phoneme-independent masks. The results suggest that a phoneme-based speech enhancement strategy may potentially benefit CI users in reverberant listening environments.

翻译：Cochlear 植入器(CI) 用户在听觉回旋环境中理解语言方面有相当大的困难。时间频掩码(T-F)是一种常见技术,目的是通过增殖值矩阵,通过增殖值矩阵,增加语音感知性语言,以压制以反动为主的T-F bins。最近提议的遮罩估计算法利用机器学习方法,以区分目标语音和反动反射。然而,语言的光谱-时空结构极易变,取决于基本的电话机。克服这种变异的一个潜在办法是在估计遮罩时利用对语音信息的明确知识。本研究提出一种基于手机的遮罩估计算法,对每个电话机进行单独的遮罩估计模型培训。在正常的收听者中进行了判决识别测试,以确定基于手机的遮罩估计算法是否有利于理想的情景,即对电话机能进行完全的了解。结果显示,基于手机的遮蔽面罩与传统的独立遮罩相比,可以提高语音调调的语音的感知性。结果表明,基于手机的语音增强战略可能有利于用户重新监听力环境。

0

相关内容

估计/估计量

估计/估计量

【开放书】应用信号处理，498页pdf，Applied Signal Processing

【开放书】应用信号处理，498页pdf，Applied Signal Processing

专知会员服务

46+阅读 · 2021年6月15日

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

专知会员服务

23+阅读 · 2021年6月3日

自然语言生成综述

专知会员服务

65+阅读 · 2021年5月29日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

专知会员服务

33+阅读 · 2020年4月26日

【超分辨率| 2019最新综述】图像超分辨率的深度学习，附PDF（Deep Learning for Image Super-resolution: A Survey）

【超分辨率| 2019最新综述】图像超分辨率的深度学习，附PDF（Deep Learning for Image Super-resolution: A Survey）

专知会员服务

60+阅读 · 2019年11月16日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

已删除

将门创投

4+阅读 · 2018年11月20日

Controlling the Remixing of Separated Dialogue with a Non-Intrusive Quality Estimate

Arxiv

0+阅读 · 2021年7月21日

On Prosody Modeling for ASR+TTS based Voice Conversion

On Prosody Modeling for ASR+TTS based Voice Conversion

Arxiv

0+阅读 · 2021年7月20日

Joint Echo Cancellation and Noise Suppression based on Cascaded Magnitude and Complex Mask Estimation

Arxiv

0+阅读 · 2021年7月20日

WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss

Arxiv

3+阅读 · 2020年2月2日

FastSpeech: Fast, Robust and Controllable Text to Speech

FastSpeech: Fast, Robust and Controllable Text to Speech

Arxiv

3+阅读 · 2019年5月22日

Phase-aware Speech Enhancement with Deep Complex U-Net

Phase-aware Speech Enhancement with Deep Complex U-Net

Arxiv

15+阅读 · 2019年3月7日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

Close to Human Quality TTS with Transformer

Arxiv

3+阅读 · 2018年11月13日

Implicit Maximum Likelihood Estimation

Implicit Maximum Likelihood Estimation

Arxiv

7+阅读 · 2018年9月24日

Signal Processing and Piecewise Convex Estimation

Arxiv

4+阅读 · 2018年3月14日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【开放书】应用信号处理，498页pdf，Applied Signal Processing

【开放书】应用信号处理，498页pdf，Applied Signal Processing

专知会员服务

46+阅读 · 2021年6月15日

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

专知会员服务

23+阅读 · 2021年6月3日

自然语言生成综述

专知会员服务

65+阅读 · 2021年5月29日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

专知会员服务

33+阅读 · 2020年4月26日

【超分辨率| 2019最新综述】图像超分辨率的深度学习，附PDF（Deep Learning for Image Super-resolution: A Survey）

【超分辨率| 2019最新综述】图像超分辨率的深度学习，附PDF（Deep Learning for Image Super-resolution: A Survey）

专知会员服务

60+阅读 · 2019年11月16日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

已删除

将门创投

4+阅读 · 2018年11月20日

相关论文

Controlling the Remixing of Separated Dialogue with a Non-Intrusive Quality Estimate

Arxiv

0+阅读 · 2021年7月21日

On Prosody Modeling for ASR+TTS based Voice Conversion

On Prosody Modeling for ASR+TTS based Voice Conversion

Arxiv

0+阅读 · 2021年7月20日

Joint Echo Cancellation and Noise Suppression based on Cascaded Magnitude and Complex Mask Estimation

Arxiv

0+阅读 · 2021年7月20日

WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss

Arxiv

3+阅读 · 2020年2月2日

FastSpeech: Fast, Robust and Controllable Text to Speech

FastSpeech: Fast, Robust and Controllable Text to Speech

Arxiv

3+阅读 · 2019年5月22日

Phase-aware Speech Enhancement with Deep Complex U-Net

Phase-aware Speech Enhancement with Deep Complex U-Net

Arxiv

15+阅读 · 2019年3月7日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

Close to Human Quality TTS with Transformer

Arxiv

3+阅读 · 2018年11月13日

Implicit Maximum Likelihood Estimation

Implicit Maximum Likelihood Estimation

Arxiv

7+阅读 · 2018年9月24日

Signal Processing and Piecewise Convex Estimation

Arxiv

4+阅读 · 2018年3月14日

微信扫码咨询专知VIP会员