This paper shows that StarGAN-VC, a spectral envelope transformation method for non-parallel many-to-many voice conversion (VC), is capable of emotional VC (EVC). Although StarGAN-VC has been shown to enable speaker identity conversion, its capability for EVC for Japanese phrases has not been clarified. In this paper, we describe the direct application of StarGAN-VC to an EVC task with minimal fundamental frequency and aperiodicity processing. Through subjective evaluation experiments, we evaluated the performance of our StarGAN-EVC system in terms of its ability to achieve EVC for Japanese phrases. The subjective evaluation is conducted in terms of subjective classification and mean opinion score of neutrality and similarity. In addition, the interdependence between the source and target emotional domains was investigated from the perspective of the quality of EVC.
翻译:本文表明,StarGAN-VC是一种用于非平行多声转换的光谱信封转换方法,它能够产生情感VC。虽然StarGAN-VC已证明能够使发言者的身份转换,但其用于日本语的EVC能力尚未澄清。本文描述了StarGAN-VC直接应用于EVC任务,其基本频率和周期性处理极少。通过主观评估实验,我们评估了我们StarGAN-EVC系统在日本语实现EVC的能力方面的表现。主观评估是按主观分类和中性和相似性平均意见分进行的。此外,还从EVC质量的角度对源与目标情感领域之间的相互依存性进行了调查。