JoeyS2T is a JoeyNMT extension for speech-to-text tasks such as automatic speech recognition and end-to-end speech translation. It inherits the core philosophy of JoeyNMT, a minimalist NMT toolkit built on PyTorch, seeking simplicity and accessibility. JoeyS2T's workflow is self-contained, starting from data pre-processing, over model training and prediction to evaluation, and is seamlessly integrated into JoeyNMT's compact and simple code base. On top of JoeyNMT's state-of-the-art Transformer-based encoder-decoder architecture, JoeyS2T provides speech-oriented components such as convolutional layers, SpecAugment, CTC-loss, and WER evaluation. Despite its simplicity compared to prior implementations, JoeyS2T performs competitively on English speech recognition and English-to-German speech translation benchmarks. The implementation is accompanied by a walk-through tutorial and available on https://github.com/may-/joeys2t.
翻译:JoeyS2T的工作流程是自成一体的,从数据预处理、示范培训和预测到评估,从数据预处理开始,从模型培训和预测到评估,无缝地融入JoeyNMT的紧凑和简单的代码基础。除了JoeyNMT最先进的以变换器为基础的编码结构外,JoeyS2T还提供以语音为导向的组件,如共进层、频谱、CT-loss和WER评价。尽管与以前的执行情况相比简单,JoeyS2T在英语语音识别和英语到德语的语音翻译基准上进行了竞争。实施过程中还配有一条步行式辅导,可在https://github.com/may-joeyst上查阅。