This technical report presents the training methodology and evaluation results of the open-source Jasper-Token-Compression-600M model, released in November 2025. Building on previous distillation-based recipes from the English Stella and Jasper models, we successfully extend this approach to a bilingual (English and Chinese) domain, further enhancing model performance through the incorporation of contrastive learning. A key innovation of our model is the introduction of a one-dimensional convolution-based token compression module. We dynamically adjust the compression rate during training, enabling the model to learn more robust and efficient compressed text representations. By combining knowledge distillation with token compression techniques, we achieve significant improvements in both embedding quality and inference efficiency. Our model performs with higher efficiency than a traditional 0.6B model while achieving performance comparable to that of an 8B model. For more information on the model release, visit: https://huggingface.co/infgrad/Jasper-Token-Compression-600M.
翻译:本技术报告介绍了于2025年11月发布的开源模型 Jasper-Token-Compression-600M 的训练方法及评估结果。基于先前英文 Stella 和 Jasper 模型的蒸馏方法,我们成功将此方法扩展至双语(英文和中文)领域,并通过引入对比学习进一步提升了模型性能。我们模型的一个关键创新是引入了一个基于一维卷积的令牌压缩模块。我们在训练过程中动态调整压缩率,使模型能够学习到更鲁棒、更高效的压缩文本表示。通过将知识蒸馏与令牌压缩技术相结合,我们在嵌入质量和推理效率方面均取得了显著提升。我们的模型在实现与 80 亿参数模型相当性能的同时,其效率高于传统的 6 亿参数模型。有关模型发布的更多信息,请访问:https://huggingface.co/infgrad/Jasper-Token-Compression-600M。