用于视觉标记化与生成的球形Leech量化方法 (Spherical Leech Quantization for Visual Tokenization and Generation)

Non-parametric quantization has received much attention due to its efficiency on parameters and scalability to a large codebook. In this paper, we present a unified formulation of different non-parametric quantization methods through the lens of lattice coding. The geometry of lattice codes explains the necessity of auxiliary loss terms when training auto-encoders with certain existing lookup-free quantization variants such as BSQ. As a step forward, we explore a few possible candidates, including random lattices, generalized Fibonacci lattices, and densest sphere packing lattices. Among all, we find the Leech lattice-based quantization method, which is dubbed as Spherical Leech Quantization ($Λ_{24}$-SQ), leads to both a simplified training recipe and an improved reconstruction-compression tradeoff thanks to its high symmetry and even distribution on the hypersphere. In image tokenization and compression tasks, this quantization approach achieves better reconstruction quality across all metrics than BSQ, the best prior art, while consuming slightly fewer bits. The improvement also extends to state-of-the-art auto-regressive image generation frameworks.

翻译：非参数量化因其参数效率高且能扩展至大规模码本而备受关注。本文通过格编码的视角，提出了不同非参数量化方法的统一表述。格码的几何特性解释了在训练自编码器时，对于某些现有无查找量化变体（如BSQ）为何需要辅助损失项。作为进一步探索，我们研究了几种可能的候选方案，包括随机格、广义斐波那契格以及最密球堆积格。其中，我们发现基于Leech格的量化方法（称为球形Leech量化，$Λ_{24}$-SQ）凭借其高对称性和超球面上的均匀分布特性，既能简化训练流程，又能改善重建与压缩的权衡关系。在图像标记化与压缩任务中，该量化方法在所有指标上均优于现有最佳技术BSQ，同时消耗的比特数略少。这一改进同样适用于最先进的自回归图像生成框架。