Sparse-voxel rasterization is a fast, differentiable alternative for optimization-based scene reconstruction, but it tends to underfit low-frequency content, depends on brittle pruning heuristics, and can overgrow in ways that inflate VRAM. We introduce LiteVoxel, a self-tuning training pipeline that makes SV rasterization both steadier and lighter. Our loss is made low-frequency aware via an inverse-Sobel reweighting with a mid-training gamma-ramp, shifting gradient budget to flat regions only after geometry stabilize. Adaptation replaces fixed thresholds with a depth-quantile pruning logic on maximum blending weight, stabilized by EMA-hysteresis guards and refines structure through ray-footprint-based, priority-driven subdivision under an explicit growth budget. Ablations and full-system results across Mip-NeRF 360 (6scenes) and Tanks & Temples (3scenes) datasets show mitigation of errors in low-frequency regions and boundary instability while keeping PSNR/SSIM, training time, and FPS comparable to a strong SVRaster pipeline. Crucially, LiteVoxel reduces peak VRAM by ~40%-60% and preserves low-frequency detail that prior setups miss, enabling more predictable, memory-efficient training without sacrificing perceptual quality.
翻译:稀疏体素栅格化是一种快速、可微分的替代方案,用于基于优化的场景重建,但它往往对低频内容欠拟合,依赖于脆弱的剪枝启发式方法,并可能以增加显存的方式过度增长。我们提出了LiteVoxel,一种自调优的训练流程,使SV栅格化更稳定且更轻量。我们的损失函数通过结合训练中期伽马斜坡的逆Sobel重加权,实现了对低频内容的感知,仅在几何结构稳定后将梯度预算转移到平坦区域。自适应方法用基于最大混合权重的深度分位数剪枝逻辑替代固定阈值,通过指数移动平均滞后保护机制稳定,并在显式增长预算下,通过基于射线足迹、优先级驱动的细分来细化结构。在Mip-NeRF 360(6个场景)和Tanks & Temples(3个场景)数据集上的消融实验和全系统结果表明,该方法减轻了低频区域和边界不稳定性误差,同时保持了与强大的SVRaster流程相当的PSNR/SSIM、训练时间和FPS。关键的是,LiteVoxel将峰值显存降低了约40%-60%,并保留了先前设置遗漏的低频细节,实现了更可预测、内存高效的训练,且不牺牲感知质量。