Domain adaptation is a key strategy for enhancing the generalizability of deep learning models in real-world scenarios, where test distributions often diverge significantly from the training domain. However, conventional approaches typically rely on prior knowledge of the target domain or require model retraining, limiting their practicality in dynamic or resource-constrained environments. Recent test-time adaptation methods based on batch normalization statistic updates allow for unsupervised adaptation, but they often fail to capture complex activation distributions and are constrained to specific normalization layers. We propose Adaptive Quantile Recalibration (AQR), a test-time adaptation technique that modifies pre-activation distributions by aligning quantiles on a channel-wise basis. AQR captures the full shape of activation distributions and generalizes across architectures employing BatchNorm, GroupNorm, or LayerNorm. To address the challenge of estimating distribution tails under varying batch sizes, AQR incorporates a robust tail calibration strategy that improves stability and precision. Our method leverages source-domain statistics computed at training time, enabling unsupervised adaptation without retraining models. Experiments on CIFAR-10-C, CIFAR-100-C, and ImageNet-C across multiple architectures demonstrate that AQR achieves robust adaptation across diverse settings, outperforming existing test-time adaptation baselines. These results highlight AQR's potential for deployment in real-world scenarios with dynamic and unpredictable data distributions.
翻译:领域适应是提升深度学习模型在现实场景中泛化能力的关键策略,因为测试分布常与训练域存在显著差异。然而,传统方法通常依赖目标域的先验知识或需要模型重新训练,这限制了其在动态或资源受限环境中的实用性。近期基于批量归一化统计量更新的测试时适应方法可实现无监督适应,但往往难以捕捉复杂的激活分布,且受限于特定的归一化层。我们提出自适应分位数重校准(AQR),这是一种通过逐通道对齐分位数来修正预激活分布的测试时适应技术。AQR能够完整捕捉激活分布的形态,并可泛化至采用BatchNorm、GroupNorm或LayerNorm的多种架构。为应对不同批量大小下估计分布尾部的挑战,AQR引入了鲁棒的尾部校准策略,提升了稳定性和精度。该方法利用训练时计算的源域统计量,实现了无需重新训练模型的无监督适应。在CIFAR-10-C、CIFAR-100-C和ImageNet-C数据集上对多种架构的实验表明,AQR能在多样化设置中实现鲁棒的适应效果,其性能优于现有的测试时适应基线方法。这些结果凸显了AQR在具有动态和不可预测数据分布的现实场景中的部署潜力。