Adaptively collected data has become ubiquitous within modern practice. However, even seemingly benign adaptive sampling schemes can introduce severe biases, rendering traditional statistical inference tools inapplicable. This can be mitigated by a property called stability, which states that if the rate at which an algorithm takes actions converges to a deterministic limit, one can expect that certain parameters are asymptotically normal. Building on a recent line of work for the multi-armed bandit setting, we show that the linear upper confidence bound (LinUCB) algorithm for linear bandits satisfies this property. In doing so, we painstakingly characterize the behavior of the eigenvalues and eigenvectors of the random design feature covariance matrix in the setting where the action set is the unit ball, showing that it decomposes into a rank-one direction that locks onto the true parameter and an almost-isotropic bulk that grows at a predictable $\sqrt{T}$ rate. This allows us to establish a central limit theorem for the LinUCB algorithm, establishing asymptotic normality for the limiting distribution of the estimation error where the convergence occurs at a $T^{-1/4}$ rate. The resulting Wald-type confidence sets and hypothesis tests do not depend on the feature covariance matrix and are asymptotically tighter than existing nonasymptotic confidence sets. Numerical simulations corroborate our findings.
翻译:自适应采集的数据在现代实践中已变得无处不在。然而,即使看似良性的自适应采样方案也可能引入严重的偏差,导致传统统计推断工具无法适用。这一问题可通过称为稳定性的性质来缓解,该性质表明:若算法采取动作的速率收敛于一个确定性极限,则可预期某些参数具有渐近正态性。基于多臂老虎机场景下的近期研究脉络,我们证明了线性老虎机中线性上置信界(LinUCB)算法满足该性质。在此过程中,我们细致刻画了动作集为单位球时随机设计特征协方差矩阵的特征值与特征向量的行为,证明其可分解为一个锁定真实参数的一秩方向和一个以可预测的$\\sqrt{T}$速率增长的近似各向同性主体。这使得我们能够为LinUCB算法建立中心极限定理,证明估计误差的极限分布具有渐近正态性,且收敛速率为$T^{-1/4}$。由此得到的Wald型置信集与假设检验不依赖于特征协方差矩阵,且渐近上比现有非渐近置信集更紧致。数值模拟结果验证了我们的发现。