Standard Double Machine Learning (DML; Chernozhukov et al., 2018) confidence intervals can exhibit substantial finite-sample coverage distortions when the underlying score equations are ill-conditioned, even if nuisance functions are estimated with state-of-the-art methods. Focusing on the partially linear regression (PLR) model, we show that a simple, easily computed condition number for the orthogonal score, denoted kappa_DML := 1 / |J_theta|, largely determines when DML inference is reliable. Our first result derives a nonasymptotic, Berry-Esseen-type bound showing that the coverage error of the usual DML t-statistic is of order n^{-1/2} + sqrt(n) * r_n, where r_n is the standard DML remainder term summarizing nuisance estimation error. Our second result provides a refined linearization in which both estimation error and confidence interval length scale as kappa_DML / sqrt(n) + kappa_DML * r_n, so that ill-conditioning directly inflates both variance and bias. These expansions yield three conditioning regimes - well-conditioned, moderately ill-conditioned, and severely ill-conditioned - and imply that informative, shrinking confidence sets require kappa_DML = o_p(sqrt(n)) and kappa_DML * r_n -> 0. We conduct Monte Carlo experiments across overlap levels, nuisance learners (OLS, Lasso, random forests), and both low- and high-dimensional (p > n) designs. Across these designs, kappa_DML is highly predictive of finite-sample performance: well-conditioned designs with kappa_DML < 1 deliver near-nominal coverage with short intervals, whereas severely ill-conditioned designs can exhibit large bias and coverage around 40% for nominal 95% intervals, despite flexible nuisance fitting. We propose reporting kappa_DML alongside DML estimates as a routine diagnostic of score conditioning, in direct analogy to condition-number checks and weak-instrument diagnostics in IV settings.
翻译:标准双重机器学习(DML;Chernozhukov等人,2018)置信区间在基础得分方程病态时,即使采用最先进的估计方法拟合干扰函数,仍可能出现显著的有限样本覆盖失真。聚焦于部分线性回归(PLR)模型,我们证明正交得分的一个简单易算的条件数(记为 kappa_DML := 1 / |J_theta|)在很大程度上决定了DML推断的可靠性。我们的第一个结果推导了一个非渐近的Berry-Esseen型界,表明常用DML t统计量的覆盖误差阶数为 n^{-1/2} + sqrt(n) * r_n,其中 r_n 是总结干扰函数估计误差的标准DML余项。我们的第二个结果提供了一个精细的线性化,其中估计误差和置信区间长度均按 kappa_DML / sqrt(n) + kappa_DML * r_n 缩放,因此病态性会直接放大方差和偏差。这些展开式导出了三种条件状态——良态、中度病态和严重病态——并表明要获得信息丰富且收缩的置信集,需要满足 kappa_DML = o_p(sqrt(n)) 且 kappa_DML * r_n -> 0。我们在不同重叠水平、干扰函数学习器(OLS、Lasso、随机森林)以及低维和高维(p > n)设计下进行了蒙特卡洛实验。在这些设计中,kappa_DML 对有限样本性能具有高度预测性:kappa_DML < 1 的良态设计能提供接近名义水平的覆盖和较短的区间,而严重病态设计即使采用灵活的干扰函数拟合,也可能表现出较大偏差和约40%的覆盖水平(名义水平为95%)。我们建议将 kappa_DML 作为得分方程条件状态的常规诊断指标,与DML估计值一同报告,这直接类比于工具变量(IV)设定中的条件数检查和弱工具变量诊断。