We develop an empirical likelihood (EL) framework for random forests and related ensemble methods, providing a likelihood-based approach to quantify their statistical uncertainty. Exploiting the incomplete $U$-statistic structure inherent in ensemble predictions, we construct an EL statistic that is asymptotically chi-squared when subsampling induced by incompleteness is not overly sparse. Under sparser subsampling regimes, the EL statistic tends to over-cover due to loss of pivotality; we therefore propose a modified EL that restores pivotality through a simple adjustment. Our method retains key properties of EL while remaining computationally efficient. Theory for honest random forests and simulations demonstrate that modified EL achieves accurate coverage and practical reliability relative to existing inference methods.
翻译:我们为随机森林及相关集成方法开发了一个经验似然框架,提供了一种基于似然的途径来量化其统计不确定性。通过利用集成预测中固有的不完全$U$-统计量结构,我们构建了一个经验似然统计量,当不完整性导致的子采样不过于稀疏时,该统计量渐近服从卡方分布。在更稀疏的子采样机制下,由于枢轴性的丧失,经验似然统计量倾向于过度覆盖;因此,我们提出了一种修正的经验似然方法,通过简单调整恢复枢轴性。我们的方法在保持计算效率的同时,保留了经验似然的关键性质。针对诚实随机森林的理论分析与模拟实验表明,相较于现有的推断方法,修正后的经验似然能够实现精确的覆盖率和实际可靠性。