基于广义Riesz回归的未标记协变量半监督处理效应估计 (Semi-Supervised Treatment Effect Estimation with Unlabeled Covariates via Generalized Riesz Regression)

This study investigates treatment effect estimation in the semi-supervised setting, where we can use not only the standard triple of covariates, treatment indicator, and outcome, but also unlabeled auxiliary covariates. For this problem, we develop efficiency bounds and efficient estimators whose asymptotic variance aligns with the efficiency bound. In the analysis, we introduce two different data-generating processes: the one-sample setting and the two-sample setting. The one-sample setting considers the case where we can observe treatment indicators and outcomes for a part of the dataset, which is also called the censoring setting. In contrast, the two-sample setting considers two independent datasets with labeled and unlabeled data, which is also called the case-control setting or the stratified setting. In both settings, we find that by incorporating auxiliary covariates, we can lower the efficiency bound and obtain an estimator with an asymptotic variance smaller than that without such auxiliary covariates.

翻译：本研究探讨半监督设置下的处理效应估计问题，其中我们不仅可以使用协变量、处理指标和结果的标准三元组，还可以利用未标记的辅助协变量。针对该问题，我们推导了效率边界并构建了高效估计器，其渐近方差与效率边界一致。在分析中，我们引入了两种不同的数据生成过程：单样本设置与双样本设置。单样本设置考虑在部分数据集中可观测处理指标与结果的情形，亦称为删失设置。与之相对，双样本设置考虑包含标记数据与未标记数据的两个独立数据集，亦称为病例对照设置或分层设置。在两种设置中，我们发现通过引入辅助协变量，能够降低效率边界，并获得渐近方差小于未使用此类辅助协变量的估计器。