We develop new methods to integrate experimental and observational data in causal inference. While randomized controlled trials offer strong internal validity, they are often costly and therefore limited in sample size. Observational data, though cheaper and often with larger sample sizes, are prone to biases due to unmeasured confounders. To harness their complementary strengths, we propose a systematic framework that formulates causal estimation as an empirical risk minimization (ERM) problem. A full model containing the causal parameter is obtained by minimizing a weighted combination of experimental and observational losses--capturing the causal parameter's validity and the full model's fit, respectively. The weight is chosen through cross-validation on the causal parameter across experimental folds. Our experiments on real and synthetic data show the efficacy and reliability of our method. We also provide theoretical non-asymptotic error bounds.
翻译:我们开发了整合实验与观测数据进行因果推断的新方法。随机对照试验虽具有强内部效度,但成本高昂且样本量通常有限;观测数据虽成本较低且样本量较大,却易受未测量混杂因素导致的偏倚影响。为利用二者的互补优势,我们提出一个系统框架,将因果估计构建为经验风险最小化问题:通过最小化实验损失与观测损失的加权组合,得到一个包含因果参数的完整模型——其中实验损失反映因果参数的有效性,观测损失体现完整模型的拟合度。权重通过因果参数在实验数据折上的交叉验证选定。我们在真实与合成数据上的实验验证了该方法的效能与可靠性,并提供了理论上的非渐近误差界。