Causal inference plays a crucial role in scientific research across multiple disciplines. Estimating causal effects, particularly the average treatment effect (ATE), from observational data has garnered significant attention. However, computing the ATE from real-world observational data poses substantial privacy risks to users. Differential privacy, which offers strict theoretical guarantees, has emerged as a standard approach for privacy-preserving data analysis. However, existing differentially private ATE estimation works rely on specific assumptions, provide limited privacy protection, or fail to offer comprehensive information protection. To this end, we introduce PrivATE, a practical ATE estimation framework that ensures differential privacy. In fact, various scenarios require varying levels of privacy protection. For example, only test scores are generally sensitive information in education evaluation, while all types of medical record data are usually private. To accommodate different privacy requirements, we design two levels (i.e., label-level and sample-level) of privacy protection in PrivATE. By deriving an adaptive matching limit, PrivATE effectively balances noise-induced error and matching error, leading to a more accurate estimate of ATE. Our evaluation validates the effectiveness of PrivATE. PrivATE outperforms the baselines on all datasets and privacy budgets.
翻译:因果推断在跨学科科学研究中发挥着关键作用。基于观测数据估计因果效应,尤其是平均处理效应(ATE),已引起广泛关注。然而,利用真实世界观测数据计算ATE会给用户带来显著的隐私风险。差分隐私作为一种提供严格理论保证的技术,已成为隐私保护数据分析的标准方法。然而,现有的差分隐私ATE估计方法依赖于特定假设、提供的隐私保护有限,或未能实现全面的信息保护。为此,我们提出了PrivATE,一个确保差分隐私的实用ATE估计框架。实际上,不同场景需要不同级别的隐私保护:例如在教育评估中通常仅测试分数属于敏感信息,而医疗记录数据的所有类型通常均为隐私信息。为适应不同的隐私需求,我们在PrivATE中设计了两个层级(即标签级和样本级)的隐私保护机制。通过推导自适应匹配限制,PrivATE有效平衡了噪声引入误差与匹配误差,从而获得更准确的ATE估计。实验评估验证了PrivATE的有效性:在所有数据集和隐私预算条件下,PrivATE均优于基线方法。