Multi-source learning is an emerging area of research in statistics, where information from multiple datasets with heterogeneous distributions is combined to estimate the parameter of interest for a target population without observed responses. We propose a high-dimensional debiased calibration (HDC) method and a multi-source HDC (MHDC) estimator for general estimating equations. The HDC method uses a novel approach to achieve Neyman orthogonality for the target parameter via high-dimensional covariate balancing on an augmented set of covariates. It avoids the augmented inverse probability weighting formulation and leads to an easier optimization algorithm for the target parameter in estimating equations and M-estimation. The proposed MHDC estimator integrates multi-source data while supporting flexible specifications for both density ratio and outcome regression models, achieving multiple robustness against model misspecification. Its asymptotic normality is established, and a specification test is proposed to examine the transferability condition for the multi-source data. Compared to the linear combination of single-source HDC estimators, the MHDC estimator improves efficiency by jointly utilizing all data sources. Through simulation studies, we show that the MHDC estimator accommodates multiple sources and multiple working models effectively and performs better than the existing doubly robust estimators for multi-source learning. An empirical analysis of a meteorological dataset demonstrates the utility of the proposed method in practice.
翻译:多源学习是统计学中一个新兴的研究领域,旨在结合来自具有异质分布的多个数据集的信息,以估计目标总体(无观测响应)的关注参数。我们提出了一种高维去偏校准(HDC)方法及用于一般估计方程的多源HDC(MHDC)估计量。HDC方法通过在高维增强协变量集上进行协变量平衡,采用一种新颖方法实现目标参数的Neyman正交性。该方法避免了增强逆概率加权形式,并为估计方程和M估计中的目标参数提供了更易优化的算法。所提出的MHDC估计量在整合多源数据的同时,支持对密度比和结果回归模型的灵活设定,实现了对模型误设定的多重稳健性。我们建立了其渐近正态性,并提出了一种设定检验以检验多源数据的可迁移性条件。与单源HDC估计量的线性组合相比,MHDC估计量通过联合利用所有数据源提高了效率。通过模拟研究,我们表明MHDC估计量能有效适应多源和多工作模型,且性能优于现有的多源学习双重稳健估计量。对气象数据集的实证分析证明了该方法在实际应用中的有效性。