The sharing of patient-level data necessary for covariate-adjusted survival analysis between medical institutions is difficult due to privacy protection restrictions. We propose a privacy-preserving framework that estimates balanced Kaplan-Meier curves from distributed observational data without exchanging raw data. Each institution sends only the low-dimensional representation obtained through dimensionality reduction of the covariate matrix. Analysts reconstruct the aggregated dataset, perform propensity score matching, and estimate survival curves. Experiments using simulation datasets and five publicly available medical datasets showed that the proposed method consistently outperformed single-site analyses. This method can handle both horizontal and vertical data distribution scenarios and enables the collaborative acquisition of reliable survival curves with minimal communication and no disclosure of raw data.
翻译:由于隐私保护限制,医疗机构之间难以共享进行协变量调整生存分析所需的患者级别数据。我们提出了一种隐私保护框架,可在不交换原始数据的情况下,从分布式观测数据中估计平衡的Kaplan-Meier曲线。每个机构仅发送通过协变量矩阵降维获得的低维表示。分析人员重构聚合数据集,执行倾向得分匹配,并估计生存曲线。使用模拟数据集和五个公开可用的医学数据集进行的实验表明,所提出的方法始终优于单站点分析。该方法能够处理水平和垂直数据分布场景,并以最少的通信且不泄露原始数据的方式,实现可靠生存曲线的协作获取。