Sensitive citizen data, such as social, medical, and fiscal data, is heavily fragmented across public bodies and the private domain. Mining the combined data sets allows for new insights that otherwise remain hidden. Examples are improved healthcare, fraud detection, and evidence-based policy making. (Multi-party) delegated private set intersection (D-PSI) is a privacy-enhancing technology to link data across multiple data providers using a data collector. However, before it can be deployed in these use cases, it needs to be enhanced with additional functions, e.g., securely delivering payload only for elements in the intersection. Although there has been recent progress in the communication and computation requirements of D-PSI, these practical obstacles have not yet been addressed. This paper is the result of a collaboration with a governmental organization responsible for collecting, linking, and pseudonymizing data. Based on their requirements, we design a new D-PSI protocol with composable output functions, including encrypted payload and pseudonymized identifiers. We show that our protocol is secure in the standard model against colluding semi-honest data providers and against a non-colluding, possibly malicious independent party, the data collector. It, hence, allows to privately link and collect data from multiple data providers suitable for deployment in these use cases in the public sector.
翻译:敏感公民数据(如社会、医疗和财政数据)在公共机构与私营领域之间高度碎片化分布。对整合数据集进行挖掘能够揭示原本隐藏的新见解,例如改善医疗保健、欺诈检测和循证政策制定。(多方)委托私有集合交集(D-PSI)是一种隐私增强技术,可通过数据收集者实现跨多个数据提供方的数据关联。然而,在应用于上述场景前,该技术需增强额外功能,例如仅安全传输交集元素的加密载荷。尽管近期D-PSI在通信与计算需求方面取得进展,这些实际障碍尚未得到解决。本文是与负责数据收集、关联和假名化的政府机构合作的成果。基于其需求,我们设计了一种具有可组合输出函数的新型D-PSI协议,包含加密载荷与假名化标识符。我们证明该协议在标准模型下对合谋的半诚实数据提供方安全,并对非合谋且可能恶意的独立方(即数据收集者)安全。因此,该协议能够以隐私保护的方式实现多数据提供方的数据关联与收集,适用于公共部门相关场景的部署。