Deep hashing improves retrieval efficiency through compact binary codes, yet it introduces severe and often overlooked privacy risks. The ability to reconstruct original training data from hash codes could lead to serious threats such as biometric forgery and privacy breaches. However, model inversion attacks specifically targeting deep hashing models remain unexplored, leaving their security implications unexamined. This research gap stems from the inaccessibility of genuine training hash codes and the highly discrete Hamming space, which prevents existing methods from adapting to deep hashing. To address these challenges, we propose DHMI, the first diffusion-based model inversion framework designed for deep hashing. DHMI first clusters an auxiliary dataset to derive semantic hash centers as surrogate anchors. It then introduces a surrogate-guided denoising optimization method that leverages a novel attack metric (fusing classification consistency and hash proximity) to dynamically select candidate samples. A cluster of surrogate models guides the refinement of these candidates, ensuring the generation of high-fidelity and semantically consistent images. Experiments on multiple datasets demonstrate that DHMI successfully reconstructs high-resolution, high-quality images even under the most challenging black-box setting, where no training hash codes are available. Our method outperforms the existing state-of-the-art model inversion attacks in black-box scenarios, confirming both its practical efficacy and the critical privacy risks inherent in deep hashing systems.
翻译:深度哈希通过紧凑的二进制编码提升了检索效率,但也引入了严重且常被忽视的隐私风险。从哈希码重建原始训练数据的能力可能导致生物特征伪造和隐私泄露等严重威胁。然而,专门针对深度哈希模型的模型逆向攻击尚未得到探索,其安全影响因此未被检验。这一研究空白源于真实训练哈希码的不可获取性以及高度离散的汉明空间,这使得现有方法难以适配深度哈希。为应对这些挑战,我们提出了DHMI,首个专为深度哈希设计的基于扩散的模型逆向框架。DHMI首先对辅助数据集进行聚类,以获取语义哈希中心作为代理锚点;随后引入一种代理引导的去噪优化方法,该方法利用一种新颖的攻击度量(融合分类一致性与哈希邻近度)动态选择候选样本。一组代理模型引导这些候选样本的细化,确保生成高保真且语义一致的图像。在多个数据集上的实验表明,即使在最具挑战性的黑盒设置下(即无训练哈希码可用),DHMI仍能成功重建高分辨率、高质量的图像。我们的方法在黑盒场景中超越了现有最先进的模型逆向攻击,证实了其实际有效性以及深度哈希系统固有的关键隐私风险。