Facial recognition has become a widely used method for authentication and identification, with applications for secure access and locating missing persons. Its success is largely attributed to deep learning, which leverages large datasets and effective loss functions to learn discriminative features. Despite these advances, facial recognition still faces challenges in explainability, demographic bias, privacy, and robustness to aging, pose variations, lighting changes, occlusions, and facial expressions. Privacy regulations have also led to the degradation of several datasets, raising legal, ethical, and privacy concerns. Synthetic facial data generation has been proposed as a promising solution. It mitigates privacy issues, enables experimentation with controlled facial attributes, alleviates demographic bias, and provides supplementary data to improve models trained on real data. This study compares the effectiveness of synthetic facial datasets generated using different techniques in facial recognition tasks. We evaluate accuracy, rank-1, rank-5, and the true positive rate at a false positive rate of 0.01% on eight leading datasets, offering a comparative analysis not extensively explored in the literature. Results demonstrate the ability of synthetic data to capture realistic variations while emphasizing the need for further research to close the performance gap with real data. Techniques such as diffusion models, GANs, and 3D models show substantial progress; however, challenges remain.
翻译:人脸识别已成为一种广泛用于身份验证与识别的技术,在安全访问和寻找失踪人员等领域具有重要应用。其成功主要归功于深度学习,该方法利用大规模数据集和有效的损失函数来学习判别性特征。尽管取得了这些进展,人脸识别在可解释性、人口统计偏差、隐私保护以及对年龄变化、姿态变化、光照变化、遮挡和面部表情的鲁棒性方面仍面临挑战。隐私法规的出台还导致多个数据集质量下降,引发了法律、伦理和隐私方面的担忧。合成面部数据生成被提出作为一种有前景的解决方案。它能够缓解隐私问题,支持对受控面部属性的实验研究,减轻人口统计偏差,并为改进基于真实数据训练的模型提供补充数据。本研究比较了采用不同技术生成的合成面部数据集在人脸识别任务中的有效性。我们在八个主流数据集上评估了准确率、Rank-1、Rank-5以及误识率为0.01%时的真阳率,提供了现有文献中尚未深入探讨的比较分析。结果表明,合成数据能够捕捉真实的面部变化,同时强调需要进一步研究以缩小与真实数据之间的性能差距。扩散模型、生成对抗网络(GANs)和三维模型等技术已取得显著进展,但仍存在挑战。