PhishSnap：基于感知哈希的图像钓鱼检测方法 (PhishSnap: Image-Based Phishing Detection Using Perceptual Hashing)

Phishing remains one of the most prevalent online threats, exploiting human trust to harvest sensitive credentials. Existing URL- and HTML-based detection systems struggle against obfuscation and visual deception. This paper presents \textbf{PhishSnap}, a privacy-preserving, on-device phishing detection system leveraging perceptual hashing (pHash). Implemented as a browser extension, PhishSnap captures webpage screenshots, computes visual hashes, and compares them against legitimate templates to identify visually similar phishing attempts. A \textbf{2024 dataset of 10,000 URLs} (70\%/20\%/10\% train/validation/test) was collected from PhishTank and Netcraft. Due to security takedowns, a subset of phishing pages was unavailable, reducing dataset diversity. The system achieved \textbf{0.79 accuracy}, \textbf{0.76 precision}, and \textbf{0.78 recall}, showing that visual similarity remains a viable anti-phishing measure. The entire inference process occurs locally, ensuring user privacy and minimal latency.

翻译：钓鱼攻击仍是最普遍的在线威胁之一，其利用人类信任窃取敏感凭证。现有的基于URL和HTML的检测系统难以应对混淆和视觉欺骗。本文提出\\textbf{PhishSnap}，一种利用感知哈希（pHash）的隐私保护型本地钓鱼检测系统。该系统以浏览器扩展形式实现，通过捕获网页截图、计算视觉哈希，并与合法模板比对，以识别视觉相似的钓鱼尝试。我们从PhishTank和Netcraft收集了\\textbf{2024年包含10,000个URL的数据集}（训练/验证/测试集比例为70\\%/20\\%/10\\%）。由于安全下架措施，部分钓鱼页面无法获取，降低了数据集多样性。系统实现了\\textbf{0.79准确率}、\\textbf{0.76精确率}和\\textbf{0.78召回率}，表明视觉相似性仍是有效的反钓鱼手段。整个推理过程在本地完成，确保了用户隐私和最低延迟。