Large language models (LLMs) exhibit exceptional performance but pose substantial privacy risks due to training data memorization, particularly within healthcare contexts involving imperfect or privacy-sensitive patient information. We present a hierarchical dual-strategy framework for selective knowledge unlearning that precisely removes specialized knowledge while preserving fundamental medical competencies. Our approach synergistically integrates geometric-constrained gradient updates to selectively modulate target parameters with concept-aware token-level interventions that distinguish between preservation-critical and unlearning-targeted tokens via a unified four-level medical concept hierarchy. Comprehensive evaluations on the MedMCQA (surgical) and MHQA (anxiety, depression, trauma) datasets demonstrate superior performance, achieving an 82.7% forgetting rate and 88.5% knowledge preservation. Notably, our framework maintains robust privacy guarantees while requiring modification of only 0.1% of parameters, addressing critical needs for regulatory compliance, auditability, and ethical standards in clinical research.
翻译:大型语言模型(LLMs)展现出卓越性能,但由于训练数据记忆效应,尤其在涉及不完善或隐私敏感患者信息的医疗场景中,带来显著的隐私风险。本文提出一种分层双策略框架,用于选择性知识遗忘,旨在精确移除特定专业知识的同时保留基础医学能力。该方法协同整合几何约束梯度更新以选择性调节目标参数,并结合概念感知的令牌级干预,通过统一的四级医学概念层次结构区分需保留的关键令牌与需遗忘的目标令牌。在MedMCQA(外科)和MHQA(焦虑、抑郁、创伤)数据集上的综合评估表明,该方法实现了82.7%的遗忘率与88.5%的知识保留率,性能优异。值得注意的是,本框架在仅需修改0.1%参数的情况下,仍保持稳健的隐私保障,满足了临床研究中法规合规性、可审计性及伦理标准的关键需求。