Future improvements in large language model (LLM) services increasingly hinge on access to high-value professional knowledge rather than more generic web data. However, the data providers of this knowledge face a skewed tradeoff between income and risk: they receive little share of downstream value yet retain copyright and privacy liability, making them reluctant to contribute their assets to LLM services. Existing techniques do not offer a trustworthy and controllable way to use professional knowledge, because they keep providers in the dark and combine knowledge parameters with the underlying LLM backbone. In this paper, we present PKUS, the Professional Knowledge Utilization System, which treats professional knowledge as a first-class, separable artifact. PKUS keeps the backbone model on GPUs and encodes each provider's contribution as a compact adapter that executes only inside an attested Trusted Execution Environment (TEE). A hardware-rooted lifecycle protocol, adapter pruning, multi-provider aggregation, and split-execution scheduling together make this design practical at serving time. On SST-2, MNLI, and SQuAD with GPT-2 Large and Llama-3.2-1B, PKUS preserves model utility, matching the accuracy and F1 of full fine-tuning and plain LoRA, while achieving the lowest per-request latency with 8.1-11.9x speedup over CPU-only TEE inference and naive CPU-GPU co-execution.
翻译:未来大语言模型服务的提升日益依赖于获取高价值的专业知识,而非更通用的网络数据。然而,此类知识的数据提供方面临着收益与风险失衡的困境:他们仅获得下游价值的极少份额,却仍需承担版权与隐私责任,这使其不愿将自身资产贡献给大语言模型服务。现有技术未能提供可信且可控的专业知识利用方式,因其使提供方处于信息黑箱状态,并将知识参数与底层大语言模型主干网络耦合。本文提出专业知识利用系统PKUS,该系统将专业知识视为一等、可分离的独立构件。PKUS将主干模型保留在GPU上运行,并将各提供方的贡献编码为紧凑的适配器,该适配器仅在经过认证的可信执行环境内执行。通过硬件根植的生命周期协议、适配器剪枝、多提供方聚合及分步执行调度等机制,该设计在实际服务中具备可行性。在SST-2、MNLI和SQuAD数据集上,基于GPT-2 Large与Llama-3.2-1B模型的实验表明,PKUS在保持模型效用的同时,其准确率与F1分数与全参数微调及原始LoRA方法相当,并以8.1-11.9倍的加速比实现了最低的单请求延迟,显著优于纯CPU TEE推理及朴素的CPU-GPU协同执行方案。