语言模式服务黑盒图 (Black-Box Tuning for Language-Model-as-a-Service)

Extremely large pre-trained language models (PTMs) such as GPT-3 are usually released as a service. It allows users to design task-specific prompts to query the PTMs through some black-box APIs. In such a scenario, which we call Language-Model-as-a-Service (LMaaS), the gradients of PTMs are usually unavailable. Can we optimize the task prompts by only accessing the model inference APIs? This paper proposes the black-box tuning framework to optimize the continuous prompt prepended to the input text via derivative-free optimization. Instead of optimizing in the original high-dimensional prompt space, which is intractable for traditional derivative-free optimization, we perform optimization in a randomly generated subspace due to the low intrinsic dimensionality of large PTMs. The experimental results show that the black-box tuning with RoBERTa on a few labeled samples not only significantly outperforms manual prompt and GPT-3's in-context learning, but also surpasses the gradient-based counterparts, i.e., prompt tuning and full model tuning.

翻译：GPT-3等极大型预先培训语言模型(PTMs)通常作为一种服务发布。它允许用户设计特定任务提示, 通过一些黑盒 API 来询问 PTM 。在这样的情景中, 我们称之为语言模型( Model- as- a- service ), PTM 的梯度通常不存在。我们能否通过只访问模型推断 API 来优化任务提示? 本文建议黑盒调控框架, 通过无衍生物优化优化来优化输入文本的连续快速预知。我们不是在传统的高维快速空间优化, 而是因为大型 PTM 的低内在维度造成的随机生成子空间进行优化。实验结果显示, 与 RoBERTa 的黑盒在少数标签样本上的调整不仅大大超越了手法速度, GPT-3 在文字上学习, 也超过了基于梯度的对应方, 即, 快速调整和完全模型调整。