Despite the impressive generative abilities of black-box large language models (LLMs), their inherent opacity hinders further advancements in capabilities such as reasoning, planning, and personalization. Existing works aim to enhance LLM capabilities via domain-specific adaptation, which require additional training on accessible model parameters, an infeasible option for black-box LLMs. To address this challenge, we introduce Matryoshka Pilot (M-Pilot), a lightweight white-box LLM controller that guides a large-scale black-box LLM generator by decomposing complex tasks into a series of intermediate outputs. Specifically, we consider the black-box LLM as an environment, with M-Pilot serving as a policy to provide intermediate guidance through prompts for driving the black-box LLM. M-Pilot is trained to pivot the outputs of the black-box LLM aligning with preferences during iterative interaction, which enables controllable multi-turn generation and self-improvement in optimizing intermediate guidance. Empirical evaluations on diverse tasks demonstrate that our method effectively enhances the capabilities of black-box LLMs in complex, long-horizon tasks. Our code is publicly available at: https://github.com/lichangh20/Matryoshka.
翻译:尽管黑盒大语言模型(LLMs)展现出卓越的生成能力,但其固有的不透明性阻碍了在推理、规划与个性化等能力上的进一步发展。现有研究多通过领域特定适配来增强LLM能力,这需要对可访问的模型参数进行额外训练,而这对黑盒LLM并不可行。为应对这一挑战,我们提出套娃引导器(M-Pilot),一种轻量级白盒LLM控制器,通过将复杂任务分解为一系列中间输出来指导大规模黑盒LLM生成器。具体而言,我们将黑盒LLM视为环境,M-Pilot则作为策略,通过提示提供中间指导以驱动黑盒LLM。M-Pilot经过训练,可在迭代交互过程中调整黑盒LLM的输出以符合偏好,从而实现可控的多轮生成及优化中间指导的自我改进。在多样化任务上的实证评估表明,我们的方法能有效提升黑盒LLM在复杂长程任务中的能力。代码已公开于:https://github.com/lichangh20/Matryoshka。