In zeroth-order optimization, we seek to minimize a function $d(\cdot)$, which may encode combinatorial feasibility, using only function evaluations. We focus on the setting where solutions must also satisfy qualitative constraints or conform to a complex prior distribution. To address this, we introduce a new framework in which such constraints are represented by an initial generative prior $Ł(\cdot)$, for example, a Large Language Model (LLM). The objective is to find solutions $s$ that minimize $d(s)$ while having high probability under $Ł(s)$, effectively sampling from a target distribution proportional to $Ł(s) \cdot e^{-T \cdot d(s)}$ for a temperature parameter $T$. While this framework aligns with classical Model-Based Optimization (e.g., the Cross-Entropy method), existing theory is ill-suited for deriving sample complexity bounds in black-box deep generative models. We therefore propose a novel learning assumption, which we term \emph{coarse learnability}, where an agent with access to a polynomial number of samples can learn a model whose point-wise density approximates the target within a polynomial factor. Leveraging this assumption, we design an iterative algorithm that employs a Metropolis-Hastings correction to provably approximate the target distribution using a polynomial number of samples. To the best of our knowledge, this is one of the first works to establish such sample-complexity guarantees for model-based optimization with deep generative priors. We provide two lines of evidence supporting the coarse learnability assumption. Theoretically, we show that maximum likelihood estimation naturally induces the required coverage properties, holding for both standard exponential families and for misspecified models. Empirically, we demonstrate that LLMs can adapt their learned distributions to zeroth-order feedback to solve combinatorial optimization problems.
翻译:在零阶优化中,我们仅通过函数评估来最小化可能编码组合可行性的函数 $d(\\cdot)$。我们关注解必须同时满足定性约束或符合复杂先验分布的场景。为此,我们提出一个新框架,其中此类约束由初始生成先验 $Ł(\\cdot)$ 表示,例如大型语言模型(LLM)。目标在于寻找解 $s$,在最小化 $d(s)$ 的同时,使 $Ł(s)$ 下的概率较高,从而有效采样于与 $Ł(s) \\cdot e^{-T \\cdot d(s)}$ 成比例的目标分布($T$ 为温度参数)。尽管该框架与经典基于模型的优化(如交叉熵方法)一致,但现有理论难以适用于推导黑盒深度生成模型的样本复杂度边界。因此,我们提出一种新的学习假设,称为\\emph{粗粒度可学习性},即代理在多项式数量样本下可学习一个模型,其逐点密度能以多项式因子近似目标分布。基于此假设,我们设计了一种迭代算法,采用 Metropolis-Hastings 校正,可证明使用多项式数量样本近似目标分布。据我们所知,这是首批为基于深度生成先验的模型优化建立此类样本复杂度保证的工作之一。我们提供两方面证据支持粗粒度可学习性假设:理论上,我们证明最大似然估计自然诱导所需的覆盖性质,适用于标准指数族模型和误设模型;实证上,我们展示 LLM 能通过学习分布适应零阶反馈,以解决组合优化问题。