We initiate the study of active learning algorithms for classifying strategic agents. Active learning is a well-established framework in machine learning in which the learner selectively queries labels, often achieving substantially higher accuracy and efficiency than classical supervised methods-especially in settings where labeling is costly or time-consuming, such as hiring, admissions, and loan decisions. Strategic classification, however, addresses scenarios where agents modify their features to obtain more favorable outcomes, resulting in observed data that is not truthful. Such manipulation introduces challenges beyond those in learning from clean data. Our goal is to design active and noise-tolerant algorithms that remain effective in strategic environments-algorithms that classify strategic agents accurately while issuing as few label requests as possible. The central difficulty is to simultaneously account for strategic manipulation and preserve the efficiency gains of active learning. Our main result is an algorithm for actively learning linear separators in the strategic setting that preserves the exponential improvement in label complexity over passive learning previously obtained only in the non-strategic case. Specifically, for data drawn uniformly from the unit sphere, we show that a modified version of the Active Perceptron algorithm [DKM05,YZ17] achieves excess error $ε$ using only $\tilde{O}(d \ln \frac{1}ε)$ label queries and incurs at most $\tilde{O}(d \ln \frac{1}ε)$ additional mistakes relative to the optimal classifier, even in the nonrealizable case, when a $\tildeΩ(ε)$ fraction of inputs have inconsistent labels with the optimal classifier. The algorithm is computationally efficient and, under these distributional assumptions, requires substantially fewer label queries than prior work on strategic Perceptron [ABBN21].
翻译:我们首次针对策略性智能体的分类问题展开主动学习算法的研究。主动学习是机器学习中一个成熟的框架,学习者在此框架下有选择性地查询标签,通常能比传统监督方法获得显著更高的准确率和效率——特别是在标注成本高昂或耗时的场景中,例如招聘、录取和贷款决策。然而,策略性分类处理的是智能体为获得更有利结果而修改其特征的场景,这导致观测数据并非真实数据。此类操纵带来的挑战超出了从干净数据中学习的范畴。我们的目标是设计在策略性环境中仍保持有效的主动且抗噪声的算法——这些算法能在尽可能减少标签请求的同时,准确分类策略性智能体。核心难点在于同时考虑策略性操纵并保持主动学习的效率增益。我们的主要成果是一种在策略性设置下主动学习线性分类器的算法,该算法保持了在标签复杂度上相对于被动学习的指数级改进,而此前这一改进仅在非策略性案例中实现。具体而言,对于从单位球面上均匀抽取的数据,我们证明了改进版的主动感知机算法[DKM05,YZ17]仅使用$\tilde{O}(d \ln \frac{1}ε)$次标签查询即可达到$ε$的过剩误差,并且相对于最优分类器最多产生$\tilde{O}(d \ln \frac{1}ε)$次额外错误,即使在不可实现的情况下(当有$\tildeΩ(ε)$比例的输入与最优分类器的标签不一致时)也是如此。该算法计算高效,并且在上述分布假设下,所需的标签查询数量显著少于先前关于策略感知机的研究[ABBN21]。