We consider the problem of actively learning an unknown binary decision tree using only membership queries, a setting in which the learner must reason about a large hypothesis space while maintaining formal guarantees. Rather than enumerating candidate trees or relying on heuristic impurity or entropy measures, we encode the entire space of bounded-depth decision trees symbolically in SAT formulas. We propose a symbolic method for active learning of decision trees, in which approximate model counting is used to estimate the reduction of the hypothesis space caused by each potential query, enabling near-optimal query selection without full model enumeration. The resulting learner incrementally strengthens a CNF representation based on observed query outcomes, and approximate model counter ApproxMC is invoked to quantify the remaining version space in a sound and scalable manner. Additionally, when ApproxMC stagnates, a functional equivalence check is performed to verify that all remaining hypotheses are functionally identical. Experiments on decision trees show that the method reliably converges to the correct model using only a handful of queries, while retaining a rigorous SAT-based foundation suitable for formal analysis and verification.
翻译:本文研究仅使用成员查询主动学习未知二叉决策树的问题,该场景要求学习者在保持形式化保证的同时处理庞大的假设空间。我们未采用枚举候选树或依赖启发式不纯度/熵度量的方法,而是将有限深度决策树的整个空间符号化编码为SAT公式。提出一种决策树主动学习的符号化方法:通过近似模型计数估计每个潜在查询引起的假设空间缩减量,从而在无需完全枚举模型的情况下实现近似最优的查询选择。该学习器基于观测到的查询结果逐步强化CNF表示,并调用近似模型计数器ApproxMC以可靠且可扩展的方式量化剩余版本空间。此外,当ApproxMC陷入停滞时,会执行功能等价性检验以验证所有剩余假设是否功能相同。在决策树上的实验表明,该方法仅需少量查询即可稳定收敛至正确模型,同时保留了适用于形式化分析与验证的严格SAT理论基础。