Program synthesis has emerged as a successful approach to the image parsing task. Most prior works rely on a two-step scheme involving supervised pretraining of a Seq2Seq model with synthetic programs followed by reinforcement learning (RL) for fine-tuning with real reference images. Fully unsupervised approaches promise to train the model directly on the target images without requiring curated pretraining datasets. However, they struggle with the inherent sparsity of meaningful programs in the search space. In this paper, we present the first unsupervised algorithm capable of parsing constructive solid geometry (CSG) images into context-free grammar (CFG) without pretraining via non-differentiable renderer. To tackle the \emph{non-Markovian} sparse reward problem, we combine three key ingredients -- (i) a grammar-encoded tree LSTM ensuring program validity (ii) entropy regularization and (iii) sampling without replacement from the CFG syntax tree. Empirically, our algorithm recovers meaningful programs in large search spaces (up to $3.8 \times 10^{28}$). Further, even though our approach is fully unsupervised, it generalizes better than supervised methods on the synthetic 2D CSG dataset. On the 2D computer aided design (CAD) dataset, our approach significantly outperforms the supervised pretrained model and is competitive to the refined model.
翻译:作为图像分析任务的成功方法,大多数先前的工作都依赖于一个两步制计划,即对Seq2Seq2Seqeq模型进行有监督的预先培训,先用合成程序对Seq2Sequ 模型进行预先培训,然后用真实的参考图像进行微调。完全不受监督的方法承诺直接对目标图像进行模型培训,而不需要经过整理的预培训数据集。然而,它们与在搜索空间中有意义的程序固有的广度抗争。在本文中,我们展示了第一个能够将建设性固态(CSG)图像分解为无背景的语法(CFG)的不受监督的预培训,然后通过非区别的成型软件进行预培训。为了解决emph{non-marn-Markovian}稀薄的奖励问题,我们将三个关键要素(一) 语法编码的LSTM树确保程序的有效性(二) 调制和(三) 取样时不替换CFG合成数据树的模型。 活性,我们的算法在大型搜索空间中恢复了有意义的程序(最高为3.8\timed-DSqmed surviewd d) 数据。