Symbolic Regression (SR) is a type of regression analysis to automatically find the mathematical expression that best fits the data. Currently, SR still basically relies on various searching strategies so that a sample-specific model is required to be optimized for every expression, which significantly limits the model's generalization and efficiency. Inspired by the fact that human beings can infer a mathematical expression based on the curve of it, we propose Symbolic Expression Transformer (SET), a sample-agnostic model from the perspective of computer vision for SR. Specifically, the collected data is represented as images and an image caption model is employed for translating images to symbolic expressions. A large-scale dataset without overlap between training and testing sets in the image domain is released. Our results demonstrate the effectiveness of SET and suggest the promising direction of image-based model for solving the challenging SR problem.
翻译:符号回归(SR)是一种回归分析,可以自动找到最适合数据的数学表达式。目前,斯洛伐克共和国基本上仍然依赖各种搜索战略,因此每个表达式都需要优化样本特有模型,这大大限制了模型的概括性和效率。受人类能够根据模型曲线推断数学表达式的启发,我们提议了符号表达式变异器(SET),这是斯洛伐克从计算机视角看的样本-不可知性模型。具体地说,所收集的数据代表为图像,用图像说明模型将图像转换为符号表达式。一个大型数据集,在图像域的培训和测试组之间没有重叠。我们的结果显示了SET的有效性,并提出了基于图像的模型解决具有挑战性的SR问题的有希望的方向。