Modeling real-world phenomena is a focus of many science and engineering efforts, such as ecological modeling and financial forecasting, to name a few. Building an accurate model for complex and dynamic systems improves understanding of underlying processes and leads to resource efficiency. Towards this goal, knowledge-driven modeling builds a model based on human expertise, yet is often suboptimal. At the opposite extreme, data-driven modeling learns a model directly from data, requiring extensive data and potentially generating overfitting. We focus on an intermediate approach, model revision, in which prior knowledge and data are combined to achieve the best of both worlds. In this paper, we propose a genetic model revision framework based on tree-adjoining grammar (TAG) guided genetic programming (GP), using the TAG formalism and GP operators in an effective mechanism to incorporate prior knowledge and make data-driven revisions in a way that complies with prior knowledge. Our framework is designed to address the high computational cost of evolutionary modeling of complex systems. Via a case study on the challenging problem of river water quality modeling, we show that the framework efficiently learns an interpretable model, with higher modeling accuracy than existing methods.
翻译:模拟现实世界现象是许多科学和工程努力的重点,例如生态模型和财务预测,等等。为复杂和动态系统建立一个精确的模型可以增进对基本过程的了解,并导致资源效率。为了实现这一目标,知识驱动模型可以建立基于人类专门知识的模型,但往往并不理想。在相反极端的情况下,数据驱动模型直接从数据中学习模型,需要广泛的数据并有可能产生过度的效应。我们侧重于一种中间方法,即模型修订,将先前的知识和数据结合起来,以达到两个世界的最佳目的。在这份文件中,我们提出了一个基于植树边际语法(TAG)指导基因方案(GP)的遗传模型修订框架,利用TAG正规主义和GP操作者的有效机制,纳入先前的知识,并以与先前知识相符的方式进行数据驱动修改。我们的框架旨在解决复杂系统演变模型的计算成本高的问题。通过对河流水质模型的挑战性问题进行个案研究,我们表明,该框架有效地学习了一种可解释的模式,比现有的更精确性更高的模型。