Decision making often occurs in the presence of incomplete information, leading to the under- or overestimation of risk. Leveraging the observable information to learn the complete information is called nowcasting. In practice, incomplete information is often a consequence of reporting or observation delays. In this paper, we propose an expectation-maximisation (EM) framework for nowcasting that uses machine learning techniques to model both the occurrence as well as the reporting process of events. We allow for the inclusion of covariate information specific to the occurrence and reporting periods as well as characteristics related to the entity for which events occurred. We demonstrate how the maximisation step and the information flow between EM iterations can be tailored to leverage the predictive power of neural networks and (extreme) gradient boosting machines (XGBoost). With simulation experiments, we show that we can effectively model both the occurrence and reporting of events when dealing with high-dimensional covariate information. In the presence of non-linear effects, we show that our methodology outperforms existing EM-based nowcasting frameworks that use generalised linear models in the maximisation step. Finally, we apply the framework to the reporting of Argentinian Covid-19 cases, where the XGBoost-based approach again is most performant.
翻译:决策制定常常面临信息不完整的情况,导致风险的低估或高估。利用可观测信息来推断完整信息的过程称为即时预测。在实践中,信息不完整通常源于报告或观测延迟。本文提出了一种基于期望最大化(EM)框架的即时预测方法,该方法利用机器学习技术对事件的发生过程及报告过程进行建模。我们允许纳入特定于事件发生期和报告期的协变量信息,以及与事件发生实体相关的特征。我们展示了如何定制最大化步骤及EM迭代间的信息流,以充分利用神经网络和(极端)梯度提升机(XGBoost)的预测能力。通过仿真实验,我们证明在处理高维协变量信息时,能够有效建模事件的发生与报告过程。在存在非线性效应的情况下,我们的方法在性能上优于现有在最大化步骤中使用广义线性模型的基于EM的即时预测框架。最后,我们将该框架应用于阿根廷新冠病例的报告数据,其中基于XGBoost的方法再次表现出最优性能。