We developed MLHO (pronounced as melo), an end-to-end Machine Learning framework that leverages iterative feature and algorithm selection to predict Health Outcomes. MLHO implements iterative sequential representation mining, and feature and model selection, for predicting the patient-level risk of hospitalization, ICU admission, need for mechanical ventilation, and death. It bases this prediction on data from patients' past medical records (before their COVID-19 infection). MLHO's architecture enables a parallel and outcome-oriented model calibration, in which different statistical learning algorithms and vectors of features are simultaneously tested to improve the prediction of health outcomes. Using clinical and demographic data from a large cohort of over 13,000 COVID-19-positive patients, we modeled the four adverse outcomes utilizing about 600 features representing patients' pre-COVID health records and demographics. The mean AUC ROC for mortality prediction was 0.91, while the prediction performance ranged between 0.80 and 0.81 for the ICU, hospitalization, and ventilation. We broadly describe the clusters of features that were utilized in modeling and their relative influence for predicting each outcome. Our results demonstrated that while demographic variables (namely age) are important predictors of adverse outcomes after a COVID-19 infection, the incorporation of the past clinical records are vital for a reliable prediction model. As the COVID-19 pandemic unfolds around the world, adaptable and interpretable machine learning frameworks (like MLHO) are crucial to improve our readiness for confronting the potential future waves of COVID-19, as well as other novel infectious diseases that may emerge.
翻译:我们开发了MLHO(称为Melo),这是一个利用迭代功能和算法选择来预测健康结果的端到端机器学习框架。MLHO实施迭代连续代代代表性挖掘以及特征和模型选择,以预测病人住院、ICU住院、机械通风和死亡的风险;根据病人过去医疗记录的数据(在感染COVID-19之前)进行这一预测;MLHO的架构可以进行平行和面向结果的模型校准,在模型中同时测试不同的前方统计算法和特征矢量,以改进对健康结果的预测。利用大量13 000名COVID-19阳性病人的临床和人口数据,我们用代表病人前COVI健康记录和人口学的大约600个特征来模拟四种不利结果。AUC 用于死亡率预测的平均值为0.91,而对于ICU、住院和通风的新模型的预测表现介于0.80和0.81之间。我们广泛描述在模型中所使用的各种特征的组合,以及它们对于预测每项结果的相对影响。我们用临床-19阳性病人的临床结果的临床数据数据数据数据数据数据模型的预测表明,一个关键的CVI作为重要的历史变量的预测结果的可靠,而对于过去的COVI的预测则是一个关键的CVLVLVA的预测,一个重要的历史变现成,对于过去的重要的精确的预测结果是:CRV的预测结果,作为主要的CRV的结果。