Multi-trait genome-wide association studies (GWAS) use multi-variate statistical methods to identify associations between genetic variants and multiple correlated traits simultaneously, and have higher statistical power than independent univariate analyses of traits. Reverse regression, where genotypes of genetic variants are regressed on multiple traits simultaneously, has emerged as a promising approach to perform multi-trait GWAS in high-dimensional settings where the number of traits exceeds the number of samples. We analyzed different machine learning methods (ridge regression, naive Bayes/independent univariate, random forests and support vector machines) for reverse regression in multi-trait GWAS, using genotypes, gene expression data and ground-truth transcriptional regulatory networks from the DREAM5 SysGen Challenge and from a cross between two yeast strains to evaluate methods. We found that genotype prediction performance, in terms of root mean squared error (RMSE), allowed to distinguish between genomic regions with high and low transcriptional activity. Moreover, model feature coefficients correlated with the strength of association between variants and individual traits, and were predictive of true trans-eQTL target genes, with complementary findings across methods. Code to reproduce the analysis is available at https://github.com/michoel-lab/Reverse-Pred-GWAS
翻译:基因变异基因基因在多种特性上同时退退退的反向回归,在特征数量超过样本数量的高维环境中,基因变异基因在多端基因变异基因群中进行多端基因变异研究(GWAS)使用多变量统计方法(山脊回归、天真贝亚斯/独立单方体、随机森林和支助矢量机器)确定遗传变异和多重相关特性之间的关联,并比独立的单方分析特性的特性具有更高的统计能力。 反向回归,基因变异基因基因基因的基因类型在多重特性上同时退退退退的基因变型基因变异基因变异变异基因的遗传回归,以及两种酵变异变变变变变变变变变变变变变变变变变种之间的交叉评估方法,我们发现,基因变变变变变变变变变变变变变变变变变变变种/变变变变变变变变变变变变变种的模型和变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变的变变变变变变变变变变变变变变变变变变变变变变变变变变变变的变的变变变变变变变变变后的变变变变变变变的变变变变变变变变变变变变后的变变变变变变变变的变的变变变的变变变变后的变后的变变后的变式方法法方法,以及变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变后变后变后变后