Electronic health records (EHRs) linked with familial relationship data offer a unique opportunity to investigate the genetic architecture of complex phenotypes at scale. However, existing heritability and coheritability estimation methods often fail to account for the intricacies of familial correlation structures, heterogeneity across phenotype types, and computational scalability. We propose a robust and flexible statistical framework for jointly estimating heritability and genetic correlation among continuous and binary phenotypes in EHR-based family studies. Our approach builds on multi-level latent variable models to decompose phenotypic covariance into interpretable genetic and environmental components, incorporating both within- and between-family variations. We derive iteration algorithms based on generalized equation estimations (GEE) for estimation. Simulation studies under various parameter configurations demonstrate that our estimators are consistent and yield valid inference across a range of realistic settings. Applying our methods to real-world EHR data from a large, urban health system, we identify significant genetic correlations between mental health conditions and endocrine/metabolic phenotypes, supporting hypotheses of shared etiology. This work provides a scalable and rigorous framework for coheritability analysis in high-dimensional EHR data and facilitates the identification of shared genetic influences in complex disease networks.
翻译:电子健康记录(EHRs)与家族关系数据相结合,为大规模研究复杂表型的遗传结构提供了独特机遇。然而,现有的遗传力与共遗传力估计方法往往未能充分考虑家族相关结构的复杂性、表型类型的异质性以及计算可扩展性。本文提出一种稳健且灵活的统计框架,用于在基于EHR的家族研究中联合估计连续型和二分类表型的遗传力与遗传相关性。该方法基于多层次潜变量模型,将表型协方差分解为可解释的遗传与环境组分,同时纳入家族内与家族间变异。我们推导了基于广义估计方程(GEE)的迭代算法进行参数估计。多种参数配置下的模拟研究表明,我们的估计量具有一致性,并在现实场景范围内提供有效推断。将本方法应用于某大型城市医疗系统的真实世界EHR数据,我们发现了心理健康状况与内分泌/代谢表型间显著的遗传相关性,支持了其共享病因学的假设。本研究为高维EHR数据的共遗传性分析提供了可扩展且严谨的框架,有助于识别复杂疾病网络中的共享遗传影响。