Psychiatric comorbidity is clinically significant yet challenging due to the complexity of multiple co-occurring disorders. To address this, we develop a novel approach integrating synthetic patient electronic medical record (EMR) construction and multi-agent diagnostic dialogue generation. We create 502 synthetic EMRs for common comorbid conditions using a pipeline that ensures clinical relevance and diversity. Our multi-agent framework transfers the clinical interview protocol into a hierarchical state machine and context tree, supporting over 130 diagnostic states while maintaining clinical standards. Through this rigorous process, we construct PsyCoTalk, the first large-scale dialogue dataset supporting comorbidity, containing 3,000 multi-turn diagnostic dialogues validated by psychiatrists. This dataset enhances diagnostic accuracy and treatment planning, offering a valuable resource for psychiatric comorbidity research. Compared to real-world clinical transcripts, PsyCoTalk exhibits high structural and linguistic fidelity in terms of dialogue length, token distribution, and diagnostic reasoning strategies. Licensed psychiatrists confirm the realism and diagnostic validity of the dialogues. This dataset enables the development and evaluation of models capable of multi-disorder psychiatric screening in a single conversational pass.
翻译:精神疾病共病具有重要的临床意义,但由于多种疾病同时发生的复杂性,其诊断极具挑战性。为解决这一问题,我们开发了一种整合合成患者电子病历构建与多智能体诊断对话生成的新方法。我们通过一个确保临床相关性和多样性的流程,为常见共病状况创建了502份合成电子病历。我们的多智能体框架将临床访谈协议转化为分层状态机和上下文树,支持超过130种诊断状态,同时保持临床标准。通过这一严谨流程,我们构建了PsyCoTalk——首个支持共病研究的大规模对话数据集,包含3000个经精神科医生验证的多轮诊断对话。该数据集提升了诊断准确性和治疗规划能力,为精神疾病共病研究提供了宝贵资源。与真实临床记录相比,PsyCoTalk在对话长度、词汇分布和诊断推理策略方面展现出高度的结构和语言保真度。持证精神科医生确认了对话的真实性和诊断有效性。该数据集支持开发和评估能够在单次对话中实现多障碍精神疾病筛查的模型。