Electroencephalography (EEG) provides a non-invasive, highly accessible, and cost-effective approach for detecting Alzheimer's disease (AD). However, existing methods, whether based on handcrafted feature engineering or standard deep learning, face two major challenges: 1) the lack of large-scale EEG-AD datasets for robust representation learning, and 2) the absence of a dedicated deep learning pipeline for subject-level detection, which is more clinically meaningful than the commonly used sample-level detection. To address these gaps, we have curated the world's largest EEG-AD corpus to date, comprising 2,255 subjects. Leveraging this unique data corpus, we propose LEAD, the first large-scale foundation model for EEG analysis in dementia. Our approach provides an innovative framework for subject-level AD detection, including: 1) a comprehensive preprocessing pipeline such as artifact removal, resampling, and filtering, and a newly proposed multi-scale segmentation strategy, 2) a subject-regularized spatio-temporal transformer trained with a novel subject-level cross-entropy loss and an indices group-shuffling algorithm, and 3) AD-guided contrastive pre-training. We pre-train on 12 datasets (3 AD-related and 9 non-AD) and fine-tune/test on 4 AD datasets. Compared with 10 baselines, LEAD consistently obtains superior subject-level detection performance under the challenging subject-independent cross-validation protocol. On the benchmark ADFTD dataset, our model achieves an impressive subject-level Sensitivity of 90.91% under the leave-one-subject-out (LOSO) setting. These results strongly validate the effectiveness of our method for real-world EEG-based AD detection. Source code: https://github.com/DL4mHealth/LEAD
翻译:暂无翻译