We introduce DiffKnock, a diffusion-based knockoff framework for high-dimensional feature selection with finite-sample false discovery rate (FDR) control. DiffKnock addresses two key limitations of existing knockoff methods: preserving complex feature dependencies and detecting non-linear associations. Our approach trains diffusion models to generate valid knockoffs and uses neural network--based gradient and filter statistics to construct antisymmetric feature importance measures. Through simulations, we showed that DiffKnock achieved higher power than autoencoder-based knockoffs while maintaining target FDR, indicating its superior performance in scenarios involving complex non-linear architectures. Applied to murine single-cell RNA-seq data of LPS-stimulated macrophages, DiffKnock identifies canonical NF-$\kappa$B target genes (Ccl3, Hmox1) and regulators (Fosb, Pdgfb). These results highlight that, by combining the flexibility of deep generative models with rigorous statistical guarantees, DiffKnock is a powerful and reliable tool for analyzing single-cell RNA-seq data, as well as high-dimensional and structured data in other domains.
翻译:暂无翻译