Physicochemically informed biological sequence generation has the potential to accelerate computer-aided cellular therapy, yet current models fail to \emph{jointly} ensure novelty, diversity, and biophysical plausibility when designing variable regions of T-cell receptors (TCRs). We present \textbf{PhysicoGPTCR}, a large generative protein Transformer that is \emph{dual-conditioned} on peptide and HLA context and trained to autoregressively synthesise TCR sequences while embedding residue-level physicochemical descriptors. The model is optimised on curated TCR--peptide--HLA triples with a maximum-likelihood objective and compared against ANN, GPTCR, LSTM, and VAE baselines. Across multiple neoantigen benchmarks, PhysicoGPTCR substantially improves edit-distance, similarity, and longest-common-subsequence scores, while populating a broader region of sequence space. Blind in-silico docking and structural modelling further reveal a higher proportion of binding-competent clones than the strongest baseline, validating the benefit of explicit context conditioning and physicochemical awareness. Experimental results demonstrate that dual-conditioned, physics-grounded generative modelling enables end-to-end design of functional TCR candidates, reducing the discovery timeline from months to minutes without sacrificing wet-lab verifiability.
翻译:暂无翻译