Understanding how protein mutations affect protein structure is essential for advancements in computational biology and bioinformatics. We introduce PRIMRose, a novel approach that predicts energy values for each residue given a mutated protein sequence. Unlike previous models that assess global energy shifts, our method analyzes the localized energetic impact of double amino acid insertions or deletions (InDels) at the individual residue level, enabling residue-specific insights into structural and functional disruption. We implement a Convolutional Neural Network architecture to predict the energy changes of each residue in a protein mutation. We train our model on datasets constructed from nine proteins, grouped into three categories: one set with exhaustive double InDel mutations, another with approximately 145k randomly sampled double InDel mutations, and a third with approximately 80k randomly sampled double InDel mutations. Our model achieves high predictive accuracy across a range of energy metrics as calculated by the Rosetta molecular modeling suite and reveals localized patterns that influence model performance, such as solvent accessibility and secondary structure context. This per-residue analysis offers new insights into the mutational tolerance of specific regions within proteins and provides higher interpretable and biologically meaningful predictions of InDels' effects.
翻译:理解蛋白质突变如何影响蛋白质结构对于计算生物学和生物信息学的发展至关重要。我们提出PRIMRose,一种新颖的方法,能够在给定突变蛋白质序列的情况下预测每个残基的能量值。与先前评估全局能量变化的模型不同,我们的方法在单个残基水平上分析双氨基酸插入或缺失(InDels)的局部能量影响,从而实现对结构和功能破坏的残基特异性洞察。我们采用卷积神经网络架构来预测蛋白质突变中每个残基的能量变化。我们在基于九种蛋白质构建的数据集上训练模型,这些数据集分为三类:第一类包含详尽的双InDel突变,第二类包含约145k个随机采样的双InDel突变,第三类包含约80k个随机采样的双InDel突变。我们的模型在Rosetta分子建模套件计算的一系列能量指标上实现了高预测精度,并揭示了影响模型性能的局部模式,如溶剂可及性和二级结构背景。这种每个残基的分析为蛋白质特定区域的突变耐受性提供了新的见解,并对InDels效应提供了更高可解释性和生物学意义的预测。