Little research explores the correlation between the expressive ability and generalization ability of the low-rank adaptation (LoRA). Sharpness-Aware Minimization (SAM) improves model generalization for both Convolutional Neural Networks (CNNs) and Transformers by encouraging convergence to locally flat minima. However, the connection between sharpness and generalization has not been fully explored for LoRA due to the lack of tools to either empirically seek flat minima or develop theoretical methods. In this work, we propose Flat Minima LoRA (FMLoRA) and its efficient version, i.e., EFMLoRA, to seek flat minima for LoRA. Concretely, we theoretically demonstrate that perturbations in the full parameter space can be transferred to the low-rank subspace. This approach eliminates the potential interference introduced by perturbations across multiple matrices in the low-rank subspace. Our extensive experiments on large language models and vision-language models demonstrate that EFMLoRA achieves optimize efficiency comparable to that of LoRA while simultaneously attaining comparable or even better performance. For example, on the GLUE dataset with RoBERTa-large, EFMLoRA outperforms LoRA and full fine-tuning by 1.0% and 0.5% on average, respectively. On vision-language models, e.g., Qwen-VL-Chat, there are performance improvements of 1.5% and 1.0% on the SQA and VizWiz datasets, respectively. These empirical results also verify that the generalization of LoRA is closely related to sharpness, which is omitted by previous methods.
翻译:目前关于低秩适应(LoRA)的表达能力与泛化能力之间关联的研究尚不充分。锐度感知最小化(SAM)通过促使模型收敛至局部平坦最小值,提升了卷积神经网络(CNN)和Transformer模型的泛化性能。然而,由于缺乏实证寻求平坦最小值或发展理论方法的工具,LoRA中锐度与泛化之间的关系尚未得到充分探索。本研究提出平坦最小值LoRA(FMLoRA)及其高效版本EFMLoRA,旨在为LoRA寻求平坦最小值。具体而言,我们从理论上证明全参数空间中的扰动可转移至低秩子空间。该方法消除了低秩子空间中多个矩阵间扰动可能引入的干扰。我们在大型语言模型和视觉语言模型上进行的大量实验表明,EFMLoRA在达到与LoRA相当的优化效率的同时,获得了可比甚至更优的性能。例如,在RoBERTa-large的GLUE数据集上,EFMLoRA平均分别超越LoRA和全参数微调1.0%和0.5%。在视觉语言模型(如Qwen-VL-Chat)上,SQA和VizWiz数据集的性能分别提升1.5%和1.0%。这些实证结果也验证了LoRA的泛化能力与锐度密切相关,而这一关联被先前方法所忽略。