Recent advancements in Large language models (LLMs) have demonstrated remarkable capabilities across diverse domains. While they exhibit strong zero-shot performance on various tasks, LLMs' effectiveness in music-related applications remains limited due to the relatively small proportion of music-specific knowledge in their training data. To address this limitation, we propose MusT-RAG, a comprehensive framework based on Retrieval Augmented Generation (RAG) to adapt general-purpose LLMs for text-only music question answering (MQA) tasks. RAG is a technique that provides external knowledge to LLMs by retrieving relevant context information when generating answers to questions. To optimize RAG for the music domain, we (1) propose MusWikiDB, a music-specialized vector database for the retrieval stage, and (2) utilizes context information during both inference and fine-tuning processes to effectively transform general-purpose LLMs into music-specific models. Our experiment demonstrates that MusT-RAG significantly outperforms traditional fine-tuning approaches in enhancing LLMs' music domain adaptation capabilities, showing consistent improvements across both in-domain and out-of-domain MQA benchmarks. Additionally, our MusWikiDB proves substantially more effective than general Wikipedia corpora, delivering superior performance and computational efficiency.
翻译:近年来,大型语言模型(LLMs)在多个领域展现出卓越的能力。尽管在各种任务中表现出强大的零样本性能,但由于训练数据中音乐相关知识占比较小,LLMs在音乐相关应用中的有效性仍有限。为克服这一局限,我们提出MusT-RAG——一个基于检索增强生成(RAG)的综合性框架,旨在将通用LLMs适配于纯文本音乐问答(MQA)任务。RAG是一种通过检索相关上下文信息为LLMs提供外部知识以生成答案的技术。为优化音乐领域的RAG应用,我们(1)提出MusWikiDB,一个专为检索阶段设计的音乐专用向量数据库;(2)在推理与微调过程中均利用上下文信息,以有效将通用LLMs转化为音乐专用模型。实验表明,MusT-RAG在提升LLMs音乐领域适应能力方面显著优于传统微调方法,在领域内和领域外MQA基准测试中均表现出持续改进。此外,我们的MusWikiDB被证明比通用维基百科语料库更为有效,在性能与计算效率上均具优势。