This study investigates the attribution patterns underlying Chain-of-Thought (CoT) reasoning in multilingual LLMs. While prior works demonstrate the role of CoT prompting in improving task performance, there are concerns regarding the faithfulness and interpretability of the generated reasoning chains. To assess these properties across languages, we applied two complementary attribution methods--ContextCite for step-level attribution and Inseq for token-level attribution--to the Qwen2.5 1.5B-Instruct model using the MGSM benchmark. Our experimental results highlight key findings such as: (1) attribution scores excessively emphasize the final reasoning step, particularly in incorrect generations; (2) structured CoT prompting significantly improves accuracy primarily for high-resource Latin-script languages; and (3) controlled perturbations via negation and distractor sentences reduce model accuracy and attribution coherence. These findings highlight the limitations of CoT prompting, particularly in terms of multilingual robustness and interpretive transparency.
翻译:本研究探讨了多语言大语言模型中思维链推理背后的归因模式。尽管先前的研究已证明思维链提示在提升任务性能方面的作用,但关于生成推理链的忠实性与可解释性仍存在疑虑。为评估这些特性在不同语言间的表现,我们采用两种互补的归因方法——用于步骤级归因的ContextCite和用于词元级归因的Inseq——在Qwen2.5 1.5B-Instruct模型上使用MGSM基准进行了实验。实验结果揭示了以下关键发现:(1) 归因分数过度强调最终推理步骤,尤其在错误生成中更为明显;(2) 结构化思维链提示显著提高了高资源拉丁文字语言的准确性;(3) 通过否定句和干扰句进行的受控扰动降低了模型的准确性与归因一致性。这些发现凸显了思维链提示的局限性,特别是在多语言鲁棒性与解释透明度方面。