Quantum computing is an emerging field recognized for the significant speedup it offers over classical computing through quantum algorithms. However, designing and implementing quantum algorithms pose challenges due to the complex nature of quantum mechanics and the necessity for precise control over quantum states. Despite the significant advancements in AI, there has been a lack of datasets specifically tailored for this purpose. In this work, we introduce QCircuitBench, the first benchmark dataset designed to evaluate AI's capability in designing and implementing quantum algorithms using quantum programming languages. Unlike using AI for writing traditional codes, this task is fundamentally more complicated due to highly flexible design space. Our key contributions include: 1. A general framework which formulates the key features of quantum algorithm design for Large Language Models. 2. Implementations for quantum algorithms from basic primitives to advanced applications, spanning 3 task suites, 25 algorithms, and 120,290 data points. 3. Automatic validation and verification functions, allowing for iterative evaluation and interactive reasoning without human inspection. 4. Promising potential as a training dataset through preliminary fine-tuning results. We observed several interesting experimental phenomena: LLMs tend to exhibit consistent error patterns, and fine-tuning does not always outperform few-shot learning. In all, QCircuitBench is a comprehensive benchmark for LLM-driven quantum algorithm design, and it reveals limitations of LLMs in this domain.
翻译:量子计算是一个新兴领域,因其通过量子算法相比经典计算提供的显著加速而备受认可。然而,由于量子力学的复杂性以及对量子态精确控制的必要性,设计和实现量子算法带来了挑战。尽管人工智能取得了显著进展,但专门为此目的定制的数据集一直缺乏。在本工作中,我们引入了QCircuitBench,这是首个旨在评估人工智能使用量子编程语言设计和实现量子算法能力的基准数据集。与使用人工智能编写传统代码不同,由于设计空间高度灵活,此任务从根本上更为复杂。我们的主要贡献包括:1. 一个为大型语言模型制定量子算法设计关键特征的通用框架。2. 从基本原语到高级应用的量子算法实现,涵盖3个任务套件、25种算法和120,290个数据点。3. 自动验证和确认功能,允许无需人工检查的迭代评估和交互式推理。4. 通过初步微调结果展现出作为训练数据集的巨大潜力。我们观察到几个有趣的实验现象:大型语言模型倾向于表现出一致的错误模式,且微调并不总是优于少样本学习。总之,QCircuitBench是一个用于大型语言模型驱动的量子算法设计的全面基准,并揭示了大型语言模型在该领域的局限性。