QSpark: Enhancing Qiskit Code Generation with Advanced AI

TLDR: QSpark is a new AI-powered assistant designed to generate more reliable Qiskit quantum code. Developed by researchers at Toronto Metropolitan University, it fine-tunes a 32-billion-parameter language model using two reinforcement learning methods, GRPO and ORPO. These methods help the model learn from human-like preferences and execution performance. QSpark significantly outperforms other general-purpose and specialized models on the Qiskit HumanEval benchmark, particularly for basic and intermediate quantum programming tasks, making quantum computing more accessible and efficient.

Quantum computing holds immense promise for solving complex problems, but programming these advanced machines remains a significant challenge. Even with high-level frameworks like IBM’s Qiskit, developing correct and optimized quantum programs requires specialized expertise, making it an error-prone process. While large language models (LLMs) have revolutionized classical software development, applying them to quantum programming introduces unique hurdles due to distinct languages, libraries, and the scarcity of training data.

Addressing this gap, researchers from Toronto Metropolitan University — Kiana Kheiri, Aamna Aamir, Andriy Miranskyy, and Chen Ding — have introduced QSpark, a Qiskit-based quantum computing coding assistant. This innovative tool aims to make quantum programming more accessible and efficient by leveraging advanced AI techniques. The details of their work can be found in their research paper, QSpark: Towards Reliable Qiskit Code Generation.

The core of QSpark is a 32-billion-parameter large language model, Qwen2.5-Coder-32B, which was fine-tuned using two distinct reinforcement learning (RL) methods: Group Relative Policy Optimization (GRPO) and Odds-Ratio Preference Optimization (ORPO). These methods were chosen to refine the model’s behavior and improve the quality of the generated quantum code.

To train QSpark, the team meticulously created a high-quality dataset of 522 Qiskit programming tasks. This dataset was built through an automated pipeline that involved collecting Qiskit code samples, extracting relevant functions, annotating them, validating their correctness through simulation-based unit tests, and deduplicating entries. Each task was assigned a difficulty level—Basic, Intermediate, or Advanced—based on factors like circuit depth, gate complexity, and the use of quantum-specific concepts.

The two reinforcement learning strategies, GRPO and ORPO, target different aspects of quantum code quality. ORPO focuses on aligning the model with human-like coding preferences, emphasizing readability and maintainability. It learns from pairwise comparisons where a ‘chosen’ (preferred) code output is favored over a ‘rejected’ (suboptimal) one. GRPO, on the other hand, improves execution fidelity by ranking multiple candidate code outputs generated for a given prompt. It assigns rewards based on simulation results, guiding the model to produce more executable and resource-efficient quantum circuits.

QSpark’s performance was rigorously evaluated using the Qiskit HumanEval (QHE) benchmark. The results demonstrated significant improvements over existing models. ORPO achieved a Pass@1 accuracy of 56.29%, outperforming the specialized Granite-8B-QK model by nearly 10 percentage points and surpassing all general-purpose LLMs. GRPO also performed strongly, achieving 49.00% Pass@1, exceeding all general-purpose models. Interestingly, both GRPO and ORPO also showed strong generalization on the original HumanEval benchmark, suggesting that preference optimization can enhance general code generation capabilities.

When analyzing performance by difficulty level, GRPO excelled in basic tasks, successfully passing 42 out of 54. ORPO showed superior performance on intermediate tasks, passing 41 out of 68. However, neither model, nor any of the baselines, managed to solve the five advanced tasks, highlighting the persistent challenges in complex quantum reasoning.

The researchers acknowledge several challenges, including inconsistencies in benchmark datasets and the absence of publicly released evaluation scripts, which necessitated the development of their own benchmarking tools. Despite these hurdles, QSpark’s practical utility was validated under realistic run-time conditions. The ongoing work aims to integrate GRPO and ORPO into a unified reward framework, broaden the training dataset, and develop more robust, automated evaluation pipelines to support consistent testing and comparison in the evolving field of quantum LLM research.

Also Read:

Ultimately, QSpark represents a significant step towards making quantum programming more accessible and reliable, bridging the gap between advanced AI and the complex demands of quantum software development.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

QSpark: Enhancing Qiskit Code Generation with Advanced AI

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates