TLDR: A new research paper introduces NeuComBack, a benchmark dataset for evaluating LLM-based IR-to-assembly compilation. It also proposes a self-evolving prompt optimization method that allows LLMs to learn from their self-debugging traces, significantly improving the functional correctness and performance of generated assembly code across x86_64 and aarch64 architectures, often outperforming traditional compilers like clang-O3.
Compilers are the unsung heroes of software, translating the human-readable code we write into instructions that computers can understand. However, these essential systems are incredibly complex and demand extensive human expertise to build and maintain. Imagine a world where creating compilers for new computer architectures or discovering groundbreaking optimization techniques was much simpler. This is the promise of Neural Compilation, a new approach leveraging the power of Large Language Models (LLMs).
Neural Compilation aims to simplify compiler development by using LLMs to directly translate high-level code or intermediate representations (IR) into low-level assembly code. This method offers two significant advantages: it can drastically reduce the time and effort needed to create compilers for new Instruction Set Architectures (ISAs), and it can uncover novel optimization strategies by processing code as text, understanding its semantics in a way traditional compilers might not.
Despite its exciting potential, Neural Compilation faces significant hurdles. One major challenge is the lack of dedicated benchmarks and robust evaluation methods to objectively measure progress. Another is consistently improving the reliability and performance of the assembly code generated by LLMs.
Addressing these critical issues, a new research paper introduces NeuComBack, a novel benchmark dataset specifically designed for the IR-to-assembly compilation task. This dataset, derived from existing benchmarks like ExeBench and TSVC, provides a diverse set of programs to systematically evaluate the fundamental compilation and optimization capabilities of LLMs. NeuComBack is divided into two levels: Level 1 for fundamental compilation correctness and Level 2 for assessing optimization potential, particularly with complex loop structures.
The researchers also propose a groundbreaking self-evolving prompt optimization method called QiMeng-NeuComBack. This innovative technique enables LLMs to iteratively refine their internal prompt strategies by learning from their past self-debugging attempts. Essentially, the LLM analyzes its own errors and successful corrections, extracting insights to improve how it generates assembly code in the future. This process involves an offline learning stage where prompts evolve, and an online inference stage where these refined prompts are used for generation and optimization.
Experiments with state-of-the-art LLMs, including DeepSeek-R1, demonstrated the effectiveness of this new approach. On the x86_64 architecture, the functional correctness rates of LLM-generated assembly code improved significantly, from 44% to 64%. For aarch64, correctness rose from 36% to 58%. Even more impressively, among the correctly generated x86_64 programs using this method, a remarkable 87.5% surpassed the performance of code compiled by clang-O3, a highly optimized traditional compiler. These consistent improvements across different architectures and program types validate the method’s superiority and its potential for widespread adoption in low-level neural compilation.
Also Read:
- Advanced LLM Jailbreaking: Co-Evolving Prompts and Evaluation for Robustness
- Unveiling AI’s Research Prowess: A New Benchmark for LLM Agents
The paper highlights that LLMs can achieve these superior optimizations by reducing instruction counts and leveraging vector instructions, as seen in case studies like functions s452 and s332. The learned prompts themselves incorporate detailed rules covering formatting, syntax, and semantics, guiding the LLM towards more accurate and performant code generation. This research marks a significant step forward in making compilers more accessible and efficient through the power of artificial intelligence. You can read the full research paper for more details: QiMeng-NeuComBack: Self-Evolving Translation from IR to Assembly Code.


