TLDR: AwareCompiler is an agentic framework for compiler optimization that uses a combined knowledge-data-driven approach to enhance program performance. It addresses challenges like semantic misalignment, inefficient agent-compiler interaction, and reward sparsity through structured knowledge integration, adaptive pass generation, and a hybrid training pipeline. Experiments show it significantly outperforms existing methods in code size reduction and generates more reliable optimizations.
Compiler optimization is a critical process in modern computing, aiming to improve how programs perform by transforming their underlying code. Imagine a program as a set of instructions; compiler optimization rearranges and refines these instructions to make the program run faster, use less memory, or be more efficient. Historically, this has been a complex task, often relying on human experts or predefined rules, which can be time-consuming and not always adaptable to new challenges.
Recently, large language models (LLMs) have shown promise in automating software optimization. These advanced AI models can understand and generate code, and even suggest optimization strategies. However, they face significant hurdles. One major issue is “semantic misalignment,” where the LLM might suggest plausible but incorrect optimization steps because it doesn’t fully grasp the intricate relationship between abstract program ideas and concrete optimization actions. Another challenge is inefficient interaction with compiler environments, often leading to trial-and-error approaches. Lastly, the vast number of possible optimization choices makes it hard for LLMs to get clear feedback, a problem known as “reward sparsity.”
Introducing AwareCompiler
A new framework called AwareCompiler has been introduced to tackle these challenges. It’s an agentic system, meaning it acts autonomously, designed for compiler optimization. AwareCompiler uses a unique “synergistic knowledge-data-driven” approach, combining structured knowledge with empirical data to make more intelligent and context-aware optimization decisions. This allows the system to generate effective optimization sequences dynamically.
AwareCompiler’s innovations are built on three pillars:
- Structured Knowledge Integration and Dataset Construction: The framework builds a comprehensive knowledge base that bridges the gap between how programs are represented and how optimizations work. This knowledge base includes empirical knowledge (patterns from past optimizations), symbolic knowledge (rules about how optimization passes depend on or conflict with each other), and negative knowledge (sequences that cause problems). Alongside this, a high-quality dataset is created to train the agent, capturing program features, reasoning processes, optimal pass sequences, and their effects.
- Knowledge-driven Adaptive Pass Generation: AwareCompiler empowers its agents with “context awareness.” It extracts critical features from a program, like instruction count or memory access patterns. Then, it retrieves relevant information from its knowledge base, using a ranking system to find the most suitable optimization strategies. Finally, it generates an optimal sequence of optimization passes that minimizes code size while adhering to all necessary constraints.
- Data-driven Hybrid Training Pipeline: The system is trained in two stages. First, Supervised Fine-Tuning (SFT) teaches the model basic optimization patterns and how to interact with the knowledge base. Second, Reinforcement Learning (RL) refines the agent’s decision-making by exploring different optimization paths and receiving rewards. This reward system is carefully designed to encourage correct formatting, valid optimization passes, and actual performance improvements, effectively addressing the reward sparsity problem.
Also Read:
- ECO: Guiding Code-LLMs to Write Faster, More Efficient Code
- TyFlow: Guiding Language Models to Master Type Correctness in Code Generation
Performance and Impact
Extensive experiments on standard benchmarks demonstrate that AwareCompiler significantly outperforms existing methods, including traditional heuristic optimizations and other LLM-assisted approaches, in reducing code size. For instance, it achieved reductions comparable to expert-level optimizations and showed significant improvements over models like GPT-5 and DeepSeek-V3, even with smaller model sizes. This success highlights its ability to perform dynamic, context-aware optimizations, generating valid and effective pass sequences for complex tasks without relying on brute-force methods.
AwareCompiler also boasts a high success rate in generating valid optimization passes, particularly in benchmarks like CBench and CHSTONE, indicating its effectiveness in overcoming the semantic misalignment challenge. An ablation study further confirmed that both the knowledge-driven reasoning and the data-driven training pipeline are crucial for its superior performance; removing either component led to a noticeable drop in effectiveness.
A case study illustrated AwareCompiler’s agentic workflow: when initial heuristic attempts failed to improve performance, the agent consulted its knowledge base. The knowledge base suggested a specific pass, which, when integrated, led to a 3.2% improvement. This demonstrates how the framework can adapt and learn from failures, leveraging its integrated knowledge to achieve better results.
In conclusion, AwareCompiler represents a significant step forward in automated compiler optimization. By synergistically combining structured knowledge with data-driven learning, it addresses long-standing challenges in the field, paving the way for more efficient, flexible, and automated compiler architectures in the future. For more details, you can refer to the original research paper. Read the full paper here.


