TLDR: GEAK is an AMD framework that uses AI agents and Large Language Models (LLMs) to automatically generate highly efficient GPU kernels in the Triton language for AMD Instinct GPUs. It significantly outperforms direct LLM prompting and other methods, achieving higher correctness and speedup by employing an iterative refinement process with Generator, Evaluator, Reflector, and Optimizer modules. The framework also introduces new, robust benchmarks for evaluating AI-generated GPU code and has been open-sourced to foster community collaboration.
The world of artificial intelligence (AI) is constantly evolving, and with it, the demand for highly efficient and specialized software that can run on powerful Graphics Processing Units (GPUs). As AI workloads become more complex and diverse, there’s a growing need to automate the creation of low-level GPU programs, known as kernels, to ensure top-notch performance and productivity.
Traditionally, developing these kernels requires significant manual effort and expert knowledge to optimize them for specific hardware. However, major tech companies and research institutions are now heavily investing in AI-driven code generation for GPUs, aiming to reduce this manual work while achieving performance levels comparable to human experts.
One language that has gained popularity for generating such AI-powered kernels is Triton. It’s a Python-based language designed for GPU programming, striking a good balance between performance and ease of coding.
Introducing GEAK: AMD’s Innovative AI Agent
In this landscape, Advanced Micro Devices, Inc. (AMD) has introduced a groundbreaking framework called GEAK (Generating Efficient AI-centric GPU Kernels). GEAK is an agent-based system that leverages cutting-edge Large Language Models (LLMs) to automatically generate high-performing Triton code specifically for AMD GPUs, including the powerful AMD Instinctâ„¢ MI300X and MI250.
GEAK stands out because it uses a sophisticated reasoning loop, inspired by Reflexion-style feedback mechanisms, to refine the generated code. This means the AI agent doesn’t just generate code once; it iteratively improves it based on evaluation and reflection.
How GEAK Works: A Modular Approach
The GEAK system is built with four core modules that work together in a pipeline:
- Generator: This module creates the initial code based on a user’s request and any relevant context.
- Evaluator: It tests the generated code for correctness and performance. If the code fails, it provides error traces.
- Reflector: This module analyzes failed code and error traces to identify issues and suggest solutions, feeding this feedback back to the Generator.
- Optimizer: For functionally correct code, the Optimizer module devises strategies to enhance its performance, focusing on speed and efficiency.
To further boost its capabilities, GEAK incorporates several techniques. It uses ‘1-shot prompting’ by providing a similar, existing Triton code example to guide the LLM. ‘Knowledge Injection’ enhances the prompt with domain-specific information about writing efficient Triton kernels and hardware specifications. The ‘Reflexion’ module enables self-correction and iterative refinement, while an ‘LLM as Optimizer’ component helps identify performance improvements. GEAK also includes a ‘Debugging Trap’ mechanism to prevent the agent from getting stuck on persistent bugs and employs ‘Parallel Scaling’ by running multiple instances of GEAK simultaneously to generate diverse and potentially better code.
Performance and Benchmarks
The researchers evaluated GEAK using two benchmark suites: a revised version of TritonBench (TritonBench-revised) and a new set of real-world kernels from open-source AMD ROCm repositories, called the ROCm Triton Benchmark. These benchmarks measure three key aspects:
- Call Accuracy: How often the generated kernels compile and run without errors.
- Execution Accuracy: The percentage of kernels that pass all unit tests.
- Speedup: How much faster the AI-generated kernels run compared to reference kernels.
The results are impressive. GEAK significantly outperformed direct prompting of state-of-the-art LLMs (like GPT-4.1, Gemini 2.5 Pro, and Claude 3.7 Sonnet). While direct prompting often yielded less than 15% correctness, GEAK achieved up to 54.89% execution accuracy on TritonBench-revised and 63.33% on the ROCm Triton Benchmark. Furthermore, GEAK-generated kernels demonstrated an average speedup of up to 2.59 times over their reference counterparts.
A detailed study on a specific kernel, ‘test_triton_flip.py’ from the ROCm Triton Benchmark, showed GEAK’s generated code achieved a 2.26x speedup. This was attributed to GEAK’s optimized memory access patterns, better memory efficiency, explicit masking, and coalesced memory access, which reduced memory bandwidth usage and register pressure compared to expert-written code.
The study also highlighted the benefits of increasing computational resources during inference. Both sequential scaling (more iterations of refinement) and parallel scaling (running multiple GEAK instances) led to substantial improvements in accuracy and performance, demonstrating the framework’s flexibility and robustness across different hardware platforms.
Also Read:
- AI-Driven Heuristic Discovery for Enhanced SAT Solver Performance
- CodeEvo: Enhancing Code Generation LLMs Through Agent Interaction and Smart Feedback
Conclusion and Future Outlook
GEAK represents a significant step forward in automating the generation of efficient GPU kernels. By combining advanced LLMs with a structured, agent-based framework, it iteratively refines code for both correctness and performance without needing additional training. The introduction of new, robust benchmarks further solidifies the evaluation of AI-generated GPU code.
AMD has open-sourced the GEAK agent implementation and evaluation framework, inviting the open-source community to contribute and accelerate the development of GPU kernels. This initiative aims to foster innovation and collaboration, ultimately improving the efficiency of training and inference for large-scale AI models. You can find more details in the original research paper.


