TLDR: APRIL is a novel AI-driven approach for synthesizing software APIs that integrates Large Language Models (LLMs) with Automatic Prompt Optimization (APO) and Reinforcement Learning from Verifiable Rewards (RLVR). APO refines prompts for LLMs, while RLVR fine-tunes the models based on objective test outcomes, ensuring functional correctness. Evaluated on 81 real-world scientific Python APIs, APRIL achieved a 93.8% success rate in generating correct, test-passing APIs, significantly outperforming baseline LLMs. The method also leverages an AI agent (Gemini CLI) for automated test suite generation, streamlining the entire synthesis and validation process.
Modern software development heavily relies on Application Programming Interfaces (APIs), but creating new ones from vast libraries can be a daunting task. Traditional methods often involve extensive exploration and detailed, hand-crafted specifications. While large language models (LLMs) have shown promise in generating code from natural language, they frequently suffer from issues like generating incorrect code (hallucinations) and lacking up-to-date contextual information.
A new approach called APRIL (API Synthesis with Automatic Prompt Optimization and Reinforcement Learning) aims to overcome these challenges. Developed by researchers at The University of Texas at Austin, APRIL integrates LLM-based synthesis with two key techniques: Automatic Prompt Optimization (APO) and Reinforcement Learning from Verifiable Rewards (RLVR). This combination creates an efficient and robust pipeline for API synthesis.
APO works by iteratively refining the prompts given to a frozen LLM. Think of it as automatically finding the best way to ask the LLM to generate the correct code, without changing the LLM itself. RLVR, on the other hand, fine-tunes the LLM’s policy directly towards functional correctness. It uses objective, programmatically checkable signals, such as the outcomes of unit tests and static analysis diagnostics, to provide feedback. This means the model learns to produce code that actually works and passes tests, rather than just code that looks syntactically plausible.
The APRIL methodology addresses several critical aspects of API synthesis. It requires a test oracle (a comprehensive test suite to validate the API), a method signature (specifying parameters and return types), a library of components (from which the API is assembled), and a set of input-output examples. A significant innovation in APRIL is the use of an AI agent, specifically Gemini CLI, to automatically generate these crucial test suites. This agent iteratively refines the tests based on feedback until they are comprehensive and of high quality.
The researchers evaluated APRIL on 81 real-world APIs from widely used scientific Python libraries, including NumPy, SciPy, and scikit-learn. When benchmarked against instruction-tuned LLMs guided by expert prompts, APRIL demonstrated substantial improvements. The results showed that all LLM-generated APIs built and executed successfully (100% executability). More impressively, 93.8% of these APIs passed their full validation suites, indicating a high level of functional correctness.
Compared to a baseline model using a manually engineered prompt, APRIL achieved significant improvements in success rates across all benchmarks: 16.6% for NumPy, 14.9% for scikit-learn, and 16.7% for SciPy. This highlights the effectiveness of integrating APO and RLVR in enhancing the LLM’s ability to synthesize accurate and reliable APIs.
The study also assessed the test-generation capability of Gemini CLI, finding that it generated an average of 8.1 tests per API and required only about 2.2 iterations to converge to a comprehensive test suite. This underscores Gemini CLI’s potential as an enabling component for automated API synthesis pipelines and rigorous evaluation workflows.
Also Read:
- Advancing Software Verification: New Tools for Large-Scale Codebases
- Interpreting AI’s Code: How PROTOCODE Improves LLM Performance and Transparency
In conclusion, APRIL offers a practical and scalable path for component-based API synthesis in large libraries. By combining the power of LLMs with automatic prompt optimization and reinforcement learning from verifiable rewards, it significantly reduces reliance on exhaustive search and manual component specification, making API creation more efficient and reliable. You can read the full research paper here.


