Enhancing API Creation with AI: A New Approach Combining Prompt Optimization and Reinforcement Learning

TLDR: APRIL is a novel AI-driven approach for synthesizing software APIs that integrates Large Language Models (LLMs) with Automatic Prompt Optimization (APO) and Reinforcement Learning from Verifiable Rewards (RLVR). APO refines prompts for LLMs, while RLVR fine-tunes the models based on objective test outcomes, ensuring functional correctness. Evaluated on 81 real-world scientific Python APIs, APRIL achieved a 93.8% success rate in generating correct, test-passing APIs, significantly outperforming baseline LLMs. The method also leverages an AI agent (Gemini CLI) for automated test suite generation, streamlining the entire synthesis and validation process.

Modern software development heavily relies on Application Programming Interfaces (APIs), but creating new ones from vast libraries can be a daunting task. Traditional methods often involve extensive exploration and detailed, hand-crafted specifications. While large language models (LLMs) have shown promise in generating code from natural language, they frequently suffer from issues like generating incorrect code (hallucinations) and lacking up-to-date contextual information.

A new approach called APRIL (API Synthesis with Automatic Prompt Optimization and Reinforcement Learning) aims to overcome these challenges. Developed by researchers at The University of Texas at Austin, APRIL integrates LLM-based synthesis with two key techniques: Automatic Prompt Optimization (APO) and Reinforcement Learning from Verifiable Rewards (RLVR). This combination creates an efficient and robust pipeline for API synthesis.

APO works by iteratively refining the prompts given to a frozen LLM. Think of it as automatically finding the best way to ask the LLM to generate the correct code, without changing the LLM itself. RLVR, on the other hand, fine-tunes the LLM’s policy directly towards functional correctness. It uses objective, programmatically checkable signals, such as the outcomes of unit tests and static analysis diagnostics, to provide feedback. This means the model learns to produce code that actually works and passes tests, rather than just code that looks syntactically plausible.

The APRIL methodology addresses several critical aspects of API synthesis. It requires a test oracle (a comprehensive test suite to validate the API), a method signature (specifying parameters and return types), a library of components (from which the API is assembled), and a set of input-output examples. A significant innovation in APRIL is the use of an AI agent, specifically Gemini CLI, to automatically generate these crucial test suites. This agent iteratively refines the tests based on feedback until they are comprehensive and of high quality.

The researchers evaluated APRIL on 81 real-world APIs from widely used scientific Python libraries, including NumPy, SciPy, and scikit-learn. When benchmarked against instruction-tuned LLMs guided by expert prompts, APRIL demonstrated substantial improvements. The results showed that all LLM-generated APIs built and executed successfully (100% executability). More impressively, 93.8% of these APIs passed their full validation suites, indicating a high level of functional correctness.

Compared to a baseline model using a manually engineered prompt, APRIL achieved significant improvements in success rates across all benchmarks: 16.6% for NumPy, 14.9% for scikit-learn, and 16.7% for SciPy. This highlights the effectiveness of integrating APO and RLVR in enhancing the LLM’s ability to synthesize accurate and reliable APIs.

The study also assessed the test-generation capability of Gemini CLI, finding that it generated an average of 8.1 tests per API and required only about 2.2 iterations to converge to a comprehensive test suite. This underscores Gemini CLI’s potential as an enabling component for automated API synthesis pipelines and rigorous evaluation workflows.

Also Read:

In conclusion, APRIL offers a practical and scalable path for component-based API synthesis in large libraries. By combining the power of LLMs with automatic prompt optimization and reinforcement learning from verifiable rewards, it significantly reduces reliance on exhaustive search and manual component specification, making API creation more efficient and reliable. You can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Enhancing API Creation with AI: A New Approach Combining Prompt Optimization and Reinforcement Learning

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates