RAMP: A Multi-Agent Framework for Automated Program Repair in Ruby

TLDR: RAMP is a novel, lightweight, multi-agent framework for Automated Program Repair (APR) in Ruby. It uses collaborative agents for feedback-driven, iterative bug fixing, generating tests, reflecting on errors, and refining solutions without relying on large multilingual databases or costly fine-tuning. RAMP achieves a 67% pass@1 on the XCodeEval benchmark for Ruby, outperforming existing methods, converging quickly, and proving effective for ‘wrong answer’, ‘compilation’, and ‘runtime’ errors, establishing a new foundation for LLM-based debugging in under-studied languages.

Software development often involves the time-consuming and error-prone task of debugging and fixing bugs. While traditional Automated Program Repair (APR) methods exist, the rise of Large Language Models (LLMs) has opened new avenues for more flexible and context-aware solutions. However, many LLM-based APR approaches are computationally expensive, require extensive fine-tuning, or focus on a limited set of programming languages, often overlooking languages like Ruby.

Ruby, despite its widespread use in web development and the persistent debugging challenges faced by its developers, has received little attention in APR research. Addressing this gap, a new framework called RAMP (Ruby Automated Multi-agent Program repair) has been introduced. RAMP is a lightweight, feedback-driven system that approaches program repair as an iterative process specifically for Ruby.

What is RAMP and How Does it Work?

RAMP distinguishes itself by avoiding reliance on large multilingual repair databases or costly fine-tuning. Instead, it operates directly on Ruby code using lightweight prompting and test-driven feedback. The framework employs a team of collaborative agents, each with a specialized role, to generate targeted tests, reflect on errors, and refine candidate fixes until a correct solution is found. This multi-agent workflow allows for deeper semantic reasoning while remaining cost-efficient.

The core of RAMP’s methodology involves an iterative loop coordinated by four specialized agents:

Feedback Integrator Agent: This agent initiates the process by hypothesizing the potential cause of a bug in natural language. It also updates this reflection based on execution traces and error logs during subsequent iterations, guiding the repair process.
Test Designer Agent: Responsible for generating a compact yet diverse set of guiding test cases (basic, edge, and large-scale inputs). These tests are crucial for evaluating candidate repairs and providing feedback without the computational expense of running a large benchmark suite.
Programmer Agent: This agent generates candidate repair programs. It receives the problem context, buggy code, and prior reflections, and is prompted to reason about the bug before proposing a fix. It iteratively refines solutions based on feedback.
Test Executor Agent: A non-LLM component, this Python script executes the candidate Ruby code against the generated test cases. It captures outputs, exceptions, and exit statuses, providing verdicts and traces that inform the Feedback Integrator Agent.

The process continues until a candidate repair passes all generated tests or an iteration budget is exhausted. Only then is the solution validated against hidden benchmark tests.

Performance and Key Insights

Evaluated on the XCodeEval benchmark, RAMP achieved a pass@1 score of 67% on Ruby, significantly outperforming prior approaches like LANTERN (61.7%) and other prompting baselines. A notable aspect of RAMP is its rapid convergence, often finding solutions within five iterations. Ablation studies confirmed that both test generation and self-reflection are critical drivers of its performance, especially for models like DeepSeekCoder.

RAMP proved particularly effective at repairing programs that initially produced ‘WRONG_ANSWER’ (68.5% repaired), ‘COMPILATION_ERROR’ (66.7% repaired), and ‘RUNTIME_ERROR’ (60.4% repaired). However, it struggled more with resource-related failures like ‘TIME_LIMIT_EXCEEDED’.

When analyzing performance across different problem categories, RAMP achieved perfect success on problems tagged with ‘geometry’ and ‘strings’. It also showed strong performance on ‘brute force’, ‘dynamic programming (dp)’, ‘math’, ‘games’, and ‘graphs’. Conversely, it faced challenges with advanced or niche categories such as ‘binary search’, ‘bitmasks’, ‘matrices’, and ‘graph matchings’, which often require highly precise reasoning and domain-specific knowledge.

Also Read:

Practicality and Future Directions

The framework demonstrates a strong balance between accuracy and computational efficiency, offering a practical solution for Ruby APR. Its design also allows for relatively easy adaptation to other programming languages by simply swapping the executor and updating few-shot examples. For instance, RAMP has shown promising results on C++ as well.

The introduction of RAMP provides new insights into multi-agent repair strategies and lays a foundation for extending LLM-based debugging tools to under-studied languages. Future research aims to further enhance domain-specific reasoning and improve the reliability of the generated tests to strengthen RAMP’s iterative repair loop. For more details, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

RAMP: A Multi-Agent Framework for Automated Program Repair in Ruby

What is RAMP and How Does it Work?

Performance and Key Insights

Practicality and Future Directions

Gen AI News and Updates

Oracle Unveils ‘Ask Oracle’ Chatbot for Personalized Redwood Experience, Powered by Advanced Select AI

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

MAKER System Achieves Million-Step LLM Task with Perfect Accuracy

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates