spot_img
HomeResearch & DevelopmentLinkAnchor: An Autonomous Agent for Smarter Software Issue Tracking

LinkAnchor: An Autonomous Agent for Smarter Software Issue Tracking

TLDR: LinkAnchor is a new autonomous LLM-based agent designed to accurately link software issues to their resolving code commits. It overcomes limitations of previous methods by dynamically accessing relevant data (commit history, issue comments, code) without exceeding token limits, and by directly pinpointing the correct commit instead of evaluating all possible pairs. LinkAnchor requires no task-specific training, outperforms state-of-the-art approaches by a significant margin, and demonstrates strong generalizability and cost-effectiveness in real-world scenarios.

In the world of software development, keeping track of changes and fixes is crucial for efficient project management and maintaining software quality. One significant challenge is accurately linking reported issues, like bugs or feature requests, to the specific code changes (commits) that resolve them. This process, known as Issue-to-Commit Link Recovery (ILR), is often difficult, with studies showing that less than half of issues are correctly linked to their corresponding commits on platforms like GitHub.

Traditional methods for ILR, many of which rely on artificial intelligence and machine learning, face several hurdles. A major limitation is their inability to process all available information, such as extensive commit histories, detailed issue comments, or large code repositories, due to constraints like limited context windows in models. Furthermore, many existing approaches evaluate issue-commit pairs individually, which becomes highly impractical for large software projects with thousands of commits.

Addressing these challenges, a new autonomous agent called LinkAnchor has been introduced. LinkAnchor is the first of its kind to use a large language model (LLM) to tackle the issue-to-commit link recovery problem. Its innovative ‘lazy-access’ architecture allows the underlying LLM to dynamically retrieve only the most relevant contextual data, such as commit details, issue discussions, and code files, without being overwhelmed by too much information. This means it can access a rich context of software development data without hitting token limits.

LinkAnchor also stands out because it can automatically pinpoint the target commit that resolves an issue, rather than having to exhaustively score every possible candidate commit. This makes it much more efficient for real-world repositories. The agent is designed to be a ready-to-use tool, initially tested for GitHub and Jira, and easily extendable to other platforms.

The core of LinkAnchor’s functionality lies in its ability to grant the LLM on-demand access to various project data sources through specialized function calls. These functions are categorized into Git functions (for commit history and details), Issue functions (for issue titles, descriptions, and comments), Codebase functions (for exploring code definitions and documentation), and Control functions (for managing the interaction flow). For instance, the LLM can ask for commits by a specific author or inspect lines of code at a particular point in time.

A notable advantage of LinkAnchor is that it does not require task-specific training, as it is built on a pre-trained, general-purpose LLM like ChatGPT-4o-nano. This makes it immune to inaccuracies often found in manually generated training datasets used by other methods. By framing ILR as a search problem, LinkAnchor avoids the need to evaluate every single commit, significantly reducing computational overhead.

Evaluations show that LinkAnchor significantly outperforms state-of-the-art ILR approaches, achieving improvements of 60% to 262% in Hit@1 scores across various case study projects. Even when compared against other models’ Hit@10 scores (meaning the correct commit is found within the top 10 predictions), LinkAnchor’s single prediction often performs better. This robust and consistent performance across different projects highlights its adaptability to varying project contexts, unlike methods that rely on fixed feature sets.

LinkAnchor’s generalizability was further demonstrated by testing it on new, unseen data from 120 randomly selected GitHub issues resolved after the LLM’s training cut-off date. It successfully linked 107 of these issues, achieving an impressive 89% accuracy. This indicates its strong real-world utility and adaptability across diverse codebases and programming languages like Python and Go.

From a cost perspective, LinkAnchor is also efficient. The median time to link an issue to its resolving commit was found to be 23 seconds, consuming approximately 115,000 tokens, which translates to about 0.01 US dollars per issue. This is significantly faster and more practical than traditional methods that might require hours of training and lengthy inference times for large repositories.

Also Read:

The development of LinkAnchor offers valuable insights for future research in LLM-based agents. Its success underscores the importance of on-demand access to data, scalable context handling through pagination and feedback pruning, and a balanced approach between deterministic functions and LLM autonomy. LinkAnchor is publicly available as a ready-to-use tool, and its replication package can be found here.

Nikhil Patel
Nikhil Patelhttps://blogs.edgentiq.com
Nikhil Patel is a tech analyst and AI news reporter who brings a practitioner's perspective to every article. With prior experience working at an AI startup, he decodes the business mechanics behind product innovations, funding trends, and partnerships in the GenAI space. Nikhil's insights are sharp, forward-looking, and trusted by insiders and newcomers alike. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -