spot_img
HomeResearch & DevelopmentEricsson's Journey into AI-Powered Code Review Automation

Ericsson’s Journey into AI-Powered Code Review Automation

TLDR: Ericsson has developed and evaluated a lightweight automated code review tool leveraging Large Language Models (LLMs) and static program analysis. The system extracts the ‘enclosing method’ of modified code lines to provide context to the LLM, which then generates concise and relevant reviews. Preliminary evaluations with expert developers show promising results in reducing cognitive burden and improving efficiency, despite some areas for improvement. The project aims to integrate seamlessly into existing workflows and has a future roadmap including advanced prompting, RAG, and multi-agent systems.

Code review is a cornerstone of software quality assurance, alongside testing and static analysis. However, it often demands significant time and expertise from senior developers, creating a bottleneck in the development lifecycle and diverting them from primary tasks like writing new features and fixing bugs. Recognizing this challenge, Ericsson has explored the use of Large Language Models (LLMs) to automate the code review process.

In their recent work, Ericsson describes their experience in developing a lightweight tool that combines LLMs with static program analysis. The goal is to alleviate the cognitive burden on experienced developers by providing timely and consistent feedback as code is committed to version control systems like Gerrit and Git.

A Lightweight Approach to Automated Review

Unlike some approaches that require extensive and costly pre-training or fine-tuning of LLMs, Ericsson opted for a more agile, lightweight method. Their solution focuses on intelligently preparing the input for the LLM. When a developer modifies Java code, the tool uses static program analysis (specifically, the Tree-Sitter parser) to identify the ‘enclosing method’ – the specific function or method that contains the changed lines. This contextual information is crucial for the LLM to generate relevant and accurate reviews.

The team experimented with various prompting strategies, moving beyond simple requests like “Please generate a code review for the following code.” They found that effective prompts needed to ensure reviews were concise, human-like, focused on the enclosing method (to prevent ‘hallucinations’ of irrelevant code elements), and avoided generating new code in the output. Post-processing steps were also implemented to refine the LLM’s output, including summarizing and ranking reviews, and validating them with human experts to recalibrate prompts.

Practical considerations were central to their design. The tool aims to generate reviews that are relevant, concise, and accurate, while being fast, cost-efficient, and easy to integrate into existing development workflows. Security, logging feedback for continuous improvement, and incorporating human validation were also key aspects.

The Automated Code Review Pipeline

The process begins by extracting the latest code changes from the Gerrit API. For each change, the system identifies the modified files and their diffs. The critical step then involves extracting the enclosing Java function for each diff, providing the necessary context. This contextualized code snippet is then fed to an LLM, such as Code Llama, with a suitable prompt. Finally, the LLM’s generated review undergoes post-processing, which includes presenting, saving, and summarizing the feedback. The tool is integrated into developers’ workflows via a web-based user interface and a plugin for Visual Studio Code.

Evaluating the Solution

Ericsson conducted surveys with experienced developers to evaluate their automated code review system. The evaluation addressed three key questions:

  • How good is the code review generation by the LLMs? Experts reviewed LLM-generated feedback for 10 Java code snippets. While there were positive comments (e.g., appreciating suggestions for meaningful variable names), there were also neutral and negative remarks. Negative feedback highlighted issues like incorrect variable types, irrelevant or incorrect reviews, missing abstractions, or excessive verbosity.
  • Which LLM produced the best review? In a pairwise comparison, experts compared reviews from different smaller LLM models (Llama 2 13B, Code Llama 13B, Llama 2 7B, Code Llama 7B). Preliminary results suggested that the Code Llama 13B model performed better than the others.
  • How good is the code review tool in practice? Nine expert developers used the tool for fifteen days. Four out of nine agreed the tool saved them time and improved their overall coding efficiency. Common criticisms included reviews merely explaining the code, being factually incorrect, or focusing on irrelevant areas. Usage frequency varied, with two developers using it regularly and five sometimes.

Additional experiments showed the LLM could find relatively easy logical bugs in adversarial prompting scenarios and generally stuck to commenting only on changed lines. The time taken for LLMs to generate reviews was consistently around 5-6 seconds, regardless of snippet length.

Also Read:

Looking Ahead

Ericsson’s study demonstrates a practical, lightweight approach to automated code review using LLMs that can be integrated into existing development systems without expensive fine-tuning or reliance on external black-box tools. The initial results are promising, validating that the method can reduce redundant feedback and enhance usability.

The research is ongoing, with a roadmap for future improvements. This includes expanding user surveys, experimenting with new LLMs and advanced prompting strategies like zero-shot, few-shot, and chain-of-thought. Future phases will explore Retrieval-Augmented Generation (RAG) and Graph-RAG to provide more context from documentation and past reviews, and even develop a multi-agent framework where specialized AI agents handle different review tasks, continuously learning from feedback. The ultimate goal is seamless integration with various internal software engineering tools at Ericsson. You can read the full research paper here.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -