spot_img
HomeResearch & DevelopmentA Two-Stage Approach to Enhance LLM Code Translation Accuracy...

A Two-Stage Approach to Enhance LLM Code Translation Accuracy and Readability

TLDR: F2STRANS is a novel two-stage framework designed to improve code translation by Large Language Models (LLMs). It first focuses on ‘functional learning’ to ensure the correctness of translated code by using high-quality, functionally consistent code pairs. Subsequently, ‘style learning’ enhances code readability by incorporating both positive and negative stylistic examples. This approach allows smaller LLMs like Qwen1.5B to achieve superior performance compared to larger models such as GPT-4 in various code translation scenarios, addressing critical challenges in real-world software development.

Large Language Models (LLMs) have made significant progress in translating code from one programming language to another. This task is crucial for updating applications or migrating software. However, a major hurdle remains: ensuring the translated code is not only functionally correct but also easy to read and maintain. Poorly structured or inconsistent code can be a significant burden for developers, often taking more time to understand than to write from scratch.

Addressing these challenges, researchers have introduced a new approach called F2STRANS. This method is designed to progressively enhance LLMs’ performance in code translation by focusing on two key aspects: functional correctness and code readability. The framework operates in two distinct stages.

Functional Learning: Ensuring Correctness

The first stage, functional learning, aims to optimize the accuracy of the translated code. It does this by using high-quality pairs of source and target code. These pairs are carefully selected from online programming platforms, ensuring that both the original and translated code snippets produce identical outputs for the same inputs. This process involves a ‘relevance-driven code pair selection’ to find similar solutions and ‘differential testing’ to verify that the code pairs behave exactly the same way. By fine-tuning LLMs on this meticulously curated data, F2STRANS ensures that the translated code retains its original functionality.

Style Learning: Improving Readability

Even if code is functionally correct, it might lack readability due to inconsistencies in variable naming, function signatures, or overall structure. The second stage, style learning, tackles this by improving the aesthetic and structural quality of the translated code. This stage incorporates both ‘positive’ and ‘negative’ style examples. Positive examples are translations that maintain stylistic consistency with the source code, often generated by a powerful LLM like Qwen32B and selected through a ‘style consensus selection’ mechanism. Negative examples are translations that deviate stylistically. By learning from both good and bad examples, the model learns to recognize and prioritize maintaining stylistic consistency, making the translated code much more readable.

A New Benchmark for Evaluation

To rigorously test F2STRANS and overcome limitations of existing benchmarks (like outdated code or insufficient test cases), the researchers developed a new, comprehensive code translation benchmark. This new benchmark includes up-to-date source code, extensive test cases, and manually annotated ground-truth translations, allowing for thorough evaluations of both functional accuracy and stylistic quality. The benchmark covers 20 diverse code translation scenarios across five programming languages: C, C++, Go, Java, and Python.

Also Read:

Impressive Results

Experiments conducted on both the new benchmark and traditional datasets demonstrate that F2STRANS significantly improves code translation performance. Remarkably, this approach enables smaller LLMs, such as Qwen1.5B, to outperform larger, more established models like prompt-enhanced Qwen32B and even GPT-4 on average across the 20 code translation scenarios. This highlights the effectiveness of the function-to-style guiding paradigm in making LLMs more efficient and capable for code translation tasks.

A detailed look into the components of F2STRANS through ablation studies revealed that each part, from the relevance-driven data selection to the specific loss functions used in style learning, contributes significantly to the overall performance gains. The research also showed that the style guidance is particularly impactful, improving the correction rate for compilation errors by teaching LLMs to avoid superficial code errors by adhering to source code style.

This work represents a significant step forward in making LLM-generated code translations more reliable and developer-friendly, paving the way for their more effective adoption in real-world software development and maintenance. For more details, you can refer to the original research paper.

Ananya Rao
Ananya Raohttps://blogs.edgentiq.com
Ananya Rao is a tech journalist with a passion for dissecting the fast-moving world of Generative AI. With a background in computer science and a sharp editorial eye, she connects the dots between policy, innovation, and business. Ananya excels in real-time reporting and specializes in uncovering how startups and enterprises in India are navigating the GenAI boom. She brings urgency and clarity to every breaking news piece she writes. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -