Enhancing Decompilation for Executable Code with Contextual Learning

TLDR: ICL4Decomp is a novel hybrid decompilation framework that uses in-context learning (ICL) to guide large language models (LLMs) in generating re-executable source code from binaries. It combines retrieved code examples (ICL4D-R) and natural-language descriptions of compiler optimization rules (ICL4D-O). The framework achieves an average of 40% improvement in re-executability over state-of-the-art methods, especially for optimized binaries, while also mitigating common decompilation errors and demonstrating strong robustness across varying program complexities.

Decompilation, the process of converting low-level binary code back into high-level source code, is a crucial task in software security analysis, reverse engineering, and understanding malware when original source code is unavailable. However, this process has long been plagued by a significant challenge: the inability of existing techniques to produce source code that can be successfully recompiled and re-executed, especially for optimized binaries.

Traditional decompilers, like Hex-Rays and Ghidra, often struggle with optimized code because compiler optimizations discard vital semantic information such as variable types, control-flow constructs, and meaningful names. While these tools can generate reasonable code for unoptimized binaries, they frequently fail when optimizations are applied, leading to code that cannot be compiled or misinterprets the original developer’s intent.

Recent advancements in large language models (LLMs) have introduced neural approaches to decompilation. These models can generate semantically plausible code, but it often lacks true executability. This limitation stems from the LLMs’ difficulty in recovering lost semantic cues without specific contextual guidance.

Introducing ICL4Decomp: A Context-Guided Approach

To tackle these persistent challenges, researchers Xiaohan Wang, Yuxin Hu, and Kevin Leach from Vanderbilt University have proposed a novel hybrid decompilation framework called ICL4Decomp. This framework leverages in-context learning (ICL) to guide LLMs in generating re-executable source code. ICL4Decomp significantly improves the re-executability of decompiled code by integrating two complementary knowledge sources:

ICL4D-R: Retrieved-Exemplar In-Context Decompilation: This variant uses semantically similar binary-source code pairs retrieved from a large corpus. By exposing the LLM to concrete examples of how assembly code translates into source code, it helps the model understand correct decompilation patterns.
ICL4D-O: Optimization Rule-based In-Context Decompilation: This approach augments the LLM’s prompt with natural-language descriptions of compiler optimization rules. This allows the model to reason about complex, non-local transformations introduced by compilers, such as loop unrolling or variable coalescing, which often confuse other decompilers.

The ICL4Decomp framework operates end-to-end. Given a target binary function, it constructs an informative context by selecting relevant examples or applicable rules, then conditions the language model on this context to generate the corresponding source code. This design combines the flexibility of ICL (adapting to arbitrary binary inputs without retraining) with the interpretability of well-defined compilation rules.

Also Read:

Remarkable Improvements in Re-executability

The evaluation of ICL4Decomp across multiple datasets (ExeBench and HumanEval-Decompile), various optimization levels (O0 to O3), and compilers (GCC and Clang) has yielded impressive results. The framework demonstrated an average increase of approximately 40% in re-executability over state-of-the-art decompilation methods. These gains were particularly significant at higher optimization levels, where compiler transformations introduce greater semantic ambiguity and structural complexity.

ICL4Decomp also proved robust across all optimization levels, indicating its effectiveness in handling diverse compilation transformations. Furthermore, the research showed that in-context learning helps mitigate specific categories of decompilation errors. ICL4D-R, for instance, substantially reduced syntax and declaration errors, improving structural and symbolic consistency. ICL4D-O, while less stable, tended to produce more localized and repairable errors.

The framework’s robustness was further highlighted by its consistent outperformance of baselines across functions of varying program complexity, including those with higher cyclomatic complexity and lines of code. This suggests that contextual guidance helps the model maintain control-flow and data-flow coherence even in intricate programs.

This groundbreaking work marks a significant step towards achieving truly re-executable source code from binaries, bridging a critical gap in software security and reverse engineering. For more in-depth information, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Enhancing Decompilation for Executable Code with Contextual Learning

Introducing ICL4Decomp: A Context-Guided Approach

Remarkable Improvements in Re-executability

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates