spot_img
HomeResearch & DevelopmentUnlocking Program Paths: AUTO STUB's AI-Powered Stubs

Unlocking Program Paths: AUTO STUB’s AI-Powered Stubs

TLDR: AUTO STUB is a novel approach that uses Genetic Programming to automatically generate symbolic stubs for external functions encountered during symbolic execution. This addresses a major limitation in software testing where external functions act as ‘black boxes,’ hindering analysis. By generating training data from random inputs and outputs, AUTO STUB’s AI derives expressions that approximate function behavior, allowing symbolic execution to continue without manual intervention. The system achieves over 90% accuracy for 55% of evaluated functions, enabling the exploration of previously intractable program paths and revealing language-specific edge cases crucial for software testing.

Software testing is a critical process for ensuring the reliability and security of applications. One powerful technique used in this domain is symbolic execution, which explores all possible program paths by representing inputs as symbols rather than concrete values. This method can uncover bugs and vulnerabilities that might be missed by traditional testing. However, symbolic execution faces a significant hurdle when it encounters ‘external functions’ – these are parts of a program that rely on native methods, third-party libraries, or uninstrumented code. Such functions act like black boxes, making it impossible for symbolic execution to understand their internal behavior and thus halting the analysis of any code dependent on them.

The Challenge of External Functions in Symbolic Execution

Imagine a program that checks a user’s input using a function called verify_input. If this function is external, symbolic execution cannot determine the relationship between the user’s input and the function’s output. This means it can’t explore different scenarios, like what happens if verify_input returns true or false, effectively blocking the analysis of subsequent code. Current solutions often involve manual intervention, where developers write ‘symbolic stubs’ – simplified models that approximate the external function’s behavior. This process is time-consuming, prone to errors, and requires deep understanding of the external code, making it a bottleneck in comprehensive software testing.

Introducing AUTO STUB: An Automated Solution

To overcome this limitation, researchers have developed AUTO STUB, a novel approach that automates the creation of these symbolic stubs using Genetic Programming. Genetic Programming is a type of machine learning inspired by biological evolution, where computer programs ‘evolve’ over generations to solve a specific task. AUTO STUB integrates seamlessly into the symbolic execution process. When an external function is encountered, AUTO STUB first generates a diverse set of random inputs and observes the corresponding outputs from the actual external function. This input-output data then serves as training material for Genetic Programming.

How AUTO STUB Leverages Genetic Programming

Genetic Programming in AUTO STUB works by searching for mathematical or logical expressions that accurately mimic the relationship between the observed inputs and outputs. These expressions, once found, become the symbolic stubs. The system uses Grammar-Guided Genetic Programming (G3P) to ensure that the generated expressions are syntactically correct and maintain type consistency across different data types (like integers, floating-point numbers, and strings). A wide range of operators, including mathematical, logical, and string manipulation functions, are used as building blocks for these expressions. The ‘fitness’ of an expression is measured by how well its predicted outputs match the actual outputs, using metrics like Normalized Root Mean Squared Error for numbers, classification accuracy for Booleans, and Levenshtein distance for strings.

Behind the Scenes: Data Generation and Evaluation

To ensure the generated stubs are robust, AUTO STUB employs a sophisticated input generation strategy. For numerical types, it uses stratified sampling to cover a wide range of magnitudes, including special values like NaN (Not-a-Number), Infinity, and min/max values. For strings, it creates random sequences of varying lengths. This comprehensive data generation is crucial for training the Genetic Programming algorithm effectively. The system was evaluated on a benchmark dataset of 273 Java methods from internal libraries, focusing on primitives and mathematical operations, ensuring they had no side effects and returned primitive or string types.

Real-World Impact and Accuracy

The results of AUTO STUB are promising. The system demonstrated that it could automatically approximate external functions with over 90% accuracy for 55% of the functions evaluated. This significantly outperforms a random baseline, proving the effectiveness of its targeted search strategy. For instance, AUTO STUB successfully inferred complex, language-specific behaviors, such as how Java handles Double.isNaN(double), by generating an expression that captures the unique properties of NaN. These insights are invaluable for identifying edge cases and potential bugs in software. While most generated stubs worked flawlessly with symbolic execution engines, some challenges arose when the underlying SMT (Satisfiability Modulo Theories) solver interpreted language-specific semantics (like NaN or Infinity) differently than Java, highlighting an area for future refinement.

Also Read:

Navigating Limitations and Future Directions

Currently, AUTO STUB is limited to stateless functions, meaning it cannot handle objects that retain internal state, such as StringBuilder. Extending its capabilities to stateful objects remains an open challenge, potentially by approximating sequences of calls rather than single functions. Additionally, the generated symbolic stubs are intentionally kept computationally simple to ensure fast solving, meaning they approximate functions with regular complexity rather than Turing-complete behaviors like loops or recursion. Despite these limitations, AUTO STUB represents a significant step forward in automating software testing, making symbolic execution more practical and less reliant on manual effort. For more technical details, you can refer to the full research paper: AUTO STUB : Genetic Programming-Based Stub Creation for Symbolic Execution.

Meera Iyer
Meera Iyerhttps://blogs.edgentiq.com
Meera Iyer is an AI news editor who blends journalistic rigor with storytelling elegance. Formerly a content strategist in a leading tech firm, Meera now tracks the pulse of India's Generative AI scene, from policy updates to academic breakthroughs. She's particularly focused on bringing nuanced, balanced perspectives to the fast-evolving world of AI-powered tools and media. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -