CRABS: A Hybrid AI Approach for Interpreting Python Notebooks Without Execution

TLDR: A new research paper introduces CRABS, a strategy that combines syntactic analysis with Large Language Models (LLMs) to understand Python notebooks. It addresses challenges like re-execution difficulties and LLM limitations by first bounding potential data flows with syntactic analysis, then using an LLM to resolve remaining ambiguities cell-by-cell. This method achieves high accuracy in identifying information flows and execution dependencies, significantly outperforming direct LLM analysis and mitigating issues like hallucinations and long-context problems.

Understanding how data and operations flow within Python notebooks is crucial for evaluating, reusing, and adapting them for new tasks. However, a significant challenge arises because re-executing these notebooks to understand their inner workings is often impractical. This is primarily due to difficulties in resolving data and software dependencies, leading to frequent errors. While Large Language Models (LLMs) have shown promise in understanding code without execution, they often falter with realistic notebooks, exhibiting issues like ‘hallucinations’ (identifying non-existent variables) and struggling with long contexts, especially in larger notebooks.

To tackle these limitations, a new approach called CRABS (Capture and Resolve Assisted Bounding Strategy) has been proposed. CRABS introduces a novel ‘pincer strategy’ that combines limited syntactic analysis with the semantic comprehension capabilities of LLMs. The goal is to generate an information flow graph and a cell execution dependency graph for a given notebook, making its internal logic clear without needing to run the code.

How CRABS Works

CRABS operates in two distinct phases:

1. Syntactic Phase: This initial phase performs a shallow syntactic analysis of the Python notebook’s code. By examining the Abstract Syntax Tree (AST), CRABS creates two estimates of the inter-cell input/output (I/O) sets: a ‘lower estimate’ (representing certain, unambiguous flows) and an ‘upper estimate’ (a superset including both certain and ambiguous flows). This step effectively ‘bounds’ the problem, narrowing down the possibilities for data flow.

2. Semantic-aware Phase: The ambiguities identified between the lower and upper estimates are then presented to an LLM. Using a cell-by-cell, zero-shot learning approach, the LLM resolves these uncertainties. This involves asking specific, binary (yes/no) questions about whether a particular data object is an input or an output candidate for a given cell. This focused questioning strategy is key to mitigating the LLM’s long-context challenges and preventing hallucinations, as all potential inputs and outputs are already derived from the syntactic analysis.

Also Read:

Demonstrated Effectiveness

The effectiveness of CRABS was evaluated using a dataset of 50 highly up-voted Kaggle notebooks, chosen for their representativeness of real-world data science and machine learning workflows. These notebooks were manually annotated to establish a ‘ground truth’ for information flows and transitive dependencies.

The results were impressive. CRABS achieved average F1 scores of 98% for identifying cell-to-cell information flows and 99% for identifying transitive cell execution dependencies. Furthermore, 37 out of 50 (74%) of the information flow graphs and 41 out of 50 (82%) of the cell execution dependency graphs generated by CRABS exactly matched the ground truth. The LLM alone correctly resolved 1397 out of 1425 (98%) of the ambiguities presented to it.

Compared to a baseline approach where an LLM was prompted to analyze entire notebooks directly, CRABS showed significant improvements. The baseline often failed to understand longer notebooks (20% of the dataset) due to long-context issues and frequently hallucinated variables. CRABS, by contrast, yielded non-zero scores for all notebooks and demonstrated substantial percentage-point increases in F1 score, accuracy, and exact match rates for both information flow and cell execution dependency graphs.

An ablation study further confirmed the critical roles of both the syntactic phase and the cell-by-cell prompting strategy, showing a notable drop in performance when either component was removed.

In conclusion, CRABS offers a robust and effective method for understanding Python notebooks without the need for re-execution. By strategically combining symbolic (syntactic) analysis with neural (LLM) capabilities, this ‘pincer strategy’ represents a promising direction for integrating different AI methods to solve complex code understanding tasks. You can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

CRABS: A Hybrid AI Approach for Interpreting Python Notebooks Without Execution

How CRABS Works

Demonstrated Effectiveness

Gen AI News and Updates

Google DeepMind Unveils SIMA 2: An Advanced AI Agent for Virtual 3D Worlds

Oracle Unveils ‘Ask Oracle’ Chatbot for Personalized Redwood Experience, Powered by Advanced Select AI

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates