TLDR: A new research paper introduces CRABS, a strategy that combines syntactic analysis with Large Language Models (LLMs) to understand Python notebooks. It addresses challenges like re-execution difficulties and LLM limitations by first bounding potential data flows with syntactic analysis, then using an LLM to resolve remaining ambiguities cell-by-cell. This method achieves high accuracy in identifying information flows and execution dependencies, significantly outperforming direct LLM analysis and mitigating issues like hallucinations and long-context problems.
Understanding how data and operations flow within Python notebooks is crucial for evaluating, reusing, and adapting them for new tasks. However, a significant challenge arises because re-executing these notebooks to understand their inner workings is often impractical. This is primarily due to difficulties in resolving data and software dependencies, leading to frequent errors. While Large Language Models (LLMs) have shown promise in understanding code without execution, they often falter with realistic notebooks, exhibiting issues like ‘hallucinations’ (identifying non-existent variables) and struggling with long contexts, especially in larger notebooks.
To tackle these limitations, a new approach called CRABS (Capture and Resolve Assisted Bounding Strategy) has been proposed. CRABS introduces a novel ‘pincer strategy’ that combines limited syntactic analysis with the semantic comprehension capabilities of LLMs. The goal is to generate an information flow graph and a cell execution dependency graph for a given notebook, making its internal logic clear without needing to run the code.
How CRABS Works
CRABS operates in two distinct phases:
1. Syntactic Phase: This initial phase performs a shallow syntactic analysis of the Python notebook’s code. By examining the Abstract Syntax Tree (AST), CRABS creates two estimates of the inter-cell input/output (I/O) sets: a ‘lower estimate’ (representing certain, unambiguous flows) and an ‘upper estimate’ (a superset including both certain and ambiguous flows). This step effectively ‘bounds’ the problem, narrowing down the possibilities for data flow.
2. Semantic-aware Phase: The ambiguities identified between the lower and upper estimates are then presented to an LLM. Using a cell-by-cell, zero-shot learning approach, the LLM resolves these uncertainties. This involves asking specific, binary (yes/no) questions about whether a particular data object is an input or an output candidate for a given cell. This focused questioning strategy is key to mitigating the LLM’s long-context challenges and preventing hallucinations, as all potential inputs and outputs are already derived from the syntactic analysis.
Also Read:
- New Benchmark Reveals How Fine-Tuning Boosts AI’s Code Comprehension
- Beyond Text: Integrating Code Structure into Large Language Models
Demonstrated Effectiveness
The effectiveness of CRABS was evaluated using a dataset of 50 highly up-voted Kaggle notebooks, chosen for their representativeness of real-world data science and machine learning workflows. These notebooks were manually annotated to establish a ‘ground truth’ for information flows and transitive dependencies.
The results were impressive. CRABS achieved average F1 scores of 98% for identifying cell-to-cell information flows and 99% for identifying transitive cell execution dependencies. Furthermore, 37 out of 50 (74%) of the information flow graphs and 41 out of 50 (82%) of the cell execution dependency graphs generated by CRABS exactly matched the ground truth. The LLM alone correctly resolved 1397 out of 1425 (98%) of the ambiguities presented to it.
Compared to a baseline approach where an LLM was prompted to analyze entire notebooks directly, CRABS showed significant improvements. The baseline often failed to understand longer notebooks (20% of the dataset) due to long-context issues and frequently hallucinated variables. CRABS, by contrast, yielded non-zero scores for all notebooks and demonstrated substantial percentage-point increases in F1 score, accuracy, and exact match rates for both information flow and cell execution dependency graphs.
An ablation study further confirmed the critical roles of both the syntactic phase and the cell-by-cell prompting strategy, showing a notable drop in performance when either component was removed.
In conclusion, CRABS offers a robust and effective method for understanding Python notebooks without the need for re-execution. By strategically combining symbolic (syntactic) analysis with neural (LLM) capabilities, this ‘pincer strategy’ represents a promising direction for integrating different AI methods to solve complex code understanding tasks. You can read the full research paper here.


