NoteEx: Visualizing Your Data Science Thoughts for Better AI Assistance

TLDR: NoteEx is a new JupyterLab extension that helps data analysts use Large Language Models (LLMs) more effectively for Exploratory Data Analysis (EDA). It addresses the problem of LLMs receiving incorrect or insufficient context from long, messy notebooks by providing a visual, flowchart-style interface. This interface allows users to map out their analytical thought process (mental model), specify relationships between code cells, and easily select relevant data and code for LLMs. A user study showed that NoteEx significantly improved user engagement, mental model retention, and the quality of LLM responses compared to traditional notebooks.

In the fast-evolving world of data science, computational notebooks like Jupyter have become indispensable tools for Exploratory Data Analysis (EDA). These notebooks allow data analysts to write and execute code, visualize results, and document their findings in an interactive environment. With the rise of Large Language Models (LLMs) such as ChatGPT and GitHub Copilot, the potential for AI assistance in EDA has grown exponentially, promising to streamline tasks like code generation and result interpretation.

However, this promise often hits a snag: LLMs need precise, task-relevant context to provide high-quality responses. As notebooks grow longer and more complex, analysts struggle to keep track of their ‘mental model’ – their internal, often non-linear understanding of the analysis workflow. This disconnect means LLMs frequently receive either too little or too much context, leading to inaccurate suggestions, frustrating debugging, and a lot of tedious prompt engineering.

The Challenge of Context

Traditional methods for context selection fall short. Automatic selection, often based on explicit code dependencies, overlooks the analyst’s evolving thought process. For instance, an analyst might experiment with slightly different data loading cells using the same variable names, making it difficult for an LLM to pick the correct one. Manual context selection, while offering control, becomes a time-consuming and cognitively demanding chore in large, messy notebooks, requiring users to remember cell relationships, variable definitions, and execution statuses.

Researchers identified four key challenges through a formative study:

An analyst’s mental model constantly evolves, making context selection difficult.
Cell-specific details like execution status, order, outputs, and errors are crucial but hard to recall.
Data variables are often implicit, making them hard to find and use as context.
Manual context curation is burdensome, and automatic selection frequently fails to capture user intent.

Introducing NoteEx: A Visual Solution

To address these issues, a new JupyterLab extension called NoteEx has been developed. NoteEx aims to bridge the gap between an analyst’s mental model and the LLM’s need for precise context by providing an interactive, visual environment. It’s designed to help analysts externalize their thought processes and specify analysis dependencies, leading to more accurate and relevant LLM responses.

NoteEx features three main components:

The Canvas View: This is the heart of NoteEx, offering a flowchart-style visualization of the EDA workflow. Each node on the canvas represents a notebook cell, displaying its order, a snippet of its content, a preview of its output, and its execution status (green for success, red for error, orange for unexecuted or changed). Analysts can draw links between these nodes to represent their mental model dependencies, which are often non-linear and conceptual, unlike strict code dependencies. This visual map helps users maintain an overview of their analysis, even in very long notebooks, and easily navigate between related cells.

The Data Information View: This panel provides a clear list of all data variables defined within the notebook. For each variable, it shows its name, structure, and highlights the specific cells in the Canvas View where it is defined or used. This makes variables first-class entities, allowing analysts to quickly identify and select them as part of the LLM’s context without endless scrolling or guessing exact names.

The LLM-Assistant View: When an analyst wants AI help, they can open this view from an active cell. NoteEx intelligently suggests a set of task-relevant cells and data variables based on the mental model dependencies specified in the Canvas View. Users can then easily add or remove items from this suggested context, ensuring the LLM receives precisely what it needs. This significantly reduces the effort of crafting prompts and leads to more accurate and useful AI-generated code or explanations.

Also Read:

Real-World Impact

A user study comparing NoteEx to a baseline (standard JupyterLab with a Copilot-like chat) demonstrated significant improvements. Participants found NoteEx more engaging, easier to use, and more effective for maintaining their mental models. The visual nature of the Canvas View helped them understand both their own and others’ analysis workflows more efficiently. When interacting with the LLM, NoteEx users crafted much shorter prompts, required fewer clarifications, and received higher-quality responses because the LLM had a better understanding of their intent through the curated context.

NoteEx not only streamlines the technical aspects of data analysis but also enhances the cognitive process, allowing analysts to focus on insights rather than wrestling with tools or AI prompts. It represents a significant step forward in making LLM-assisted EDA more intuitive, efficient, and aligned with how data scientists actually think and work. To learn more about this innovative tool, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

NoteEx: Visualizing Your Data Science Thoughts for Better AI Assistance

The Challenge of Context

Introducing NoteEx: A Visual Solution

Real-World Impact

Gen AI News and Updates

Oracle Unveils ‘Ask Oracle’ Chatbot for Personalized Redwood Experience, Powered by Advanced Select AI

Microsoft Research Unveils Project Gecko to Advance Equitable Multilingual AI for Global Communities

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates