CelloAI: A Local LLM Assistant for High Energy Physics Software Development

TLDR: CelloAI is a locally hosted AI coding assistant for High Energy Physics (HEP) that uses Large Language Models (LLMs) with retrieval-augmented generation (RAG) to improve software development. It addresses challenges like porting legacy code and sparse documentation by offering Doxygen-style comment generation, file summaries, and syntax-aware code generation with callgraph knowledge. Its unique CelloRetriever uses separate code/text collections and pattern matching for accurate context. Evaluations show CelloAI significantly enhances code understanding and generation, especially for porting GPU kernels, while ensuring data privacy and reducing costs.

Next-generation High Energy Physics (HEP) experiments are set to produce an unprecedented amount of data. To handle this, High Performance Computing (HPC) is becoming essential. However, integrating HPC into HEP faces significant challenges, primarily due to the difficulty of adapting existing software to new, diverse computing architectures and the lack of comprehensive documentation for complex scientific codebases.

Addressing these critical issues, researchers have introduced CelloAI, an innovative, locally hosted coding assistant. CelloAI leverages the power of Large Language Models (LLMs) combined with retrieval-augmented generation (RAG) to significantly enhance code documentation and generation within the HEP community. A key advantage of CelloAI’s local deployment is that it ensures data privacy, eliminates recurring costs associated with cloud-based services, and provides access to large context windows without relying on external dependencies.

CelloAI’s Core Capabilities

CelloAI focuses on two primary use cases: code documentation and code generation, each supported by specialized components:

Code Documentation: The assistant can automatically generate Doxygen-style comments for functions and classes by pulling relevant information from various RAG sources like papers, posters, and presentations. It also creates file-level summaries and offers an interactive chatbot to answer code comprehension questions.
Code Generation: For generating code, CelloAI uses advanced syntax-aware chunking strategies. These strategies ensure that syntactic boundaries are preserved during the embedding process, which greatly improves the accuracy of retrieval in large codebases. The system also incorporates knowledge from callgraphs to maintain awareness of code dependencies during modifications, providing AI-generated suggestions for performance optimization and accurate refactoring.

The development of scientific HPC software presents unique hurdles that differentiate it from general-purpose programming. These include limited training data for LLMs, the need for absolute correctness (which conflicts with the probabilistic nature of LLM outputs), and the complex, domain-specific frameworks prevalent in HEP. CelloAI tackles these by using RAG enhanced with HEP-specific data, intelligent code chunking, and integrating callgraph knowledge to understand codebase relationships.

How CelloAI Works: The CelloRetriever

At the heart of CelloAI is the CelloRetriever, a specialized mechanism designed to assemble contextually relevant code fragments and explanatory text for the language model. It optimizes retrieval performance by:

Separate Collections: Storing code and text embeddings in distinct collections within ChromaDB. This allows for independent tuning and ensures a balanced context for queries that need both algorithmic understanding and conceptual explanations.
Syntax-aware Code Chunking: Unlike conventional fixed-window chunking that can fragment code, CelloAI uses a Tree-sitter driven strategy. This ensures that complete, self-contained units like full function or class definitions are retrieved, leading to higher accuracy in code generation.
Unified Retrieval with Pattern Matching: CelloRetriever integrates retrieved code and text. It uses a lightweight pattern-matching pass to identify and re-order candidates, ensuring that semantically dissimilar but algorithmically relevant chunks are prioritized.
Automatic Prompt Enhancement with Callgraphs: To prevent LLMs from suggesting changes that could break dependencies, CelloAI automatically adds contextual dependency information from callgraphs to prompts. This “two-hop lineage” (immediate callers and callees) provides the LLM with visibility into the broader execution flow, enhancing safety and explainability.

Also Read:

Evaluating CelloAI’s Performance

CelloAI was rigorously evaluated using real-world HEP applications from ATLAS, CMS, and DUNE experiments, including FastCaloSim, Patatrack, P2R, and WireCell. The evaluations confirmed the benefits of its design choices:

Separate collections proved crucial for balanced retrieval, preventing scenarios where only code or only text dominated the context.
Syntax-aware chunking significantly reduced code fragmentation, ensuring that LLMs received complete and accurate code snippets.
CelloRetriever’s pattern matching consistently boosted retrieval effectiveness across various embedding models, demonstrating its robustness.

In terms of practical application, CelloAI provides an out-of-the-box utility for generating Doxygen-style comments and file summaries, significantly reducing manual documentation effort. For code generation, particularly in porting GPU kernels from CUDA to OpenMP, CelloAI’s retrieval pipeline dramatically increased the coverage of kernels that open-weight LLMs could successfully process. While “easy” kernels compiled reliably, “moderate” and “hard” kernels still present challenges, mainly due to complex memory mapping requirements and the inherent stochasticity of LLM generation.

The research paper, available at arXiv:2508.16713, highlights that CelloAI is a significant step towards an agentic framework that can port GPU kernels across different backends, write unit tests from physics specifications, and execute and benchmark code while feeding back compiler diagnostics and performance metrics. This iterative approach aims to constrain the stochasticity of LLM generation and enable the development of optimal kernels for scientific computing.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

CelloAI: A Local LLM Assistant for High Energy Physics Software Development

CelloAI’s Core Capabilities

How CelloAI Works: The CelloRetriever

Evaluating CelloAI’s Performance

Gen AI News and Updates

Progress Software Unveils Groundbreaking Generative CMS with Trusted AI for Dynamic Digital Experiences

Runloop.ai Launches Enterprise AI Infrastructure with Google Wallet Co-Founder Rob von Behren Joining Leadership

Microsoft Research Unveils BlueCodeAgent: AI-Powered Defense for Secure Code Generation

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates