TLDR: CelloAI is a locally hosted AI coding assistant for High Energy Physics (HEP) that uses Large Language Models (LLMs) with retrieval-augmented generation (RAG) to improve software development. It addresses challenges like porting legacy code and sparse documentation by offering Doxygen-style comment generation, file summaries, and syntax-aware code generation with callgraph knowledge. Its unique CelloRetriever uses separate code/text collections and pattern matching for accurate context. Evaluations show CelloAI significantly enhances code understanding and generation, especially for porting GPU kernels, while ensuring data privacy and reducing costs.
Next-generation High Energy Physics (HEP) experiments are set to produce an unprecedented amount of data. To handle this, High Performance Computing (HPC) is becoming essential. However, integrating HPC into HEP faces significant challenges, primarily due to the difficulty of adapting existing software to new, diverse computing architectures and the lack of comprehensive documentation for complex scientific codebases.
Addressing these critical issues, researchers have introduced CelloAI, an innovative, locally hosted coding assistant. CelloAI leverages the power of Large Language Models (LLMs) combined with retrieval-augmented generation (RAG) to significantly enhance code documentation and generation within the HEP community. A key advantage of CelloAI’s local deployment is that it ensures data privacy, eliminates recurring costs associated with cloud-based services, and provides access to large context windows without relying on external dependencies.
CelloAI’s Core Capabilities
CelloAI focuses on two primary use cases: code documentation and code generation, each supported by specialized components:
- Code Documentation: The assistant can automatically generate Doxygen-style comments for functions and classes by pulling relevant information from various RAG sources like papers, posters, and presentations. It also creates file-level summaries and offers an interactive chatbot to answer code comprehension questions.
- Code Generation: For generating code, CelloAI uses advanced syntax-aware chunking strategies. These strategies ensure that syntactic boundaries are preserved during the embedding process, which greatly improves the accuracy of retrieval in large codebases. The system also incorporates knowledge from callgraphs to maintain awareness of code dependencies during modifications, providing AI-generated suggestions for performance optimization and accurate refactoring.
The development of scientific HPC software presents unique hurdles that differentiate it from general-purpose programming. These include limited training data for LLMs, the need for absolute correctness (which conflicts with the probabilistic nature of LLM outputs), and the complex, domain-specific frameworks prevalent in HEP. CelloAI tackles these by using RAG enhanced with HEP-specific data, intelligent code chunking, and integrating callgraph knowledge to understand codebase relationships.
How CelloAI Works: The CelloRetriever
At the heart of CelloAI is the CelloRetriever, a specialized mechanism designed to assemble contextually relevant code fragments and explanatory text for the language model. It optimizes retrieval performance by:
- Separate Collections: Storing code and text embeddings in distinct collections within ChromaDB. This allows for independent tuning and ensures a balanced context for queries that need both algorithmic understanding and conceptual explanations.
- Syntax-aware Code Chunking: Unlike conventional fixed-window chunking that can fragment code, CelloAI uses a Tree-sitter driven strategy. This ensures that complete, self-contained units like full function or class definitions are retrieved, leading to higher accuracy in code generation.
- Unified Retrieval with Pattern Matching: CelloRetriever integrates retrieved code and text. It uses a lightweight pattern-matching pass to identify and re-order candidates, ensuring that semantically dissimilar but algorithmically relevant chunks are prioritized.
- Automatic Prompt Enhancement with Callgraphs: To prevent LLMs from suggesting changes that could break dependencies, CelloAI automatically adds contextual dependency information from callgraphs to prompts. This “two-hop lineage” (immediate callers and callees) provides the LLM with visibility into the broader execution flow, enhancing safety and explainability.
Also Read:
- Scaling Large Language Model Inference on HPC Clusters with SLURM
- AI-Driven Precision: Multi-Model Collaboration for Intelligent Structural Demolition Planning
Evaluating CelloAI’s Performance
CelloAI was rigorously evaluated using real-world HEP applications from ATLAS, CMS, and DUNE experiments, including FastCaloSim, Patatrack, P2R, and WireCell. The evaluations confirmed the benefits of its design choices:
- Separate collections proved crucial for balanced retrieval, preventing scenarios where only code or only text dominated the context.
- Syntax-aware chunking significantly reduced code fragmentation, ensuring that LLMs received complete and accurate code snippets.
- CelloRetriever’s pattern matching consistently boosted retrieval effectiveness across various embedding models, demonstrating its robustness.
In terms of practical application, CelloAI provides an out-of-the-box utility for generating Doxygen-style comments and file summaries, significantly reducing manual documentation effort. For code generation, particularly in porting GPU kernels from CUDA to OpenMP, CelloAI’s retrieval pipeline dramatically increased the coverage of kernels that open-weight LLMs could successfully process. While “easy” kernels compiled reliably, “moderate” and “hard” kernels still present challenges, mainly due to complex memory mapping requirements and the inherent stochasticity of LLM generation.
The research paper, available at arXiv:2508.16713, highlights that CelloAI is a significant step towards an agentic framework that can port GPU kernels across different backends, write unit tests from physics specifications, and execute and benchmark code while feeding back compiler diagnostics and performance metrics. This iterative approach aims to constrain the stochasticity of LLM generation and enable the development of optimal kernels for scientific computing.


