TLDR: JetBrains has introduced Mellum, a family of 4-billion-parameter, open-weight code completion models designed for interactive use in their IDEs. Mellum addresses limitations of general LLMs by focusing on production-grade constraints like low latency, reasonable model size, and permissively licensed training data. Through a multi-stage training pipeline including pre-training, supervised fine-tuning with project context, and direct preference optimization, Mellum achieves superior code completion quality and stopping behavior. Both offline and online evaluations demonstrate its effectiveness, significantly boosting developer productivity in real-world scenarios. Mellum is open-sourced under the Apache-2.0 license, offering a pragmatic blueprint for specialized, efficient AI tools in software development.
JetBrains has unveiled its new family of open-weight code completion models, named Mellum, designed to provide production-grade, contextual code completion directly within Integrated Development Environments (IDEs). This initiative aims to bridge the gap between advanced research prototypes and practical, real-world applications for hundreds of thousands of developers.
Addressing the Challenges of Modern Code Completion
While large, general-purpose language models (LLMs) have gained popularity, they often fall short for in-IDE code completion due to several practical limitations. These include restrictive licensing, high serving costs and latency, inconsistent output formats, and a lack of editor-critical behaviors like understanding code below the caret or handling partial tokens. Furthermore, the lack of transparency in training data and irregular updates can pose model governance risks for IDE vendors.
Mellum models were developed specifically to overcome these hurdles. They are purpose-built for multi-line code completion, trained from scratch on permissively licensed public code, and later released openly. The design prioritizes low latency for real-time suggestions, a reasonable model size to fit cost-efficient GPUs, and a widely adopted architecture for optimized training and inference.
The Mellum Model Family: Architecture and Training
The Mellum models feature 4 billion parameters and adopt a Llama-style architecture, making them compatible with high-load inference frameworks. They were extensively pre-trained on approximately 4 trillion tokens of multi-language code, along with some natural language data from Wikipedia for better comment and string literal completion. A custom tokenizer with 49,152 tokens was created, and a context size of 8,192 tokens is supported.
The training pipeline is a multi-stage process:
- Pre-training: Focused on general code understanding across various languages, syntax, patterns, and programming concepts. This stage also incorporated a “fill-in-the-middle” (FIM) transformation, where code examples are split into prefix, middle, and suffix, and the model learns to predict the missing middle part.
- Supervised Fine-tuning (SFT): This stage tuned the base model specifically for the code completion task. It used more realistic FIM examples, focusing on semantically whole parts like function or loop bodies, rather than random chunks. Crucially, it integrated project-level contextual information using strategies like IoU (Intersection over Union) similarity, path distance, and RAG (Retrieval-Augmented Generation) to find relevant code chunks across multiple files in a project.
- Direct Preference Optimization (DPO): The final stage aimed to align the model with real-world user preferences, improving readability and utility. This involved sampling diverse outputs from the SFT model and using an LLM-as-a-Judge procedure to create a dataset of “good” and “bad” generations. The DPO training helped the model produce more compact, readable code and suppress unhelpful outputs like placeholder comments.
Rigorous Evaluation and Real-World Impact
JetBrains conducted extensive evaluations, both offline and online, to validate Mellum’s performance. The proprietary JetComplete benchmark, mirroring real-world IDE usage, showed that Mellum models, especially after SFT and DPO, significantly outperform even larger open-source models in code completion quality. Metrics like Exact Match, chrF++, and a custom KK score (highly correlated with human judgment) were used. The DPO stage notably improved the model’s “stopping behavior,” leading to more concise and relevant suggestions.
Online evaluations, using telemetry from production deployments in JetBrains IDEs, further confirmed Mellum’s impact. Metrics such as Ratio of Completed Code (RoCC) and Acceptance Rate (AR) demonstrated substantial increases with cloud code completion powered by Mellum variants. For instance, RoCC for Python increased from 0.25 to 0.39, and for Java, from 0.42 to 0.49, highlighting a significant boost in developer productivity.
Also Read:
- Unlocking Smarter Code Completion: Insights from a Global AI Challenge
- Code4Me V2: An Open-Source Platform for AI Code Completion Research
A Blueprint for Open, Production-Grade AI
The Mellum project underscores the importance of task-specific models and disciplined data curation for production-grade AI tools. By open-sourcing Mellum models under the Apache-2.0 license on HuggingFace, JetBrains provides a reproducible reference for practitioners and makes AI-assisted coding more accessible to organizations with strict privacy concerns or limited resources for large-scale model deployment. This approach demonstrates that compact, specialized models can deliver substantial productivity gains while meeting stringent product constraints, paving the way for more efficient and democratized AI tools in software development. You can find the full research paper here.


