JetBrains Unveils Mellum: A New Era for In-IDE Code Completion

TLDR: JetBrains has introduced Mellum, a family of 4-billion-parameter, open-weight code completion models designed for interactive use in their IDEs. Mellum addresses limitations of general LLMs by focusing on production-grade constraints like low latency, reasonable model size, and permissively licensed training data. Through a multi-stage training pipeline including pre-training, supervised fine-tuning with project context, and direct preference optimization, Mellum achieves superior code completion quality and stopping behavior. Both offline and online evaluations demonstrate its effectiveness, significantly boosting developer productivity in real-world scenarios. Mellum is open-sourced under the Apache-2.0 license, offering a pragmatic blueprint for specialized, efficient AI tools in software development.

JetBrains has unveiled its new family of open-weight code completion models, named Mellum, designed to provide production-grade, contextual code completion directly within Integrated Development Environments (IDEs). This initiative aims to bridge the gap between advanced research prototypes and practical, real-world applications for hundreds of thousands of developers.

Addressing the Challenges of Modern Code Completion

While large, general-purpose language models (LLMs) have gained popularity, they often fall short for in-IDE code completion due to several practical limitations. These include restrictive licensing, high serving costs and latency, inconsistent output formats, and a lack of editor-critical behaviors like understanding code below the caret or handling partial tokens. Furthermore, the lack of transparency in training data and irregular updates can pose model governance risks for IDE vendors.

Mellum models were developed specifically to overcome these hurdles. They are purpose-built for multi-line code completion, trained from scratch on permissively licensed public code, and later released openly. The design prioritizes low latency for real-time suggestions, a reasonable model size to fit cost-efficient GPUs, and a widely adopted architecture for optimized training and inference.

The Mellum Model Family: Architecture and Training

The Mellum models feature 4 billion parameters and adopt a Llama-style architecture, making them compatible with high-load inference frameworks. They were extensively pre-trained on approximately 4 trillion tokens of multi-language code, along with some natural language data from Wikipedia for better comment and string literal completion. A custom tokenizer with 49,152 tokens was created, and a context size of 8,192 tokens is supported.

The training pipeline is a multi-stage process:

Pre-training: Focused on general code understanding across various languages, syntax, patterns, and programming concepts. This stage also incorporated a “fill-in-the-middle” (FIM) transformation, where code examples are split into prefix, middle, and suffix, and the model learns to predict the missing middle part.
Supervised Fine-tuning (SFT): This stage tuned the base model specifically for the code completion task. It used more realistic FIM examples, focusing on semantically whole parts like function or loop bodies, rather than random chunks. Crucially, it integrated project-level contextual information using strategies like IoU (Intersection over Union) similarity, path distance, and RAG (Retrieval-Augmented Generation) to find relevant code chunks across multiple files in a project.
Direct Preference Optimization (DPO): The final stage aimed to align the model with real-world user preferences, improving readability and utility. This involved sampling diverse outputs from the SFT model and using an LLM-as-a-Judge procedure to create a dataset of “good” and “bad” generations. The DPO training helped the model produce more compact, readable code and suppress unhelpful outputs like placeholder comments.

Rigorous Evaluation and Real-World Impact

JetBrains conducted extensive evaluations, both offline and online, to validate Mellum’s performance. The proprietary JetComplete benchmark, mirroring real-world IDE usage, showed that Mellum models, especially after SFT and DPO, significantly outperform even larger open-source models in code completion quality. Metrics like Exact Match, chrF++, and a custom KK score (highly correlated with human judgment) were used. The DPO stage notably improved the model’s “stopping behavior,” leading to more concise and relevant suggestions.

Online evaluations, using telemetry from production deployments in JetBrains IDEs, further confirmed Mellum’s impact. Metrics such as Ratio of Completed Code (RoCC) and Acceptance Rate (AR) demonstrated substantial increases with cloud code completion powered by Mellum variants. For instance, RoCC for Python increased from 0.25 to 0.39, and for Java, from 0.42 to 0.49, highlighting a significant boost in developer productivity.

Also Read:

A Blueprint for Open, Production-Grade AI

The Mellum project underscores the importance of task-specific models and disciplined data curation for production-grade AI tools. By open-sourcing Mellum models under the Apache-2.0 license on HuggingFace, JetBrains provides a reproducible reference for practitioners and makes AI-assisted coding more accessible to organizations with strict privacy concerns or limited resources for large-scale model deployment. This approach demonstrates that compact, specialized models can deliver substantial productivity gains while meeting stringent product constraints, paving the way for more efficient and democratized AI tools in software development. You can find the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

JetBrains Unveils Mellum: A New Era for In-IDE Code Completion

Addressing the Challenges of Modern Code Completion

The Mellum Model Family: Architecture and Training

Rigorous Evaluation and Real-World Impact

A Blueprint for Open, Production-Grade AI

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates