Code4Me V2: An Open-Source Platform for AI Code Completion Research

TLDR: Code4Me V2 is an open-source, research-oriented code completion plugin for JetBrains IDEs. It features inline completion, a chat assistant, and a modular, transparent data collection framework, allowing researchers fine-grained control over telemetry. It aims to overcome the limitations of proprietary AI coding tools by providing an extensible platform for studying human-AI interaction in software development, demonstrating industry-comparable performance and positive user feedback.

The world of software development is rapidly evolving with the integration of AI-powered code completion tools. These tools, like GitHub Copilot and JetBrains AI Assistant, have shown significant promise in boosting developer productivity, with some studies reporting task completion up to 55% faster. However, this advancement comes with a significant challenge for the academic community: the vast majority of these powerful systems are proprietary and closed-source. This means researchers lack access to crucial user interaction data, transparency into model decisions, and control over experimental conditions, making it difficult to conduct reproducible studies on human-AI interaction.

Addressing this critical gap, a team of researchers from Delft University of Technology has introduced Code4Me V2, an innovative, research-oriented, open-source code completion platform. Designed as a plugin for JetBrains IDEs, Code4Me V2 aims to democratize access to the tools and data necessary for rigorous academic research in AI-assisted software development.

What is Code4Me V2?

Code4Me V2 is more than just a code completion tool; it’s a comprehensive platform built with researchers in mind. It features inline code completion, providing suggestions directly within the editor, and a context-aware chat assistant for more interactive support. Its fundamental contribution lies in its modular and transparent data collection framework. This framework gives researchers unprecedented, fine-grained control over how telemetry and contextual data are gathered, enabling them to design and execute detailed studies on developer behavior and AI interaction.

The platform is built on a robust client-server architecture. The client, a lightweight JetBrains IDE plugin, handles the user interface and dispatches requests. The server, a Python application, manages user authentication, stores data in a PostgreSQL database, and performs the computationally intensive AI model inference. This separation ensures minimal performance overhead on the developer’s IDE, as heavy lifting is offloaded to the server.

Empowering Research Through Modularity and Transparency

One of Code4Me V2’s standout features is its exceptional extensibility. The frontend architecture is highly modular, centered around “Modules” that can be responsible for data collection, update processes, or post-acceptance actions. Researchers can easily implement new telemetry (e.g., tracking copy-paste events) by simply creating a new module and registering it, without needing to modify the core codebase. This drastically reduces the engineering effort required to set up new experiments, allowing researchers to focus on experimental design and data analysis.

Furthermore, Code4Me V2 includes a dedicated Analytics Platform. This subsystem processes the collected telemetry—including user interactions, model generations, and contextual data—into research-quality metrics. It provides endpoints for calculating time-series aggregates, acceptance statistics, and latency percentiles, supporting both descriptive and comparative analysis. The platform also supports A/B testing and configuration management, enabling controlled experiments and side-by-side comparisons of model performance.

Performance and User Validation

The platform has undergone preliminary evaluation for performance, stability, and research suitability. By offloading model inference to the server, Code4Me V2 minimizes client-side overhead. For code completion, it achieves an average end-to-end latency of 186.31 ms, which is well within the acceptable range for a non-disruptive user experience. Chat completions have a higher latency, averaging 8369.78 ms, due to the more complex nature of the task.

User studies were conducted in two phases. An initial formative study with four expert researchers from the AI for Software Engineering (AI4SE) faculty provided strong validation for the platform’s core premise. Participants praised its modularity and extensibility, expressing confidence in adapting it for their research. Subsequent iterations based on this feedback led to a secondary evaluation with four daily users, who experienced fewer issues and further highlighted the tool’s usefulness. One user even suggested an “Agent” feature, underscoring the need for academic tools to quickly adapt to evolving domains.

Also Read:

Looking Ahead

While Code4Me V2 offers significant advancements, the researchers acknowledge areas for future improvement. These include refining the user interface for experiment configuration, enhancing completion speed and model quality to match commercial tools, and implementing project-wide context retrieval for even more accurate suggestions. The team actively invites community contributions to ensure the platform continues to meet the evolving needs of the AI4SE community.

Code4Me V2 represents a crucial step towards fostering open science in AI-assisted software development. By providing a transparent, controllable, and extensible platform, it empowers researchers to delve deeper into the complex dynamics of human-AI collaboration in programming. More information about the tool can be found on its official website, and the full research paper is available here: Code4Me V2 Research Paper.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Code4Me V2: An Open-Source Platform for AI Code Completion Research

What is Code4Me V2?

Empowering Research Through Modularity and Transparency

Performance and User Validation

Looking Ahead

Gen AI News and Updates

AI Models Begin to Grasp What Makes Math Problems Interesting to Humans

Automating Data Gathering for Software Interfaces: The Auto-Explorer Method

Stepping Back in Time: How 8bit-GPT on a Vintage Mac Redefines Human-AI Interaction

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates