TLDR: A new framework, Meta-Prompted Code Optimization (MPCO), addresses the challenge of optimizing code using multiple large language models (LLMs) where prompts often fail across different models. MPCO automatically generates high-quality, context-aware prompts by integrating project, task, and LLM-specific information. Evaluated on real-world codebases, MPCO achieved up to 19.06% performance improvement, demonstrating that comprehensive context is essential and that various major LLMs can effectively serve as meta-prompters, making multi-LLM code optimization practical for industrial use.
Large language models, or LLMs, are increasingly being used for automating code optimization. However, a significant challenge arises for industrial platforms that use multiple LLMs: a prompt that works well for one LLM often doesn’t work for others. This means developers have to spend a lot of time and effort creating specific prompts for each LLM and task combination, a problem the researchers call the “cross-model prompt engineering bottleneck.” This issue severely limits how widely multi-LLM optimization systems can be used in real-world production environments.
To tackle this, a new framework called Meta-Prompted Code Optimization, or MPCO, has been introduced. MPCO is designed to automatically generate high-quality, task-specific prompts that work across different LLMs, all while meeting the strict efficiency demands of industrial applications. The core idea behind MPCO is “meta-prompting,” where a higher-level LLM dynamically creates context-aware optimization prompts. It does this by combining various pieces of information: project details, specific task requirements, and characteristics unique to each LLM. This framework is seamlessly integrated into the ARTEMIS industrial platform, which handles automated validation and scaling of the optimized code.
The effectiveness of MPCO was rigorously tested through a comprehensive evaluation involving five real-world codebases. This involved 366 hours of runtime benchmarking, demonstrating MPCO’s impressive capabilities. The results showed overall performance improvements of up to 19.06% compared to existing baseline methods. A detailed analysis revealed that 96% of the top-performing optimizations came from genuinely meaningful code edits, not just minor tweaks. Through various studies, the researchers found that integrating comprehensive context is crucial for effective meta-prompting. They also discovered that all three major LLMs tested (GPT-4o, Claude 3.7 Sonnet, and Gemini 2.5 Pro) can effectively serve as meta-prompters, offering valuable insights for industry professionals.
The Challenge of Cross-Model Prompt Engineering
The success of LLM-based code optimization has led to specialized industrial platforms, such as TurinTech AI’s ARTEMIS, which deploy multiple LLMs to find optimal code optimizations. However, a fundamental problem emerged: a prompt that works perfectly with one LLM might produce poor results with another. For instance, GPT-4o might perform best with its own prompt but degrade significantly when using a prompt optimized for Claude or Gemini. These inconsistencies are due to differences in how LLMs are trained, how they handle text, and their default behaviors. This necessitates creating and maintaining distinct prompts for every LLM-task-project combination, which is impractical for large-scale industrial deployment.
How MPCO Works: A Four-Stage Approach
MPCO operates through a streamlined four-stage workflow to ensure complete automation from identifying performance issues to validating optimized code:
Stage 1: Profiling and Bottleneck Identification. The process begins by pinpointing performance bottlenecks in software projects using industry-standard profilers like Intel VTune Profiler for C++ and Speedscope for Python. The top 10 most performance-critical code snippets are then selected for optimization.
Stage 2: Context Collection and Meta-Prompt Generation. This is where MPCO’s innovation shines. It rapidly gathers comprehensive context from ARTEMIS’s existing metadata store. This includes project context (name, description, languages), task context (optimization goals, constraints), and LLM context (model strengths and limitations). A meta-prompting LLM then uses this structured information to generate a specialized prompt for the target LLM in a single step, eliminating manual tuning.
Stage 3: Multi-LLM Code Optimization. The generated meta-prompts are sent to multiple target LLMs simultaneously. Each LLM receives its specific prompt along with the bottleneck code snippet, producing code optimizations. The framework then creates a new version of the original code repository with only the optimized snippet, ensuring consistent validation conditions.
Stage 4: Performance Evaluation and Validation. Finally, the optimized code variants are automatically validated using ARTEMIS’s isolated cloud services. This involves compiling the code, running unit tests to ensure functional correctness, executing performance benchmarks, and collecting runtime metrics. Only functionally correct and performance-validated optimizations are accepted, maintaining high industrial reliability standards.
Key Findings from the Evaluation
The research addressed three main questions:
RQ1: MPCO’s Effectiveness. MPCO consistently outperformed traditional baseline prompting methods like Chain-of-Thought, Few-Shot, and Contextual Prompting. It achieved the best average rank across all systems and the highest mean performance improvement in four out of five systems, demonstrating its ability to overcome the cross-model prompt engineering challenge.
RQ2: Importance of Contextual Components. An analysis of MPCO with missing contextual components (no project, task, or LLM context) showed that the full MPCO template consistently performed best. This highlights that integrating comprehensive context—including project details, task objectives, and LLM-specific characteristics—is crucial for optimal meta-prompting effectiveness.
RQ3: Sensitivity to Meta-Prompter LLM. The study found that while GPT-4o performed slightly better as a meta-prompter, all three major LLMs (GPT-4o, Claude 3.7 Sonnet, and Gemini 2.5 Pro) could effectively generate high-quality optimization prompts. This means organizations can choose their meta-prompting LLM based on practical factors like cost and availability without significantly compromising MPCO’s benefits.
Also Read:
- AI Agents Reshaping Software Development
- SE-Agent: Enhancing AI Problem-Solving Through Iterative Trajectory Optimization
Practical Implications for Industry
The study provides several actionable guidelines for industrial deployment:
- Maintain comprehensive context integration (project, task, and LLM context) as removing any part can significantly degrade performance.
- Automate context collection by integrating with existing platform services using a modular, API-driven pipeline and structured data schemas.
- When selecting a meta-prompting LLM, prioritize deployment considerations such as cost, availability, and integration requirements over minor performance differences, as all major providers offer comparable effectiveness.
- For more complex optimization scenarios, consider advanced meta-prompting techniques like agent-based or iterative approaches, understanding that they may involve a trade-off with computational resources.
MPCO represents a significant step forward in automating code optimization for multi-LLM industrial platforms, reducing manual prompt engineering overhead and enabling more efficient and scalable software development. You can read the full research paper at this link.


